Week 2 — Maybe I should have taken stats
Amelia S -
Hi guys! Welcome back to yet another week of my blog!!
Now that I have finished cleaning my spreadsheet of diabolical proportions, I can finally move on to coding. I (in theory) know how to code, and R is a pretty decent language – not overly finicky and semi-decent at letting you know what is causing errors — however, on Monday I spent over an hour trying to print the results of my analyses, only to realize the error was because my arguments were backwards. But we stay silly.
When my code is working, however, I’m mainly using two different analyses on my data – neural networks and random forests. Neural networks are supposed to emulate the human brain and use layers of nodes (neurons) to connect the inputs (in this case influenza titers) to the outputs (in this case birth cohorts). Each data point is given a score (kind of) for each of the outputs and is classified as the output category with the highest score. I then print out the stats to check how accurate it was, and am disappointed because the computer thinks everyone was born between 1961 and 1970, except 3 people who actually were born between 1961 and 1970.

Random forests are another method of machine learning, and they operate using a forest (that is a large number) of decision trees. Each tree is trained on a random subset of the data input, and they’re combined to make the forest as a whole more accurate because each tree independently arrives at a result for the data. They don’t print out pretty diagrams for me like neural nets do, but I can show you what they actually spit out when showing me if they worked (this one kind of did).


Though it’s annoying that the computer programs are really bad with old people (see Fig. 2), it makes sense. Even with just a human brain, you can see that there are higher concentrations of antibodies for flu strains around when a person is born, which is helpful for computer analysis. However, because the earliest strain of flu I have data for is from 1968, it’s much harder to classify people born a while before that.
I spent a lot of this week on StackExchange trying to understand what was breaking in my code, but now that the models mostly work, I can focus on fine tuning them, and hopefully by this time next week not everyone will be born in 1960!
Otherwise, I’ve started reading FLU: The Story of the Great Influenza Pandemic of 1918 and the Search for the Virus That Caused It by Gina Kolata. It’s been a little gory thus far, but super interesting! I’ll talk more about Spanish Flu next week (probably) so look forward to that!
Thanks for reading, see you for week 3!!

Comments:
All viewpoints are welcome but profane, threatening, disrespectful, or harassing comments will not be tolerated and are subject to moderation up to, and including, full deletion.