Week 2 — Maybe I should have taken stats

Amelia S -

Hi guys! Welcome back to yet another week of my blog!!

Now that I have finished cleaning my spreadsheet of diabolical proportions, I can finally move on to coding. I (in theory) know how to code, and R is a pretty decent language – not overly finicky and semi-decent at letting you know what is causing errors — however, on Monday I spent over an hour trying to print the results of my analyses, only to realize the error was because my arguments were backwards. But we stay silly.

When my code is working, however, I’m mainly using two different analyses on my data – neural networks and random forests. Neural networks are supposed to emulate the human brain and use layers of nodes (neurons) to connect the inputs (in this case influenza titers) to the outputs (in this case birth cohorts). Each data point is given a score (kind of) for each of the outputs and is classified as the output category with the highest score. I then print out the stats to check how accurate it was, and am disappointed because the computer thinks everyone was born between 1961 and 1970, except 3 people who actually were born between 1961 and 1970.

Neural Network (example)
Fig. 1; example plot of a neural net I was using — this one didn’t do so well (~20% accuracy) but it looks pretty!

Random forests are another method of machine learning, and they operate using a forest (that is a large number) of decision trees. Each tree is trained on a random subset of the data input, and they’re combined to make the forest as a whole more accurate because each tree independently arrives at a result for the data. They don’t print out pretty diagrams for me like neural nets do, but I can show you what they actually spit out when showing me if they worked (this one kind of did).

confusion matrix generated from a random forest
Fig. 2; confusion matrix and stats for a random forest I ran (the goal is for everything to be along the diagonal)
graph of error vs treescolorful lines
Fig. 3; what the random forest spits out when I tell it to plot itself, I have no clue what those lines mean, but pretty colors! (this is the same analysis as the confusion matrix above)

Though it’s annoying that the computer programs are really bad with old people (see Fig. 2), it makes sense. Even with just a human brain, you can see that there are higher concentrations of antibodies for flu strains around when a person is born, which is helpful for computer analysis. However, because the earliest strain of flu I have data for is from 1968, it’s much harder to classify people born a while before that.

I spent a lot of this week on StackExchange trying to understand what was breaking in my code, but now that the models mostly work, I can focus on fine tuning them, and hopefully by this time next week not everyone will be born in 1960!

Otherwise, I’ve started reading FLU: The Story of the Great Influenza Pandemic of 1918 and the Search for the Virus That Caused It by Gina Kolata. It’s been a little gory thus far, but super interesting! I’ll talk more about Spanish Flu next week (probably) so look forward to that!

Thanks for reading, see you for week 3!!

More Posts

Comments:

All viewpoints are welcome but profane, threatening, disrespectful, or harassing comments will not be tolerated and are subject to moderation up to, and including, full deletion.

    ryan_t
    The diagrams look so cool! Hopefully you'll be able to fine tune them well. What's your favorite part of your internship thus far?
    leah_b
    Hi Amelia! I’ve taken stats and this still look intimidating to me lol. I’m glad you are getting through it though. Good luck and keep pushing through!!
    Nathaniel Green
    It's really cool that you're learning R and that you're analyzing data now! Relevant (?) XKCD: https://xkcd.com/2173/
    elliot_d
    All of these visualizations of data are very scary and also beautiful! you should ramble on these subjects sometime it sounds super interesting!
    lily_h
    Amelia, this is really impressive! Where did you learn about random forests and neural networks?
      amelia_s
      Thanks Lily! My internship advisor recommended them to me, though I learned the actual methods for them from the internet mostly!

Leave a Reply to lily_h Cancel reply

Your email address will not be published. Required fields are marked *