Week 7- Data Analysis

Arnab M -

Hi guys, It’s Arnab and this is my seventh weekly update on my senior research project: Exploring the Genomic Effects of PNPLA7 Mutations on Cerebral Palsy through RNA Sequencing.

This week I spent almost the entirety of my time wrapping up the bioinformatics coding part of my research project by taking on the daunting task of  starting Overrepresentation Analysis and Gene Set Enrichment Analysis (GSEA).  First I worked with Overrepresentation Analysis’s Vignette to try and discover how to properly code and carry out the analysis. Majority of the code was straightforward and easy and I got my results data frame table in no time…  however, when I saw the results the word disappointing would only  be an understatement.  Out of the 16 genes I identified as significant in DESeq2 only 12 of them had enough data from Gene Ontology database and any other gene database to work with. This already placed heavy constraints on our possible results and the best I got out of Overrepresentation Analysis was that a single pathway only containing a 3 gene pathway based in metabolic processes. It was interesting to say the least, but the postdoc reassured me that this was completely normal for RNA-Seq projects as my sample size was not nearly as large as some of their previous projects and even then they would often fail to yield results. I continued to start the code for Gene Set Enrichment Analysis and as I processed the data I found some insightful and interesting results. I’m going to leave you on a cliffhanger to find out what exactly I found next week, but I assure you it is well worth the wait.

Until next week, see you soon!

More Posts


All viewpoints are welcome but profane, threatening, disrespectful, or harassing comments will not be tolerated and are subject to moderation up to, and including, full deletion.

    Arnab, sounds like you are definitely overcoming some hurdles with your research. Can you explain the Gene Ontology database?
    Of course, Gene Ontology (GO) at basically a massive database collection of different genes each with their specific pathways and phenotypes that they affect and there are 3 classifications/pathways that GO uses: Molecular Function, Biological Processes, and Cellular Component. GO is heavily used in most Bioinformatics Pipelines and give help researchers draw massive inferences from data through its specification and tremendous amounts of listed genes. There is a hierarchical structure to its innate classification system, but simple lines of code can account for this and overall it is an extremely useful tool in the world of Data Analysis. Hope this insight answered your question!

Leave a Reply

Your email address will not be published. Required fields are marked *