“The Meat (No More Potatoes)”

Jad C -

I wonder if each blog reads like an episode of a TV show. If that’s the case, then Blog Post 3 was certainly a filler episode.

Contrastingly, this is like the episode where the show moves past the potatoes and gets into the meat of the issue.

First, let’s contextualize my data collection.

Part One: 

I’m scraping U.S. major media articles online and performing sentiment analysis on their titles to ascertain an overall U.S. major media bias towards Israel, neutral, or towards Palestine.

Part Two Part One:

I’m exporting a dataset on Kaggle to perform sentiment analysis on tweets hosted on Twitter regarding the Israeli – Palestine war. The data is only sourced from October to December of 2023, so this is supposed to represent earlier coverage.

Part Two Part Two:

Same deal here. I export a dataset off Kaggle into CSV format to perform sentiment analysis on Reddit comments regarding the Israeli – Palestine war. The data is sourced until present day, so this is supposed to represent modern coverage.

Recently, I’ve finished Part Two Part One. Originally, I exported the CSV into Google Sheets, and downloaded GPT Workspace, a Chrome extension which should be able to process the data and perform sentiment analysis. However, it did not work as planned. Additionally, there’s a credit limit, so before I got it to output a single data point, it already canceled on me for the month. So, I developed a code using VADER sentiment analysis on Cursor, and somehow, it actually worked. Now, I have 5,000 randomly selected tweets with sentiment scores from +1 (pro-Israel) to -1 (pro-Palestine), as well as an average. My average is looking significantly different from a null hypothesis of -0.05 to 0.05, so I might have some legitimate findings here. However, my findings actually differ from my hypothesis, as I predicted that major media should be significantly pro-Israel, while they seem to lean pro-Palestine. Yet, there could be slight errors here. My data has a heavier sourcing from left-leaning media publications, so that could easily be the explanation there. I’ll continue to run more tests and see if I get any different results.

Next week is Part One and Part Two Part Two.

More Posts

Comments:

All viewpoints are welcome but profane, threatening, disrespectful, or harassing comments will not be tolerated and are subject to moderation up to, and including, full deletion.

    archisha_r
    Hey Jad, this is a pretty interesting procedure! Why did you use a VADER sentiment analysis over other models and how accurate was it when it captured the nuances in political language or sarcasm?
    krish_s
    Great blog post, Jad! Because your data already has a lot of sourcing from left-leaning outlets, do you plan on making it a priority to add more sourcing from right-leaning outlets to kind of balance it out? If not, how else do you plan on reducing this error because it seems like it could have a big impact on your results?

Leave a Reply

Your email address will not be published. Required fields are marked *