Week 6: Analyzing AI Exposure Through Polynomial Regression
Welcome back to another week of my senior project! After exploring linear regression in the previous post, this week I’ve been analyzing the relationship between various demographic factors and AI exposure using a polynomial regression to see if a more complex model reveals any non-linear patterns that we didn’t capture with linear regression.
Note: I applied polynomial regression only to quantitative variables (such as income, age, etc.)—not categorical variables (like sex, race, and level of education).
In this post, I’ll walk you through how I approached polynomial regression, how I selected the right degree for the model, and then I’ll dive into the results.
Polynomial Regression: How I Chose the Right Degree
As I explained in my Week 4 post, the most difficult part of polynomial regression is selecting the optimal degree:
1. Testing Different Degrees: I tried polynomial regressions of varying degrees (1, 2, 3, and 4) for each variable to test how well they fit the data. A higher degree allows for more flexibility in the model, but it also increases the risk of overfitting, where the model fits the noise in the data rather than the actual underlying trends.
2. Evaluating Model Performance: To decide on the best degree, I compared the Mean Squared Error (MSE) between the training and test datasets. A good polynomial model will have a low MSE on both the training and test sets, indicating that the model is both accurate and generalizable. I looked for the point where increasing the degree did not significantly lower the MSE.
3. Choosing the Right Degree: After testing different degrees, I found that a degree of 2 (quadratic) provided the best balance between fitting the data well and avoiding overfitting for all models. For most variables, a higher degree (3 or 4) didn’t improve the model much, so I chose to keep it simple with the quadratic model.
Regression Results
Now, let’s go over the results of the polynomial regressions for each demographic factor.
You can find the full regression results here.
AI Exposure and Age
The positive linear and quadratic terms indicate an upward sloping curve (concave up). This means AI exposure increases with age, but the rate of increase is faster for older workers.
AI Exposure and Wage
The positive linear and negative quadratic terms for the regression mean that though the slope of the curve is initially positive, it becomes less positive over time (the curve is concave down). This means that workers in low-wage jobs had very little exposure to AI, and exposure significantly increased as wages moved into the middle to upper ranges. However, as wages grew even higher, the effect of AI exposure on these workers plateaued.
AI Exposure and Duration of U.S. Residency
The negative linear and positive quadratic terms for the regression mean that though the slope of the curve is initially negative, it becomes less negative over time (the curve is concave up). This suggests that newer immigrants initially work in sectors less vulnerable to AI, but over time, as they move into higher-skilled positions, their exposure to AI grows. However, this increase slows down after a certain point, indicating a leveling effect over time.
What’s Next?
In the upcoming weeks, I’ll be exploring the relationship between occupation type (white-collar vs. blue-collar) and AI exposure. I’ll also group occupations by industry (e.g., healthcare, IT, manufacturing) to uncover sector-specific patterns.
Thanks again for following along, and feel free to share any questions or thoughts in the comments below!
Comments:
All viewpoints are welcome but profane, threatening, disrespectful, or harassing comments will not be tolerated and are subject to moderation up to, and including, full deletion.