Week 9 – Finalizing My Results: Confounding Variables, Multicollinearity, and Interactions

Akshita K -

Welcome back to Week 9 of my senior project! This week, I’ve been finalizing my results, addressing multicollinearity, exploring potential lag effects over time, and running regression interactions to explain the seemingly contradictory results between my initial findings and my results after controlling for confounding variables.

Multicollinearity

Multicollinearity occurs when two or more independent variables in a regression model are highly correlated, making it difficult to determine their individual effects on the dependent variable. When this happens, our model might produce misleading conclusions.

How can we detect and solve for multicollinearity?

One of the most common ways to detect multicollinearity is by using the Variance Inflation Factor (VIF). VIF measures how much a variable’s variance is inflated due to collinearity with other predictors in the model. Because many of my variables are categorical and have more than 1 degree of freedom, I will be using Adjusted Generalized Variance Inflation Factor (GVIF^(1/(2*Df))

GVIF Interpretation:

  • GVIF = 1 → No correlation (ideal)
  • GVIF > 2 → Moderate to strong multicollinearity (should remove variable from model)

My GVIF Results:

Variable GVIF^(1/(2*Df)
Sex 1.057
Age 1.471
Education 1.652
Race/Ethnicity 1.061
Income 1.543
Years in the U.S. 1.095
Occupation Type 2.512
Industry  1.089
Region 1.011
Year 1.001

 

In the table above, we can see that the adjusted GVIF for occupation type is very high (about 2.5). Intuitively, this makes sense, as occupation type (white-collar or blue-collar) is too closely tied to variables like income and education. 

To reduce redundancy, I decided to remove Occupation Type from the final regression.

Lag Effects

This week I also explored lagged effects, the idea that the dependent variable — AI exposure — may have a delayed response to the independent variable. This concept is important when modeling trends over time.

However, after incorporating a lag variable in my model, the results showed no meaningful time lag, suggesting that AI exposure remains relatively stable year-to-year for each occupation, at least within the scope of my dataset.

My Final Results

While some trends remained the same, some effects changed drastically after controlling for confounding variables.

You can find the full regression results here.

  1. Sex – Females workers are more vulnerable to AI than male workers.
  2. Age – AI exposure increases with age, but the rate of increase is faster for older workers.
  3. Education – In a reversal of earlier results, more education is now associated with lower AI exposure after controlling for income (more on this in the next section).
  4. Race – From most exposed to least exposed: Asian/Pacific Islander, American Indian/Alaskan Native, Hispanic/Latino, Black/African American, White
  5. Income – AI exposure increases with income
  6. U.S. Residency – Recent immigrants are more likely to work in high-AI-exposure jobs at first, but over time their exposure decreases as they move into less exposed positions, eventually leveling off.
  7. Industry – After controlling for confounding variables, the ranking of industries by AI exposure changed completely. Here’s the updated list from most to least exposed:
    1. Information Technology
    2. Science, Engineering, & Technical Services
    3. Business, Finance, Management, & Real Estate
    4. Wholesale, Retail, & Manufacturing
    5. Professional Services & Administrative Support (e.g., Marketing, Accounting, Bookkeeping, Clerical, Consulting, Administrative Support)
    6. Media and Communications
    7. Transportation and Logistics
    8. Agriculture, Natural Resources, and Mining
    9. Construction/Repair
    10. Arts, Entertainment, and Hospitality
    11. Public Services and Education
    12. Healthcare and Life Sciences
  8. Region – The U.S. region exposure rankings changed as well. For example, workers in the Pacific Division now experience the highest exposure, while those in the West North Central Division experience some of the lowest.

AI Exposure across U.S. Regions from most to least exposed

Interaction Effects

Lastly, I explored interactions — how two variables together might affect AI exposure differently than when considered separately — to explain the discrepancies between my initial and final results.

Example: Education & Income

When I looked at the effect of education and income on AI exposure independently, both showed positive relationships with AI exposure. However, once I included them together in the same model, a surprising shift occurred:

  • Income remained positively associated with AI exposure
  • Education’s effect became negative

This suggests that the earlier positive relationship between education and AI exposure was confounded by income. Education and income are strongly correlated, so in simpler models, education may have acted as a proxy for income. Once I controlled for income, however, the “true” effect of education became clear: among people with the same income, those with more education tend to have slightly lower AI exposure.

What’s Next?

Next week, I’ll wrap up my research by developing policy recommendations to address the inequalities I found in AI-related job displacement.

Thanks for following along! Please feel free to share any thoughts or questions in the comments—I’d love to hear them.

More Posts

Comments:

All viewpoints are welcome but profane, threatening, disrespectful, or harassing comments will not be tolerated and are subject to moderation up to, and including, full deletion.

    aashi_h
    Hi Akshita, I love your project! Now that we are in the final stretch, what would you say has been the most exciting part of the whole process?
    akshita_k
    Thank you for your question, Aashi! I think the most exciting part has been uncovering how different factors that seem straightforward on their own (like education or income) actually behave very differently once you control for other variables. Running those interaction terms and seeing the shift in the effect of education once income was included was intriguing.

Leave a Reply

Your email address will not be published. Required fields are marked *