Week 4: An Introduction to Polynomial Regression

Akshita K -

Welcome back! This week, I have been studying polynomial regressions and how I can apply them to my research.

What is Polynomial Regression?

Last week, I introduced different types of regression models, including linear and logistic regression. However, real-world data is often more complex than a simple straight-line relationship. This is where polynomial regression becomes useful.

While the formula for linear regression is:

y = β₀ + β₁x + ε,

a polynomial regression takes the following form:

y = β₀ + β₁x + β₂x² + … + βₙxⁿ + ε

The Challenge: Finding the Right Degree (n)

One of the most challenging aspects of polynomial regression is determining the optimal degree (n).

Avoiding Underfitting

If we assume too low a polynomial degree when the actual trend is more complex, our model will underfit the data. This means it oversimplifies patterns and fails to capture important nuances in job displacement trends.

Avoiding Overfitting

On the other hand, using too high a polynomial degree can lead to overfitting, where the model captures noise instead of meaningful patterns. Overfitting makes predictions highly sensitive to small fluctuations in data, leading to unreliable results and losing generalizability.

How can we determine the optimal degree?

There are multiple ways to do this:

1. Visual Inspections

One simple and intuitive way to determine the degree is by visually inspecting the data. You can try different degrees and pick the one that best captures the general shape of the data without becoming overly complex.

2. AIC/BIC:

These information criteria are used to penalize models for having too many parameters. By incorporating this penalty, both AIC and BIC help prevent overfitting and encourage models that are simpler, while still fitting the data well.

3. Adjusted R-squared:

Adjusted R-squared adjusts for the number of predictors and penalizes unnecessary complexity. It ensures that the model strikes a balance between fit and simplicity, guiding you to the degree that captures the underlying trends without overfitting the data.

What’s next?

In the coming weeks, I will be running polynomial regressions, using the demographic as the independent and AI Exposure as the dependent variable. If you have any thoughts or questions about polynomial regression or my project, feel free to drop a comment below!

More Posts

Comments:

All viewpoints are welcome but profane, threatening, disrespectful, or harassing comments will not be tolerated and are subject to moderation up to, and including, full deletion.

    camille_bennett
    Hi Akshita, great description of a complicated process. You mentioned that choosing the right degree (n) is challenging. Can you describe a specific scenario or example where you encountered this issue, and how you handled it?
    akshita_k
    Thank you for your question, Ms. Bennett! As I run polynomial regressions to understand how different demographics are affected by AI exposure, I will have to choose the right degree for each model, depending on the data. A model that’s too simple (low n) might miss important patterns, while a model that’s too complex (high n) could overfit. To find the optimal degree, I’m thinking that I will either use adjusted R-squared (which I explained above) or Mean Squared Error (which I will explain in the next post) to choose an n that captures all important trends without unnecessary complexity.

Leave a Reply

Your email address will not be published. Required fields are marked *