Week 6: Feature Implementation

Ishan B -

Last week, I discussed feature engineering and how it works, specifically the correlation between different features such as people with “less than a high school diploma” compared to the different districts to see the amount of people with different degrees is directly related to the population or if there are other factors that go into it. In the dataset, there was no column for population of a district, but there was information about people with “less than a high school diploma” and percent of people with “less than a high school diploma”. Using these numbers, I was able to generate the population of the district, which was the first feature I implemented as population of an area is one of the biggest factors when it comes to a business’s choice of where to open.

This week, I am generating more features based off of the different years provided in the dataset. One of the important metrics for a business wishing to open in a new area is the potential growth of the area. The change in population over time is a good metric to analyze this, but other metrics can also be analyzed. One other metric that may be important for certain types of businesses is the change in education level over time. Certain businesses are more likely to get certain demographics of people as customers, so being able to see, for example, the change in amount of people getting bachelor’s degrees could be useful information and could impact the decision of a businesses opening location.

More Posts

Comments:

All viewpoints are welcome but profane, threatening, disrespectful, or harassing comments will not be tolerated and are subject to moderation up to, and including, full deletion.

    tanay_n
    Hey Ishan! That sounds like a solid approach to feature engineering! When calculating the change in education level over time, are you looking at absolute changes (e.g., the number of people with bachelor’s degrees increasing by X) or relative changes (e.g., the percentage of the population with bachelor’s degrees increasing by Y%)? Also, how do you plan to handle missing or inconsistent data across different years?
    camille_bennett
    Hi Ishan, it's so interesting how you are analyzing different demographics. Are there any other changes over time that businesses may be interested in? Like housing prices? Or median age of the population?
    Rahul Patel
    Hey Ishan, I really like how you’re using indirect data to estimate population and analyze demographic trends—it’s a creative way to extract valuable insights! Tracking education level changes over time is especially interesting for businesses targeting specific demographics. Will your model weigh these factors differently based on industry type, or will it take a more standardized approach? Looking forward to your next update!
    ishan_b
    Thanks for the question Ms. Bennett! Age is definitely a big factor when it comes to who different businesses target, however, my current dataset only has education levels in it. If time permits, I am planning to add new columns to my dataset with information about age, income level, etc.
    ishan_b
    Thanks for the question Rahul! Initially, the model will not weigh factors differently based on industry, but once the initial model is finished training, I am planning to allow users to input what type of business they are opening to allow for results more specific to each business.
    ishan_b
    Thanks for the question Tanay! Most of the analysis I will be doing is on relative changes as for areas with smaller populations, the changes are expected to be smaller, yet that data is still useful. As for handling missing or inconsistent data, the dataset I found has very minimal missing values, and I am training the model on the most recent years. If there is any inconsistent results, I plan to train the model on older years as well and see if those inconsistent results hold.

Leave a Reply

Your email address will not be published. Required fields are marked *