Week 5: Feature Engineering
Ishan B -
Last week, I discussed the challenges I had with setting up the dataframe. The first challenge of changing the county names to numbers and assigning those numbers to the names was relatively easy. The other issue was a bit more of a challenge, but using inbuilt pandas methods, specifically iloc, I was able to loop through row 3 and set each element to the column header.
This week, I am focusing on feature engineering, a part of machine learning where you find correlation between the input and output elements to decide which features are important to the training of your model. This can help get rid of any features that are unnecessary which could confuse the model during training. Feature engineering is not a crucial part of training a machine learning model, but I believe that this extra step can help enhance the model and improve its accuracy. I may also generate new features (columns) into my dataframe by using new existing columns. Sometimes, the relationships between columns are more important than the columns themselves, so by defining the relationship yourself, the model can train better and provide better results. Some columns which I believe this will be helpful in are the columns that describe number of people with “less than a high school diploma” or “high school diploma” and comparing that to the population of the county to generate what percent of people have a high school diploma. That also applies for bachelor’s degrees or higher and some college or associate’s degrees.
Comments:
All viewpoints are welcome but profane, threatening, disrespectful, or harassing comments will not be tolerated and are subject to moderation up to, and including, full deletion.