Week 7: Model Training

Ishan B -

Last week, I finished up the feature engineering generating features such as percent of population with different levels of education as well as comparisons of amounts of people with bachelor’s degrees to people with master’s degrees. This can help as certain business types are more likely to get customers of certain education levels.

This week, I am beginning to train my machine learning model. The first model I will be testing is a scikit-learn (sklearn) linear regression model. Sklearn is an open source python library which has access to many different types of machine learning/artificial intelligence models ranging from regression models to neural networks. Once completing training of the sklearn linear regression model, I will begin to train other regression models from sklearn. This will allow me to compare the results and see which regression model best fits the data and provides the most accurate results. After completing that, if I have extra time, I will try to train regression models from either Tensorflow, Keras, or both. Testing all these models can help guarantee that the final trained model has the best results possible and is the most useful that it can be based on the data.

More Posts

Comments:

All viewpoints are welcome but profane, threatening, disrespectful, or harassing comments will not be tolerated and are subject to moderation up to, and including, full deletion.

    caitlin_e
    Hi Ishan! Training a machine learning model sounds both interesting and demanding, and I am looking forward to seeing how your project evolves over these next few weeks. Applying this knowledge to business growth is very innovative! How will you determine which regression models are the best trained? Are there certain indicators that you search for in each graph?
    ishan_b
    Thanks for the question Caitlin! One of the main ways that I will determine which regression model is best is by training each type. The initial training of the model only takes a couple hours, so I should be able to finish training each type of regression model in a couple days. Once the training is done, the hard part is tuning each model to see which model provides the best results. After tuning, I will analyze which model performed best based on metrics such as root mean squared error (RMSE) and accuracy.
    camille_bennett
    Hi Ishan, sounds like you are getting a lot of experience in training machine learning models. Are there any external factors (such as economic trends) that might impact the reliability of your model’s predictions over time?
    tanay_n
    This is amazing Ishan! I'm curious, what challenges do you anticipate when transitioning from scikit-learn models to TensorFlow/Keras?
    ishan_b
    Thanks for the question Ms. Bennett! External trends can definitely impact the reliability of my model over time. The main way that I would counteract that is by re-training the model on the most up-to-date data from the US Census. By doing this, those trends could already be accounted for in the changes to the dataset.
    ishan_b
    Thanks for the question Tanay! The main challenges I anticipate when transitioning from open-source models is that some of these models have different ways that they need the data to be formatted. Formatting the data incorrectly could lead to improper training of the models.

Leave a Reply

Your email address will not be published. Required fields are marked *