Week 10: Classifier Complete!
Johnny Y -
This week, I finished coding my LR classifier. Again, the goal was to take some pre-computed text embeddings and predict one of three labels: “down”, “no change”, or “up”. It uses one logistic regression layer.
The data came in CSV format, with each row being a vector (basically a long list of numbers) and a label. My first job was to clean up the data so everything was the same size and could be used for training (the reason each row has to be the same size is so that it can be turned into a proper PyTorch tensor). Since the data for the first 30 days lacked an ARIMA prediction, I manually added a 0 term.
The model itself just followed the logistic regression logic I’ve described in previous blog posts. The only major hiccup was the progress bar not showing up properly, but that was easily fixed by importing the tqdm library instead of tqdm.notebook.
At first, the model just predicted the same label over and over — not helpful. So, I added a tiny feature: after every training round (or “epoch”), I printed the first ten predictions alongside their true answers. I then tweaked the parameters until there was sufficient variety.
Current results aren’t super great, however, so I’ll be meeting with my advisor next week to see how we can improve them. The model used is pretty simple, so we may turn to more sophisticated models (adding another layer, etc). Manually entering more training data is also a possibility.
Comments:
All viewpoints are welcome but profane, threatening, disrespectful, or harassing comments will not be tolerated and are subject to moderation up to, and including, full deletion.