Week #1 — The Dataset
Sachin C -
This week, I began the process of gathering data to train the artificial intelligence model I intend to use on the microcontrollers. AI models scale in quality based on the data they are trained on, so gathering a satisfactory sample of training images is quite important for an unbiased experiment.
- Curating Existing Datasets: I am sourcing labeled image datasets from platforms like Kaggle, ImageNet, and OpenAI’s datasets.
- These sources cover high-quality training images that are labeled and tagged for my own convenience.
- Collecting Custom Data: Using tools like OpenCV for image processing and Python scripts to automate data collection, I am capturing real-world images that closely resemble the use case of my model.
- Using these scripts ensures that my data is relatively standardized.
- Data Preprocessing: I am using libraries like Pandas and NumPy to clean, filter, and structure the dataset.
- Structuring the dataset is important so that I can minimize noise, or irrelevant/random data points that disrupt underlying patterns and add meaningless information.
- Augmenting Data: Since deep learning models perform better with diverse datasets, I am applying image augmentation techniques using TensorFlow and OpenCV.
- These techniques ensure that the model can recognize images with a slight variance (eg. a darker background, or an image rotated 90 degrees).
The goal is to compile a well-structured dataset that balances efficiency and accuracy while being lightweight enough to function on a microcontroller. Over the next few weeks, I’ll continue refining my dataset before moving on to model training and optimization.
I am very excited to check back in next week to update you all on my progress!
Comments:
All viewpoints are welcome but profane, threatening, disrespectful, or harassing comments will not be tolerated and are subject to moderation up to, and including, full deletion.