Week #1 — The Dataset

Sachin C -

This week, I began the process of gathering data to train the artificial intelligence model I intend to use on the microcontrollers. AI models scale in quality based on the data they are trained on, so gathering a satisfactory sample of training images is quite important for an unbiased experiment.

  • Curating Existing Datasets: I am sourcing labeled image datasets from platforms like Kaggle, ImageNet, and OpenAI’s datasets.
    • These sources cover high-quality training images that are labeled and tagged for my own convenience.
  • Collecting Custom Data: Using tools like OpenCV for image processing and Python scripts to automate data collection, I am capturing real-world images that closely resemble the use case of my model.
    • Using these scripts ensures that my data is relatively standardized.
  • Data Preprocessing: I am using libraries like Pandas and NumPy to clean, filter, and structure the dataset.
    • Structuring the dataset is important so that I can minimize noise, or irrelevant/random data points that disrupt underlying patterns and add meaningless information.
  • Augmenting Data: Since deep learning models perform better with diverse datasets, I am applying image augmentation techniques using TensorFlow and OpenCV.
    • These techniques ensure that the model can recognize images with a slight variance (eg. a darker background, or an image rotated 90 degrees).

The goal is to compile a well-structured dataset that balances efficiency and accuracy while being lightweight enough to function on a microcontroller. Over the next few weeks, I’ll continue refining my dataset before moving on to model training and optimization.

I am very excited to check back in next week to update you all on my progress!

More Posts

Comments:

All viewpoints are welcome but profane, threatening, disrespectful, or harassing comments will not be tolerated and are subject to moderation up to, and including, full deletion.

Leave a Reply

Your email address will not be published. Required fields are marked *