Datachef - Preparing AI ready Datasets

Datachef - Preparing AI ready Datasets

A powerful platform for automating AI dataset transformation, enabling seamless integration with Machine Learning models.

Created on 2nd June 2023

Datachef - Preparing AI ready Datasets

Datachef - Preparing AI ready Datasets

A powerful platform for automating AI dataset transformation, enabling seamless integration with Machine Learning models.

The problem Datachef - Preparing AI ready Datasets solves

Most of the AI engineers put so much time into Data Preprocessing, Data cleaning etc. However, they can use that time to fine-tune his/her AI model instead of investing their valuable time in preprocessing data. For any AI model training, we need structured data in the form of numbers as ML works on Mathematics.
For Image and audio data, engineers have to convert the data into matrices, and in the case of text, they have to tokenize it to feed into ML models.

Datachef helps them to automate the whole process as:
For Images, Datachef does the following preprocessing:

  1. Resizes
  2. Compresses
  3. Color Channel Correction
  4. Noise Reduction
  5. Enhancement
  6. Converts into Pixel Matrix
  7. Label Encoding

For Text dataset:

  1. Converts text into lowercase and removes special characters
  2. Stop words removal
  3. Stemming of words
  4. Tokenization
  5. Label Encoding

For Audio:

  1. Noise Reduction
  2. Extracts MFCC from audio spectrograms as a feature
    MFCC is Mel-frequency cepstral coefficients

For Classification/regression datasets:

  1. Duplicacy Removal
  2. Missing values Handle
  3. Drop Unrelevent attributes
  4. Drop correlated columns
  5. Label Encoding

Also, we provide preprocessing for Time Series
Apart from this, we also generate LOGS for each operation we perform on dataset which will be helpful for AI Engineer to understand the processed dataset.

Bonus Feature: CHATBOT
We have created a chatbot where engineers can talk about Machine Learning to the bot. This chatbot is capable of generating Model Codes as well.

NOTE: We only provide preprocessing of datasets and we can only suggest the models that can be implemented through our chatbot.

Target Audience:

  1. People who are learning AI
  2. Faculty, Students
  3. ML Engineers

Challenges we ran into

We needed to make Datachef flexible to make it work with all the datasets. We were stuck on extracting Labels and encoding them from zip while uploading image datasets.
We iterated through folders to get their names and used SKLearn LabelEncoder to encode them as labels. This way we were able to get over the label thing in Image dataset preprocessing.

Discussion

Builders also viewed

See more projects on Devfolio