Created on 2nd June 2023
•
Most of the AI engineers put so much time into Data Preprocessing, Data cleaning etc. However, they can use that time to fine-tune his/her AI model instead of investing their valuable time in preprocessing data. For any AI model training, we need structured data in the form of numbers as ML works on Mathematics.
For Image and audio data, engineers have to convert the data into matrices, and in the case of text, they have to tokenize it to feed into ML models.
Datachef helps them to automate the whole process as:
For Images, Datachef does the following preprocessing:
For Text dataset:
For Audio:
For Classification/regression datasets:
Also, we provide preprocessing for Time Series
Apart from this, we also generate LOGS for each operation we perform on dataset which will be helpful for AI Engineer to understand the processed dataset.
Bonus Feature: CHATBOT
We have created a chatbot where engineers can talk about Machine Learning to the bot. This chatbot is capable of generating Model Codes as well.
NOTE: We only provide preprocessing of datasets and we can only suggest the models that can be implemented through our chatbot.
Target Audience:
We needed to make Datachef flexible to make it work with all the datasets. We were stuck on extracting Labels and encoding them from zip while uploading image datasets.
We iterated through folders to get their names and used SKLearn LabelEncoder to encode them as labels. This way we were able to get over the label thing in Image dataset preprocessing.