AI “sees” an image and generates a caption. This is commonly used for auto tagging and generating metadata for images to organize and search images. Also used for marketing campaigns, generating taglines for products etc. Participants are expected to train a custom vision model trained on available open image sets. Many high tech companies and organisations have huge image datasets which are less compatible to work with. Generating captions from the image dataset will help in optimized segregation of the dataset because textual dataset is much easier to segregate rather than the image dataset.
Dataset we started with provided less accuracy so we took a while to pick another dataset that would be better in training the model.
Tracks Applied (1)
Discussion