When a person goes missing, the police can upload the picture of the person which will get stored in the database. When the public encounter a suspicious person, they can capture and upload the picture or video footage of public cameras into our portal. The face recognition model in our system will try to find a match in the database with the help of face encodings.
With normal deep learning method, model has to be trained on huge no. of labelled images of the employees and needs to be trained on large no. of epochs. This method may not be suitable because every time new employee comes in model needs to be trained.
Our approach is model is trained on fewer images of the People, but it can be used for newer People without retraining the model. This way of approach is called one shot learning.
One-shot learning is an object categorization problem in computer vision. Whereas most machine learning based object categorization algorithms require training on hundreds or thousands of images and very large datasets, one-shot learning aims to learn information about object categories from one, or only a few, training images.
So we have used FaceNet which was introduced by google in 2015 which uses one shot learning.
Main difference between FaceNet and other techniques is that it learns the mapping from the images and creates embeddings rather than using any bottleneck layer for recognition or verification tasks
One of the biggest challenges with implementing the triplet loss function to generate accurate embeddings of face images is the proper selection of triplets that can contribute to producing quality representations.
The minimum resolution for any standard image should be 1616. The picture with the resolution less than 1616 is called the low resolution image. These low resolution images can be found through small scale standalone cameras like CCTV cameras in streets, ATM cameras, supermarket security cameras. These cameras can capture a small part of the human face area and as the camera is not very close to face, they can only capture the face region of less than 16*16. Such a low resolution image doesn’t provide much information as most of them are lost. It can be a big challenge in the process of recognizing the faces.We solve this problem by providing different resolution of the person's images in dataset and train our model on that dataset.
Illumination changes the face appearance drastically. It has been found that the difference between two same faces with different illuminations is higher than two different faces taken under same illumination.We solve this problem by providing the same images in different light variation for dataset.
Discussion