People who don't want to use an external keyboard and also want to have an immersive and fun experience playing games with basic controls like dino run or pinball are our target audience. Also long duration use of the keyboard can cause damage to the body, so we believe gesture based control is the future, and this is our small contribution towards this vision. The app can be mapped for upto 5 different gestures, hence 5 different keys like jump and duck in dino run or the two flippers and table shake in pin ball.
A big issue was getting the hand segmented out of the image in a format that can easily be recognized by a trained model. We chose to crop a fixed area from the camera screen and then enlarge it using super resolution. This upsampled image is then segmented using a threshold based method and this segmented black and white image is passed to the model. The dataset we trained our model comprised of similar thresholded images of hand gestures.
A good dataset that matched our input was also hard to find. Most datasets either had infrared images or had too few images per class or too few classes. We settled on one with thresholded images, with 5 classes and ~8k total images. The accuracy is still a bit spotty as the gestures need to be in a certain range of orientations for the model to recognize them but we plan on training our model on more data to solve that issue.
Technologies used
Discussion