AllTalks: One platform, connecting All

AllTalks is a real-time web application. It provide users with built-in features such as chat, speech recognition, audio-video conferencing and gesture recognition along with translated subtitles.

Built at HackDefine 2022

Created on 8th May 2022

•

AllTalks: One platform, connecting All

AllTalks is a real-time web application. It provide users with built-in features such as chat, speech recognition, audio-video conferencing and gesture recognition along with translated subtitles.

The problem AllTalks: One platform, connecting All solves

Web conferencing tools are available in plenty. That said, not every platform is suited to conduct group meetings involving a large number of users and neither convenient for users with listening and speaking disabilities. There isn’t a “one size, fits all” solution. So we need a real-time solution for gesture detection as well as speech recognition so as to provide a successful way of communication. Sign language is the medium for hearing and speech impaired people to share their feelings or thoughts with others. But their communication is restricted with able bodied people as they struggle to learn to sign language. The vision-based solutions can overcome some of their difficulties and disadvantages, they appear to be the best choice for raw data collection. This system converts the sign language into text and speech to text, which will be displayed on screen which is easily understandable by all. Also, it provides a cheap, portable and efficient solution. In the ongoing pandemic situation, the scope of video conferencing has increased tremendously. Alltalks would provide a chance for the deaf and dumb people to socialize with others, also the people without such disabilities would be able to communicate with these people. People without hearing and speech impairments would become aware of some common signs used in Indian Sign Language, hence spreading awareness and bridging the gap between these two categories of people. Moreover, Vision based approaches do not require the user to wear anything (naked hands). Instead, video camera(s) are used to capture the images of hands, which are then processed and analyzed using computer vision techniques. This type of hand gesture recognition is simple, natural and convenient for users and at present they are the most popular approaches to gesture recognition. However, there are still several challenges to be addressed, for instance, illumination change, background clutter, partial or full occlusion etc.

Challenges we ran into

Working sockets was a whole new experience for our team and how to extract the audio from the on-going stream and passing it through the data channels,this whole procedure took a lot of time in understanding the flow and googling the stuff out! Besides, the training of the gesture detection model in a stipulated time period was a task in itself. Building a Gesture recognition model was quite an experience yet there were some milestones which we had to clear for a successful and accurate model.The model had to be precise for gesture which was to be captured at all angles and for every background. Secondly, if the background was of same or a color close to the color as of hand, then it should be able to distinguish between the two. Also, some gestures have almost same sign, for example, V and 2 has same sign, so which sign to be used is a difficult task. Moreover, the model should work for user for different lighting setup, like if some user is in dim light, or if there is a lot of light exposure, then also the model should work. The model should work, irrespective of if user is wearing something on hands, like a glove. There were a lot of errors to be sorted down in making of the model, we were encountering frequent errors, even in initial installation of Tensorflow and we had to refer youtube or stackoverflow. It was a task to collect datasets as we had to capture frames for just one particular action from all different angles and changing the background as well, adjusting lightening, image processing was required for static gestures like alphabets, digits etc. There were some actions which were region dependent so we had to look for standard ones, practice them and then train our model on them. Sufficient time for research was devoted to jot down the required technologies and understanding them. In implementation there were issues in creating chat-box feature and its display when some other person enters the room. Collaboratively dividing the work we did it finally

Technologies used

JavaScript

TensorFlow

OpenCV

Socket.IO

WebRTC

Express.js

Long short-term memory (LSTM)

PeerJS

Nodejs

ejs

Discussion

Builders also viewed

See more projects on Devfolio