Are you ready to break down language barriers and communicate effortlessly in American Sign Language (ASL)? Introducing One Mudra, your ultimate ASL translation companion. Developed by Mannat V Jain, a passionate 16-year-old student studying in the United States, One Mudra revolutionizes the way we interact and understand ASL.
One Mudra is more than just a mobile application; it's a gateway to seamless communication. Harnessing the power of cutting-edge AI technology, One Mudra, with its bi-directional communication ability, bridges the gap between different sign languages and numerous global languages in real time. Open-sourced to trainers, it allows for the training of the app in various sign languages, expanding its reach and inclusivity. Whether you're a fluent Sign language speaker or someone eager to learn, One Mudra empowers users to express themselves effectively and understand Sign Language conversations effortlessly.
One Mudra's AI feature automatically corrects grammatical errors, ensuring clear and accurate communication in real time.
Say goodbye to language barriers. One Mudra instantly translates ASL signs into various global languages, ensuring smooth communication in any setting.
From English and Spanish to Mandarin and French, One Mudra supports a wide range of languages, making it accessible to users worldwide.
A backend for training the model allowing the larger community of Sign language trainers to contribute and also for users to have personalised signs
World's first Digitally enabled Bi Directional communication converting both sign to text and text to sign
Designed with simplicity in mind, One Mudra offers an intuitive interface for users of all ages and backgrounds. Navigate effortlessly and start translating with just a few taps.
Using One Mudra is as simple as pointing your device's camera at ASL signs. The app instantly detects and translates the signs into your preferred language, allowing for seamless communication in any situation. Whether you're having a conversation with a Deaf friend, attending an ASL class, or exploring a new culture, One Mudra is your trusted companion every step of the way.
Hearing loss is one of modern society’s most understated and overlooked clinical conditions. As per the WHO, over 430 million people around the world require rehabilitation to address “disabling hearing loss.” Another large segment of sign language users is ‘hearing nonverbal children.’ They are nonverbal due to conditions such as down syndrome, autism, cerebral palsy, trauma, and brain disorders or speech disorders.
The model itself is an LSTM (Long-Short Term Memory Model) that uses a Recurrent Neural Network (derived from Google’s Media pipe) to layer a mesh onto the hands and face of the user. It then tracks the mesh to determine motion and interprets the motion as text (which it knows how to do because of the training dataset). Then, as a second step, it takes this converted text and runs a text-to-speech algorithm to say it out aloud in the desired language.
I chose to use an LSTM model for my project which is a type of a RNN (Recurrent Neural Network) because of the comparatively little training data they need. I’ve outlined the benefits of RNN architecture in the table below: LSTMs solve this problem by introducing a memory cell, which can store information over a prolonged period. The cell has gates that control the flow of information into and out of the cell, allowing the network to selectively retain or forget information as needed. This is important for recognizing a vast library of gestures whose frequency of repetition is low
The principal efforts are to train the computer to recognize spatial coordinates (x, y, z) of sign gestures that, while intending to be standardized, almost always involve recognition variations arising on account of users possessing a different way of “writing”.
To reduce these variations, the program is trained to recognize each gesture of this new, unifying language with increasing numbers of turns, until it reaches an acceptable level of accuracy
An extract of a sample run of my code is pasted below. These runs can run into thousands of lines to decipher iconography organized as follows: Deictic (Location or time), Motor (general gestures), Symbolic (representational), Iconic (understandable concepts), Metaphoric (convey through analogy).
By using machine learning techniques, it is possible to create a system that can recognize sign language with a high degree of accuracy and translate it into any spoken or written language in real time. This system would use a database of signs (organized around deictic, motor, symbolic, iconic and metaphoric), and use machine learning algorithms to analyze and interpret the signs in real-time. The system would then generate spoken or written text that represents the meaning of the signs being used, allowing people who use sign language to communicate with those who do not understand it.
There are an optimal number of training epochs and runs-per-gesture for my translator system. (An epoch refers to a complete iteration through a dataset during training of a machine learning model.) My program functioned best at the dual equilibriums of 40 runs/gesture, and 2000 epochs/cycle, showing that this is the optimal for such projects. Adding more data only results in longer runtimes, while making no statistically significant enhancement in the levels of accuracy. (In some outlier results, an increase in the numbers of runs/gesture size caused reduced accuracy. This is an observation that needs greater scrutiny.)
A “Run” is defined as the single execution of the training process. I used 40 runs/case to train the machine and to arrive at a 97% degree of accuracy. These are sample images of sequential runs. At 40 runs/gesture, the desired level of accuracy was observed. Further runs only provided marginal gains (see graph above) once the graph began to plateau. The program can be trained with significantly higher runs/gesture to achieve higher levels of gains. An increase in the number of such gestures is a measure of the “vocabulary” of the program, and a measure of its ‘intelligence,’ and hence, the consequent precision and utility.
This process is typically done using a specific algorithm and is often iterative, with the computer being trained on multiple rounds or epochs of the data similar to the ones displayed in the previous page. The quality and quantity of the data is crucial for the performance of the model, and it's also important to have a diverse and representative dataset. Both of these conditions will be met in the manner in which it is proposed to train the computer with 300 sign languages, each with their 100 initial gestures.
The principles of training a computer involve providing the computer with a large dataset of examples, along with corresponding outputs or labels. The computer then uses this data to learn patterns and relationships and adjust its internal parameters in order to improve its performance on new, unseen examples.
In the two graphs on the right, I have compared the accuracies of the various models, each of which was run with different numbers of epoch. The 2000 and 3000 models exhibit accuracies of 96 percent, and 93 respectively, while the 1000-epoch model has only 76%.
The graph in green compares the latencies of the different models. The result of this analysis is that the 2000 epoch model is ideal. The 2000 epoch model takes 2 seconds instead of the 3 seconds of the 1000 epoch model, and the 5 seconds of the 3000 epoch model.
The quality and quantity of the data is crucial for the performance of the model, and it's also important to have a diverse and representative dataset. Both of these conditions will be met in the manner in which it is proposed to train the computer with 300 sign languages, each with their 100 initial gestures.
I trained the LSTM on a custom-collected dataset of over 900 videos, (30 gestures x 30 recordings/gesture), each of which I collected by hand. On these gestures, I ran the epochs (one epoch is one complete iteration over the training dataset, in which the system runs through the entire set of data, progressively updating its weights and biases), and the model returned a loss, and a categorical accuracy value for the results of each epoch.
The Results In this demonstration, I’m showing how a new shortcut can be used to convey a powerful message. By using a new gesture, and one which is not rooted in any existing sign language, the shortcut can be adopted by anyone around the world who has a hearing disability. I coded one small gesture to quickly say: “This is an emergency.”
The result of the gesture recognition is displayed in the blue ribbon. (I’m showing closed captioning in English. But it could be in any language.) In the next stage, I can add a voice synthesizer which can convert this text into any spoken language from the thousands spoken around the world. All of this is possible on a smartphone app.
Epochs And Runs: Epochs and runs are related to the training process of machine learning models. An epoch is a complete iteration through all the training data, while a run is a single execution of the training process. Typically, a machine learning model is trained for multiple epochs, with each run consisting of multiple epochs. The number of epochs and runs can affect the performance of the model, with more epochs and runs leading to a better fit to the training data, but also increasing the risk of overfitting.
My project seeks to demonstrate how advanced AI capabilities can be leveraged to create a free, easy-to-use and low latency sign language translator that can translate gestures in sign language to text and speech. Furthermore, the model ran locally with no privacy or security risks - the data was never stored or transmitted. With these features, the system is ready for implementation in everyday life
The applications of such translative technology are near-limitless, being able to run conversions between multiple different sign-languages at will, adding closed captions to ASL videos to allow immediate understanding by global viewers, and more. Applications can range from the most basic (e.g., ordering a pizza) to something of critical importance, such as, to describe an emergency and/or to communicate effectively with a first responder, something that can have life and death consequences.
The main technology can support the creation of various affiliated technology-enabled platforms, such as:
This technology can be used to make it easier for the ‘hearing disabled’ and the ‘hearing non-verbal’ children to be educated.
This is probably one of the most important applications. Once a more normal conversation is possible between the hearing disabled and the doctor, a more accurate and nuanced diagnosis will be possible.
This can exponentially expand the number of jobs and professions in which the hearing disabled can participate. This will have a direct consequence on their quality of life.
Create an App and launch this as a free service.
Nearly all alternative, existing systems that are built for object detection run on a cloud. I.e., on a system controlled by an external company, and where data breaches are common, and where privacy is lost. To remedy this, my entire system runs on the user’s device thus ensuring that there can be no violations of user-privacy, as data is neither transmitted nor stored. Furthermore:
It will be possible to enable end-to-end encryption for all such conversations similar to that in any WhatsApp or Signal messaging, thus ensuring security.
Images and videos that are generated as part of any conversation are neither recorded, nor transmitted.
It will be possible to enable in-goggle translation of sign language gestures without the need for a laptop or external webcam.
Retinal movement detection can account for differences in visual acuity among recipients of the translated communication.
I am Mannat Jain, a student at Garden City High School in New York. I developed One Mudra, a sign language translation companion. I am creating the world’s first AI-based digital translator to enable hearing and speech-impaired individuals to communicate with ease.
Mannat Jain won Highest Honors, Most Distinguished Categorical Project in Physics, the LISTMELA Award of Excellence in STEM, and was a NY State Science Congress Finalist for his project, “Using Machine Learning to Translate Sign Language Gestures to Text and Speech.”
Read More
Ready to experience the power of seamless ASL translation? Join the One Mudra community today and embark on a journey of communication without boundaries. Download One Mudra from the App Store or Google Play Store and unlock a world of possibilities.
Join NowHave questions, feedback, or suggestions? We'd love to hear from you! Connect with us on social media or reach out via email to share your thoughts and experiences with One Mudra. Break down barriers, connect with confidence discover the power of One Mudra today.