profile

One Mudra, Breaking Barriers

The World's First AI based Machine Learning Sign Language Translator



Testimonials

Deaf Can Foundation

Founder & General Secretary

Deaf Can Foundation
Bhopal (M.P)

Breaking Communication Barriers with One Mudra

Are you ready to break down language barriers and communicate effortlessly in American Sign Language (ASL)? Introducing One Mudra, your ultimate ASL translation companion. Developed by Mannat V Jain, a passionate 16-year-old student studying in the United States, One Mudra revolutionizes the way we interact and understand ASL.

...

About One Mudra

One Mudra is more than just a mobile application; it's a gateway to seamless communication. Harnessing the power of cutting-edge AI technology, One Mudra, with its bi-directional communication ability, bridges the gap between different sign languages and numerous global languages in real time. Open-sourced to trainers, it allows for the training of the app in various sign languages, expanding its reach and inclusivity. Whether you're a fluent Sign language speaker or someone eager to learn, One Mudra empowers users to express themselves effectively and understand Sign Language conversations effortlessly.

Key Features

...
AI-Powered Automatic Grammar Correction

One Mudra's AI feature automatically corrects grammatical errors, ensuring clear and accurate communication in real time.

...
Real-Time Translation

Say goodbye to language barriers. One Mudra instantly translates ASL signs into various global languages, ensuring smooth communication in any setting.

...
Translates to more than 100 languages

From English and Spanish to Mandarin and French, One Mudra supports a wide range of languages, making it accessible to users worldwide.

...
Community & Custom training tool

A backend for training the model allowing the larger community of Sign language trainers to contribute and also for users to have personalised signs

...
Bi Directional Communication

World's first Digitally enabled Bi Directional communication converting both sign to text and text to sign

...
User-Friendly Interface

Designed with simplicity in mind, One Mudra offers an intuitive interface for users of all ages and backgrounds. Navigate effortlessly and start translating with just a few taps.

How It Works

Using One Mudra is as simple as pointing your device's camera at ASL signs. The app instantly detects and translates the signs into your preferred language, allowing for seamless communication in any situation. Whether you're having a conversation with a Deaf friend, attending an ASL class, or exploring a new culture, One Mudra is your trusted companion every step of the way.

Abstract
The Problem: Hearing Loss Affects A Large, Underserved Population of 430 Million

Hearing loss is one of modern society’s most understated and overlooked clinical conditions. As per the WHO, over 430 million people around the world require rehabilitation to address “disabling hearing loss.” Another large segment of sign language users is ‘hearing nonverbal children.’ They are nonverbal due to conditions such as down syndrome, autism, cerebral palsy, trauma, and brain disorders or speech disorders.

Existing Solutions Are Not Scalable, Nor Cost-Efficient

  • Sign languages (e.g., the American Sign Language) are spoken by less than 2% of the hearing disabled. Also, its comprehension amongst the broader society, i.e., within the ‘hearing population’ is low
  • No common standards : There are more than 300 sign languages around the world, each with its own grammar and vocabularies. It's not possible to translate one sign language to another easily. (E.g., ASL to Chinese Sign Lang.)
  • Prosthetic devices (i.e., hearing aids) are expensive as regards low-income countries, and do not provide a solution for disabling hearing loss.
  • Consequences : Social isolation and depression amongst those with disabling hearing loss. Absence of effective communication has real-life consequences, e.g., during interactions with first responders.

My Machine Learning Solution Runs on a Smartphone, Is Free, Accessible in Any Language, & It Expands The Population Funnel With Whom Anyone Who Is Hearing Disabled Can Communicate to Include The Population of the Entire World.

  • My machine learning platform unifies complex, non-standardized gestures of 300 discrete sign languages into any spoken language on the planet, thus enlarging two-way communication exponentially.
  • The process involves training a machine learning model on a dataset of new symbols that can unify the 300 global sign languages. These are used to train a machine, which then recognizes these hand and body pose estimation and gestures, converts them to text, and then to spoken words in any language. Then, an animation engine reverses the process and converts spoken words into text or gestures, allowing for two-way communication.

Introduction
LSTM Models

The model itself is an LSTM (Long-Short Term Memory Model) that uses a Recurrent Neural Network (derived from Google’s Media pipe) to layer a mesh onto the hands and face of the user. It then tracks the mesh to determine motion and interprets the motion as text (which it knows how to do because of the training dataset). Then, as a second step, it takes this converted text and runs a text-to-speech algorithm to say it out aloud in the desired language.

Establishing The Underlying Logic And Principles Of Organization. The Long-short Term Memory

I chose to use an LSTM model for my project which is a type of a RNN (Recurrent Neural Network) because of the comparatively little training data they need. I’ve outlined the benefits of RNN architecture in the table below: LSTMs solve this problem by introducing a memory cell, which can store information over a prolonged period. The cell has gates that control the flow of information into and out of the cell, allowing the network to selectively retain or forget information as needed. This is important for recognizing a vast library of gestures whose frequency of repetition is low

Principles of Spatial Organization

The principal efforts are to train the computer to recognize spatial coordinates (x, y, z) of sign gestures that, while intending to be standardized, almost always involve recognition variations arising on account of users possessing a different way of “writing”.

To reduce these variations, the program is trained to recognize each gesture of this new, unifying language with increasing numbers of turns, until it reaches an acceptable level of accuracy

An extract of a sample run of my code is pasted below. These runs can run into thousands of lines to decipher iconography organized as follows: Deictic (Location or time), Motor (general gestures), Symbolic (representational), Iconic (understandable concepts), Metaphoric (convey through analogy).

Sample Echo Run
Sample Demo Run
Methodology
Hypothesis and Variables

Hypothesis

By using machine learning techniques, it is possible to create a system that can recognize sign language with a high degree of accuracy and translate it into any spoken or written language in real time. This system would use a database of signs (organized around deictic, motor, symbolic, iconic and metaphoric), and use machine learning algorithms to analyze and interpret the signs in real-time. The system would then generate spoken or written text that represents the meaning of the signs being used, allowing people who use sign language to communicate with those who do not understand it.

There are an optimal number of training epochs and runs-per-gesture for my translator system. (An epoch refers to a complete iteration through a dataset during training of a machine learning model.) My program functioned best at the dual equilibriums of 40 runs/gesture, and 2000 epochs/cycle, showing that this is the optimal for such projects. Adding more data only results in longer runtimes, while making no statistically significant enhancement in the levels of accuracy. (In some outlier results, an increase in the numbers of runs/gesture size caused reduced accuracy. This is an observation that needs greater scrutiny.)

Independent Variables

  • Number of epochs necessary to train the model on a two-gesture database.
  • Number of runs/gesture.

Dependent Variables

  • The time it takes to train the model.
  • The speed of recognition.
  • The accuracy of the recognition

Platform

  • This project was built in a Jupyter IDE, a specially designed system that is used to write code to perform ML and AI-related tasks.
  • In the working of the project, I integrated systems including Google Mediapipe (a facemesh system), and Tensorflow, an open-source library for Machine Learning.
  • On, average, the construction can be broken down into:
    1. Using Mediapipe to generate facemeshes (to track personal motion)
    2. Using Tensorflow to build an LSTM that can understand the data.
    3. Recording nearly 100 gestures.
    4. Training the system on such data

The Methodology: Training the Machine

A “Run” is defined as the single execution of the training process. I used 40 runs/case to train the machine and to arrive at a 97% degree of accuracy. These are sample images of sequential runs. At 40 runs/gesture, the desired level of accuracy was observed. Further runs only provided marginal gains (see graph above) once the graph began to plateau. The program can be trained with significantly higher runs/gesture to achieve higher levels of gains. An increase in the number of such gestures is a measure of the “vocabulary” of the program, and a measure of its ‘intelligence,’ and hence, the consequent precision and utility.

Results

This process is typically done using a specific algorithm and is often iterative, with the computer being trained on multiple rounds or epochs of the data similar to the ones displayed in the previous page. The quality and quantity of the data is crucial for the performance of the model, and it's also important to have a diverse and representative dataset. Both of these conditions will be met in the manner in which it is proposed to train the computer with 300 sign languages, each with their 100 initial gestures.

The principles of training a computer involve providing the computer with a large dataset of examples, along with corresponding outputs or labels. The computer then uses this data to learn patterns and relationships and adjust its internal parameters in order to improve its performance on new, unseen examples.

In the two graphs on the right, I have compared the accuracies of the various models, each of which was run with different numbers of epoch. The 2000 and 3000 models exhibit accuracies of 96 percent, and 93 respectively, while the 1000-epoch model has only 76%.

The graph in green compares the latencies of the different models. The result of this analysis is that the 2000 epoch model is ideal. The 2000 epoch model takes 2 seconds instead of the 3 seconds of the 1000 epoch model, and the 5 seconds of the 3000 epoch model.

Quality and Quantity of Data is Key

The quality and quantity of the data is crucial for the performance of the model, and it's also important to have a diverse and representative dataset. Both of these conditions will be met in the manner in which it is proposed to train the computer with 300 sign languages, each with their 100 initial gestures.

I trained the LSTM on a custom-collected dataset of over 900 videos, (30 gestures x 30 recordings/gesture), each of which I collected by hand. On these gestures, I ran the epochs (one epoch is one complete iteration over the training dataset, in which the system runs through the entire set of data, progressively updating its weights and biases), and the model returned a loss, and a categorical accuracy value for the results of each epoch.

The Results In this demonstration, I’m showing how a new shortcut can be used to convey a powerful message. By using a new gesture, and one which is not rooted in any existing sign language, the shortcut can be adopted by anyone around the world who has a hearing disability. I coded one small gesture to quickly say: “This is an emergency.”

The result of the gesture recognition is displayed in the blue ribbon. (I’m showing closed captioning in English. But it could be in any language.) In the next stage, I can add a voice synthesizer which can convert this text into any spoken language from the thousands spoken around the world. All of this is possible on a smartphone app.

Conclusion

The Technology

Epochs And Runs: Epochs and runs are related to the training process of machine learning models. An epoch is a complete iteration through all the training data, while a run is a single execution of the training process. Typically, a machine learning model is trained for multiple epochs, with each run consisting of multiple epochs. The number of epochs and runs can affect the performance of the model, with more epochs and runs leading to a better fit to the training data, but also increasing the risk of overfitting.

Conclusion and Future Applications

My project seeks to demonstrate how advanced AI capabilities can be leveraged to create a free, easy-to-use and low latency sign language translator that can translate gestures in sign language to text and speech. Furthermore, the model ran locally with no privacy or security risks - the data was never stored or transmitted. With these features, the system is ready for implementation in everyday life

The applications of such translative technology are near-limitless, being able to run conversions between multiple different sign-languages at will, adding closed captions to ASL videos to allow immediate understanding by global viewers, and more. Applications can range from the most basic (e.g., ordering a pizza) to something of critical importance, such as, to describe an emergency and/or to communicate effectively with a first responder, something that can have life and death consequences.

The main technology can support the creation of various affiliated technology-enabled platforms, such as:

Next Step:

Create an App and launch this as a free service.

Risk Mitigation

Nearly all alternative, existing systems that are built for object detection run on a cloud. I.e., on a system controlled by an external company, and where data breaches are common, and where privacy is lost. To remedy this, my entire system runs on the user’s device thus ensuring that there can be no violations of user-privacy, as data is neither transmitted nor stored. Furthermore:

Privacy

It will be possible to enable end-to-end encryption for all such conversations similar to that in any WhatsApp or Signal messaging, thus ensuring security.

Biometric

Images and videos that are generated as part of any conversation are neither recorded, nor transmitted.

  • The entire gesture-text-oral translation happens at the originating handheld device, and the only content which is transmission is of the spoken word, and which is encrypted.
  • This can address concerns of privacy, especially where children (hearing non-verbal) are concerned, and also open up clinical uses for this product

Future Experiments & Next Steps

Free to Use Mobile App

  • Exponentially expand access and reach globally
  • Create API’s to enable real-time communication during Zoom/Teams meetings or webinars by providing an intuitive and user-friendly interface for participants to interact using sign-language.

Portable deciphering of sign language:

It will be possible to enable in-goggle translation of sign language gestures without the need for a laptop or external webcam.

Biometric:

Retinal movement detection can account for differences in visual acuity among recipients of the translated communication.

Bibliography

Team Member 1

Mannat V Jain

Developer Of One Mudra

Meet the Developer

I am Mannat Jain, a student at Garden City High School in New York. I developed One Mudra, a sign language translation companion. I am creating the world’s first AI-based digital translator to enable hearing and speech-impaired individuals to communicate with ease.

...

Science Award

Mannat Jain won Highest Honors, Most Distinguished Categorical Project in Physics, the LISTMELA Award of Excellence in STEM, and was a NY State Science Congress Finalist for his project, “Using Machine Learning to Translate Sign Language Gestures to Text and Speech.”

Read More
...

Join the One Mudra Community

Ready to experience the power of seamless ASL translation? Join the One Mudra community today and embark on a journey of communication without boundaries. Download One Mudra from the App Store or Google Play Store and unlock a world of possibilities.

Join Now

Connect with Us

Have questions, feedback, or suggestions? We'd love to hear from you! Connect with us on social media or reach out via email to share your thoughts and experiences with One Mudra. Break down barriers, connect with confidence discover the power of One Mudra today.