Real-Time Sign Language Captioning for Video Chats
Supervisor: Mrs. K. Vindhya Rani (Assistant Professor, MVGRCOE)
Abstract
This project proposes a solution for enabling accessible communication for the deaf and hard-of-hearing community through video chats. The system leverages machine learning and computer vision technologies to detect sign language gestures in real-time and translate them into text captions. The goal is to provide a more inclusive communication platform that facilitates seamless virtual conversations for everyone, regardless of their hearing ability. The project integrates cutting-edge tools for video processing, gesture recognition, and low-latency communication to offer an efficient and accessible web-based platform.
Project Highlights
Real-time gesture detection through the user's webcam for instant translation of sign language.
Seamless peer-to-peer video calls with integrated captioning features.
Web-based application with cross-platform compatibility and minimal setup required.
Highly accurate recognition model with approximately 90% success rate in controlled environments.
Modular architecture allowing easy extension to support additional gestures or languages.
Dataset & Model Overview
The system is built upon a custom-designed dataset, consisting of approximately 100 sign language classes with hundreds of images per class. To ensure robustness and accuracy:
Double-handed signs contain 100 images per class.
Single-handed signs include 50 images per hand (left and right).
Data augmentation techniques such as rotation, flipping, and scaling were applied to simulate real-world variability.
The Random Forest Classifier was chosen for its effectiveness with structured landmark data and its suitability for real-time applications. Mediapipe is utilized for landmark extraction, while the API layer is handled by Flask for smooth integration.
System Architecture & Workflow
Video feed captured using getUserMedia API.
Hand landmark detection using Mediapipe Hands.
Feature extraction from landmark points followed by classification using Random Forest.
Flask API returns recognized signs in real-time for captioning on video calls.
WebRTC & PeerJS enable live, low-latency video streaming between users.
Key Technologies & Tools
Frontend: HTML, CSS, JavaScript, WebRTC, PeerJS
Backend: Node.js, Flask, Socket.io, Flask
Machine Learning & Computer Vision: OpenCV, Mediapipe, Random Forest Classifier
Development Tools: GitHub, VS Code, Flask API, Data Augmentation Tools
Testing & Validation
Several testing procedures were conducted to ensure system reliability:
API response testing to verify real-time recognition performance.
Model accuracy testing using controlled hand gestures and varied lighting conditions.
Connection testing to validate stable peer-to-peer video calls.
Live testing with multiple users to ensure robust, real-time functionality.
Real-World Applications
Inclusive online education platforms for hearing-impaired students.
Corporate video conferencing tools promoting workplace diversity.
Healthcare consultations for deaf patients to improve accessibility.
Social media platforms to encourage diverse, barrier-free communication.
Future Enhancements
Integration of deep learning models (e.g., Convolutional Neural Networks, Recurrent Neural Networks) for higher accuracy.
Support for dynamic (continuous) sign language gestures and full-sentence recognition.
Deployment of mobile applications with optimized performance for on-the-go use cases.
Automatic speech-to-sign and sign-to-speech translation for full-duplex communication.
Conclusion
This project successfully demonstrates the feasibility of integrating real-time sign language recognition into video conferencing systems. By combining AI-powered gesture recognition, efficient machine learning models, and robust real-time communication protocols, the system provides a highly functional, scalable, and socially impactful solution. This work lays the foundation for more inclusive digital environments and promotes accessible technology for all.