Pages: 35-46
Subham Pandey, Sumaiya Tahseen, Rohit Pathak, Hina Parveen, Maruti Maurya
This work proposes a vision-based approach to real-time sign language translation for Indian Sign Language (ISL). The system uses state-of-the-art deep learning architectures such as CNN (Convolutional Neural Networks), LSTM (Long Short-Term Memory) networks, and Transformer-based encoder-decoder models for gesture recognition in both isolated and continuous forms. Data preprocessing techniques such as DTW (Dynamic Time Warping) were applied to augment and normalize gesture sequences from custom ISL and public ASL datasets. The model performance was quantitatively evaluated using precision, recall, F1-score, BLEU, ROUGE, CER(character error rate) and WER (word error rate). A Transformer-based model outperformed the achieving a BLEU score of 0.74 and a classification accuracy of 96.1%. The developed desktop application enables real-time ISL-to-English translation at 18 FPS without requiring external sensors, while ablation studies validate the benefits of multimodal fusion and pose-language alignment. This work demonstrates a robust, scalable approach to non-intrusive sign language translation, advancing accessibility for the DHH community.
Transformer-based Encoder-Decoder, Spatiotemporal Gesture Modeling, Indian Sign Language (ISL), Convolutional Neural Networks (CNN), Long Short-Term Memory (LSTM), Dynamic Time Warping (DTW), Real-time Sign Language Translation
© kvscsjournal.org . All Rights Reserved.