Improving Speech Recognition Accuracy with Deep Learning Models

نویسندگان

  • Taraneh Ranjbar Department of Computer Science, Tarbiat Modares University نویسنده

کلمات کلیدی:

Speech recognition, deep learning, neural networks, acoustic modeling, language modeling, feature extraction, accuracy improvement

چکیده

The field of speech recognition has undergone substantial advancements with the advent of deep learning methodologies, yet challenges persist in achieving high accuracy across diverse acoustic environments and languages. This study examines the application of deep learning models to enhance speech recognition accuracy, focusing on the integration of advanced neural network architectures and innovative training techniques. By leveraging large-scale datasets and employing transfer learning, our approach adapts to various linguistic nuances and acoustic conditions, thereby improving robustness and precision.

 

We introduce a hybrid model incorporating convolutional neural networks (CNNs) and recurrent neural networks (RNNs), specifically designed to capture temporal dependencies and spatial hierarchies inherent in speech signals. This model architecture is augmented with attention mechanisms, which selectively focus on pertinent features, enhancing the model's ability to generalize across different speakers and dialects. Additionally, the implementation of data augmentation and noise-injection strategies during training further bolsters the model's resilience to environmental variations.

 

Our experimental results, derived from benchmark datasets, demonstrate a significant reduction in word error rates (WER) compared to traditional speech recognition systems. The proposed model consistently outperforms baseline models across multiple metrics, highlighting its efficacy in real-world scenarios where speech recognition systems must operate reliably under suboptimal conditions. Furthermore, the findings underscore the importance of model interpretability, as the attention mechanism unveils insights into feature importance and model decision processes.

 

In conclusion, this research contributes a novel deep learning framework that substantially enhances speech recognition accuracy. The integration of CNNs, RNNs, and attention mechanisms, coupled with rigorous training protocols, presents a compelling solution to the challenges of modern speech recognition tasks. This approach sets the stage for future explorations into more adaptive and context-aware speech recognition technologies, fostering advancements in human-computer interaction.

چاپ شده

2023-12-31

شماره

نوع مقاله

Articles

ارجاع به مقاله

Improving Speech Recognition Accuracy with Deep Learning Models. (2023). International Journal of Advanced Human Computer Interaction, 1(1). https://www.ijahci.com/index.php/ijahci/article/view/91