This study addresses the challenges of low accuracy and high computational demands in Tibetan speech recognition by investigating the application of end-to-end networks. We propose a decoding strategy that integrates Connectionist Temporal Classification (CTC) and Attention mechanisms. capitalizing on the benefits of automatic alignment and attention weight extraction. The Conformer a... https://www.lolasalinas.com/