Penerapan Siamese Long Short-Term Memory pada Sistem Verifikasi Bacaan Berbasis Suara
DOI:
https://doi.org/10.29407/ewwjye08Keywords:
LSTM, MFCC, Siamese Network, Verifikasi Bacaan, Kemiripan SuaraAbstract
Verifikasi bacaan berbasis suara merupakan pendekatan yang dapat digunakan untuk membantu proses evaluasi bacaan secara otomatis, terutama pada aplikasi yang memerlukan pembandingan bacaan tanpa bergantung pada transkripsi teks. Pendekatan ini penting karena mampu memberikan penilaian yang lebih objektif dan efisien dibandingkan evaluasi manual. Penelitian ini membahas penerapan metode Siamese Long Short-Term Memory (LSTM) pada sistem verifikasi bacaan berbasis suara sebagai implementasi awal. Sistem dirancang melalui beberapa tahapan, yaitu praproses sinyal suara, ekstraksi fitur menggunakan Mel-Frequency Cepstral Coefficients (MFCC) dan pitch, pemodelan dengan arsitektur Siamese LSTM, serta pengukuran kemiripan menggunakan Euclidean Distance. Dua masukan suara, yaitu suara referensi dan suara uji, diproses secara paralel untuk menghasilkan nilai kemiripan sebagai dasar pengambilan keputusan. Hasil penerapan menunjukkan bahwa metode Siamese LSTM dapat diimplementasikan pada sistem verifikasi bacaan berbasis suara dan mampu membedakan pasangan bacaan dengan tingkat kemiripan yang berbeda. Hasil ini menunjukkan bahwa pendekatan yang digunakan berpotensi menjadi dasar pengembangan sistem verifikasi bacaan berbasis suara yang lebih komprehensif di masa mendatang.
Downloads
References
[1] D. A. Reynolds, “An Overview Of Automatic Speaker Recognition Technology.”
[2] T. Kinnunen and H. Li, “An Overview of Text-Independent Speak er Recognition,” no. August, 2009, doi: 10.1016/j.specom.2009.08.009.
[3] L. R. and B. Juang, “fundamental of speech recognition.”
[4] S. H. and J. Schmidhuber, “Long short-term memory,” vol. 9, no. 8, pp. 1–32, 1997.
[5] and P. F. Y. Bengio, P. Simard, “Learning long-term dependencies with gradient descent is difficult.pdf.”
[6] and R. S. A. Bromley, I. Guyon, Y. LeCun, E. Säckinger, “SIGNATURE VERIFICATION USING SIAMESE NETWORK BASED ON ONE-SHOT LEARNING,” vol. 3, no. 3, pp. 248–260, 2021, doi: 10.47933/ijeir.972796.
[7] J. Chung, “Empirical evaluation of gated recurrent neural networks on sequence modeling,” pp. 1–9.
[8] S. D. and P. Mermelstein, “Comparison of parametric representations for monosyllabic word recognition”.
[9] M. S. and G. Saha, “Design , analysis and experimental evaluation of block based transformation in MFCC computation for speaker recognition,” vol. 54, pp. 2011–2013, 2012, doi: 10.1016/j.specom.2011.11.004.
[10] H. Hermansky, “Perceptual linear predictive PLP analysi,” vol. 87, no. 4, 1990.
[11] N. Dehak, P. Kenny, P. Dumouchel, and P. Ouellet, “Front-End Factor Analysis For Speaker Verification,” pp. 1–11.
Downloads
Published
Issue
Section
License
Copyright (c) 2026 Muhamad Nur Sabani

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Copyright on any article is retained by the author(s).
- The author grants the journal, right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgment of the work’s authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal’s published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.
- The article and any associated published material is distributed under the Creative Commons Attribution-ShareAlike 4.0 International License





