Winter2018

Page 28 - Winter2018

P. 28

Deep Language Learning
Brain Stimulation
Neurotechnology may play a role in foreign language cur- ricula of the future. A $12 million DARPA grant to Johns Hopkins University (Baltimore, MD) and collaborating institutions explores whether the ability to learn a foreign language can be enhanced through modulating the activa- tion of relevant parts of the auditory and speech areas of the brain through electrical stimulation of the vagus nerve (e.g., Engineer et al., 2015).
Brave New Language-Learning World
DNN-powered speech technology is likely to play an in- creasingly prominent role in language-learning curricula. As computational power increases and costs diminish, sim- ulation technology will enable a student to inhabit a virtual language world for hours on end. This is likely the future of language instruction, for there is no better way to learn a for- eign tongue than to reside in a community where it is spo- ken. Will it matter that the language community exists only virtually? Virtual reality gaming devices, such as the Oculus RiftTM, will only improve over time, enhancing their educa- tional potential. Indeed, language learning could become a “killer app” for educational VR. Stay tuned.
References
Arık, S. O., Chrzanowski, M., Coates, A., Diamos, G., Gibiansky, A., Kang, Y., Li, X., Miller, J., Ng, A., Raiman, J., Sengupta, S., and Shoeybi, M. (2017). Deep voice: Real-time neural text-to-speech. Proceedings of Ma- chine Learning Research, 34th International Conference on Machine Learn- ing, Sydney, Australia, August 6-11, 2017, vol. 70, pp. 195-204.
Bach, N., Eck, M., Charoenpornsawat, P., Köhler, T., Stüker, S., Nguyen, T., Hsiao, R., Waibel, A., Vogel, S., Schultz, T., and Black, A. W. (2007) The CMU TransTac 2007 eyes-free and hands-free two-way speech-to-speech translation system. Proceedings of the International Workshop on Spoken Language Translation 7.
Baur, C., Chua, C., Gerlach, J., Rayner, M., Russell, M., Strik, H., and Wei, X. (2017) Overview of the 2017 spoken call shared task. Proceedings of the 7th International Speech Communication Association Workshop on Speech and Language Technology in Education, Stockholm, Sweden, August 25- 26, 2017, pp. 71-78. https://doi.org/10.21437/SLaTE.2017-13.
Bax, S. (2003). CALL—Past, present and future. System 31, 13-28. Bernstein, J., and Cheng, J. (2007). Logic and validation of fully automatic spoken English test. In Holland, M., and Fisher, F. P. (Eds.), The Path of Speech Technologies in Computer Assisted Language Learning: From Re-
search Toward Practice. Routledge, Florence, KY, pp. 174-194.
Bishop, C. (2006). Pattern Recognition and Machine Learning. Springer-
Verlag, New York.
Chang, S., Shastri, L., and Greenberg, S. (2000) Automatic phonetic tran-
scription of spontaneous speech (American English). Proceedings of the 6th International Conference on Spoken Language Processing, Beijing, Chi- na, October 16-20, 2000, vol. 4, pp. 330-333.
Chapelle, C. A., and Sauro, S. (Eds.). (2017). The Handbook of Technology and Second Language Teaching and Learning. Wiley-Blackwell, Hoboken, NJ.
Chelba, C., and Jelinek, F. (2000). Structured language modeling. Computer Speech and Language 14, 283-332.
Chorowski, J., Bahdanau, D., Serdyuk, D., Kyunghyun, C., and Bengio, Y. (2015). Attention-based models for speech recognition. Proceedings of the 28th International Conference on Neural Information Processing Systems, Montreal, QC, Canada, December 7-12, 2015, vol. 1, pp. 577-585.
Cole, R. A., Fanty, M., Noel, M., and Lander, T. (1994). Telephone speech corpus development at CSLU. Proceedings of the Third International Con- ference on Spoken Language Processing (ICSLP1994), Yokohama, Japan, September 18-22, 1994, pp. 1815-1818.
Davis, S. B., and Mermelstein, P. (1980). Comparison of parametric rep- resentations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics Speech and Signal Processing 28, 357-364.
Engineer, C. T., Engineer, N. D., Riley, J. R., Seale, J. D., and Kilgard, M. P. (2015). Pairing speech sounds with vagus nerve stimulation drives stimu- lus-specific cortical plasticity. Brain Stimulation 8, 637-644.
Eskenazi, M. (2009). An overview of spoken language technology for edu- cation. Speech Communication 51, 832-844.
Franco, H., Neumeyer, L., Ramos, M., and Bratt, H. (1999). Automatic de- tection of phone-level mispronunciation for language learning. Proceed- ings of the 6th European Conference on Speech Communication and Tech- nology (EUROSPEECH’99), Budapest, Hungary, September 5-9, 1999, pp. 851-854.
Furui, S. (1986). On the role of spectral transition for speech perception. The Journal of the Acoustical Society of America 80, 1016-1025.
Goodfellow, I., Courville, A., and Bengio, Y. (2016). Deep Learning. MIT Press, Cambridge, MA.
Greenberg, S. (1999). Speaking in shorthand — A syllable-centric perspec- tive for understanding pronunciation variation. Speech Communication 29, 159-176.
Greenberg, S., and Chang, S. (2000). Linguistic dissection of switchboard- corpus automatic speech recognition systems. International Speech Com- munication Association Workshop on Automatic Speech Recognition: Chal- lenges for the New Millennium, Paris, France, September 18-20, 2000, pp. 195-202.
James, G., Witten, D., Hastie, T., and Tibshirani, R. (2013). An Introduction to Statistical Learning. Springer-Verlag, New York, p. 204.
Jolliffe I. T. (2002). Principal Component Analysis, 2nd ed. Springer-Verlag, New York.
Kawahara, H., Masuda-Katsuse, I., and de Cheveigne, A. (1999). Restructur- ing speech representations using a pitch-adaptive time frequency smooth- ing and an instantaneous-frequency-based f0 extraction: Possible role of a repetitive structure in sounds. Speech Communication 27, 187-207.
Lee, A. (2016). Language-Independent Methods for Computer-Assisted Pro- nunciation Training. PhD Thesis, Massachusetts Institute of Technology, Cambridge, MA.
Lee, A., and Glass, J. (2013). Pronunciation assessment via a comparison- based system. Proceedings of Speech and Language Technology in Educa- tion (SLaTE 2013), Grenoble, France, August 30 to September 1, 2013, pp. 122-126.
Lee, A., and Glass, J. (2015). Mispronunciation detection without nonna- tive training data. Proceedings of the 16th Annual Conference of the Inter- national Speech Communication Association (Interspeech 2015), Dresden, Germany, September 6-10, 2015, pp. 643-647.
Li, J., Cheng, K., Wang, S., Morstatter, F., Trevino, R. P., Tang, J., and Huan, L. (2018) Feature selection: A data perspective. Association for Computing Machinery Computing Surveys 50(6), 94.
Liou, C.-Y., Cheng, W.-C., Liou, J.-W., and Liou, D.-R. (2014). Autoencoder for words. Neurocomputing 139, 84-96.
26 | Acoustics Today | Winter 2018

26 27 28 29 30