Page 27 - Winter Issue 2018
P. 27

2016). A European project, the “Spoken CALL Shared Task” combination of ASR, natural language processing, and other
aur et ., 2017 , 0 ers ani ustrationo ow on ine eva u- orms o arti ci inte i ence to i e t e stu ent rou
(B al ) ff ‘ll ’ fh l’ l f f ‘fi ‘al ’ ll’g gu’d h d th gh
ation and feedback may operate in the future. The competi- language lessons in naturalistic settings.

tive task was based on data collected from a speech—enabled _ _
. . . Automatic Language Translation
online tool used to help young Swiss German teens practice _

. . . . _ A Defense Advanced Research Pro]ects Agency (DARPA)-
Sl(1llS in English conversation. Items were prompt response _

. . . funded proiect, TransTac (Bach et al., 2007), was an early,
Pans’ where the prompt ls a Plece of German text and the albeit limited attem t to rovide automatic lan ua e trans-
response is a recorded English audio file. The task was to _ _ ’ P P _ g g _

a » (c . ,, lation in a handheld box (for de lo ent in the Middle
P Yrn
accept or re]ect responses that may or may not be gram- E Am th 1 ff d I _ A b_ d
matically and linguistically correct. The task involved more aSt_)' ong e anguages 9 ere were raql fa 1c’ an
. . . . . Dari. Waverly Labs sells the P1lOtTM, an earbud—enabled app
than conventional ASR because it also involved the ability _ _ _ _
. . . . that performs simultaneous translation in near real time for
to discern semantically and grammatically appropriate re- _ _
. . . . over a dozen languages. Google Translate offers the ability
sponses using natural language processing. The winning en- 1 f 1 th A h 1
try (from the University of Manchester, UK) used an ASR to trans ate mm one fmguage to fmo er" m(_mg t e an-
. . guages offered for paired translation are English, French,
system trained with DNNs. G It 1_ P t R _ d S _ h G
A h t dpff t h h d b th “V t al L alerman, da ian, or uguese, uisian, an panilsl . oogle
s°rneW a 1 eren aPPr°aC rs use Y e 1r 11 ah‘ so provi es an optica version using a smartp one cam-
guage Tutor» (Wik. 2011); Which is an embodied C011V€fS3' era) that translates signs and other text into one’s native lan-
tronal agent that can he sP°ken t0 and thats in turn: can talk guage. Microsoft has demonstrated simultaneous transla-
back (Via sPeeCh sYntheSiS) '50 the Student The agent grndess tion between English and Mandarin Chinese powered by a
ene°nrages> and Pr°Vrdes feedback r°r rnastering a roreign DNN that can meld the speaker’s voice characteristics with
language (iI1itia11Y) 5W€diSh)- the translated speech. These applications are not especially
useful (yet) because they lack the semantic precision and
The Future Of CAI-L emotional nuance emblematic of human communication, so
Several trends in langnagedearning s°rtWare are Worth not‘ are best reserved for simple scenarios such as grocery shop-
ing. Most will likely be enabled through some form of deep Ping and sightseeing
learning, among which are the following:
Speech Synthesis
Tltilzlaip FluentU uses real world video containing music The quality and naturalness of Speech Synthesis has greatly

' ’ improved, largely due to the ability of DNNs to simulate
Eda)’ _n:°vie trailer’ 36:75’ and i‘11SPiri_ng tlalks andlltums voices with realism. Baidu’s Deep Voice (Arik et al., 2017).

em In 0 Persona 126 anguage‘ eammg eSS°nS' mg°' Amazon’s Polly, Microsoft’s Cortana, and Google’s Cloud
Arcade, Mindsnacks, and DigitalDialects are just a few of TeXt_t0_Speech (TTS) applications all use DNNS. Google 0f_
thetdrnéfireaflrtesttgfr rearninglaroriigtn rartrgnagse “sting S(i:(‘)i11:; fers TTS in a dozen languages. Deepmind’s Wavenet (van
ma en ’ W1 In a game" ase 5 mm me‘ u e 3' den Oord et al., 2017) offers highly realistic synthesis for
illustrate several ways to “gamify” dialogue learning for lan- English and Japanese in multiple voices
guage learning.

V_ IL L _ Voice Conversion

um.“ _anguage eammg Speech synthesis has improved to the point where it is now
Apphcanons Such as ImmerSeMe and Mondlr Place the Stu‘ possible to transform or meld the voice characteristics of
dent in simulated, real-life scenarios, such as a bakery or res- . . . . . . . .

_ _ _ one talker into another while preserving intelligibility. Cur-
taumm’ where language skills can be practlced In an engag- rent state—of-the-art systems (Toda et al 2016) use a special-
iniwg/' Infthfiie a£PS’ASReva1uateSthe Students responses PurPose Vocoder (e,g., STRAIGHT, Kawahara et al., 1999;
an 0 ers ee ac ' WORLD, Morise et al., 2016) as the synthesis engine. Two
Intelligent Language Tutors of the more advanced voice conversion systems use DNNs,
Applications such as Duolingo are starting to use “chatbots” which include long short-term memory (LSTM)-based re-
to interact with students on a variety of topics to enhance current neural networks (Sun et al., 2015) or sequence-to-
vocabulary and grammar skills. These bots are driven by a sequence learning (Miyoshi et al., 2017).

Winter 2018 | Acnustics Thday | E5





























   25   26   27   28   29