Deep learning – The day language technologies became a Christmas present

It is said the third Monday every January is the saddest day in the year. It does not take deep learning to feel so. A long vacation period has ended. No sight of another one until several months away. Overspent, overstuffed, with no more presents to exchange, with winter settling in the Northern hemisphere and missing the drinks and chocolates that made our sugar levels go sky high, many start booking holidays in the sun. Let’s turn the clocks back to Christmas and we will remember the last few weeks as the Christmas when language technologies made it to the top of the list. Millions of people, literally, have opened boxes whose content was an electronic assistant with a rapidly improving ability to use human language. There are two main products: Amazon’s Echo, featuring the Alexa digital assistant, which sold more than 5m units. In essence, Echo is a desktop computer but in the shape of a cylinder. There is no keyboard, no mouse, no monitor, no interface – just voice.

A picture of what Amazon's Echo looks like, a black cylinder with speakers

Amazon – Echo’s Alexa

Google Home can do quite a few things Echo can’t. For example contextual conversations or sending images and videos to your TV set and it seems pretty good at playing music. Back in March, Amazon released its own API for Alexa Voice Service (AVS). That is the service which powers the Amazon Echo, Echo Dot and Amazon Tap. The move means that you can use Amazon’s Alexa in third-party hardware. Amazon’s final goal is to sell hardware and it has become very good at doing so. Google has a long history of glasses, cars and collateral products that have been discontinued. Although the best artificial intelligence engineers favor Google over most companies, the battle out there is becoming a war for talent. Amazon acquired ailing machine translation firm Safaba late 2015 not to add the company’s technology but mainly to add chief scientist Alon Lavie and create Amazon’s own MT and R&D department.

Back to Alexa. Try asking it for the weather (typical easy question), play music, to order a taxi, to tell you about your commute from home to work, and even to pick a joke for you, and Alexa will comply.

These two have made the first technology, Apple’s Siri a second choice (my experience with Sierra’s voice control on a laptop is far from satisfactory) and Microsoft’s Cortana a minority choice (yet).

List of things Siri says it can do

Siri says it can do a lot of things but….

Siri cannot understand "Find Echo Alexa photos on Internet"

it cannot understand “Find Echo Alexa photos on Internet”

 

 

 

 

 

 

 

 

 

 

So how did computers became so clever at interpreting language and tackling the problems of human language? More importantly, will the problem of communication accross languages then a thing of the past?

How did machine learning became so clever in language technologies?

For many years, the basic idea was to code rules into software. “IF” this happend, “THEN” something else had to happen. In translation, this meant building a list of grammar rules that analyzed the source language better of worse, and another set of grammar rules for reproducing the meaning in the target language. We will know that the initial optimism in the 1950s led to the now infamous (and flawed) ALPAC Report which put a stop to research in machine translation for decades. Human-language technologies disappeared from the research scene for decades until the availability of bilingual data sets renewed the interest on it (academically) in the 1990′s – but now within the realm of pattern-recognition and machine learning.

Nowadays, most automated translation systems are based on some form or another of statistics. Practically all incorporate self-learning or updating routines (what we once called DIY MT) so that translation engines can improve gradually with new data.

In the case of speech recognition (a very related science, also heavily dependant on pattern recognition), the software is fed sound files on the one hand. By matching these to approved, human-written transcriptions on the other, a pattern of equivalences is established. The system learns thus to “predict” with a high degree of probability which sound should result in a particular.

This is not remote from machine translation and hence the ultimate goal of the “speech-to-speech” translator. Machine Translation or Automated Translation, as it is known lately, gathers parallel bitexts that have been previously translated by humans. Algorithms detect the frequency of some string of words (or n-grams) between the two languages. The closer the languages, the better the resulting translation, as less re-ordering will be required. There are other features to improve the final output, obviously. Having large monolingual samples to create a “language model” will help smooth out the final result to a certain degree as the systems’ statistical guesswork is narrowed down noticeably. However, the last ten years of SMT have produced enough engines, metrics and examples to know that languages that are not even remotely related grammatically or that are highly inflected (Slavic languages like Russian or Polish, Baltic languages) perform worse with pure statistics as cases impact statistics negatively. Some companies have been able to “beat Google” as the benchmark and mother of all machine translation services. PangeaMT beat Google in English-Korean, for example.

The statistical approach has been made possible thanks to the huge improvements in computing. Most of us carry a smartphone in our pockets and that means several times over all the technology our grandparents ever handled in the whole of their life time. The huge availability of data that makes customization possible at huge speeds is another factor. Lastly, the latest buzzword: deep learning. Google introduced neural machine translation in 9 language pairs in November 2016, following a tested improvement in translations from Chinese to English. The nine languages are a mixed bag of “easy transfer” (English, Spanish, Portuguese, French) somewhat difficult (German, Chinese), and known hard nuts to crack like Japanese, Korean and Turkish.

What is deep learning?

Answering this question would require a full article. In short, let’s imagine that you lean over the window and look at the sky. It is cloudy, grey. It is likely to rain and although you have the tickets, only live 5 minutes from public transport and your best friend has said she’ll come with you to the concert, you decide you won’t go. This decision has taken you milliseconds. For you, the possibility of getting wet on the way to or from the concert weighs over all other factors (your friend coming with you, having tickers, using only public transport downtown). You use powerful neurons to think and take a fairly logical decision: “I have already spent money on tickets, my friend is coming and the concert is just a short ride away, but I hate the rain, and I don’t fancy getting wet at all in the middle of winter”. What seems so logical to a human, isn’t necessarily so to an algorithm (let’s call it machine). Teaching it to take several factors into consideration, and adding an element of uncertainty by weighing some over others sometimes is close to what users see as speech recognition or machine translation magic.

A neural system uses several layers of digital “neurons” and connections (i.e a digital neural network) between them which might ressemble the image we have of our own neurons at work. They require a lot of GPU (not CPU) computing power and are becoming extremely good at learning from the examples we, humans, provide. The disadvantage compared to purely statistical or hybrid systems are the huge computing power required to train and retrain the models. This cannot happen at most company servers. Therefore, Echo and Google Home use their software as a catching tool and calcualtions are performed at their own centers. Data is transferred to their system and, potentially, the way the user uses the sytem. The trade-off is that the system improves with usage and it becomes more and more reliable and useful.

The implications with Big Brother watching and privacy are clear. However, if the use of online email, social media and smartphones is anything to go by, Amazon and Google are going to help us  a lot and become more intrusive and yet indispensable in our daily lives very soon.


Reference

  • Can amazon echo beat google in the long run: http://www.forbes.com/sites/quora/2016/10/24/can-amazon-echo-beat-google-in-the-long-run/
  • 8 things Alexa can do that Google’s Assistant can’t: https://www.cnet.com/how-to/things-alexa-can-do-that-google-home-cant/
  • 9 things Google Home can do that Alexa can’t: https://www.cnet.com/how-to/amazon-echo-alexa-vs-google-home/

Leave a Reply

Your email address will not be published. Required fields are marked *


nine − 8 =

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>