International relations

Speech-to-speech translation opens a world of communication

Jacqui Griffiths
17 May 2015

5 min read

Technology that translates one spoken language into another, known as speech-to-speech translation, could transform the way we communicate. Recent technology advances are helping researchers make rapid progress toward tools that will enable us to communicate naturally and fluidly, regardless of the languages we speak.

In science fiction, the universal translator that enabled instant communication between different cultures in Star Trek was invented around the year 2150. In reality, we may be ahead of schedule.

At the 2014 Code Conference, an annual media and technology gathering, Microsoft publicly demonstrated its Skype Translator app for the first time. Speaking in English, Gurdeep Singh Pall, corporate vice president of online video chat company Skype, and Microsoft CEO Satya Nadella conversed with German-speaking Microsoft employee Diana Heinrichs; the app translated their conversation in real time.

Such demonstrations have captured the public’s imagination, fueling research teams worldwide to develop translation tools that will help us understand each other better, regardless of the languages we speak. Ultimately, such tools could erase many of the challenges of international travel, global commerce and inter-cultural communications, eliminating barriers while protecting cultural differences.

PRESENT IMPERFECT

The latest speech-to-speech tools combine numerous technologies, including new neural-network learning methods patterned after human-brain behavior that significantly improves upon previous efforts.

Accurate, real-time speech-to-speech translation remains a work in progress, however. Even with a reduced rate of errors – the average has fallen from about 20% incorrect words in 2010 to about 12% in 2013 – computers are not yet capable of handling all aspects of conversation.

“Current speech-to-speech focuses on fairly literal translations of sentences,” said Sean Colbath, senior scientist at Raytheon BBN Technologies, a Massachusetts-based subsidiary of aerospace-and-defense giant Raytheon that specializes in acoustics, signal processing and related information technology. “It doesn’t recognize memes, context or conversational ambiguity. For instance, it might stall at a name or translate it literally. Or, if you ask when the bus is due and then ask what the fare is, it won’t link the two sentences and understand that you’re asking about the bus fare.”

Still, speech-to-speech technology has made tremendous strides.

INDUSTRY ENTHUSIASM

Speech-translation technologies, which until recently have had only niche applications, are hitting the mainstream and attracting major investors. Facebook, for example, acquired the company behind the Jibbigo speech-translation app; Google introduced speech-to-speech translations for 80 languages as part of Google Translate; and AT&T Labs, the US-based research and development division of multinational telecommunications provider AT&T, is driving research using cloud-based speech recognition, language translation and speech synthesis engines.

“Technologies related to speech-to-speech translation have greatly improved,” said Srinivas Bangalore, a principal member of the technical staff at AT&T Labs. “While error-free translation might never be a reality, pragmatic services with a good user interface design can mitigate those limitations, and services like these are already achieving practical relevance.”

“SPEECH-TO-SPEECH TRANSLATION CAN BRING INTERNATIONAL BUSINESS COMMUNICATION TO THE NEXT LEVEL BY VIRTUALLY REMOVING THE LANGUAGE BARRIER.”

OLIVIER FONTANA
DIRECTOR, PRODUCT MARKETING FOR MICROSOFT/BING TRANSLATOR, MICROSOFT RESEARCH’S MACHINE TRANSLATION GROUP

DEFINED CONVERSATIONS

Today’s speech-to-speech translation works best in situations where the subject of conversation is sufficiently limited for the technology to cope. “Speech-to-speech can’t pick up subtle messages like context, body language or emotions,” said Neil Payne, who recently left his role as marketing director of UK-based translation agency Kwintessential to backpack around Southeast Asia. “But it can have specific uses such as between a doctor and patient, where there are parameters around the subjects discussed.”

Alan Black, a computer scientist and speech synthesis expert at Carnegie Mellon University’s Language Technologies Institute in Pittsburgh, concurs. “At the moment, speech-to-speech translation is most useful in situations where you need to communicate with people who don’t know another language, such as international rescue operations,” he said. “For example, we get refugees from Myanmar. The local medical school provides care for them, but the doctors don’t speak their languages and there are not enough human translators available. Speech-to-speech technology is very useful in such situations.”

Deployment in these limited contexts builds experience that can be used to develop the technology for broader applications.

“We’ve developed speech-to-speech technology for the US military,” Raytheon BBN’s Colbath said. “Our research aims to figure out the science behind making speech-to-speech work. But we’re moving beyond military applications into areas like borders and customs. The conversation here will be broader – travelers may fall ill, be seeking asylum or asking for information – but it still has limits. That’s where we’ll work to bring in more shared context and meaning, to make the conversation flow.”

BREAKING DOWN BARRIERS

As the technology’s parameters expand, so will its ability to transform communication for a wide range of users.

“Speech-to-speech translation can bring international business communication to the next level by virtually removing the language barrier,” said Olivier Fontana, director, product marketing for Microsoft/BING translator for Microsoft Research’s Machine Translation group in Redmond, Washington.

Aaron Davis, computational linguist and former CTO of Lingotek, an automated translation-tool provider based in Lehi, Utah, agrees. “Combined with Web-based, real-time communication technology, speech-to-speech translation could enable interesting applications for international, multi-user video conferences,” he said. “Providing translation or subtitles for people who are more comfortable speaking another language would give you confidence that your message is being communicated accurately.”

Davis believes speech-to-speech technology could also have exciting applications in the entertainment industry. “Video gamers already use audio prompts, but they could be communicating through chat that’s translating to their gaming partners halfway across the world.”

Another promising application: enhancing relationships. “Speech-to-speech translation will open new opportunities for geographically dispersed friends and family to stay connected,” Fontana said. “For example, a grandmother in China could speak to her grandchildren in the UK, even if they don’t share a language.”

CULTURAL CONNECTION

It might seem logical that speech-to-speech translation could reduce interest in learning to speak other languages, but researchers say that doesn’t seem to be the case, at least not yet. “Research around speech-to-speech technology tends to indicate a cultural benefit,” Davis said. “When people are not forced to learn English to communicate, they tend to preserve their culture better, starting with language.”

AT&T Labs’ Bangalore believes speech-to-speech technology will prompt more communication between people of different cultures. “Equipped with translation technologies, people are likely to communicate more with people of different languages, thus broadening their linguistic and cultural horizons,” he said.

Fontana agrees. “Speech-to-speech translation technology will democratize and demystify language learning,” he said. “It will enable non-speakers to communicate with people they would never have been able to communicate with. It will provide a back-up tool that enables new language learners to feel more confident in trying out their skills.”

“WE WILL EVENTUALLY BE ABLE TO SUPPORT CASUAL CONVERSATIONS BECAUSE PEOPLE WANT THAT CAPABILITY AND ARE INVESTING IN DEVELOPING IT.”

ALAN BLACK
COMPUTER SCIENTIST AND SPEECH SYNTHESIS EXPERT, CARNEGIE MELLON UNIVERSITY’S LANGUAGE TECHNOLOGIES INSTITUTE

FUTURE PROGRESSIVE

Although seamless, real-time translation may still be years away, speech-to-speech technology is expanding to support a broader range of interactions.

Still, Davis cautions against a “good enough” approach to the technology’s development. “If we accept some flaws in the translation and deploy the application widely, we’ll reach a plateau because we’re not pushing to perfect it anymore,” he said. “The error rate may be only 10%, but subtle nuances can be lost; that 10% could be vital communication.”

Carnegie Mellon’s Black believes that, as long as momentum is maintained, speech-to-speech technology will develop to meet even more needs and expectations. “As with other artificial intelligence, we’ll keep moving the boundary every time the technology gets better,” he said. “So we’ll never perfect it. But we will eventually be able to support casual conversations, because people want that capability and are investing in developing it.”

See Skype translator demo:
https://www.youtube.com/watch?v=cJIILew6l28

Related resources