Named after Henry Higgins, a phonetics professor of Bernard Shaw’s play Pygmalion, Higgins, the software, promises to offer to millions of users world-wide what Higgins, the professor, offered to Eliza Doolittle – namely, ameliorate accents.
Although closely related to speech-to-text software, Higgins is substantially different. Higgins is speech-to-speech software; it transforms accented speech sounds into speech sounds that sound as native English.
Notably, Higgins achieves this without attempting to recognize relationships between sounds and written words. When Higgins hears a word it does not know what the word means, but it can repeat it almost without accent.
Higgins is based on sophisticated machine learning (ML) algorithms. As the name tells it, the software analyzes huge sets of data – called training corpora in the industry parlance – and deduces millions of statistical dependencies. In a sense, it learns the rules of the game, by example.
Machine learning has been extensively applied to analyzing and processing written text. E.g., famous Google’s “Did you mean?” spelling corrector is based on this approach. When analyzing written text, a ML system would be fed millions of random documents written in a specific language. It would then quickly learn that certain sequences of letters occur much more often than others. E.g., it would learn that Pigmalion almost never occurs, while Pygmalion is quite common. The spelling corrector would then suggest a user who typed Pigmalion to change it to Pygmalion. Notably, it will do it without consulting Webster dictionary and knowing anything about what Pygmalion means.
Essentially, Higgins Accent Improvement System does to sounds what Google spellchecker does to written words. Before becoming operational, Higgins is trained on a huge corpus of digital recordings made by native English speakers, as well as by people with accents. It learns that certain short sound bites – lasting about 1 second and typically representing one syllable – occur very often while others almost never. When training is completed, Higgins is ready to correct pronunciation.
This unique approach has a number of tremendous advantages. Firstly, though only American English – or “CNN English” – is currently supported, it can be trained to correct accents in almost any other language. All is needed is to feed Higgins hours of recordings made by native speakers of French, German, Finnish or Swahili.
Secondly, Higgins preserves speaker’s individuality. It does not replace accented word with a recording made by a CNN anchor. It analyzes wave-fronts, as well as wavelet transforms, and slightly reshapes them. An abnormal wave-front is gently pushed towards more common one, using what is known in the industry as simulated annealing algorithms. As a result, male voices remain distinctly male; high pitch remains high; and intonations do not change much either. In most cases, one can still recognize the speaker. It is still Eliza Doolittle, but now she speaks Queen’s English, not Cockney.
Though providing reasonable quality out of the box, Higgins works better if trained on typical accents. Currently, Chinese, Indian, French and Russian are supported.
The drawback of Higgins approach is that to work in almost real-time – currently, the delay is barely noticeable, being similar to average over the Internet conversations – Higgins requires huge computational resources. Currently, there are only two public networks in the world that can deliver adequate throughput: Skype, owned by eBay (NASDAQ: EBAY), with 276 million registered users around the world, and Google Talk, operated by Google, Inc. (NASDAQ: GOOG). Higgins is also licensing its technology to Tandberg of Lysaker,
“We are honored to work with the leaders of the industry, and grateful to Google and Skype for selecting our technology,” – said Ashutosh Gupta, CEO and co-founder of Higgins. “We hope our service will help millions of users around the world to communicate more easily, reducing misunderstanding and increasing productivity of remote meetings.”
“Modern knowledge economy implies unimpeded exchange of information between team members working in remote locations. Strong accents, combined with often poor quality of cellular phone connections, legacy long-distance telephone networks and video conferencing equipment, makes such meetings annoying and unproductive.”
“I flew 4 times to
“Now, with Higgins, it is better than face-to-face,” – continues Petr Kislodrishtchenko, Higgins’s CTO. “When we were Beta-testing our system at a major American corporation, we noticed that workers of Russian, Indian and Chinese origin began scheduling video-conferencing meetings over Tandberg, though all of them were on the same campus, sometimes being only blocks away from each other.”
“Realizing this opportunity, Rhonda, a Motorola’s outsourcing contractor in
However, it is Higgins’s potential to cut on un-needed travel that attracted attention of Al Gore, former Vice President and Nobel Prize winner, who agreed to join Higgins’s Board of Directors, to represent Kleiner Perkins Caufield and Byers, a legendary venture capital firm where he is a Partner, and which made an investment into the company.
“We are excited with Higgins’s potential,” – said Al Gore. “Not only will it help companies world-wide to save billions of in travel costs, it will help humanity to reduce airline fuel consumption, thus restricting carbon dioxide emissions. Recent NASA climate models indicate that deploying Higgins on Skype and Google Talk alone will result in reducing CO2 emissions by 3.14 billion metric tons over the next 21 years. This, in turn, will translate into lowering the Earth’s temperature by 0.2718C.”