SAN JOSE, Calif. — China Web giant Baidu will make available what it claims are three of the largest data sets related to Chinese voice recognition in an effort to attract developers. Its Project Prometheus also includes $1 million dollar fund to invest in efforts related to voice and machine learning.
The initiative is part of DuerOS, Baidu’s platform for natural-language services. Earlier this year, the Web giant, known as the Google of China, formally launched DuerOS and a variety of third-party products using it.
Baidu will gradually open three large datasets, one in far-field wake word detection, one in far-field speech recognition and one in what it calls multi-turn conversations. The data can be used to train new smart voice systems or services.
The wake-word data consists of about 500,000 voice clips of five to ten popular Chinese wake words. It includes the wake word to activate DuerOS devices, “xiaodu xiaodu.”
The speech recognition datasets will include thousands of hours of spoken Mandarin. The third data set is made up of thousands of dialogues across ten domains DuerOS currently serves.
Web giants such as Baidu typically guard the large datasets they accumulate because they are seen as part of their strategic advantage. Baidu’s goal is to enable many small groups to use the data to expand Baidu’s offerings and drive the technology ahead.
“In the age of AI, data is the new oil,” said Guoguo Chen, Baidu’s principal architect for DuerOS, speaking in a press statement.
Even giants such as Amazon and Google do not yet support Chinese in their Alexa and Google Assistant products today, in part, due to the complexity of the language.
Interestingly, Baidu invited Björn Hoffmeister, senior manager of Amazon Machine Learning, to speak about the field at an event in Silicon Valley today where Baidu launched Prometheus. Baidu is taking a page from Facebook which has tried to spawn open source work among partners to gain leverage over larger rivals.
Under Project Prometheus, Baidu will work with universities and other researchers to conduct joint training, course design and workshops. The effort is geared to attract talent to the field as well as make Baidu a center of technical work in the area.
Baidu claims more than 100 branded devices from refrigerators and air conditioners to TV set-top boxes and smart speakers currently use its DuerOS.
— Rick Merritt, Silicon Valley Bureau Chief, EE Times