PORTLAND, Ore. — Apple's Siri started the trend, followed by Google's Now and Microsoft's Cortana, but they all are just copy-cats of a flawed original, according to the University of Michigan (Ann Arbor). Yes, they all answer verbal questions about things you can look up on the Internet, but the the University of Michigan's free open-source Sirius one-ups all three by doing the things that they can't, plus allows users to customize it to do things that none have ever done before.
"We've put together the best of the best open-source algorithms — in many cases from the same sources as the others — plus added capabilities that Siri, Now and Cortana will have to add to keep up with Sirius," Jason Mars, U-M assistant professor of computer science and engineering and co-director of Clarity Lab where Sirius was developed, told EE Times.
For example, with Sirius you can take a snapshot of a building, monument, animal — almost anything — and then ask questions about it, such as what are its operating hours (for a restaurant), or when was this built (for a monument) or what is its natural diet (for an animal). Siri, Now and Cortana have access to all the same databases as Sirius, but just have not worked as hard at putting them together to answer queries that require using more than one at a time.
Video: Programing whiz kid at University of Michigan (Ann Arbor), professor Jason Mars (above), explains how he grafted together all the best open-source software components to make Sirius better than Siri, Now and Cortana. (Source: University of Michigan)
Mars — together with professor Lingjia Tang and co-director of Clarity Lab, along with doctoral candidates Johann Hauswald and Yiping Kang — have worked hard to one-up Siri, Now and Cortana, but not to go into competition with them. Actually, their original motivation was to investigate what types of resources will be required of cloud services in the future. In order to test their hypotheses, they needed and application from-the-future. Since those aren't readily available — by definition — they had to create one. After investigating the open-source Unix resources available, they settled on creating a better Siri, thus the name Sirius.
Professors Lingjia Tang and Jason Mars with doctoral candidates Johann Hauswald and Yiping Kang enjoying their success with the Sirius project.
(Source: Universityn of Michigan)
"The project has been so successful, that now we are not only still investigating what resources the servers of the future will need — since most of Siri's, Now's and Cortana's services are performed in cloud servers — but we are also investigating how the quad- and octal-core processors in handhelds can help off-load some of the workload from the servers."
Like Frankenstein, Sirius was stitched together from pieces found here and there in the Unix community with the server side running on any Unix based machine — from cheap $200 Linux white boxes to a medium-priced $500 Mac-minis to expensive $3,000 Apple Mac Pros. Since the interface is web-based any smartphone can access Sirius from your own personal cloud server (or from professional cloud server that downloads the free Sirius code.)
Sirius can work just like Google's Now using text queries or spoken queries.
(Source: Universityn of Michigan)
For speech recognition the researchers grafted pieces of Sirius on the server side including Carnegie Mellon University's Sphinx, Microsoft Research's Kaldi and Germany's RWTH Aachen "RASR." For the question-and-answer engine the researchers grafted on OpenEphyra — the predecessor of IBM's Watson. For image recognition the researchers grafted on SURF created by the Swiss company Kooaba (recently acquired by Qualcomm.)
Sirius can also answer questions about pictures or videos, telling you what they are, their historical origin and so forth.
(Source: Universityn of Michigan)
Mars ultimate goal is to make sure that servers are ready to offer the user a quick, rich experiences from the limited resources of wearables — by beefing up the cloud servers with the right hardware — as well as to off-load some of the load from the servers by making better use of the octal- and hex-core processors to be available on the smartphones of the future.
In rough figures, the researchers have calculated using the smartphone to translate the speech into text will increase the workload on servers by 100 times over a simple text queries — since the server has to figure out what the speech-to-text query means. If voice were to become the primary way of framing web searches, the the data-center servers will have to beef up by 165 times. But by adding the right resources to servers — such as GPUs — and by programming smartphones more smartly, that server burden can be reduced. The researchers napkin calculations estimate with GPUs as standard equipment servers could be sped up by 10 times and by adding FPGA's they could be sped up by 16 times.
The coolest result found by the Mar's research group was that personalized open-source digital assistants can be easily crafted by using the APIs included with Sirius for smartphones and wearables, regardless of the OS. By adding what Mars' calls "fancy algorithms" every user should be able to customize their Sirius to perform optimally for their particular applications.
Sirius will be announced on March 14th at the International Conference on Architectural Support for Programming Languages and Operating Systems (Istanbul, Turkey) in a paper titled "Sirius: An Open End-to-End Voice and Vision Personal Assistant and Its Implications for Future Warehouse Scale Computers." That same day Sirius will become a free download that yields voice access to Wikipedia, but can be customized to access any database. In fact, the researchers are working with IBM to create a compatible database for academic advising.
You will also be able to download the paper and all the details about Sirius on the same website.
— R. Colin Johnson, Advanced Technology Editor, EE Times
aplumb: I agree. I think it began as a Linux-head project--the guys who are notoriously anti-commercial, which is why it is free. The name could be a problem for a commerdial product, but I'm not sure it will be if there is no money being made for which to sue. But as you mentioned, I'm not a lawyer. Thanks for the comment.
TanjB: Sirius will be available for download tomorrow after the paper is presented and needs to be installed on a Unix-based server. Demos will be available at the conference, but its in Istanbul, Turkey.
While it's great to see more open source options like this, I *really* wish they had chosen a different/unique name. Inspired by Siri(us) perhaps, but not derived from it. Knowing how litigious Apple can get protecting its iStuff trademarks and associated IP, I would think thrice before incorporating a project called Sirius into a commercial product.
Insert 'I Am Not A Lawyer' Disclaimer Here, but I am a member of the Open Source Hardware Association because I care about this sort of thing. :-)
You can ask those commercial systems what the natural diet of a cat is (using voice) and get good answers. Claiming they don't is a red flag for hype.
There is a big step from using a known cat picture and asking a question well suited for Wikipedia when using WP as the database, versus recognising a cat in a freely taken picture and asking about it in a supermarket when connected to an agent like Cortana which has access to thousands of times more sources than just WP.
It is good to see this stuff being taught and algorithms in the public domain. It is also good to see academia getting into the swim of understanding what cloud computing involves on the hardware side (although you can go to Open Compute and learn a lot more).
There is a lot more to scaling this up to beat the commercial systems. It involves some hard work and very subtle challenges due to scale, ambiguities, context, and intent. Nice work which can stand on its own merits, a shame they feel they need to claim they are giant killers when they are still ankle biters.