UK semantic technology company True Knowledge has released Evi, a mobile app that competes with Siri.
The mobile app is available on the Android Market and on iTunes. You can pose queries to either by speaking or typing. The Android app uses Google’s ASR speech technology and the iTunes app uses Nuance.
True Knowledge has been developing a natural answering question answering system since 2007. You can query the True Knowledge online via a Web interface. Tty the following links for some examples:
The Evi app has a number of additional features beyond the Web-based True Knowledge QA system and these wil probably be expanded on in the months to come.
Here’s a word cloud that visualizes the 200 most significant words extracted from over 400 papers from our research group over the past ten years. Significance was estimated by tf-idf where the idf data is from a collection of newswire articles (thanks Paul!). The word cloud was created with Wordle.
The First Mid-Atlantic Student Colloquium on Speech, Language and Learning is a one-day event to be held at the Johns Hopkins University in Baltimore on Friday, 23 September 2011. Its goal is to bring together students taking computational approaches to speech, language, and learning, so that they can introduce their research to the local student community, give and receive feedback, and engage each other in collaborative discussion. Attendance is open to all and free but space is limited, so online registration is requested by September 16. The program runs from 10:00am to 5:00pm and will include oral presentations, poster sessions, and breakout sessions.
The Mid-Atlantic Student Colloquium on Speech, Language and Learning is a one day, free event bringing together faculty, researchers and students from universities in the Mid-Atlantic area working in Speech/Language/ML. The colloquium is an opportunity for students to present preliminary or completed work and to network with other students, faculty and researchers working in related fields. The event will be held in Baltimore MD at the Johns Hopkins University on Friday 23 September 2011.
Students are encouraged to submit one-page abstracts by Monday, August 15 describing ongoing, planned, or completed research projects, including previously published results and negative results. Student research in any field applying computational methods to any aspect of human language, including speech and learning, from all areas of computer science, linguistics, engineering, neuroscience, information science, and related fields, is welcome. Submissions and presentations must be made by students or postdocs. See the call for papers for more information.
Accepted submissions will be presented as posters and each will also be given a one-minute presentation during a poster spotlight session. A small number of submissions will be selected to be presented as talks, on the basis of diversity and general interest.
Student-led breakout sessions of one hour will also be held to discuss papers on topics of interest and stimulate interaction and discussion. Topics and suggested papers for breakout sessions should be submitted by students alongside abstracts.
Microsoft Research and Bing are jointly hosting the Speller Challenge. The goal is to build the best service that could propose alternative spellings for search queries submitted to Bing. Entries must be submitted for the challenge in the form of a REST-based web service, and they will be judged based on their expected F1 score against a test set sampled from real Bing queries.
For development purposes, they are making available a TREC evaluation dataset through their Web-NGram service. Refer to this page for detailed evaluation measures and REST service specs.
The dataset consists of over 386M blog posts, news articles, classifieds, forum posts and social media content in a month including events such as the Tunisian revolution and the Egyptian protests. The content includes the syndicated text, its original HTML as found on the web, annotations and metadata (e.g., author information, time of publication and source URL), and boilerplate/chrome extracted content. The data is formatted as Spinn3r’s protostreams – an extension to Google protobuffers. It is also broken down by date, content type and language making it easy to work with selected data.
See the ICWSM Data Challenge pages for more information on the challenge task, its associated ICWSM workshop and procedures for data access.
On the eve of the big Jeopardy! match, Peter Norvig’s opinion piece in the New York Post (!) today, The Machine Age looks at AI’s progress over the past sixty years and lays out six surprising lessons we’ve learned.
The things we thought were hard turned out to be easier.
Dealing with uncertainty turned out to be more important than thinking with logical precision.
Learning turned out to be more important than knowing.
Current systems are more likely to be built from examples than from logical rules.
The focus shifted from replacing humans to augmenting them.
The partnership between human and machine is stronger than either one alone.
When took Pat Winston’s undergraduate AI class in 1970, only the first of those ideas was current. It’s a good essay.
Of course, after we we’ve exploited the new data-driven, statistical paradigm for the next decade or so, we’ll probably have to go back to figuring out how to get logic back into the framework.
Recorded Future is a Boston-based startup with backing from Google and In-Q-Tel uses sophisticated linguistic and statistical algorithms to extract time-related information from streams of Web data about entities and events. Their goal is to help their clients to understand how the relationships between entities and events of interest are changing over time and make predictions about the future.
“Conventional search engines like Google use links to rank and connect different Web pages. Recorded Future’s software goes a level deeper by analyzing the content of pages to track the “invisible” connections between people, places, and events described online.
”That makes it possible for me to look for specific patterns, like product releases expected from Apple in the near future, or to identify when a company plans to invest or expand into India,” says Christopher Ahlberg, founder of the Boston-based firm.
A search for information about drug company Merck, for example, generates a timeline showing not only recent news on earnings but also when various drug trials registered with the website clinicaltrials.gov will end in coming years. Another search revealed when various news outlets predict that Facebook will make its initial public offering.
That is done using a constantly updated index of what Ahlberg calls “streaming data,” including news articles, filings with government regulators, Twitter updates, and transcripts from earnings calls or political and economic speeches. Recorded Future uses linguistic algorithms to identify specific types of events, such as product releases, mergers, or natural disasters, the date when those events will happen, and related entities such as people, companies, and countries. The tool can also track the sentiment of news coverage about companies, classifying it as either good or bad.”
Pricing for access to their online services and API starts at $149 a month, but there is a free Futures email alert service through which you can get the results of some standing queries on a daily or weekly basis. You can also explore the capabilities they offer through their page on the 2010 US Senate Races.
“Rather than attempt to predict how the the races will turn out, we have drawn from our database the momentum, best characterized as online buzz, and sentiment, both positive and negative, associated with the coverage of the 29 candidates in 14 interesting races. This dashboard is meant to give the view of a campaign strategist, as it measures how well a campaign has done in getting the media to speak about the candidate, and whether that coverage has been positive, in comparison to the opponent.”
Their blog reveals some insights on the technology they are using and much more about the business opportunities they see. Clearly the company is leveraging named entity recognition, event recognition and sentiment analysis. A short A White Paper on Temporal Analytics has some details on their overall approach.
Training Examples QA is a site created by Joseph Turian where “data geeks ask and answer questions on machine learning, natural language processing, artificial intelligence, text analysis, information retrieval, search, data mining, statistical modeling, and data visualization!”
It’s a close knock off of the popular stack overflow site and appears to be very well done.
If it catches on in the relevant research communities, it could be a very useful resource. (via LingPipe blog)
PCWorld has a story, Google VP Mayer Describes the Perfect Search Engine, with some interesting comments on semantic search from Marissa Mayer, Google’s vice president of Search Products & User Experience.
“IDGNS: What’s the status of semantic search at Google? You have said in the past that through “brute force” — analyzing massive amounts of queries and Web content — Google’s engine can deliver results that make it seem as if it understood things semantically, when it really functions using other algorithmic approaches. Is that still the preferred approach?
Mayer: We believe in building intelligent systems that learn off of data in an automated way, [and then] tuning and refining them. When people talk about semantic search and the semantic Web, they usually mean something that is very manual, with maps of various associations between words and things like that. We think you can get to a much better level of understanding through pattern-matching data, building large-scale systems. That’s how the brain works. That’s why you have all these fuzzy connections, because the brain is constantly processing lots and lots of data all the time.
IDGNS: A couple of years ago or so, some experts were predicting that semantic technology would revolutionize search and blindside Google, but that hasn’t happened. It seems that semantic search efforts have hit a wall, especially because semantic engines are hard to scale.
Mayer: The problem is that language changes. Web pages change. How people express themselves changes. And all those things matter in terms of how well semantic search applies. That’s why it’s better to have an approach that’s based on machine learning and that changes, iterates and responds to the data. That’s a more robust approach. That’s not to say that semantic search has no part in search. It’s just that for us, we really prefer to focus on things that can scale. If we could come up with a semantic search solution that could scale, we would be very excited about that. For now, what we’re seeing is that a lot of our methods approximate the intelligence of semantic search but do it through other means.”
I interpret these comments to mean that Google’s management still views the concept of semantic search (and the Semantic Web) as involving better understanding of the intended meaning of text in documents and queries. The W3C’s web of data model is still not on their radar.
Wolfram|Alpha is an interesting query answering system developed by Wolfram Research that is a blend of a question answering system and a Semantic Web alternative. It tries to interpret and answer queries expressed as a sequence of words from a large collection of interlinked tables. Oh, and Mathematica is in thrown in for free. A free Web version was released last Spring.
The news today is that Wolfram|Alpha has released an API, as noted in their blog:
“The API allows your application to interact with Wolfram|Alpha much like you do on the web—you send a web request with the same query string you would type into Wolfram|Alpha’s query box and you get back the same computed results. It’s just that both are in a form your application can understand. There are plenty of ways to tweak and control the results, as well.”
The pricing plan runs from $60/month for 1000 (6 cents a query) queries to $220K for up to 10M queries/month (2.2 cents a query). programming language bindings are available for Java, PHP, Perl, Python, Ruby and .NET.
Their original web interface remains free, but the TOS specifies that it “may be used only by a human being using a conventional web browser to manually enter queries one at a time.”
HealthBase is a ‘semantic search engine’ for healthcare information that is driven by content mined from “millions of authoritative health sources” including WebMD, Wikipedia, PubMed, and Mayo Clinic’s health site. Techcrunch first described it as the ultimate medical content search engine but then had a follow up article reporting that HealthBase thinks you can get rid of jews with alcohol and salt. Language Log had some more fun exploring HealthBase.
We thought we’d see what HealthBase thought of the Semantic Web and it turns out that if you are experiencing the Semantic Web as a condition there are several recommended treatments.
and as a treatment itself, HealthBase is pretty positive.