Conrad Barski, M.D. will give a talk on “How To Tell Stuff To Your Computer — The Enigmatic Art of Knowledge Representation” at UMBC at 1:00pm on Friday 17 October in Lecture Hall 8 in the ITE building.
Barski maintains an interesting site, Lisperati , that has graphical introductions to a number of topics, including Lisp, Haskell, Emacs, etc. and well as serving as he home of FringeDC an informal group of people interested in “fringe” programming languages.
Here’s the abstract for his talk.
“Have you ever wondered how we take information from the “real world” and put it into our computers? When we do this, do we lose parts of the information? Are some concepts just too hard to turn into ones and zeroes? How is our ability to enter information limited by the data structures we use inside of our computers? These questions enter into a science that is rarely discussed: The science of Knowledge Representation.
My presentation on KR will include some navel gazing, but also some nitty-gritty practical examples of Description Logics, RDF, and other modern approaches to capturing complicated information within a computer. We will also discuss some likely future directions this field may head into.”
Dr. Barski is a Medical Software Developer working on cardiology procedure documentation for Wolters Kluwer Health. He is also currently working on a textbook on the Common Lisp programming language.
You can submit a question either before, during or after the talk here.
Evri is another entry into the ‘semantic search’ space and has recently opened up a beta site with the slogan Search less, understand more. Evri is an startup launched by Vulcan Inc, a company founded by Paul Allen in 1986 as a private investment and R&D firm.
Here’s part of how Evri describes itself on their (FAQ).
“What is Evri doing? Evri is creating a map of connections between people, places, and things on the web. You’ll use this map to find the things you’re interested in. Instead of searching by keywords and looking for relevant results, Evri will lead you to other relevant articles, images, and video based on what you’re reading.
… Where does Evri get its information? We search the World Wide Web and gather content from as many highly regarded information sources as we can find, and we’re adding more sources all the time.”
Saying that Evri does ‘semantic search’ is not quite right — their initial focus is on providing widgets for blogs and other web sites that use the text on the page to recommend links to other, related information.
Evri appears to have developed an underlying ontology that is used to organize their knowledge of “people, products and things”, capturing both a type taxonomy and relations. Some of this is revealed in the beta**2 part of their site, Evri’s Garden. There is a query system over their knowledge base complex search queries.
The current push, though, seems to be to get bloggers to add an Evri widget to their blogs that will pop up a window with links to related articles and information.
This is an interesting development that is worth watching.
Databases are a fundamental technology for most information systems and especially those based on the web. A group of senior database researchers met recently to assess the state of database research, as documented in site. So, where did the Semantic Web fit into their vision?
“In late May, 2008, a group of database researchers, architects, users and pundits met at the Claremont Resort in Berkeley, California to discuss the state of the research field and its impacts on practice. This was the seventh meeting of this sort in twenty years, and was distinguished by a broad consensus that we are at a turning point in the history of the field, due both to an explosion of data and usage scenarios, and to major shifts in computing hardware and platforms. Given these forces, we are at a time of opportunity for research impact, with an unusually large potential for influential results across computing, the sciences and society. This report details that discussion, and highlights the group’s consensus view of new focus areas, including new database engine architectures, declarative programming languages, the interplay of structured and unstructured data, cloud data services, and mobile and virtual worlds.”
It’s a good report with lots of interesting things in it and definitely worth reading, but I was disappointed to find that it makes no mention of the Semantic Web, RDF, OWL, ontologies, AI, knowledge bases, or reasoning. Here’s a word cloud (generated with wordle) generated from the report, which provides a 10,000 foot view of it’s content.
The reports says that it was “surprisingly easy for the group to reach consensus on a set of research topics to highlight for investigation in coming years”. Those topics are:
Revisiting Database Engines
Declarative Programming for Emerging Platforms
The Interplay of Structured and Unstructured Data
Cloud Data Services
Mobile Applications and Virtual Worlds
There is clearly overlap between the database and semantic web communities in the first three topics.
David Huynh completed his PhD at MIT CSAIL last year and joined MetaWeb a few months ago, where he has been working on new and better interfaces to explore the data encoded in their Freebase system. He recently released Parallax as a prototype browsing interface for Freebase. Here is a video that shows the interface in action.
Freebase is “an open database of the world’s information” that is constructed by a Wiki-like collaborative community. In many ways it is like the Semantic Web model, with two big differences: (1) the data is stored centrally rather than distributed across the Web and (2) the representation system is not based on RDF but rather uses a custom built object-oriented data representation language.
Freebase is a great resource. Much of the data is extracted from Wikipedia, so its content has a large overlap with DBpedia. But it is also relatively easy to upload additional information in various structured forms and many have done so, resulting in an extended coverage.
This is clearly a system in the Web of Data space along with the Linking Open Data effort and having it should offer a way for us all to explore the consequences of some of the underlying design decisions.
“2008-06-20: The Semantic Web Deployment Working Group has published a Candidate Recommendation of RDFa in XHTML: Syntax and Processing. Web documents contain significant amounts of structured data, which is largely unavailable to tools and applications. When publishers can express this data more completely, and when tools can read it, a new world of user functionality becomes available, letting users transfer structured data between applications and web sites, and allowing browsing applications to improve the user experience. RDFa is a specification for attributes to be used with languages such as HTML and XHTML to express structured data. See the group’s RDFa implementation report. The Working Group also updated the companion document RDFa Primer. Learn more about the Semantic Web and the HTML Activity.”
Achieving candidate recommendation status is a significant step toward becoming a W3C recommendation. Congratulation to the working group for all of their efforts in developing RDFa.
Joshua Tauberer, a Upenn Linguistics graduate student, maintains rdf:about as a resouce of information on the semantic web language RDF. Its a consise collection of information that manages not to overwhelm and includes good Quick Intro and RDF in Depth pages.
“We invite submissions to the sixth annual Semantic Web Challenge, the premiere event for demonstrating practical progress towards achieving the vision of the Semantic Web. The central idea of the Semantic Web is to extend the current human-readable web by encoding some of the semantics of resources in a machine-processable form. Moving beyond syntax opens the door to more advanced applications and functionality on the Web. Computers will be better able to search, process, integrate and present the content of these resources in a meaningful, intelligent manner.
As the core technological building blocks are now in place, the next challenge is to show off the benefits of semantic technologies by developing integrated, easy to use applications that can provide new levels of Web functionality for end users on the Web or within enterprise settings. Applications submitted should demonstrate clear practical value that goes above and beyond what is possible with conventional web technologies alone.
Unlike in previous years, the Semantic Web Challenge of 2008 will consist of two tracks: the Open Track and the Billion Triples Track. The key difference between the two tracks is that the Billion Triples Track requires the participants to make use of the data set –a billion triples– provided by the organizers. The Open Track has no such restrictions.
As before, the Challenge is open to everyone from academia and industry. The authors of the best applications will be awarded prizes and featured prominently at special sessions during the conference”
“Swoogle has indexed millions of Semantic Web Documents, but how do I know that mine has been indexed?” Here is a simple way – please try your URL using Swoogle Track Back Service. Here I list several example to show how it works:
——————————————————————————–
About this URL
The latest ping on [2006-01-29] shows its status is [Succeed, changed into SWD].
Its latest cached original snapshot is [2006-01-29 (3373 bytes)]
Its latest cached NTriples snapshot is [2006-01-29 (41 triples)].
——————————————————————————–
We have found 7 cached versions.
2006-01-29: Original Snapshot (3373 bytes), NTriples Snapshot (41 triples)
2005-08-25: Original Snapshot (3373 bytes), NTriples Snapshot (41 triples)
2005-07-16: Original Snapshot (2439 bytes), NTriples Snapshot (35 triples)
2005-05-20: Original Snapshot (2173 bytes), NTriples Snapshot (30 triples)
2005-04-10: Original Snapshot (1909 bytes), NTriples Snapshot (28 triples)
2005-02-25: Original Snapshot (1869 bytes), NTriples Snapshot (27 triples)
2005-01-24: Original Snapshot, NTriples Snapshot (31 triples)
We may also check the growth of FOAF documents.
http://www.csee.umbc.edu/~dingli1/foaf.rdf
——————————————————————————–
About this URL
The latest ping on [2006-01-29] shows its status is [Succeed, changed into SWD].
Its latest cached original snapshot is [2006-01-29 (6072 bytes)]
Its latest cached NTriples snapshot is [2006-01-29 (98 triples)].
——————————————————————————–
We have found 6 cached versions.
2006-01-29: Original Snapshot (6072 bytes), NTriples Snapshot (98 triples)
2005-07-16: Original Snapshot (6072 bytes), NTriples Snapshot (98 triples)
2005-06-19: Original Snapshot (5053 bytes), NTriples Snapshot (80 triples)
2005-04-17: Original Snapshot (3142 bytes), NTriples Snapshot (50 triples)
2005-04-01: Original Snapshot (1761 bytes), NTriples Snapshot (29 triples)
2005-01-24: Original Snapshot, NTriples Snapshot (29 triples)
Finally, this service may also help us learn the life cycle of a semantic web document: it was created, actively maintained, lingered around for a while and finally died (i.e. went offline).
——————————————————————————–
About this URL
The latest ping on [2006-02-02] shows its status is [Failed, http code is not 200 (or406)].
Its latest cached original snapshot is [2005-03-09 (15809 bytes)]
Its latest cached NTriples snapshot is [2005-03-09 (149 triples)].
——————————————————————————–
We have found 3 cached versions.
2005-03-09: Original Snapshot (15809 bytes), NTriples Snapshot (149 triples)
2005-02-25: Original Snapshot (12043 bytes), NTriples Snapshot (149 triples)
2005-01-26: Original Snapshot, NTriples Snapshot (145 triples)
NOTICE: Yesterday we posted a form that direct you to Swoogle trackback service. Unfortunately, the form failed when it was called outside our firewall because a Swoogle API key is required. We didn’t notice at first, because we were inside the firewall when we tested it. When we did, we deleted the post, but PlanetRDF had already picked up the post and it was still in our database. Now the form has been removed, but you can definitely go to swoolge web site and try trackback service there.
We’ve set up a Google group, Swooglers, for users of the Swoogle Semantic Web search engine. Anyone can browse the archived and join, but only members can post messages. Replies are sent to the whole group. We’re not exactly sure what Swooglers will have to talk about, but it might be a place to share your experiences in using Swoogle, ask other users for advice, etc.
Recently Cláudio Fernandes asked on several semantic web mailing lists
“Can someone point me to some huge owl/rdf files? I’m writing a owl parser with different tools, and I’d like to benchmark them all with some really really big files.”
I just ran some queries over Swoogle’s collection of 850K RDF documents collected from the web. Here are the 100 largest RDF documents and OWL documents, respectively. Document size was measured in terms of the number of triples. For this query, a document was considered to be an OWL document if it used a namespace that contained the string OWL.
Curently, the version of Swoogle you get by going to http://swoogle.umbc.edu/ is Swoogle 2. Its database has been trapped in amber since last summer, when it was corrupted, preventing us from adding new data. We put our efforts into a reimplementation, Swoogle 3, which will be released early next week. The data reported here is from Swoogle 3′s database.
We noticed a Jose Vidal using a great idea on his publication list which we’ve added to the ebiquity site’s publication page. Jose augments his paper descriptions with data from Google Scholar (GS) — a link to the GS data, the number of citing papers, and a list of their GS data.
We think GS is likely to be increasingly important in the academic/scholarly community. It’s a way to find papers, of course, but also helps judge their significance to the field as measured by the number of citations. Citation counting is the traditional way of measuring the impact of a paper. Using Google Scholar’s citations to measure impact has its problems, a topic we’ve posted on before and is also discussed in the bibliometric circles, but it’s free and convenient, a combination that’s hard to beat. (Writing this, I wonder if anyone has tried a recursive model like that used in pagerank to citation graphs. If not, this would be an interesting experiment to do).
Here’s how our paper listings now works. We augmented the RGB paper ontology to give the paper class a new metadata property, googleKey, that is then used to derive the other properties — the number of citations and links to the GS description and the list of citing papers. Right now getting the GS Key is done manually since automating it reliably is not trivial. But we do have a link on the paper display that makes it easier to find the key by querying GS with the paper title and showing the results. If the paper is in GS, it will probably be on the first page.
Every night, an agent (well, ok, a cron job) checks Google Scholar to update the citation counts for all of the papers that have a GS key.
Our lab members tend to enter papers into the site’s database as soon as they are accepted for publication, which is long before they show up in Google Scholar and even longer before they begin to accrue citations. So authors will have to periodically check recently entered papers and update them with their GS keys when available. It will take some weeks or more before we’ve processed all of the old papers to look up their GS Key. Once we’ve done so, I think it should be easy to maintain it.
SemNews is a prototype application being developed by UMBC Ph.D. student Akshay Java that uses a sophisticated text understanding system to interpret summaries of news stories, publishes the results on the semantic web and provides browsing and query services over them. The project is the result of a collaboration between the UMBC’s Institute for Language and Information Technologies and Ebiquity Laboratory with partial support from the Lockheed Martin Corporation.
SemNews monitors a number of news source RSS feeds and processes new stories as they are published. After extracting a story’s metadata, its news summary is interpreted by the OntoSem text analyzer which does a syntactic, semantic, and pragmatic analysis of the text, resulting in its text meaning representation or TMR. A TMR is a language-neutral description (an interlingua) of the meaning conveyed in a natural language text. In addition to providing information about the lexical-semantic dependencies in the text, the TMR represents stylistic factors, discourse relations, speaker attitudes, and other pragmatic factors present in the discourse structure. In doing so, the TMR captures not only the meaning of individual elements in the text, but also the relations between those elements, and captures both propositional and non-propositional components of textual meaning. OntoSem’s TMRs are represented in a custom frame-based representation language and grounded in the Mikrokosmos ontology, an extensive ontology with over 30K concepts and nearly 400K entities.
Each story’s metadata and TMR are translated into the Semantic Web language OWL via the OntoSem2OWL translator developed for this project. The results are then added to a special collection indexed by the Swoogle search engine and also put into a RDF triple store. These are used to support several services enabling people and agents to semantically browse, query and visualize the stories in the collection, enabling access to information that would otherwise not be easy to find using simple keyword based search.
For example, one can browse through the story collection via the ontology to find stories that involve certain concepts, such as a terrorist organization; find all stories that involve an entities in OntoSem’s onomasticon, such as al qaeda or Karbala; visualize the stories on a map based on the locations they reference; or construct an arbitrary query, such as finding “stories in which the nation named Afghanistan was the location of a bombing event.” Users can also define semantic “alerts” as queries over the RDF triple store and/or the Swoogle collection. For each alert, SemNews will generate an RSS feed of the results.
The SemNews system is currently a research prototype that is being used to refine the underlying technologies and to explore how the sophisticated automatic linguistic processing of text can be integrated into the Semantic Web and conventional web applications. Ongoing work on SemNews includes an evaluation of its semantic recall and precision as well as a service that can group and cluster stories based on their semantic representations.
For more information
Akshay Java, Tim Finin and Sergei Nirenburg, Text understanding agents and the Semantic Web, Proceedings of the 39th Hawaii International Conference on System Sciences, Kauai HI, 4-6 January, 2006.
Sergei Nirenburg and Victor Raskin, Ontological Semantics, September 2004, The MIT Press, Cambridge.
Li Ding, Tim Finin, Anupam Joshi, Yun Peng, Rong Pan and Pavan Reddivari, Search on the Semantic Web, IEEE Computer, October 2005.