UMBC ebiquity research group Building intelligent systems in open, heterogeneous, dynamic, distributed environments
Semantic Web

Archive for the 'Semantic Web' Category

NOSQL: distributed key-value data stores

July 2nd, 2009, by Tim Finin, posted in Database, Semantic Web, Web

ComputerWorld has an article on the “nosql” movement and a recent nosql meetup held in San Francisco, No to SQL? Anti-database movement gains steam. Nosql systems are distributed, non-relational data stores that typically use a simple key-value approach to indexing and retrieving data and use a simple procedural query API rather than a sophisticated declarative query language.

“The inaugural get-together of the burgeoning NoSQL community crammed 150 attendees into a meeting room at CBS Interactive. Like the Patriots, who rebelled against Britain’s heavy taxes, NoSQLers came to share how they had overthrown the tyranny of slow, expensive relational databases in favor of more efficient and cheaper ways of managing data.

“Relational databases give you too much. They force you to twist your object data to fit a RDBMS [relational database management system],” said Jon Travis, principal engineer at Java toolmaker SpringSource, one of the 10 presenters at the NoSQL confab (PDF). NoSQL-based alternatives “just give you what you need,” Travis said.”

There were presentation on nine different ‘nosql’ databases: Voldemort, Cassandra, Dynomite, HBase, Hypertable, CouchDB, VPork, MongoDb as well as general presentations by Google’s Jonas Karlsson, and Cloudera’s Todd Lipcon.

Johan Oskarsson of Last.fm wrote a debriefing post on his blog.

“The relatively young but rapidly growing “nosql” community met last Thursday in San Francisco. The idea was to give attendees a solid introduction to how distributed, non relational databases work as well as an overview of the various projects out there.”

and provides links to the presentation slides and videos. You can also search for NOSQL on Vimeo to get the videos.

I learned of this meeting on Hacker News, where you can find some interesting comments.

Of course their are many popular key-value stores that are not designed to support the highly-scalable distributed needs of many Web applications. I found, for example, that as a persistent RDF store for rdflib, Sleepycat out performed MySQL.

CFP: JWS special issue on Semantic Web and Social Media

June 27th, 2009, by Tim Finin, posted in Blogging, Semantic Web, Social media, Wikipedia
important dates
abstracts 21 Sept 09
submissions 01 Oct 09
notification 15 Dec 09
final copy 15 Jan 10
publication April 10

The Journal of Web Semantics will publish a special issue on Data Mining and Social Network Analysis for integrating Semantic Web and Web 2.0 in the spring of 2010. The special issue will be edited by Bettina Berendt, Andreas Hotho and Gerd Stumme and initial abstracts for papers must be submitted via the Elsevier EES system by September 21, 2009.

The special issue, invites contributions that show how synergies between Semantic Web and Web 2.0 techniques can be successfully used. Since both communities work on network-like data structures, analysis methods from different fields of research could form a link between those communities. Techniques can be - but are not limited to - social network analysis, graph analysis, machine learning and data mining methods.

Relevant topics include

  • ontology learning from Web 2.0 data
  • instance extraction from Web 2.0 systems
  • analysis of Blogs
  • discovering social structures and communities
  • predicting trends and user behaviour
  • analysis of dynamic networks
  • using content of the Web for modelling
  • discovering misuse and fraud
  • network analysis of social resource sharing systems
  • analysis of folksonomies and other Web 2.0 data structures
  • analysis of Web 2.0 applications and their data
  • deriving profiles from usage
  • personalized delivery of news and journals
  • Semantic Web personalization
  • Semantic Web technologies for recommender systems
  • ubiquitous data mining in Web (2.0) environment
  • applications

Bing vs. Google, side by side comparison

June 1st, 2009, by Tim Finin, posted in Google, Security, Semantic Web, Social media, sEARCH

Microsoft’s new Bing search engine is getting a lot of interest. Glenn McDonald posts about a nice side-by-side Bing vs Google comparator tat he developed. It makes it easy to compare how the two services do on a range of different types of searches. Here are the ones that Glen said he found useful in developing his initial opinion.

I sense form some of these queries that he is probing the systems where an advanced search engine can exploit a little bit of semantic knowledge. For example, recognizing that a user’s query “boston to asheville” matches a common pattern “ to “, and she probably is interested in information about how to travel from the first location tot he second. It seems like Google has been working on adding more such patterns, at least for the low hanging fruit.

Of course, if everyone hits on this site it may get throttled or blocked by either or both of the search engines. @Glen — would you be willing to share your code?

(spotted on hacker news)

Price Waterhouse Coopers bullish on the Semantic Web

May 29th, 2009, by Tim Finin, posted in AI, Database, Semantic Web

Price Waterhouse Coopers is one of the largest “professional services” organization and has always been strong on technology consulting and advice. The Spring issue of their quarterly Technology Forecast journal focuses on the Semantic Web. This is from the table of contents

pwc-tech-forecast-spring-2009

  • 04 Spinning a data Web. Semantic Web technologies could revolutionize enterprise decision making and information sharing. Here’s why.
  • 20 Making Semantic Web connections. Linked Data technology can change the business of enterprise data management.
  • 16 Traversing the Giant Global Graph. Tom Scott of BBC Earth describes how everyone benefits from interoperable data.
  • 28 From folksonomies to ontologies. Uche Ogbuji of Zepheira discusses how early adopters are introducing Semantic Web to the enterprise.
  • 40 How the Semantic Web might improve cancer treatment. M. D. Anderson’s Lynn Vogel explores new techniques for combining clinical and research data.
  • 46 Semantic technologies at the ecosystem level. Frank Chum of Chevron talks about the need for shared ontologies in the oil and gas industry.

You can download the free 58 report here. You can also read a note on the issue in ReadWriteWeb, which focuses on linked data and interoperability.

“A new PricewaterhouseCoopersTechnology report explains how the Semantic Web and Linked Data can help enterprises manage their large scale data better. The PwC Center for Technology and Innovation team spent several months researching and analyzing the problem of data silos in enterprises - and what solutions are being developed to help with that problem. The answer, according to PwC, is Semantic Web techniques. PwC believes that the Semantic Web offers a practical way to address the problem of large-scale data integration. … “

(Spotted on publi-lod@w3.org)

Google Wave as a new communication model

May 28th, 2009, by Tim Finin, posted in Agents, Google, Semantic Web, Social media

Google wave looks interesting. Google describes it as “a new tool for communication and collaboration on the web” and it’s a funny mix of email, instant messaging, wikis, and Facebook wall interactions. Or maybe IRC for the new century. This is from a post, Went Walkabout. Brought back Google Wave, on the Google blog.

“A “wave” is equal parts conversation and document, where people can communicate and work together with richly formatted text, photos, videos, maps, and more. Here’s how it works: In Google Wave you create a wave and add people to it. Everyone on your wave can use richly formatted text, photos, gadgets, and even feeds from other sources on the web. They can insert a reply or edit the wave directly. It’s concurrent rich-text editing, where you see on your screen nearly instantly what your fellow collaborators are typing in your wave. That means Google Wave is just as well suited for quick messages as for persistent content — it allows for both collaboration and communication. You can also use “playback” to rewind the wave and see how it evolved.”

Google Wave is not available yet, but you can sign up to be notified when it’s launched.

Here’s a random thought. Our models for communication in multiagent systems (e.g., KQML and FIPA) were informed by if not based on email and, to a lesser degree, IM. If Wave is a useful new communication model for humans, does it have a counterpart for software agents? If so, I suspect that ideas from the Semantic Web will be useful to provide a “rich content” for agents.

For more views, see posts by o’reilly, techcrunch, BusinessWeek and Gabor Cselle.

Wolfram Alpha is live, API description online

May 15th, 2009, by Tim Finin, posted in Semantic Web

Wolfram!Alpha is live. A document describing the Wolfram Alpha API can be found in Google’s cache.

Steve Wolfram wrote today in a blog post, Wolfram|Alpha Is Launching: Made Possible by Mathematica, on its relation to Mathematica.

“Wolfram|Alpha defines a new direction in computing—that would have simply not have been possible without Mathematica, and that in time will add some remarkable new dimensions to Mathematica itself. In terms of technology, Wolfram|Alpha is a uniquely complex software system, which has been entirely developed and deployed with Mathematica and Mathematica technologies. … When we launch Wolfram|Alpha this weekend, it will be running Mathematica on about 10,000 processor cores, using gridMathematica-based parallelism. And every single query that comes into the system will be served with webMathematica.”

And now, for a real test…

(spotted on Hacker News)

UPDATE: (5/18) The API document is officially now available.

Google supports RDFa and Microformats

May 12th, 2009, by Tim Finin, posted in Google, RDF, Semantic Web

Google has announced that it will begin to recognize structured information encoded as metadata in either RDFa and in Microformats and use the metadata in search results snippets for reviews and people.

“Structured data makes the web a better place. It also helps Google better understand and present your page in search results. … Google’s first use of this data will be in search results snippets for two kinds of objects: Reviews and People. Providing more detail in search results helps users to understand the value of your pages. When users get more information showing how your page is relevant to their search, they’re more likely to click through to see the full page. … At Google, we believe in openness, so we are using two open standards to allow you to annotate structured data on your site: microformats and RDFa. Both standards allow markup of information on your pages.”

This is a case where Google is following Yahoo, which announced more general support for RDFa and microformats last Fall in their Search Monkey.

We expect that this is work in progress. While it’s great that Google is supporting RDFa annotations, they are asking people to start with the new RDF vocabulary defined at their site http://www.data-vocabulary.org/ rather than reusing or integrating with existing, widely used vocabularies. Let’s hope that they embrace the LOD vision in the near future.

Got a linguistic fluff problem?

May 4th, 2009, by Tim Finin, posted in Semantic Web

Finally, a way to remove all of that annoying ‘linguistic fluff’! A BBC article on Wolfram Alpha describes it as better than Google.

“A web tool that ‘could be as important as Google’, according to some experts, has been shown off to the public. Wolfram Alpha is the brainchild of British-born physicist Stephen Wolfram. The free program aims to answer questions directly, rather than display web pages in response to a query like a search engine.”

But wait, there’s more…

“In addition, he said, the system had got ‘pretty good at removing linguistic fluff’, the kinds of words that are not necessary for the system to find and compute the relevant data.”

ShamWow!

Google flu trends: Web searches as sensors

April 26th, 2009, by Tim Finin, posted in Google, Semantic Web, Social media, sEARCH

Google has had a special “flu trends” site up for many months that provides “up-to-date estimates of flu activity in the United States based on aggregated search queries.”

They have found that how many people search for flu-related topics is a leading indicator for reports on how many people actually have flu symptoms. They believe that this metric “may indicate flu activity up to two weeks ahead of traditional flu surveillance systems”. Click on the flash video below to see the relationship between the flu searches and flu symptoms.

So, is Google magic? The explanation for why changes in in the level of flu searches precedes changes in the level of flu symptoms is more mundane.

“So why bother with estimates from aggregated search queries? It turns out that traditional flu surveillance systems take 1-2 weeks to collect and release surveillance data, but Google search queries can be automatically counted very quickly. By making our flu estimates available each day, Google Flu Trends may provide an early-warning system for outbreaks of influenza.”

You can get the details in a recent article in nature:

J. Ginsberg, M. Mohebbi, R. Patel, L. Brammer, M. Smolinski and L. Brilliant, Detecting influenza epidemics using search engine query data, Nature 457, 1012-1014 (19 February 2009).

Of course, such leading indicators may not correlate well if there is a “black swan” flu epidemic or even if there is an unfounded fear of one. Sometimes the crowds are wise, but often not. Remember when we all thought technology stocks real estate was a good thing to invest in?

The Google site also allows you to look at the data by state as well. Click on the image below to try it out.



Web of Data, Services and Identities

April 18th, 2009, by Tim Finin, posted in Semantic Web, Web, Web 2.0

ReadWriteWeb has a post up on The Web of Data: Creating Machine-Accessible Information that focuses on Linked Open Data.

“In the coming years, we will see a revolution in the ability of machines to access, process, and apply information. This revolution will emerge from three distinct areas of activity connected to the Semantic Web: the Web of Data, the Web of Services, and the Web of Identity providers. These webs aim to make semantic knowledge of data accessible, semantic services available and connectable, and semantic knowledge of individuals processable, respectively. In this post, we will look at the first of these Webs (of Data) and see how making information accessible to machines will transform how we find information.”

I did find the three ‘Webs’ mentioned in their into — data, services and identity providers — to be interesting. The first two are standard components of the envisioned future Web but their third, a web of identity providers, less so. I am unsure its meant to refer to authentication services and protocols (e.g., oauth) or maybe some kind of named entity recognition services from text. The former is certainly necessary for web services and APIs to work more seamlessly, but doesn’t seem to me to be as significant a problem as developing highly interoperable and integrable Webs of data and services. Of course, I am probably unaware of the subtleties involved in getting this right while maintaing security and appropriate privacy. In any case, I look forward to the articles to follow.

Tutorial: Hadoop on Windows with Eclipse

April 9th, 2009, by Tim Finin, posted in High performance computing, MC2, Multicore Computation Center, Programming, Semantic Web, cloud computing

Hadoop has become one of the most popular frameworks to exploit parallelism on a computing cluster. You don’t actually need access to a cluster to try Hadoop, learn how to use it, and develop code to solve your own problems.

UMBC Ph.D student Vlad Korolev has written an excellent tutorial, Hadoop on Windows with Eclipse, showing how to install and use Hadoop on a single computer running Microsoft Windows. It also covers the Eclipse Hadoop plugin, which enables you to create and run Hadoop projects from Eclipse. In addition to step by step instructions, the tutorial has short videos documenting the process.

If you want to explore Hadoop and are comfortable developing Java programs in Eclipse on a Windows box, this tutorial will get you going. Once you have mastered Hadoop and had developed your first project using it, you can go about finding a cluster to run it on.

Cloudera offers a simpler Hadoop distribution

March 18th, 2009, by Tim Finin, posted in Google, High performance computing, MC2, Multicore Computation Center, Semantic Web, Social media, cloud computing

We are early in the era of big data (including social and/or semantic) and more and more of us need the tools to handle it. Monday’s NYT had a story, Hadoop, a Free Software Program, Finds Uses Beyond Search, on Hadoop and Cloudera, a new startup that offering its own Hadoop distribution that is designed to beasier to install and configure.

“In the span of just a couple of years, Hadoop, a free software program named after a toy elephant, has taken over some of the world’s biggest Web sites. It controls the top search engines and determines the ads displayed next to the results. It decides what people see on Yahoo’s homepage and finds long-lost friends on Facebook.”

Three top engineers from Google, Yahoo and Facebook, along with a former executive from Oracle, are betting it will. They announced a start-up Monday called Cloudera, based in Burlingame, Calif., that will try to bring Hadoop’s capabilities to industries as far afield as genomics, retailing and finance. The company has just released its own version of Hadoop. The software remains free, but Cloudera hopes to make money selling support and consulting services for the software. It has only a few customers, but it wants to attract biotech, oil and gas, retail and insurance customers to the idea of making more out of their information for less.

Cloudera’s distribution, curently based on Hadoop v0.18.3, uses RPM and comes with a Web-based configuration aide. The company also offers some free basic training in mapReduce concepts, using Hadoop, developing appropriate algorithms and using Hive.

You are currently browsing the archives for the Semantic Web category.

  Home | Archive | Login | Feed






UMBC