 | Semantic Web 
Archive for the 'Semantic Web' Category
February 13th, 2011, by Tim Finin, posted in AI, Datamining, Machine Learning, NLP, Semantic Web
On the eve of the big Jeopardy! match, Peter Norvig’s opinion piece in the New York Post (!) today, The Machine Age looks at AI’s progress over the past sixty years and lays out six surprising lessons we’ve learned.
- The things we thought were hard turned out to be easier.
- Dealing with uncertainty turned out to be more important than thinking with logical precision.
- Learning turned out to be more important than knowing.
- Current systems are more likely to be built from examples than from logical rules.
- The focus shifted from replacing humans to augmenting them.
- The partnership between human and machine is stronger than either one alone.
When took Pat Winston’s undergraduate AI class in 1970, only the first of those ideas was current. It’s a good essay.
Of course, after we we’ve exploited the new data-driven, statistical paradigm for the next decade or so, we’ll probably have to go back to figuring out how to get logic back into the framework.
Edit | Bookmark@del.icio.us | Trackback | 2 Comments »
February 12th, 2011, by Tim Finin, posted in Machine Learning, Semantic Web, Social media
The current (11 February 2011) issue of Science is a special issue on Dealing with Data. It includes a collection of free, online articles that “highlights both the challenges posed by the data deluge and the opportunities that can be realized if we can better organize and access the data.” Some of the articles are drawn from three sister publications: Science Signaling, Science Translational Medicine and Science Careers.
From the issue’s introduction:

“Scientific innovation has been called on to spur economic recovery; science and technology are essential to improving public health and welfare and to inform sustainability; and the scientific community has been criticized for not being sufficiently accountable and transparent. Data collection, curation, and access are central to all of these issues.
…
As you will discover, two themes appear repeatedly: Most scientific disciplines are finding the data deluge to be extremely challenging, and tremendous opportunities can be realized if we can better organize and access the data.”
One of the great things about the “data deluge” is that there is something in it for almost all computer science researchers including areas like machine learning, data mining, NLP, visualization, semantic web, security and privacy, social media, high performance computing, HCI, etc. Here are some of the articles that caught our eye:
and still more that look very interesting:
- Climate Data Challenges in the 21st Century, J. T. Overpeck et al.
- Challenges and Opportunities of Open Data in Ecology, O. J. Reichman et al.
- Challenges and Opportunities in Mining Neuroscience Data, H. Akil et al.
- The Disappearing Third Dimension, T. Rowe and L. R. Frank
- Advancing Global Health Research Through Digital Technology and Sharing Data, T. Lang
- More Is Less: Signal Processing and the Data Deluge, R. G. Baraniuk
- Access to Stem Cells and Data: Persons, Property Rights, and Scientific Progress, D. J. H. Mathews et al.
- On the Future of Genomic Data, S. D. Kahn
- Conquering the Data Mountain, N. R. Gough and M. B. Yaffe
- Power to the People: Participant Ownership of Clinical Trial Data, S. F. Terry and P. F. Terry
- Surfing the Tsunami, E. Pain
- Sharing Data in Biomedical and Clinical Research, K. Travis
Edit | Bookmark@del.icio.us | Trackback | 1 Comment »
February 8th, 2011, by Tim Finin, posted in Semantic Web
In today’s ebiquity meeting, Curt Tilmes showed an interesting figure showing the how often a particular dataset (MODIS snow cover data) was mentioned in a paper vs. how often it was formally cited. It’s a good example of how far we still need to go w.r.t. formally capturing the provenance of data and information derived from it.
The figure is from:
Parsons, Mark A.; Duerr, Ruth; Minster, Jean-Bernard. Data Citation and Peer Review. Eos, Transactions American Geophysical Union, Volume 91, Issue 34, p. 297-298. 2010.
Edit | Bookmark@del.icio.us | Trackback | 2 Comments »
December 25th, 2010, by Varish Mulwad, posted in RDF, Semantic Web
The goal and vision of the Semantic Web is to create a Web of connected and interlinked data (items) which can be shared and reused by all. Sharing and opening up “raw data” is great; but the Semantic Web isn’t just about sharing data. To create a Web of data, one needs interlinking between data. In 2006, Sir Tim Berners-Lee introduced the notion of linked data in which he outlined the best practices for creating and sharing data on the Web. To encourage people and government to share data, he recently developed the following rating system -

The highest rating is for the data that can link to other people’s data to provide context. While the Semantic Web has been growing steadily, there is lot of data that is still in raw format. A study by Google researchers shows that there are 154 million tables with high quality relational data on the world wide web. The US government along with 7 other nations have started sharing data publicly. Not all the data is RDF or confers with the best practices of publishing and sharing linked data.
Here in the Ebiquity Research Lab, we have been focusing on converting data in tables and spreadsheets into RDF; but our focus is not on generating just RDF, but rather generate high quality linked data (as now Berners-Lee calls it “5 star data”). Our goal is to build a completely automated framework for interpreting tables and generating linked data from it.

As part of our preliminary research, we have already developed a baseline framework which can link the table column headers to classes from ontologies in the linked data cloud datasets, link the table cells to entities in the linked data cloud and identify relations between table columns and map them to properties in the linked data cloud. You can read papers related to our preliminary research at [1]. We will use this blog as a medium to publish updates in our pursuit of creating “5-star” data for the Semantic Web.
If you are data publisher, go grab some Linked Data star badges at [2]. You can show your support to the open data movement by gettings t-shirts, mugs and bumper stickers from [3] ! (all profits go to W3C)
Happy Holidays ! Let 2011 be yet another step forward in the open data movement !
[1] – http://ebiquity.umbc.edu/person/html/Varish/Mulwad/?pub=on#pub
[2] – http://lab.linkeddata.deri.ie/2010/lod-badges/
[3] – http://www.cafepress.co.uk/w3c_shop
Edit | Bookmark@del.icio.us | Trackback | 3 Comments »
November 19th, 2010, by Tim Finin, posted in GENERAL, Privacy, Semantic Web, Web
Sir Tim Berners-Lee discusses the principles underlying the Web and the need to protect them in an article from the December issue of Scientific American, Long Live the Web.
“The Web evolved into a powerful, ubiquitous tool because it was built on egalitarian principles and because thousands of individuals, universities and companies have worked, both independently and together as part of the World Wide Web Consortium, to expand its capabilities based on those principles.
The Web as we know it, however, is being threatened in different ways. Some of its most successful inhabitants have begun to chip away at its principles. Large social-networking sites are walling off information posted by their users from the rest of the Web. Wireless Internet providers are being tempted to slow traffic to sites with which they have not made deals. Governments—totalitarian and democratic alike—are monitoring people’s online habits, endangering important human rights.
If we, the Web’s users, allow these and other trends to proceed unchecked, the Web could be broken into fragmented islands. We could lose the freedom to connect with whichever Web sites we want. The ill effects could extend to smartphones and pads, which are also portals to the extensive information that the Web provides.
Why should you care? Because the Web is yours. It is a public resource on which you, your business, your community and your government depend. The Web is also vital to democracy, a communications channel that makes possible a continuous worldwide conversation. The Web is now more critical to free speech than any other medium. It brings principles established in the U.S. Constitution, the British Magna Carta and other important documents into the network age: freedom from being snooped on, filtered, censored and disconnected.”
Near the end of the long feature article, he mentions the Semantic Web’s linked data as one of the major new technologies the Web will give birth to, provided the principles are upheld.
“A great example of future promise, which leverages the strengths of all the principles, is linked data. Today’s Web is quite effective at helping people publish and discover documents, but our computer programs cannot read or manipulate the actual data within those documents. As this problem is solved, the Web will become much more useful, because data about nearly every aspect of our lives are being created at an astonishing rate. Locked within all these data is knowledge about how to cure diseases, foster business value and govern our world more effectively.”
One of the benefits of linked data is that it makes data integration and fusion much easier. The benefit comes with a potential risk, which Berners-Lee acknowledges.
“Linked data raise certain issues that we will have to confront. For example, new data-integration capabilities could pose privacy challenges that are hardly addressed by today’s privacy laws. We should examine legal, cultural and technical options that will preserve privacy without stifling beneficial data-sharing capabilities.”
The risk is not unique to linked data, and new research is underway, in our lab and elsewhere, on how to also use Semantic Web technology to protect privacy.
Edit | Bookmark@del.icio.us | Trackback | 1 Comment »
October 30th, 2010, by Tim Finin, posted in AI, Datamining, Google, Machine Learning, NLP, sEARCH, Semantic Web, Social media
Recorded Future is a Boston-based startup with backing from Google and In-Q-Tel uses sophisticated linguistic and statistical algorithms to extract time-related information from streams of Web data about entities and events. Their goal is to help their clients to understand how the relationships between entities and events of interest are changing over time and make predictions about the future.
A recent Technology Review article, See the Future with a Search, describes it this way.
“Conventional search engines like Google use links to rank and connect different Web pages. Recorded Future’s software goes a level deeper by analyzing the content of pages to track the “invisible” connections between people, places, and events described online.
”That makes it possible for me to look for specific patterns, like product releases expected from Apple in the near future, or to identify when a company plans to invest or expand into India,” says Christopher Ahlberg, founder of the Boston-based firm.
A search for information about drug company Merck, for example, generates a timeline showing not only recent news on earnings but also when various drug trials registered with the website clinicaltrials.gov will end in coming years. Another search revealed when various news outlets predict that Facebook will make its initial public offering.
That is done using a constantly updated index of what Ahlberg calls “streaming data,” including news articles, filings with government regulators, Twitter updates, and transcripts from earnings calls or political and economic speeches. Recorded Future uses linguistic algorithms to identify specific types of events, such as product releases, mergers, or natural disasters, the date when those events will happen, and related entities such as people, companies, and countries. The tool can also track the sentiment of news coverage about companies, classifying it as either good or bad.”
Pricing for access to their online services and API starts at $149 a month, but there is a free Futures email alert service through which you can get the results of some standing queries on a daily or weekly basis. You can also explore the capabilities they offer through their page on the 2010 US Senate Races.
“Rather than attempt to predict how the the races will turn out, we have drawn from our database the momentum, best characterized as online buzz, and sentiment, both positive and negative, associated with the coverage of the 29 candidates in 14 interesting races. This dashboard is meant to give the view of a campaign strategist, as it measures how well a campaign has done in getting the media to speak about the candidate, and whether that coverage has been positive, in comparison to the opponent.”
Their blog reveals some insights on the technology they are using and much more about the business opportunities they see. Clearly the company is leveraging named entity recognition, event recognition and sentiment analysis. A short A White Paper on Temporal Analytics has some details on their overall approach.
Edit | Bookmark@del.icio.us | Trackback | 2 Comments »
September 12th, 2010, by Tim Finin, posted in Semantic Web, Social media, Web
Facebook has rolled out Facebook Browser as what sounds like a simple and effective idea — recommend pages based on on a user’s country and social network. My impression is mixed, however. While I like it’s top recommendation for me, I am already a fan. It’s suggestions for the celebrities category are a bust — Rush Limbaugh, Glenn Beck, Michelle Malkin, Mark Levin, Red Green and Bill O’Reilly. And Movies? Don’t even go there! Maybe it’s trying to tell me I need a new set of friends? Inside Facebook summarizes Facebook Browser this way:
“Facebook has launched a new way to “Discover Facebook’s Popular Pages” called Browser. It shows icons of Pages that are popular in a user’s country, but factors in which Pages which are popular amongst their unique friend network. When the Page icons are hovered over they display a Like button. Browser could cause popular Pages to get more popular, widening the gap between them and smaller Pages, similar to the frequently criticized and since abandoned Twitter Suggested User List.”
I think the idea is sound, though, and I like my Facebook friends. So, my conclusion is that Facebook needs to tweak the algorithm.
Edit | Bookmark@del.icio.us | Trackback | Comments Off
September 6th, 2010, by Tim Finin, posted in Semantic Web
The Semantic Web Science Association (SWSA) is seeking statements of interest from organizations or consortia interested in hosting the 11th International Semantic Web Conference, ISWC 2012. The conference series moves regularly between the Americas, Europe, and the Asia/Pacific region and we expect that the 2012 edition will be held in the US Americas in late October or early November 2012.
Organizations wishing to host ISWC 2012 should contact SWSA President Professor James Hendler (swsa-president@aifb.uni-karlsruhe.de) who will work with the SWSA members who are co-ordinating the bidding process for ISWC 2012.
The process comprises two stages. During the first stage, statements of interest are solicited through an open call that request responses using a simple form. Once the first phase is complete, SWSA will shortlist a number of applications, who will be invited to submit a full proposal, using a standard form and budget template. More information about the ISWC Conference Series and the bidding process for hosting a conference in the series can be found in the ISWC Conference Guide.
The important dates for applying to host a Conference in 2012 are:
- September 30, 2010: Deadline for receiving statements of interest
- November 15, 2010: Notifications to shortlisted bids are sent out
- January 15, 2011: Formal applications received from shortlisted bids
- March 1, 2011: SWSA decides on location for the 2012 Conference
Edit | Bookmark@del.icio.us | Trackback | Comments Off
September 2nd, 2010, by Tim Finin, posted in Privacy, Semantic Web, Social media, Twitter
Twitter’s planned shortening of all links via its t.co service is about to happen. The initial motivation was security, according to Twitter:
“Twitter’s link service at http://t.co is used to better protect users from malicious sites that engage in spreading malware, phishing attacks, and other harmful activity. A link converted by Twitter’s link service is checked against a list of potentially dangerous sites. When there’s a match, users can be warned before they continue.”
Declan McCullagh reports that Twitter announced in an email message that when someone click “on these links from Twitter.com or a Twitter application, Twitter will log that click.” Such information is extremely valuable. Give Twitter’s tens of millions of active users, just knowing how often certain URLs are clicked by people indicates what entities and topics are of interest at the moment.
“Our link service will also be used to measure information like how many times a link has been clicked. Eventually, this information will become an important quality signal for our Resonance algorithm—the way we determine if a Tweet is relevant and interesting.”
Associating the clicks with a user, IP address, location or device can yield even more information — like what you are interested in right now. Moreover, Twitter now has a way to associate arbitrary annotation metadata with each tweet. Analyzing all of this data can identify, for example, communities of users with common interests and the influential members within them.
Note that Twitter has not said it will do this or even that it will record and keep any user-identifiable information along with the clicks. They might just log the aggregate number of clicks in a window of time. But going the next step and capturing the additional information would be, in my mind, irresistible, even if there was no immediate plan to use it.
Search engines like Google already link clicks to users and IP addresses and use the information to improve their ranking algorithms and probably in many other ways. But what is troubling is the seemingly inexorable erosion of our online privacy. There will be no way to opt out of having your link wrapped by the t.co service and no announced way to opt out of having your clicks logged.
Edit | Bookmark@del.icio.us | Trackback | Comments Off
August 24th, 2010, by Tim Finin, posted in Google, sEARCH, Semantic Web, Social media
Microsoft’s Bing team announced on their blog that that the Bing search engine is “powering Yahoo!’s search results” in the US and Canada for English queries. Yahoo also has a post on their Yahoo! Search Blog.
The San Jose Mercury News reports:
“Tuesday, nearly 13 months after Yahoo and Microsoft announced plans to collaborate on Internet search in hopes of challenging Google’s market dominance, the two companies announced that the results of all Yahoo English language searches made in the United States and Canada are coming from Microsoft’s Bing search engine. The two companies are still racing to complete the transition of paid search, the text advertising links that run beside and above the standard search results, before the make-or-break holiday period — a much more difficult task.”
Combining the traffic from Microsoft and Yahoo will give the Bing a more significant share of the Web search market. That should help them by providing both companies with a larger stream of search related data that can be exploited to improve search relevance, ad placement and trend spotting. It will also help to foster competition with Google focused on developing better search technology.
Hopefully, Bing will be able to benefit from the good work done at Yahoo! on adding more semantics to Web search.
Edit | Bookmark@del.icio.us | Trackback | 1 Comment »
August 18th, 2010, by Tim Finin, posted in GENERAL, Semantic Web, Social media
Analog computers were a hot idea — in the 1950s! But I find this intriguing because I’ve come around to the position that a lot of our human “intelligence” is the result of acquiring and using probabilistic models. So supporting this in hardware might be a big win, especially for low-cost, low-power devices. It will also support lots of other common tasks in social computing, image processing and language technology.
Technology review has a short article, A New Kind of Microchip, on computer chip being developed by Lyric Semiconductor that process signals representing probabilities rather than digital bits.
“A computer chip that performs calculations using probabilities, instead of binary logic, could accelerate everything from online banking systems to the flash memory in smart phones and other gadgets. … And because that kind of math is at the core of many products, there are many potential applications. “To take one example, Amazon’s recommendations to you are based on probability,” says Vigoda. “Any time you buy [from] them, the fraud check on your credit card is also probability [based], and when they e-mail your confirmation, it passes through a spam filter that also uses probability.”
All those examples involve comparing different data to find the most likely fit. Implementing the math needed to do this is simpler with a chip that works with probabilities, says Vigoda, allowing smaller chips to do the same job at a faster rate. A processor that dramatically speeds up such probability-based calculations could find all kinds of uses.”
Lyric’s chip is called LEC and was developed with support from DARPA. It is 30 times smaller in size than current digital error correction technology according to Wired. Although small it yields “a Pentium’s worth of computation,” according to Lyric CEO Vigoda. His 2003 dissertation at MIT was on a related topic, Analog Logic: Continuous-Time Analog Circuits for Statistical Signal Processing.
You can also read about the LEC chip in a story in yesterday’s NYT, A Chip That Digests Data and Calculates the Odds.
Edit | Bookmark@del.icio.us | Trackback | Comments Off
August 15th, 2010, by Tim Finin, posted in Policy, Privacy, Security, Semantic Web, Social media
Privacy continues to be an important topic surrounding social media systems. A big part of the problem is that virtually all of us have a difficult time thinking about what information about us is exposed and to whom and for how long. As UMBC colleague Zeynep Tufekci points out, our intuitions in such matters come from experiences in the physical world, a place whose physics differs considerably from the cyber world.
Bruce Schneier offered a taxonomy of social networking data in a short article in the July/August issue of the IEEE Security & Privacy. A version of the article, A Taxonomy of Social Networking Data, is available on his site.
“Below is my taxonomy of social networking data, which I first presented at the Internet Governance Forum meeting last November, and again — revised — at an OECD workshop on the role of Internet intermediaries in June.
- Service data is the data you give to a social networking site in order to use it. Such data might include your legal name, your age, and your credit-card number.
- Disclosed data is what you post on your own pages: blog entries, photographs, messages, comments, and so on.
- Entrusted data is what you post on other people’s pages. It’s basically the same stuff as disclosed data, but the difference is that you don’t have control over the data once you post it — another user does.
- Incidental data is what other people post about you: a paragraph about you that someone else writes, a picture of you that someone else takes and posts. Again, it’s basically the same stuff as disclosed data, but the difference is that you don’t have control over it, and you didn’t create it in the first place.
- Behavioral data is data the site collects about your habits by recording what you do and who you do it with. It might include games you play, topics you write about, news articles you access (and what that says about your political leanings), and so on.
- Derived data is data about you that is derived from all the other data. For example, if 80 percent of your friends self-identify as gay, you’re likely gay yourself.”
I think most of us understand the first two categories and can easily choose or specify a privacy policy to control access to information in them. The rest however, are more difficult to think about and can lead to a lot of confusion when people are setting up their privacy preferences.
As an example, I saw some nice work at the 2010 IEEE International Symposium on Policies for Distributed Systems and Networks on “Collaborative Privacy Policy Authoring in a Social Networking Context” by Ryan Wishart et al. from Imperial college that addressed the problem of incidental data in Facebook. For example, if I post a picture and tag others in it, each of the tagged people can contribute additional policy constraints that can narrow access to it.
Lorrie Cranor gave an invited talk at the workshop on Building a Better Privacy Policy and made the point that even P3P privacy policies are difficult for people to comprehend.
Having a simple ontology for social media data could help us move forward toward better privacy controls for online social media systems. I like Schneier’s broad categories and wonder what a more complete treatment defined using Semantic Web languages might be like.
Edit | Bookmark@del.icio.us | Trackback | 1 Comment »
|  | You are currently browsing the archives for the Semantic Web category.
  Home
|
Archive
|
Login
|
Feed
|  |