UMBC ebiquity research group Building intelligent systems in open, heterogeneous, dynamic, distributed environments
Semantic Web

Archive for the 'Semantic Web' Category

RPI exports data.gov information as linked data

November 6th, 2009, by Tim Finin, posted in Semantic Web

UMBC alumnus Joab Jackson has an article in Government Computer News, Tim Berners-Lee: Machine-readable Web still a ways off, reporting on the International Semantic Web Conference help outside of Washington DC at the end of October. The article uses data.gov to illustrate the challenges and opportunities for the Semantic Web. Data.gov is a site whose purpose “is to increase public access to high value, machine readable datasets generated by the Executive Branch of the Federal Government.”

Jackson quotes Tim Berners-Lee

“When you look at putting government data on the Web, one of the concerns is … to not just put it out there on Excel files on Data.gov,” he said. “You should put these things in” the Resource Description Framework.

and later describes a project at RPI to republish information from data.gov in RDF leaded by another UMBC alumnus, Li Ding.

“Our goal is to make the whole thing shareable and replicable for others to re-use,” said project researcher Li Ding. By rendering data into RDF, it can be more easily interposed with other sets of data to create entirely new datasets and visualizations, Ding said. He showed a Google Map-based graphic that interposed RDF-versions of two different data sources from the Environmental Protection Agency, originally rendered in CSV files.


data.gov information as linked data

Dashboard shows data Google has about you

November 5th, 2009, by Tim Finin, posted in Google, Privacy, Semantic Web, Social media, Web

Google added a great new service, Dashboard, that summarizes data stored for a Google account — see MY ACCOUNT>PERSONAL SETTINGS>DASHBOARD.

“Designed to be simple and useful, the Dashboard summarizes data for each product that you use (when signed in to your account) and provides you direct links to control your personal settings. Today, the Dashboard covers more than 20 products and services, including Gmail, Calendar, Docs, Web History, Orkut, YouTube, Picasa, Talk, Reader, Alerts, Latitude and many more. The scale and level of detail of the Dashboard is unprecedented, and we’re delighted to be the first Internet company to offer this — and we hope it will become the standard.”

This is a good move on Google’s part. But while there’s a lot of information included, it’s not everything that Google knows about you — e.g., data in cookies, click throughs data from search results and information from companies it’s acquired, like Doublclick. Still, it is a big step in a positive direction.

New York Times publishes Linked Open Data

October 30th, 2009, by Tim Finin, posted in Ontologies, RDF, Semantic Web

Like many newspapers, the New York Times links the first mention of well known entitles in its articles to a reference page. For example, a mention of Barack Obama links to a page which is a collection of basic information on President Obama and links to relevant stories and other resources that the Times has created.

Now the Times is also using RDF to publish some of information as linked open data. Yesterday the Times announced the publication of an LOD collection covering about 5,000 people at http://data.nytimes.com/ under under a Creative Commons 3.0 Attribution License and plan to put their full collection of 30K topics online soon.

“Over the last several months we have manually mapped more than 5,000 person name subject headings onto Freebase and DBPedia. And today we are pleased to announce the launch of http://data.nytimes.com and the release of these 5,000 person name subject headings as Linked Open Data.

Over the next several months, we plan to expand http://data.nytimes.com to include each of the nearly 30,000 subject headings we use to power Times Topics pages, a collection that includes locations, organizations and descriptors in addition to person names.”

OWL 2 becomes a W3C recommendation

October 27th, 2009, by Tim Finin, posted in AI, KR, OWL, Ontologies, Semantic Web

OWL 2, the new version of the Web Ontology Language, officially became a W3C standard yesterday. From the W3C press release:

“Today W3C announces a new version of a standard for representing knowledge on the Web. OWL 2, part of W3C’s Semantic Web toolkit, allows people to capture their knowledge about a particular domain (say, energy or medicine) and then use tools to manage information, search through it, and learn more from it. Furthermore, as an open standard based on Web technology, it lowers the cost of merging knowledge from multiple domains.”

WolframAlpha releases API

October 16th, 2009, by Tim Finin, posted in AI, KR, NLP, Ontologies, Semantic Web

Wolfram|Alpha is an interesting query answering system developed by Wolfram Research that is a blend of a question answering system and a Semantic Web alternative. It tries to interpret and answer queries expressed as a sequence of words from a large collection of interlinked tables. Oh, and Mathematica is in thrown in for free. A free Web version was released last Spring.

The news today is that Wolfram|Alpha has released an API, as noted in their blog:

“The API allows your application to interact with Wolfram|Alpha much like you do on the web—you send a web request with the same query string you would type into Wolfram|Alpha’s query box and you get back the same computed results. It’s just that both are in a form your application can understand. There are plenty of ways to tweak and control the results, as well.”

The pricing plan runs from $60/month for 1000 (6 cents a query) queries to $220K for up to 10M queries/month (2.2 cents a query). programming language bindings are available for Java, PHP, Perl, Python, Ruby and .NET.

Their original web interface remains free, but the TOS specifies that it “may be used only by a human being using a conventional web browser to manually enter queries one at a time.”

Gaydar, Facebook and privacy

October 6th, 2009, by Tim Finin, posted in Machine Learning, Privacy, Semantic Web, Social media

In the Fall of 2007, two MIT students carried out a class project exploring how presumably private data could be inferred from an online social networking system. Their experiment was to predict the sexual orientation of Facebook users who make their basic information public by analyzing friendship associations. As reported in the Boston Globe last month, the students’ had not yet published their results.

Well, now they have — in the October issue of the First Monday, “one of the first openly accessible, peer–reviewed journals on the Internet”.

The paper has a lot of detail on the methodology for collecting the data and how it was analyzed. Here’s the abstract.

“Public information about one’s coworkers, friends, family, and acquaintances, as well as one’s associations with them, implicitly reveals private information. Social networking Web sites, e–mail, instant messaging, telephone, and VoIP are all technologies steeped in network data — data relating one person to another. Network data shifts the locus of information control away from individuals, as the individual’s traditional and absolute discretion is replaced by that of his social network. Our research demonstrates a method for accurately predicting the sexual orientation of Facebook users by analyzing friendship associations. After analyzing 4,080 Facebook profiles from the MIT network, we determined that the percentage of a given user’s friends who self–identify as gay male is strongly correlated with the sexual orientation of that user, and we developed a logistic regression classifier with strong predictive power. Although we studied Facebook friendship ties, network data is pervasive in the broader context of computer–mediated communication, raising significant privacy issues for communication technologies to which there are no neat solutions.”

As we had previously noted, this datamining exercise only accesses information that Facebook users explicitly choose to make public. The authors note that their analysis “relies on public self–identification of same–gender interest in Facebook profiles as a sentinel value for LGB identity”. The privacy vulnerability is that the default setting for a Facebook account is that friendship relations are public and you can not control the privacy settings of your friends. So if your leave your friend list public and many of your Facebook friends open up their profiles, it may be possible to draw reasonable inferences about your age, gender, political leanings, sexual preferences and other attributes.

Blackbook, a graph analytic platform for semantic web data

October 3rd, 2009, by Tim Finin, posted in Semantic Web, Social media, Web

In next week’s ebiquity meeting (10:15 EDT Tue 10/6), Lance Byrd and Set Cruz will talk about Blackbook, a graph analytic processing platform for semantic web data.

Blackbook3 is an RDF middleware framework for integrating data and executing algorithms that relies on open standards and “best-of-breed” open source technologies, including Jena, Lucene, JAAS, D2RQ, Hadoop, HBase and Solr. Blackbook3 has a plug-and-play, loosely–coupled architecture, supports SOAP and REST interfaces, offers SPARQL and linked data endpoints and can run in environments where high confidentiality is required.

The talk will discuss the current and future use cases for Blackbook3 as well as broader knowledge discovery and dissemination issues for RDF applications. You can participate remotely via dimdim starting at 10:15 EDT October 6.

Free draft of new Easley/Kleinberg book on Networks, Crowds, and Markets

October 1st, 2009, by Tim Finin, posted in Semantic Web, Social media, Web

David Easley and Jon Kleinberg have made available a free pre-publication draft of a new book, Networks, Crowds, and Markets: Reasoning About a Highly Connected World, to be published by Cambridge University Press in 2010. The book is based on an inter-disciplinary undergraduate course, Networks, that they teach at Cornell.

They say about their book

“Over the past decade there has been a growing public fascination with the complex “connectedness” of modern society. This connectedness is found in many incarnations: in the rapid growth of the Internet and the Web, in the ease with which global communication now takes place, and in the ability of news and information as well as epidemics and financial crises to spread around the world with surprising speed and intensity. These are phenomena that involve networks, incentives, and the aggregate behavior of groups of people; they are based on the links that connect us and the ways in which each of our decisions can have subtle consequences for the outcomes of everyone else.
    Networks, Crowds, and Markets combines different scientific perspectives in its approach to understanding networks and behavior. Drawing on ideas from economics, sociology, computing and information science, and applied mathematics, it describes the emerging field of study that is growing at the interface of all these areas, addressing fundamental questions about how the social, economic, and technological worlds are connected.”

Download the 828-page (!) draft of Networks, Crowds, and Markets in pdf here.

$1M Netflix Prize goes to BellKor’s Pragmatic Chaos

September 21st, 2009, by Tim Finin, posted in AI, Machine Learning, Semantic Web, Social media

Netflix announced today that BellKor’s Pragmatic Chaos team was awarded the $1M Netflix Grand Prize.

“It is our great honor to announce the $1M Grand Prize winner of the Netflix Prize contest as team BellKor’s Pragmatic Chaos for their verified submission on July 26, 2009 at 18:18:28 UTC, achieving the winning RMSE of 0.8567 on the test subset. This represents a 10.06% improvement over Cinematch’s score on the test subset at the start of the contest. We congratulate the team of Bob Bell, Martin Chabbert, Michael Jahrer, Yehuda Koren, Martin Piotte, Andreas Töscher and Chris Volinsky for their superb work advancing and integrating many significant techniques to achieve this result.”

Netflix announced that it will hold a new Netflix Prize 2 contest with details to be released.

What about the Ensemble’s last-minute entry, the one that seemed to top BellKor’s?

“Team BellKor’s Pragmatic Chaos edged out team The Ensemble with the winning submission coming just 24 minutes before the conclusion of the nearly three-year-long contest. Historically the Leaderboard has only reported team scores on the quiz subset. The Prize is awarded based on teams’ test subset score. Now that the contest is closed we will be updating the Leaderboard to report team scores on both the test and quiz subsets.”

As part of the final submission, teams were required to submit papers describing the approach. Here are the three that the winning team delivered.

The New York Times Bits blog also has an article, Netflix Awards $1 Million Prize and Starts a New Contest.

HealthBase semantic search is very positive about the Semantic Web

September 3rd, 2009, by Tim Finin, posted in NLP, Semantic Web, sEARCH

HealthBase is a ’semantic search engine’ for healthcare information that is driven by content mined from “millions of authoritative health sources” including WebMD, Wikipedia, PubMed, and Mayo Clinic’s health site. Techcrunch first described it as the ultimate medical content search engine but then had a follow up article reporting that HealthBase thinks you can get rid of jews with alcohol and salt. Language Log had some more fun exploring HealthBase.

We thought we’d see what HealthBase thought of the Semantic Web and it turns out that if you are experiencing the Semantic Web as a condition there are several recommended treatments.

healthbase1

and as a treatment itself, HealthBase is pretty positive.

healthbase2

Can infodemiology help manage a Swine Flu pandemic?

September 2nd, 2009, by Tim Finin, posted in Mobile Computing, Semantic Web, Social media

The Washington Post reports that Flu Trackers Encourage Patients to Blog About It. There was quite a bit of discussion about this back in April with the first wave of H1N1 (swine flu) concerns (e.g., Google flu trends: Web searches as sensors). The article mentions Google Flu Trends and HealthMap, but I was surprised with some of the new ideas people are exploring that the article mentions. Plus, I learned a catchy new term for this: infodemiology.

One idea is to further exploit mobile phone technology.

Boston-based HealthMap’s automated system sends out an hourly Web “crawler” that hunts for flu information in seven languages. Its creators on Tuesday launched a cellphone application called “Outbreaks Near Me” that can alert users to illnesses nearby. “If you move into a zone where there’s an outbreak, your phone would actually alert you,” said John Brownstein, assistant professor of pediatrics at Children’s Hospital in Boston, where HealthMap is based. The application also allows users to send back to HealthMap their own flu alerts.

And another is to recruit a population sample willing to serve as active sensors by reporting their own status and experiences.

Locally, Maryland has launched a “flu watcher” program in which volunteers report their health conditions weekly via the Internet. Project officials say the state is the first in the country to have such a system: the Maryland Resident Influenza Tracking Survey.

“We get people to sign up online and give us their e-mail address,” said Rene Najera, an epidemiologist with the Maryland Department of Health and Mental Hygiene. “They give us their county of residence, their month and year of birth. We don’t get too personal with them. We just want some basic demographics. Every week . . . we send them a survey . . . ‘Did you have any fever? Did you have any cough? Did you have any sore throat in the week previous?’ ” he said. If the answer is yes, more detailed questions are asked. So far, 740 people across the state have signed up.

And the Maryland system is not the only one — see the Australian Flutracking system for another, which gets responses from about 6,000 people.

Researchers at the National University of Singapore have developed a system called FluLog that will use Bluetooth to locate people who had been in proximity to someone who has become infected.

It’s a high-tech version of a process called “contact tracing,” said Mehul Motani of the National University of Singapore’s Faculty of Engineering. Typically, he said “when you have a suspected case, you interview the suspected case, and you ask them: ‘Where have you been? . . . Who have you been in sustained contact with?’ ” The idea is to locate others who might get sick.

Many of these systems have serious privacy issue, of course. But the examples discussed in this article (only some of which are mentioned here) are all voluntary.

It would be great if some of these systems could expose data as RDF making it available as part of the web of linked data.

RAEng report on Social, legal and ethical issues of autonomous systems

August 21st, 2009, by Tim Finin, posted in AI, Agents, Semantic Web, Social media, Technology Impact

RAEng report on Social, legal and ethical issues of autonomous systems

The Royal Academy of Engineering has released a report on the social, legal and ethical issues involving autonomous systems — systems that are adaptive, learn and can make decisions without the intervention or supervision of a human.

The report, Autonomous Systems: Social, Legal and Ethical Issues (pdf), was based on a roundtable discussion “from a wide range of experts, looking at the areas where autonomous systems are most likely to emerge first, and discussing the broad ethical issues surrounding their uptake.”

While autonomous systems have broad applicability, the report focuses on two areas: transportation (e.g. autonomous road vehicles) and personal care (e.g., smart homes).

“Autonomous systems, such as fully robotic vehicles that are “driverless” or artificial companions that can provide practical and emotional support to isolated people, have a level of self-determination and decision making ability with the capacity to learn from past performance. Autonomous systems do not experience emotional reactions and can therefore perform better than humans in tasks that are dull, risky or stressful. However they bring with them a new set of ethical problems. What if unpredicted behaviour causes harm? If an unmanned vehicle is involved in an accident, who is responsible – the driver or the systems engineer? Autonomous vehicles could provide benefits for road transport with reduced congestion and safety improvements but there is a lack of a suitable legal framework to address issues such as insurance and driver responsibility.

The technologies for smart homes and patient monitoring are already in existence and provide many benefits to older people, such as allowing them to remain in their own home when recovering from an illness, but they could also lead to isolation from family and friends. Some users may be unfamiliar with the technologies and be unable to give consent to their use.”

The RAEng report recommends “engaging early in public consultation” and working to establish “appropriate regulation and governance so that controls are put in place to guide the development of these systems”.

rdf:SeeAlso Autonomous tech ‘requires debate’; Scientists ponder rules and ethics of robo helpers; Robot cats could care for older Britons.

(via Mike Wooldridge)

You are currently browsing the archives for the Semantic Web category.

  Home | Archive | Login | Feed






UMBC