 | Web 
Archive for the 'Web' Category
November 6th, 2009, by Tim Finin, posted in Semantic Web
UMBC alumnus Joab Jackson has an article in Government Computer News, Tim Berners-Lee: Machine-readable Web still a ways off, reporting on the International Semantic Web Conference help outside of Washington DC at the end of October. The article uses data.gov to illustrate the challenges and opportunities for the Semantic Web. Data.gov is a site whose purpose “is to increase public access to high value, machine readable datasets generated by the Executive Branch of the Federal Government.”
Jackson quotes Tim Berners-Lee
“When you look at putting government data on the Web, one of the concerns is … to not just put it out there on Excel files on Data.gov,” he said. “You should put these things in” the Resource Description Framework.
and later describes a project at RPI to republish information from data.gov in RDF leaded by another UMBC alumnus, Li Ding.
“Our goal is to make the whole thing shareable and replicable for others to re-use,” said project researcher Li Ding. By rendering data into RDF, it can be more easily interposed with other sets of data to create entirely new datasets and visualizations, Ding said. He showed a Google Map-based graphic that interposed RDF-versions of two different data sources from the Environmental Protection Agency, originally rendered in CSV files.
Edit | Bookmark@del.icio.us | Trackback | No Comments »
November 5th, 2009, by Tim Finin, posted in Google, Privacy, Semantic Web, Social media, Web
Google added a great new service, Dashboard, that summarizes data stored for a Google account — see MY ACCOUNT>PERSONAL SETTINGS>DASHBOARD.
“Designed to be simple and useful, the Dashboard summarizes data for each product that you use (when signed in to your account) and provides you direct links to control your personal settings. Today, the Dashboard covers more than 20 products and services, including Gmail, Calendar, Docs, Web History, Orkut, YouTube, Picasa, Talk, Reader, Alerts, Latitude and many more. The scale and level of detail of the Dashboard is unprecedented, and we’re delighted to be the first Internet company to offer this — and we hope it will become the standard.”
This is a good move on Google’s part. But while there’s a lot of information included, it’s not everything that Google knows about you — e.g., data in cookies, click throughs data from search results and information from companies it’s acquired, like Doublclick. Still, it is a big step in a positive direction.
Edit | Bookmark@del.icio.us | Trackback | 2 Comments »
November 4th, 2009, by Tim Finin, posted in Security, Social media
Yesterday was the first time a truly voter verifiable voting system was used in any binding government election, thanks in part to work being carried out at UMBC’s Cyber Defense Lab under the direction of Alan Sherman.
Takoma Park, MD used the Scantegrity system for its municipal election after testing it in a mock election last April. Technology Review has a story, First Test for Election Cryptography, that quotes Anne Sergeant, the chair of the Takoma Park board of elections
“Before trying Scantegrity in an official election, the city held a mock vote in April to work out kinks in the system. In that test, she says, about 30 percent of participants went home and used the system to verify their votes. Sergeant says that Scantegrity representatives talked extensively with voters and election officials after the April test and have improved their system accordingly. “I hope we can provide an experience where people walk away and say, ‘That was awesome,’” she says. “It’s a goal to which we aspire.”
The Scantegrity system was created by a group of universities, including UMBC. A voter uses a paper ballot marked with invisible ink, which is exposed with a special marker. That marker reveals a code, which the voter can then use to check online whether their vote was tabulated correctly.
Ben Adida has been auditing the election and documenting the process on his blog.
See also the ComputerWorld story, E-voting system lets voters verify their ballots are counted, and audio report on WAMU.
Edit | Bookmark@del.icio.us | Trackback | No Comments »
October 30th, 2009, by Tim Finin, posted in Ontologies, RDF, Semantic Web
Like many newspapers, the New York Times links the first mention of well known entitles in its articles to a reference page. For example, a mention of Barack Obama links to a page which is a collection of basic information on President Obama and links to relevant stories and other resources that the Times has created.
Now the Times is also using RDF to publish some of information as linked open data. Yesterday the Times announced the publication of an LOD collection covering about 5,000 people at http://data.nytimes.com/ under under a Creative Commons 3.0 Attribution License and plan to put their full collection of 30K topics online soon.
“Over the last several months we have manually mapped more than 5,000 person name subject headings onto Freebase and DBPedia. And today we are pleased to announce the launch of http://data.nytimes.com and the release of these 5,000 person name subject headings as Linked Open Data.
…
Over the next several months, we plan to expand http://data.nytimes.com to include each of the nearly 30,000 subject headings we use to power Times Topics pages, a collection that includes locations, organizations and descriptors in addition to person names.”
Edit | Bookmark@del.icio.us | Trackback | No Comments »
October 29th, 2009, by Tim Finin, posted in Social media, Web
DARPA will hold the DARPA Network Challenge to explore how “broad-scope problems can be solved using Internet-based technologies.
“To mark the 40th anniversary of the Internet, DARPA has announced the DARPA Network Challenge, a competition that will explore the role the Internet and social networking plays in the timely communication, wide area team-building and urgent mobilization required to solve broad scope, time-critical problems.
The challenge is to be the first to submit the locations of ten moored, 8 foot, red weather balloons located at ten fixed locations in the continental United States. Balloons will be in readily accessible locations and visible from nearby roadways.”
According to the rules, the balloons will be on display from 10:00AM to 4:00PM on Saturday, 5 December 2009. A prize of $40,000 will be awarded to the first participant to submit the latitude and longitude of all ten weather balloons within the contest period, which ends on 14 December 2009.
Edit | Bookmark@del.icio.us | Trackback | 1 Comment »
October 27th, 2009, by Tim Finin, posted in AI, KR, OWL, Ontologies, Semantic Web
OWL 2, the new version of the Web Ontology Language, officially became a W3C standard yesterday. From the W3C press release:
“Today W3C announces a new version of a standard for representing knowledge on the Web. OWL 2, part of W3C’s Semantic Web toolkit, allows people to capture their knowledge about a particular domain (say, energy or medicine) and then use tools to manage information, search through it, and learn more from it. Furthermore, as an open standard based on Web technology, it lowers the cost of merging knowledge from multiple domains.”
Edit | Bookmark@del.icio.us | Trackback | No Comments »
October 25th, 2009, by Tim Finin, posted in AI, Agents, Social media
Golden Balls is a UK game show with a final round, Split or Steal, that is similar to the prisoner’s dilemma. The two contestants have to simultaneously choose to split the prize or try to steal it. If both choose split, they each get half. If one chooses split and the other steal, than the stealer gets it all. If they both choose steal, neither gets anything. While the payoff matrix is not exactly that for the PD, it has a similar effect on the strategy. Check out this video of a Split or Steal round for £100,000. (Spotted on Hacker News)
Edit | Bookmark@del.icio.us | Trackback | No Comments »
October 16th, 2009, by Tim Finin, posted in AI, KR, NLP, Ontologies, Semantic Web
Wolfram|Alpha is an interesting query answering system developed by Wolfram Research that is a blend of a question answering system and a Semantic Web alternative. It tries to interpret and answer queries expressed as a sequence of words from a large collection of interlinked tables. Oh, and Mathematica is in thrown in for free. A free Web version was released last Spring.
The news today is that Wolfram|Alpha has released an API, as noted in their blog:
“The API allows your application to interact with Wolfram|Alpha much like you do on the web—you send a web request with the same query string you would type into Wolfram|Alpha’s query box and you get back the same computed results. It’s just that both are in a form your application can understand. There are plenty of ways to tweak and control the results, as well.”
The pricing plan runs from $60/month for 1000 (6 cents a query) queries to $220K for up to 10M queries/month (2.2 cents a query). programming language bindings are available for Java, PHP, Perl, Python, Ruby and .NET.
Their original web interface remains free, but the TOS specifies that it “may be used only by a human being using a conventional web browser to manually enter queries one at a time.”
Edit | Bookmark@del.icio.us | Trackback | No Comments »
October 6th, 2009, by Tim Finin, posted in Machine Learning, Privacy, Semantic Web, Social media
In the Fall of 2007, two MIT students carried out a class project exploring how presumably private data could be inferred from an online social networking system. Their experiment was to predict the sexual orientation of Facebook users who make their basic information public by analyzing friendship associations. As reported in the Boston Globe last month, the students’ had not yet published their results.
Well, now they have — in the October issue of the First Monday, “one of the first openly accessible, peer–reviewed journals on the Internet”.
The paper has a lot of detail on the methodology for collecting the data and how it was analyzed. Here’s the abstract.
“Public information about one’s coworkers, friends, family, and acquaintances, as well as one’s associations with them, implicitly reveals private information. Social networking Web sites, e–mail, instant messaging, telephone, and VoIP are all technologies steeped in network data — data relating one person to another. Network data shifts the locus of information control away from individuals, as the individual’s traditional and absolute discretion is replaced by that of his social network. Our research demonstrates a method for accurately predicting the sexual orientation of Facebook users by analyzing friendship associations. After analyzing 4,080 Facebook profiles from the MIT network, we determined that the percentage of a given user’s friends who self–identify as gay male is strongly correlated with the sexual orientation of that user, and we developed a logistic regression classifier with strong predictive power. Although we studied Facebook friendship ties, network data is pervasive in the broader context of computer–mediated communication, raising significant privacy issues for communication technologies to which there are no neat solutions.”
As we had previously noted, this datamining exercise only accesses information that Facebook users explicitly choose to make public. The authors note that their analysis “relies on public self–identification of same–gender interest in Facebook profiles as a sentinel value for LGB identity”. The privacy vulnerability is that the default setting for a Facebook account is that friendship relations are public and you can not control the privacy settings of your friends. So if your leave your friend list public and many of your Facebook friends open up their profiles, it may be possible to draw reasonable inferences about your age, gender, political leanings, sexual preferences and other attributes.
Edit | Bookmark@del.icio.us | Trackback | 2 Comments »
October 3rd, 2009, by Tim Finin, posted in Semantic Web, Social media, Web
In next week’s ebiquity meeting (10:15 EDT Tue 10/6), Lance Byrd and Set Cruz will talk about Blackbook, a graph analytic processing platform for semantic web data.
Blackbook3 is an RDF middleware framework for integrating data and executing algorithms that relies on open standards and “best-of-breed” open source technologies, including Jena, Lucene, JAAS, D2RQ, Hadoop, HBase and Solr. Blackbook3 has a plug-and-play, loosely–coupled architecture, supports SOAP and REST interfaces, offers SPARQL and linked data endpoints and can run in environments where high confidentiality is required.
The talk will discuss the current and future use cases for Blackbook3 as well as broader knowledge discovery and dissemination issues for RDF applications. You can participate remotely via dimdim starting at 10:15 EDT October 6.
Edit | Bookmark@del.icio.us | Trackback | 3 Comments »
October 1st, 2009, by Tim Finin, posted in Semantic Web, Social media, Web
David Easley and Jon Kleinberg have made available a free pre-publication draft of a new book, Networks, Crowds, and Markets: Reasoning About a Highly Connected World, to be published by Cambridge University Press in 2010. The book is based on an inter-disciplinary undergraduate course, Networks, that they teach at Cornell.
They say about their book
“Over the past decade there has been a growing public fascination with the complex “connectedness” of modern society. This connectedness is found in many incarnations: in the rapid growth of the Internet and the Web, in the ease with which global communication now takes place, and in the ability of news and information as well as epidemics and financial crises to spread around the world with surprising speed and intensity. These are phenomena that involve networks, incentives, and the aggregate behavior of groups of people; they are based on the links that connect us and the ways in which each of our decisions can have subtle consequences for the outcomes of everyone else. Networks, Crowds, and Markets combines different scientific perspectives in its approach to understanding networks and behavior. Drawing on ideas from economics, sociology, computing and information science, and applied mathematics, it describes the emerging field of study that is growing at the interface of all these areas, addressing fundamental questions about how the social, economic, and technological worlds are connected.”
Download the 828-page (!) draft of Networks, Crowds, and Markets in pdf here.
Edit | Bookmark@del.icio.us | Trackback | No Comments »
September 22nd, 2009, by Tim Finin, posted in Privacy, Social media
The New York Times reports that the data for the Netflix Prize 2 will include more information about the anonymous users:
“Netflix was so pleased with the results of its first contest that it announced a second one on Monday. The new contest will present contestants with demographic and behavioral data, including renters’ ages, gender, ZIP codes, genre ratings and previously chosen movies — but not ratings. Contestants will then have to predict which movies those people will like.”
As others have noted this will make it much easier to “de-anonymize” individuals in the collection.
As an experiment, I checked the zip code where I grew up and found that it had about 3900 people in the 2000 census. So, given an age and gender you would have a set of about 40 people. With just a little bit of additional information, one could narrow this to a specific individual.
For example, Narayanan and Shmatikov showed (Robust De-anonymization of Large Sparse Datasets) that this could be done with the dataset from the first Netflix Grand Prize by mining information from IMDB. Think of how much more powerful such attacks would be with the new dataset.
Edit | Bookmark@del.icio.us | Trackback | No Comments »
|  |
|  |