New York Times publishes Linked Open Data

October 30th, 2009

Like many newspapers, the New York Times links the first mention of well known entitles in its articles to a reference page. For example, a mention of Barack Obama links to a page which is a collection of basic information on President Obama and links to relevant stories and other resources that the Times has created.

Now the Times is also using RDF to publish some of information as linked open data. Yesterday the Times announced the publication of an LOD collection covering about 5,000 people at under under a Creative Commons 3.0 Attribution License and plan to put their full collection of 30K topics online soon.

“Over the last several months we have manually mapped more than 5,000 person name subject headings onto Freebase and DBPedia. And today we are pleased to announce the launch of and the release of these 5,000 person name subject headings as Linked Open Data.

Over the next several months, we plan to expand to include each of the nearly 30,000 subject headings we use to power Times Topics pages, a collection that includes locations, organizations and descriptors in addition to person names.”

Win $40K in the DARPA Network Challenge

October 29th, 2009

DARPA will hold the DARPA Network Challenge to explore how “broad-scope problems can be solved using Internet-based technologies.

“To mark the 40th anniversary of the Internet, DARPA has announced the DARPA Network Challenge, a competition that will explore the role the Internet and social networking plays in the timely communication, wide area team-building and urgent mobilization required to solve broad scope, time-critical problems.

The challenge is to be the first to submit the locations of ten moored, 8 foot, red weather balloons located at ten fixed locations in the continental United States. Balloons will be in readily accessible locations and visible from nearby roadways.”

According to the rules, the balloons will be on display from 10:00AM to 4:00PM on Saturday, 5 December 2009. A prize of $40,000 will be awarded to the first participant to submit the latitude and longitude of all ten weather balloons within the contest period, which ends on 14 December 2009.

OWL 2 becomes a W3C recommendation

October 27th, 2009

OWL 2, the new version of the Web Ontology Language, officially became a W3C standard yesterday. From the W3C press release:

“Today W3C announces a new version of a standard for representing knowledge on the Web. OWL 2, part of W3C’s Semantic Web toolkit, allows people to capture their knowledge about a particular domain (say, energy or medicine) and then use tools to manage information, search through it, and learn more from it. Furthermore, as an open standard based on Web technology, it lowers the cost of merging knowledge from multiple domains.”

Prisoners Dilemma and the Golden Balls game show

October 25th, 2009

Golden Balls is a UK game show with a final round, Split or Steal, that is similar to the prisoner’s dilemma. The two contestants have to simultaneously choose to split the prize or try to steal it. If both choose split, they each get half. If one chooses split and the other steal, than the stealer gets it all. If they both choose steal, neither gets anything. While the payoff matrix is not exactly that for the PD, it has a similar effect on the strategy. Check out this video of a Split or Steal round for £100,000. (Spotted on Hacker News)

WolframAlpha releases API

October 16th, 2009

Wolfram|Alpha is an interesting query answering system developed by Wolfram Research that is a blend of a question answering system and a Semantic Web alternative. It tries to interpret and answer queries expressed as a sequence of words from a large collection of interlinked tables. Oh, and Mathematica is in thrown in for free. A free Web version was released last Spring.

The news today is that Wolfram|Alpha has released an API, as noted in their blog:

“The API allows your application to interact with Wolfram|Alpha much like you do on the web—you send a web request with the same query string you would type into Wolfram|Alpha’s query box and you get back the same computed results. It’s just that both are in a form your application can understand. There are plenty of ways to tweak and control the results, as well.”

The pricing plan runs from $60/month for 1000 (6 cents a query) queries to $220K for up to 10M queries/month (2.2 cents a query). programming language bindings are available for Java, PHP, Perl, Python, Ruby and .NET.

Their original web interface remains free, but the TOS specifies that it “may be used only by a human being using a conventional web browser to manually enter queries one at a time.”

Gaydar, Facebook and privacy

October 6th, 2009

In the Fall of 2007, two MIT students carried out a class project exploring how presumably private data could be inferred from an online social networking system. Their experiment was to predict the sexual orientation of Facebook users who make their basic information public by analyzing friendship associations. As reported in the Boston Globe last month, the students’ had not yet published their results.

Well, now they have — in the October issue of the First Monday, “one of the first openly accessible, peer–reviewed journals on the Internet”.

The paper has a lot of detail on the methodology for collecting the data and how it was analyzed. Here’s the abstract.

“Public information about one’s coworkers, friends, family, and acquaintances, as well as one’s associations with them, implicitly reveals private information. Social networking Web sites, e–mail, instant messaging, telephone, and VoIP are all technologies steeped in network data — data relating one person to another. Network data shifts the locus of information control away from individuals, as the individual’s traditional and absolute discretion is replaced by that of his social network. Our research demonstrates a method for accurately predicting the sexual orientation of Facebook users by analyzing friendship associations. After analyzing 4,080 Facebook profiles from the MIT network, we determined that the percentage of a given user’s friends who self–identify as gay male is strongly correlated with the sexual orientation of that user, and we developed a logistic regression classifier with strong predictive power. Although we studied Facebook friendship ties, network data is pervasive in the broader context of computer–mediated communication, raising significant privacy issues for communication technologies to which there are no neat solutions.”

As we had previously noted, this datamining exercise only accesses information that Facebook users explicitly choose to make public. The authors note that their analysis “relies on public self–identification of same–gender interest in Facebook profiles as a sentinel value for LGB identity”. The privacy vulnerability is that the default setting for a Facebook account is that friendship relations are public and you can not control the privacy settings of your friends. So if your leave your friend list public and many of your Facebook friends open up their profiles, it may be possible to draw reasonable inferences about your age, gender, political leanings, sexual preferences and other attributes.

Open problems in metabiology.
(We are all random walks in program space.)

October 4th, 2009

Gregory Chaitin is on tour promoting his new field – metabiology. As Chaitin conceives it, metabiology is the study of the evolution of computer programs, with the goal of proving theorems concerning the circumstances under which evolution occurs. It’s ultimate goal, as the name suggests, is proving that under Earth-like conditions, DNA-based computers must evolve.

Key to Chaitin’s notion of evolution is something he calls creativity, and he explored this idea a little bit in a talk at the University of Toronto’s Centre for Mathematical Medicine. To understand his first theorems in this area, you need to (roughly) understand the Busy Beaver problem of Tibor Rado. A good precis is here. Essentially, a busy beaver is a Turing machine that operates as long as possible, and then halts. The Busy Beaver function, BB(n), is the highest whole number produced by an n-bit busy beaver.

So, to Chaitin’s first theorem in metabiology …

He begins with a single organism – a Turing machine. He mutates this organism, and then either keeps the original and throws away the mutant, or vise-versa, depending on which is more fit. The fitness function is based on the Busy Beaver problem. If the mutant halts, and, upon halting, produces a higher whole number than the original, then the mutant wins. If not, it loses.

Now, BB(n) is uncomputable. In fact, it has no computable bound. Nevertheless, Chaitin shows that random mutations will, in exponential time (on the number of bits, n, in the organism), result in the computation of the Busy Beaver function for n!
(That was an exclamation point, not a factorial sign.)

In other words, evolution causes fitness to increase faster than any computable function. Chaitin calls this “evidence of biological creativity”. This is a nice result, but is one that Chaitin finds less than satisfactory. In real life evolution is cumulative, while Chaitin’s proof requires assuming that evolution sometimes starts over from scratch. He really wants to prove an evolutionary process that is, in some sense, cumulative, in addition to being creative. His second theorem uses his infamous halting probability, Ω, to construct a cumulative path through program space to arbitrary levels of complexity. But this also doesn’t satisfy Chaitin, since the process is unstable, in a sense that he didn’t really explain.

Beyond these two theorems, the field is open. Things to work on seem to be:

i. Without changing the model, can Chaitin’s desired result (cumulative evolution) be proved?

ii. Part of the utility of Chaitin’s fitness function is that it explicitly rewards complexity. This fits with the observation that life, in general, evolves to become more complex. But complexity is, I think, typically seen as an epiphenomenon of fitness, and not as the very definition of fitness. Can a “Darwinian” fitness function be chosen such that complexity is not explicitly rewarded AND such that life can be proven to evolve to arbitrary complexity?

iii. Once we exhaust the limits of what we can prove without an environment, what happens when we introduce an environment, which interacts with the organism, exchanges information with the organism, and which can change, suddenly or gradually?

Of course, (iii) might not be necessary. If (ii) can be proven, then, in a sense, case closed: life must evolve. Some might even say that (ii) isn’t necessary.

But I suspect that Chaitin expects (i) to be very hard. Hence his enthusiasm. In fact, I suspect that he suspects that Ω in going to be all over metabiology, and that some of its fundamental questions will prove to be (mathematically) unknowable.

But algorithmic information theory (AIT) is only one extra-biological approach to evolution. Another is thermodynamics. Eric Chaisson, for example, argues that the Earth, bathed in solar radiation, has a natural tendency towards lower entropy and higher complexity. Is an AIT/Thermodynamics synthesis possible? Google says: Yes, (and it’s been around a while).

Blackbook, a graph analytic platform for semantic web data

October 3rd, 2009

In next week’s ebiquity meeting (10:15 EDT Tue 10/6), Lance Byrd and Set Cruz will talk about Blackbook, a graph analytic processing platform for semantic web data.

Blackbook3 is an RDF middleware framework for integrating data and executing algorithms that relies on open standards and “best-of-breed” open source technologies, including Jena, Lucene, JAAS, D2RQ, Hadoop, HBase and Solr. Blackbook3 has a plug-and-play, loosely–coupled architecture, supports SOAP and REST interfaces, offers SPARQL and linked data endpoints and can run in environments where high confidentiality is required.

The talk will discuss the current and future use cases for Blackbook3 as well as broader knowledge discovery and dissemination issues for RDF applications. You can participate remotely via dimdim starting at 10:15 EDT October 6.

Free draft of new Easley/Kleinberg book on Networks, Crowds, and Markets

October 1st, 2009

David Easley and Jon Kleinberg have made available a free pre-publication draft of a new book, Networks, Crowds, and Markets: Reasoning About a Highly Connected World, to be published by Cambridge University Press in 2010. The book is based on an inter-disciplinary undergraduate course, Networks, that they teach at Cornell.

They say about their book

“Over the past decade there has been a growing public fascination with the complex “connectedness” of modern society. This connectedness is found in many incarnations: in the rapid growth of the Internet and the Web, in the ease with which global communication now takes place, and in the ability of news and information as well as epidemics and financial crises to spread around the world with surprising speed and intensity. These are phenomena that involve networks, incentives, and the aggregate behavior of groups of people; they are based on the links that connect us and the ways in which each of our decisions can have subtle consequences for the outcomes of everyone else.
    Networks, Crowds, and Markets combines different scientific perspectives in its approach to understanding networks and behavior. Drawing on ideas from economics, sociology, computing and information science, and applied mathematics, it describes the emerging field of study that is growing at the interface of all these areas, addressing fundamental questions about how the social, economic, and technological worlds are connected.”

Download the 828-page (!) draft of Networks, Crowds, and Markets in pdf here.