 | UMBC eBiquity Blog 
Tim Finin, 3:50pm 6 November 2009
UMBC alumnus Joab Jackson has an article in Government Computer News, Tim Berners-Lee: Machine-readable Web still a ways off, reporting on the International Semantic Web Conference help outside of Washington DC at the end of October. The article uses data.gov to illustrate the challenges and opportunities for the Semantic Web. Data.gov is a site whose purpose “is to increase public access to high value, machine readable datasets generated by the Executive Branch of the Federal Government.”
Jackson quotes Tim Berners-Lee
“When you look at putting government data on the Web, one of the concerns is … to not just put it out there on Excel files on Data.gov,” he said. “You should put these things in” the Resource Description Framework.
and later describes a project at RPI to republish information from data.gov in RDF leaded by another UMBC alumnus, Li Ding.
“Our goal is to make the whole thing shareable and replicable for others to re-use,” said project researcher Li Ding. By rendering data into RDF, it can be more easily interposed with other sets of data to create entirely new datasets and visualizations, Ding said. He showed a Google Map-based graphic that interposed RDF-versions of two different data sources from the Environmental Protection Agency, originally rendered in CSV files.
Categories: Semantic Web,
Tags: linked data,
Related posts: • Joel Sachs on Linked Data, 10:30am Oct 1, ITE 325b; • Reuters Calais to support Semantic Web Linked Data in next release; • New York Times publishes Linked Open Data; Comments: none
Tim Finin, 9:40am 5 November 2009
This post on the CACM Blog caught my eye and shows that we still have a long way to go before computing is taken seriously in US secondary education, let alone K-12.
AP CS no Longer Counts for High School Graduation in Georgia (for now)
“Up until September, Georgia and Texas were the (only) two states in the US that accepted a computer science course as fulfilling high school graduation requirements. In Texas, the Advanced Placement Computer Science (AP CS) course fulfilled a mathematics requirement. In Georgia, it fulfilled a fourth science course requirement. As of October, however, Georgia has rescinded that decision. … ”
I wonder how other countries treat computing and informatics in primary and secondary education.
Categories: CS, GENERAL,
Tags: Computer Science; education,
Related posts: • College Board eliminates AP computer science AB test; • TBL joins Southhampton; • Computer Science 2.0; Comments: none
Tim Finin, 8:46am 5 November 2009
Google added a great new service, Dashboard, that summarizes data stored for a Google account — see MY ACCOUNT>PERSONAL SETTINGS>DASHBOARD.
“Designed to be simple and useful, the Dashboard summarizes data for each product that you use (when signed in to your account) and provides you direct links to control your personal settings. Today, the Dashboard covers more than 20 products and services, including Gmail, Calendar, Docs, Web History, Orkut, YouTube, Picasa, Talk, Reader, Alerts, Latitude and many more. The scale and level of detail of the Dashboard is unprecedented, and we’re delighted to be the first Internet company to offer this — and we hope it will become the standard.”
This is a good move on Google’s part. But while there’s a lot of information included, it’s not everything that Google knows about you — e.g., data in cookies, click throughs data from search results and information from companies it’s acquired, like Doublclick. Still, it is a big step in a positive direction.
Categories: Google, Privacy, Semantic Web, Social media, Web,
,
Related posts: • Analyzing AOL search data shows click through rates for search rank; • Some Germans think Google knows too much; • RSS extensions by Google Base; Comments: 2
Tim Finin, 7:45am 4 November 2009
Yesterday was the first time a truly voter verifiable voting system was used in any binding government election, thanks in part to work being carried out at UMBC’s Cyber Defense Lab under the direction of Alan Sherman.
Takoma Park, MD used the Scantegrity system for its municipal election after testing it in a mock election last April. Technology Review has a story, First Test for Election Cryptography, that quotes Anne Sergeant, the chair of the Takoma Park board of elections
“Before trying Scantegrity in an official election, the city held a mock vote in April to work out kinks in the system. In that test, she says, about 30 percent of participants went home and used the system to verify their votes. Sergeant says that Scantegrity representatives talked extensively with voters and election officials after the April test and have improved their system accordingly. “I hope we can provide an experience where people walk away and say, ‘That was awesome,’” she says. “It’s a goal to which we aspire.”
The Scantegrity system was created by a group of universities, including UMBC. A voter uses a paper ballot marked with invisible ink, which is exposed with a special marker. That marker reveals a code, which the voter can then use to check online whether their vote was tabulated correctly.
Ben Adida has been auditing the election and documenting the process on his blog.
See also the ComputerWorld story, E-voting system lets voters verify their ballots are counted, and audio report on WAMU.
Categories: Security, Social media,
,
Related posts: • Scantegrity cryptographic voting system to be used in binding governmental election; • 2007 Collegiate Voting Systems Competition; • UMBC Professor Alan Sherman on electronic voting; Comments: none
Tim Finin, 1:00pm 30 October 2009
Like many newspapers, the New York Times links the first mention of well known entitles in its articles to a reference page. For example, a mention of Barack Obama links to a page which is a collection of basic information on President Obama and links to relevant stories and other resources that the Times has created.
Now the Times is also using RDF to publish some of information as linked open data. Yesterday the Times announced the publication of an LOD collection covering about 5,000 people at http://data.nytimes.com/ under under a Creative Commons 3.0 Attribution License and plan to put their full collection of 30K topics online soon.
“Over the last several months we have manually mapped more than 5,000 person name subject headings onto Freebase and DBPedia. And today we are pleased to announce the launch of http://data.nytimes.com and the release of these 5,000 person name subject headings as Linked Open Data.
…
Over the next several months, we plan to expand http://data.nytimes.com to include each of the nearly 30,000 subject headings we use to power Times Topics pages, a collection that includes locations, organizations and descriptors in addition to person names.”
Categories: Ontologies, RDF, Semantic Web,
,
Related posts: • New API to make the New York Times programmable; • Joel Sachs on Linked Data, 10:30am Oct 1, ITE 325b; • Blogrunner: the New York Times robot in the newsroom; Comments: none
Tim Finin, 7:21pm 29 October 2009
DARPA will hold the DARPA Network Challenge to explore how “broad-scope problems can be solved using Internet-based technologies.
“To mark the 40th anniversary of the Internet, DARPA has announced the DARPA Network Challenge, a competition that will explore the role the Internet and social networking plays in the timely communication, wide area team-building and urgent mobilization required to solve broad scope, time-critical problems.
The challenge is to be the first to submit the locations of ten moored, 8 foot, red weather balloons located at ten fixed locations in the continental United States. Balloons will be in readily accessible locations and visible from nearby roadways.”
According to the rules, the balloons will be on display from 10:00AM to 4:00PM on Saturday, 5 December 2009. A prize of $40,000 will be awarded to the first participant to submit the latitude and longitude of all ten weather balloons within the contest period, which ends on 14 December 2009.
Categories: Social media, Web,
Tags: darpa; internet,
Related posts: • DARPA Grand Challenge; • Darpa Grand Challenge qualifiers; • DARPA’s Tony Tether on Urban Challenge and Computer Science research; Comments: one
Tim Finin, 11:05pm 27 October 2009
OWL 2, the new version of the Web Ontology Language, officially became a W3C standard yesterday. From the W3C press release:
“Today W3C announces a new version of a standard for representing knowledge on the Web. OWL 2, part of W3C’s Semantic Web toolkit, allows people to capture their knowledge about a particular domain (say, energy or medicine) and then use tools to manage information, search through it, and learn more from it. Furthermore, as an open standard based on Web technology, it lowers the cost of merging knowledge from multiple domains.”
Categories: AI, KR, OWL, Ontologies, Semantic Web,
,
Related posts: • W3C anounces RDFa as a candidate recommendation; • Cleverset recomendation engine uses statistical relational learning; • Netflix to release user rating data; Comments: none
Tim Finin, 9:32pm 25 October 2009
Golden Balls is a UK game show with a final round, Split or Steal, that is similar to the prisoner’s dilemma. The two contestants have to simultaneously choose to split the prize or try to steal it. If both choose split, they each get half. If one chooses split and the other steal, than the stealer gets it all. If they both choose steal, neither gets anything. While the payoff matrix is not exactly that for the PD, it has a similar effect on the strategy. Check out this video of a Split or Steal round for £100,000. (Spotted on Hacker News)
Categories: AI, Agents, Social media,
,
Related posts: • UMBC hosts Baltimore site for two day Global Game Jam; • the evolution of cooperative behaviour; • Game theoretic analysis of the toilet seat problem; Comments: none
Tim Finin, 12:16am 16 October 2009
Wolfram|Alpha is an interesting query answering system developed by Wolfram Research that is a blend of a question answering system and a Semantic Web alternative. It tries to interpret and answer queries expressed as a sequence of words from a large collection of interlinked tables. Oh, and Mathematica is in thrown in for free. A free Web version was released last Spring.
The news today is that Wolfram|Alpha has released an API, as noted in their blog:
“The API allows your application to interact with Wolfram|Alpha much like you do on the web—you send a web request with the same query string you would type into Wolfram|Alpha’s query box and you get back the same computed results. It’s just that both are in a form your application can understand. There are plenty of ways to tweak and control the results, as well.”
The pricing plan runs from $60/month for 1000 (6 cents a query) queries to $220K for up to 10M queries/month (2.2 cents a query). programming language bindings are available for Java, PHP, Perl, Python, Ruby and .NET.
Their original web interface remains free, but the TOS specifies that it “may be used only by a human being using a conventional web browser to manually enter queries one at a time.”
Categories: AI, KR, NLP, Ontologies, Semantic Web,
,
Related posts: • Wolfram Alpha is live, API description online; • Dust up in DC!; • Wolfram Alpha: an alternative to Google, the Semantic Web and Cyc?; Comments: none
Tim Finin, 8:00am 6 October 2009
In the Fall of 2007, two MIT students carried out a class project exploring how presumably private data could be inferred from an online social networking system. Their experiment was to predict the sexual orientation of Facebook users who make their basic information public by analyzing friendship associations. As reported in the Boston Globe last month, the students’ had not yet published their results.
Well, now they have — in the October issue of the First Monday, “one of the first openly accessible, peer–reviewed journals on the Internet”.
The paper has a lot of detail on the methodology for collecting the data and how it was analyzed. Here’s the abstract.
“Public information about one’s coworkers, friends, family, and acquaintances, as well as one’s associations with them, implicitly reveals private information. Social networking Web sites, e–mail, instant messaging, telephone, and VoIP are all technologies steeped in network data — data relating one person to another. Network data shifts the locus of information control away from individuals, as the individual’s traditional and absolute discretion is replaced by that of his social network. Our research demonstrates a method for accurately predicting the sexual orientation of Facebook users by analyzing friendship associations. After analyzing 4,080 Facebook profiles from the MIT network, we determined that the percentage of a given user’s friends who self–identify as gay male is strongly correlated with the sexual orientation of that user, and we developed a logistic regression classifier with strong predictive power. Although we studied Facebook friendship ties, network data is pervasive in the broader context of computer–mediated communication, raising significant privacy issues for communication technologies to which there are no neat solutions.”
As we had previously noted, this datamining exercise only accesses information that Facebook users explicitly choose to make public. The authors note that their analysis “relies on public self–identification of same–gender interest in Facebook profiles as a sentinel value for LGB identity”. The privacy vulnerability is that the default setting for a Facebook account is that friendship relations are public and you can not control the privacy settings of your friends. So if your leave your friend list public and many of your Facebook friends open up their profiles, it may be possible to draw reasonable inferences about your age, gender, political leanings, sexual preferences and other attributes.
Categories: Machine Learning, Privacy, Semantic Web, Social media,
Tags: Datamining; Facebook; social graph; social network,
Related posts: • Changes in FaceBook default privacy policy; • Project Gaydar and privacy in Facebook and other online social networking systems; • Canada: facebook violates privacy law; Comments: 2
joel, 9:57pm 4 October 2009
Gregory Chaitin is on tour promoting his new field – metabiology. As Chaitin conceives it, metabiology is the study of the evolution of computer programs, with the goal of proving theorems concerning the circumstances under which evolution occurs. It’s ultimate goal, as the name suggests, is proving that under Earth-like conditions, DNA-based computers must evolve.
Key to Chaitin’s notion of evolution is something he calls creativity, and he explored this idea a little bit in a talk at the University of Toronto’s Centre for Mathematical Medicine. To understand his first theorems in this area, you need to (roughly) understand the Busy Beaver problem of Tibor Rado. A good precis is here. Essentially, a busy beaver is a Turing machine that operates as long as possible, and then halts. The Busy Beaver function, BB(n), is the highest whole number produced by an n-bit busy beaver.
So, to Chaitin’s first theorem in metabiology …
He begins with a single organism – a Turing machine. He mutates this organism, and then either keeps the original and throws away the mutant, or vise-versa, depending on which is more fit. The fitness function is based on the Busy Beaver problem. If the mutant halts, and, upon halting, produces a higher whole number than the original, then the mutant wins. If not, it loses.
Now, BB(n) is uncomputable. In fact, it has no computable bound. Nevertheless, Chaitin shows that random mutations will, in exponential time (on the number of bits, n, in the organism), result in the computation of the Busy Beaver function for n!
(That was an exclamation point, not a factorial sign.)
In other words, evolution causes fitness to increase faster than any computable function. Chaitin calls this “evidence of biological creativity”. This is a nice result, but is one that Chaitin finds less than satisfactory. In real life evolution is cumulative, while Chaitin’s proof requires assuming that evolution sometimes starts over from scratch. He really wants to prove an evolutionary process that is, in some sense, cumulative, in addition to being creative. His second theorem uses his infamous halting probability, Ω, to construct a cumulative path through program space to arbitrary levels of complexity. But this also doesn’t satisfy Chaitin, since the process is unstable, in a sense that he didn’t really explain.
Beyond these two theorems, the field is open. Things to work on seem to be:
i. Without changing the model, can Chaitin’s desired result (cumulative evolution) be proved?
ii. Part of the utility of Chaitin’s fitness function is that it explicitly rewards complexity. This fits with the observation that life, in general, evolves to become more complex. But complexity is, I think, typically seen as an epiphenomenon of fitness, and not as the very definition of fitness. Can a “Darwinian” fitness function be chosen such that complexity is not explicitly rewarded AND such that life can be proven to evolve to arbitrary complexity?
iii. Once we exhaust the limits of what we can prove without an environment, what happens when we introduce an environment, which interacts with the organism, exchanges information with the organism, and which can change, suddenly or gradually?
Of course, (iii) might not be necessary. If (ii) can be proven, then, in a sense, case closed: life must evolve. Some might even say that (ii) isn’t necessary.
But I suspect that Chaitin expects (i) to be very hard. Hence his enthusiasm. In fact, I suspect that he suspects that Ω in going to be all over metabiology, and that some of its fundamental questions will prove to be (mathematically) unknowable.
But algorithmic information theory (AIT) is only one extra-biological approach to evolution. Another is thermodynamics. Eric Chaisson, for example, argues that the Earth, bathed in solar radiation, has a natural tendency towards lower entropy and higher complexity. Is an AIT/Thermodynamics synthesis possible? Google says: Yes, (and it’s been around a while).
Categories: GENERAL, Metabiology, Theory of computation,
Tags: AIT; evolution; metabiology,
Related posts: • The Google walks from London to Paris; • A-Space: a social networking site for intelligence analysts; • Lecture notes on AI metaheuristic algorithms; Comments: none
Tim Finin, 10:34am 3 October 2009
In next week’s ebiquity meeting (10:15 EDT Tue 10/6), Lance Byrd and Set Cruz will talk about Blackbook, a graph analytic processing platform for semantic web data.
Blackbook3 is an RDF middleware framework for integrating data and executing algorithms that relies on open standards and “best-of-breed” open source technologies, including Jena, Lucene, JAAS, D2RQ, Hadoop, HBase and Solr. Blackbook3 has a plug-and-play, loosely–coupled architecture, supports SOAP and REST interfaces, offers SPARQL and linked data endpoints and can run in environments where high confidentiality is required.
The talk will discuss the current and future use cases for Blackbook3 as well as broader knowledge discovery and dissemination issues for RDF applications. You can participate remotely via dimdim starting at 10:15 EDT October 6.
Categories: Semantic Web, Social media, Web,
,
Related posts: • Google planning to leverage social network data ?; • Semantic Web job trends; • Parallel Semantic Search; Comments: 3
|  |
|  |