Settles’s original post, On “Geek” Versus “Nerd”, has a brief, but good, explanation of the method and data.
In The trouble with DBpedia, Paul Houle talks about the problems he sees in DBpedia, Freebase and Wikidata and offers up :BaseKB as a better “generic database” that models concepts that are in people’s shared consciousness.
:BaseKB is a purified version of Freebase which is compatible with industry-standard RDF tools. By removing hundreds of millions of duplicate, invalid, or unnecessary facts, :BaseKB users speed up their development cycles dramatically when compared to the source Freebase dumps.
:BaseKB is available for commercial and academic use under a CC-BY license. Weekly versions (:BaseKB Now) can be downloaded from Amazon S3 on a “requester-paid basis”, estimated at $3.00US per download. There are also BaseKB Gold releases which are periodic :BaseKB Now snapshots. These can be downloaded free via Bittorrent or purchased as a Blu Ray disc.
It looks like it’s worth checking out!
Google is offering a free, online MOOC style course on ‘Making Sense of Data‘ from March 18 to April 4 taught by Amit Deutsch (Google) and Joe Hellerstein (Berkeley).
Interestingly, it doesn’t require programming or database skills: “Basic familiarity with spreadsheets and comfort using a web browser is recommended. Knowledge of statistics and experience with programming are not required.” The course will use Google’s Fusion Tables service for managing and visualizing data
In the first Ebiquity meeting of the semester, Vlad Korolev will talk about his work on using RDF for to capture, represent and use provenance information for big data experiments.
PROB: A tool for Tracking Provenance and Reproducibility of Big Data Experiments
10-11:30am, ITE346, UMBC
Reproducibility of computations and data provenance are very important goals to achieve in order to improve the quality of one’s research. Unfortunately, despite some efforts made in the past, it is still very hard to reproduce computational experiments with high degree of certainty. The Big Data phenomenon in recent years makes this goal even harder to achieve. In this work, we propose a tool that aids researchers to improve reproducibility of their experiments through automated keeping of provenance records.
A free PDF version of the new second edition of Mining of Massive Datasets by Anand Rajaraman, Jure Leskovec and Jeffey Ullman is available. New chapters on mining large graphs, dimensionality reduction, and machine learning have been added. Related material from Professor Leskovec’s recent Stanford course on Mining Massive Data Sets is also available.
The ninth Ontology Summit starts on Thursday, January 16 with the theme “Big Data and Semantic Web Meet Applied Ontology.” The event kicks off a three month series of weekly online meetings on Thursdays that feature presentations from expert panels and discussions with all of the participants. The series will culminate with a two day symposium on April 28-29 in Arlington VA. The sessions are free and open to all, including researchers, practitioners and students.
The first virtual meeting will be held 12:30-
2:00 2:30 (EST) on Thursday, January 16 and will introduce the nine different topical tracks in the series, their goals and organizers. Audio connection is via phone (206-402-0100, 141184#) or Skype with a shared screen and participant chatroom. See the session page for more details.
This year’s Ontology Summit is an opportunity for building bridges between the Semantic Web, Linked Data, Big Data, and Applied Ontology communities. On the one hand, the Semantic Web, Linked Data, and Big Data communities can bring a wide array of real problems (such as performance and scalability challenges and the variety problem in Big Data) and technologies (automated reasoning tools) that can make use of ontologies. On the other hand, the Applied Ontology community can bring a large body of common reusable content (ontologies) and ontological analysis techniques. Identifying and overcoming ontology engineering bottlenecks is critical for all communities.