Exploring the meanings of geek vs. nerd

January 3rd, 2015

click image for higher-resolution version

Mark Liberman pointed out a nice use of pmi to explore the difference in meaning of geek vs. nerd done last year by Burr Settles using Twitter data.

Settles’s original post, On “Geek” Versus “Nerd”, has a brief, but good, explanation of the method and data.

:BaseKB offered as a better Freebase version

July 15th, 2014


In The trouble with DBpedia, Paul Houle talks about the problems he sees in DBpedia, Freebase and Wikidata and offers up :BaseKB as a better “generic database” that models concepts that are in people’s shared consciousness.

:BaseKB is a purified version of Freebase which is compatible with industry-standard RDF tools. By removing hundreds of millions of duplicate, invalid, or unnecessary facts, :BaseKB users speed up their development cycles dramatically when compared to the source Freebase dumps.

:BaseKB is available for commercial and academic use under a CC-BY license. Weekly versions (:BaseKB Now) can be downloaded from Amazon S3 on a “requester-paid basis”, estimated at $3.00US per download. There are also BaseKB Gold releases which are periodic :BaseKB Now snapshots. These can be downloaded free via Bittorrent or purchased as a Blu Ray disc.

It looks like it’s worth checking out!

Google MOOC: Making Sense of Data

February 26th, 2014

Google is offering a free, online MOOC style course on ‘Making Sense of Data‘ from March 18 to April 4 taught by Amit Deutsch (Google) and Joe Hellerstein (Berkeley).

Interestingly, it doesn’t require programming or database skills: “Basic familiarity with spreadsheets and comfort using a web browser is recommended. Knowledge of statistics and experience with programming are not required.” The course will use Google’s Fusion Tables service for managing and visualizing data

Tracking Provenance and Reproducibility of Big Data Experiments

February 8th, 2014

In the first Ebiquity meeting of the semester, Vlad Korolev will talk about his work on using RDF for to capture, represent and use provenance information for big data experiments.

PROB: A tool for Tracking Provenance and Reproducibility of Big Data Experiments

10-11:30am, ITE346, UMBC

Reproducibility of computations and data provenance are very important goals to achieve in order to improve the quality of one’s research. Unfortunately, despite some efforts made in the past, it is still very hard to reproduce computational experiments with high degree of certainty. The Big Data phenomenon in recent years makes this goal even harder to achieve. In this work, we propose a tool that aids researchers to improve reproducibility of their experiments through automated keeping of provenance records.

Free copy of Mining Massive Datasets

January 18th, 2014

A free PDF version of the new second edition of Mining of Massive Datasets by Anand Rajaraman, Jure Leskovec and Jeffey Ullman is available. New chapters on mining large graphs, dimensionality reduction, and machine learning have been added. Related material from Professor Leskovec’s recent Stanford course on Mining Massive Data Sets is also available.

2014 Ontology Summit: Big Data and Semantic Web Meet Applied Ontology

January 14th, 2014


The ninth Ontology Summit starts on Thursday, January 16 with the theme “Big Data and Semantic Web Meet Applied Ontology.” The event kicks off a three month series of weekly online meetings on Thursdays that feature presentations from expert panels and discussions with all of the participants. The series will culminate with a two day symposium on April 28-29 in Arlington VA. The sessions are free and open to all, including researchers, practitioners and students.

The first virtual meeting will be held 12:30-2:00 2:30 (EST) on Thursday, January 16 and will introduce the nine different topical tracks in the series, their goals and organizers. Audio connection is via phone (206-402-0100, 141184#) or Skype with a shared screen and participant chatroom. See the session page for more details.

This year’s Ontology Summit is an opportunity for building bridges between the Semantic Web, Linked Data, Big Data, and Applied Ontology communities. On the one hand, the Semantic Web, Linked Data, and Big Data communities can bring a wide array of real problems (such as performance and scalability challenges and the variety problem in Big Data) and technologies (automated reasoning tools) that can make use of ontologies. On the other hand, the Applied Ontology community can bring a large body of common reusable content (ontologies) and ontological analysis techniques. Identifying and overcoming ontology engineering bottlenecks is critical for all communities.

The 2014 Ontology Summit is chaired by Michael Gruninger and Leo Obrst.