Archive for the 'Big data' Category
February 15th, 2015, by Tim Finin, posted in Big data
ACM Tech Talk
Studying Internet Latency via TCP Queries to DNS
Dr. Yannis Labrou
Principal Data Architect, Verisign
1:30-2:30pm Friday, 27 February 2015, ITE 456, UMBC
Every day Verisign processes upwards of 100 billion authoritative DNS requests for .COM and .NET from all corners of the earth. The vast majority of these requests are via the UDP protocol. Because UDP is connectionless, it is impossible to passively estimate the latency of the UDP-based requests. A very small percentage of these requests though, are over TCP, thus providing the means to estimate the latency of specific requests and paths for a subset of the hosts that interact with Verisign’s network infrastructure.
In this work, we combine this relatively small number of datapoints from TCP (on the order of a few hundred million per day) with the much larger dataset of all DNS requests. Our focus is the process of data analysis of real world, imperfect data at very large scale with the goals of understanding network latency at an unprecedented magnitude, identifying large volume, high latency clients and improving their latency. We discuss the techniques we used for data selection and analysis and we present the results of a variety of analyses, such as deriving regional and country patterns, estimations for query latency for different countries and network locations, and techniques for identifying high latency clients.
It is important to note that latency results we will report are based on passive measurements from, essentially, the entire Internet. For this experiment we do not have control over the client side — where they are, which software, their configuration, their network congestion. This is significantly different from latency studied in any active measurement infrastructure such as Planet Lab, RIPE Atlas, Thousand Eyes, Catchpoint, etc.
Dr. Yannis Labrou is Principal Data Architect at Verisign Labs where he leads efforts to create value from the wealth of data that Verisign’s operations generate every day. He brings to Verisign 20 years of experience in conceiving, creating and bringing to fruition innovations; combining thinking big with laboring through the pains of materializing ideas. He has done so in an academic environment, at a startup company, while conducting government and DoD/DARPA sponsored research and for a global Fortune 200 company.
Before joining Verisign, Dr. Labrou was a Senior Researcher at Fujitsu Laboratories of America, Director of Technology and member of the executive staff of PowerMarket, an enterprise application software start-up company and a Research Assistant Professor at UMBC. He received his Ph.D. in Computer Science from UMBC, where his research focused on software agents, and a Diploma in Physics from the University of Athens, Greece. He has authored more than 40 peer-reviewed publications, with almost 4000 citations and he has been awarded 14 patents from the USPTO. His current research focus is data through the entire lifecycle from generation to monetization.
– more information and directions: http://bit.ly/UMBCtalks –
January 14th, 2015, by Tim Finin, posted in Agents, AI, Big data, Ontologies, Semantic Web, Web
The theme of the 2015 Ontology Summit is Internet of Things: Toward Smart Networked Systems and Societies. The Ontology Summit is an annual series of events (first started by Ontolog and NIST in 2006) that involve the ontology community and communities related to each year’s theme.
The 2015 Summit will hold a virtual discourse over the next three months via mailing lists and online panel sessions augmented conference calls. The Summit will culminate in a two-day face-to-face workshop on 13-14 April 2015 in Arlington, VA. The Summit’s goal is to explore how ontologies can play a significant role in the realization of smart networked systems and societies in the Internet of Things.
The Summit’s initial launch session will take place from 12:30pm to 2:00pm EDT on Thursday, January 15th and will include overview presentations from each of the four technical tracks. See the 2015 Ontology Summit for more information, the schedule and details on how to participate in these free an open events.
January 3rd, 2015, by Tim Finin, posted in Big data, NLP
click image for higher-resolution version
Mark Liberman pointed out a nice use of pmi to explore the difference in meaning of geek vs. nerd done last year by Burr Settles using Twitter data.
Settles’s original post, On “Geek” Versus “Nerd”, has a brief, but good, explanation of the method and data.
July 15th, 2014, by Tim Finin, posted in Big data, KR, Ontologies, RDF, Semantic Web
In The trouble with DBpedia, Paul Houle talks about the problems he sees in DBpedia, Freebase and Wikidata and offers up :BaseKB as a better “generic database” that models concepts that are in people’s shared consciousness.
:BaseKB is a purified version of Freebase which is compatible with industry-standard RDF tools. By removing hundreds of millions of duplicate, invalid, or unnecessary facts, :BaseKB users speed up their development cycles dramatically when compared to the source Freebase dumps.
:BaseKB is available for commercial and academic use under a CC-BY license. Weekly versions (:BaseKB Now) can be downloaded from Amazon S3 on a “requester-paid basis”, estimated at $3.00US per download. There are also BaseKB Gold releases which are periodic :BaseKB Now snapshots. These can be downloaded free via Bittorrent or purchased as a Blu Ray disc.
It looks like it’s worth checking out!
February 26th, 2014, by Tim Finin, posted in Big data, Google
Google is offering a free, online MOOC style course on ‘Making Sense of Data‘ from March 18 to April 4 taught by Amit Deutsch (Google) and Joe Hellerstein (Berkeley).
Interestingly, it doesn’t require programming or database skills: “Basic familiarity with spreadsheets and comfort using a web browser is recommended. Knowledge of statistics and experience with programming are not required.” The course will use Google’s Fusion Tables service for managing and visualizing data
February 8th, 2014, by Tim Finin, posted in Big data, High performance computing, Ontologies, Semantic Web
In the first Ebiquity meeting of the semester, Vlad Korolev will talk about his work on using RDF for to capture, represent and use provenance information for big data experiments.
PROB: A tool for Tracking Provenance and Reproducibility of Big Data Experiments
10-11:30am, ITE346, UMBC
Reproducibility of computations and data provenance are very important goals to achieve in order to improve the quality of one’s research. Unfortunately, despite some efforts made in the past, it is still very hard to reproduce computational experiments with high degree of certainty. The Big Data phenomenon in recent years makes this goal even harder to achieve. In this work, we propose a tool that aids researchers to improve reproducibility of their experiments through automated keeping of provenance records.
January 18th, 2014, by Tim Finin, posted in Big data, Datamining, Machine Learning, Semantic Web
A free PDF version of the new second edition of Mining of Massive Datasets by Anand Rajaraman, Jure Leskovec and Jeffey Ullman is available. New chapters on mining large graphs, dimensionality reduction, and machine learning have been added. Related material from Professor Leskovec’s recent Stanford course on Mining Massive Data Sets is also available.
January 14th, 2014, by Tim Finin, posted in Big data, KR, Ontologies, Semantic Web
The ninth Ontology Summit starts on Thursday, January 16 with the theme “Big Data and Semantic Web Meet Applied Ontology.” The event kicks off a three month series of weekly online meetings on Thursdays that feature presentations from expert panels and discussions with all of the participants. The series will culminate with a two day symposium on April 28-29 in Arlington VA. The sessions are free and open to all, including researchers, practitioners and students.
The first virtual meeting will be held 12:30-
2:00 2:30 (EST) on Thursday, January 16 and will introduce the nine different topical tracks in the series, their goals and organizers. Audio connection is via phone (206-402-0100, 141184#) or Skype with a shared screen and participant chatroom. See the session page for more details.
This year’s Ontology Summit is an opportunity for building bridges between the Semantic Web, Linked Data, Big Data, and Applied Ontology communities. On the one hand, the Semantic Web, Linked Data, and Big Data communities can bring a wide array of real problems (such as performance and scalability challenges and the variety problem in Big Data) and technologies (automated reasoning tools) that can make use of ontologies. On the other hand, the Applied Ontology community can bring a large body of common reusable content (ontologies) and ontological analysis techniques. Identifying and overcoming ontology engineering bottlenecks is critical for all communities.
The 2014 Ontology Summit is chaired by Michael Gruninger and Leo Obrst.
You are currently browsing the archives for the Big data category.