<?xml version="1.0"?>

<!DOCTYPE owl [
  <!ENTITY rdf "http://www.w3.org/1999/02/22-rdf-syntax-ns#">
  <!ENTITY rdfs "http://www.w3.org/2000/01/rdf-schema#">
  <!ENTITY xsd "http://www.w3.org/2001/XMLSchema#">
  <!ENTITY owl "http://www.w3.org/2002/07/owl#">
  <!ENTITY cc "http://web.resource.org/cc/#">
  <!ENTITY event "http://ebiquity.umbc.edu/ontology/event.owl#">
  <!ENTITY person "http://ebiquity.umbc.edu/ontology/person.owl#">
  <!ENTITY assert "http://ebiquity.umbc.edu/ontology/assertion.owl#">]>

<!--
  This ontology document is licensed under the Creative Commons
  Attribution License. To view a copy of this license, visit
  http://creativecommons.org/licenses/by/2.0/ or send a letter to
  Creative Commons, 559 Nathan Abbott Way, Stanford, California
  94305, USA.
-->

<rdf:RDF 
  xmlns:rdf = "&rdf;"
  xmlns:rdfs = "&rdfs;"
  xmlns:xsd = "&xsd;"
  xmlns:owl = "&owl;"
  xmlns:cc = "&cc;"
  xmlns:event = "&event;"
  xmlns:person = "&person;"
  xmlns:assert = "&assert;">
  <event:Event rdf:about="http://ebiquity.umbc.edu/event/html/id/238/Probabilistic-Approximate-Algorithms-for-Distributed-Data-Mining-in-Peer-to-Peer-Networks">
    <rdfs:label><![CDATA[Probabilistic Approximate Algorithms for Distributed Data Mining in Peer-to-Peer Networks]]></rdfs:label>
    <event:title><![CDATA[Probabilistic Approximate Algorithms for Distributed Data Mining in Peer-to-Peer Networks]]></event:title>
    <event:speaker><person:PhDStudent rdf:about="http://ebiquity.umbc.edu/person/html/id/421/"><person:name><![CDATA[Souptik Datta ]]></person:name><rdfs:label><![CDATA[Souptik Datta ]]></rdfs:label></person:PhDStudent></event:speaker>
    <event:startDate rdf:datatype="&xsd;dateTime">2008-04-28T11:15:00-05:00</event:startDate>
    <event:abstract><![CDATA[Peer-to-peer(P2P) computing is emerging as a new distributed computing 
paradigm for novel applications that involves exchange of information 
among  peers with little centralized coordination. Analyzing data 
distributed in P2P networks requires peer-to-peer data mining algorithms 
that can mine the data without data centralization. However, 
replicating  result of centralized data mining in an exact fashion is 
often communication-wise expensive. Approximate algorithms can be a 
realistic and communication-efficient alternative in this case.This 
dissertation concentrates on developing approximate data mining 
algorithms suitable for P2P networks, that closely estimates the result 
of centralized data mining algorithm with probabilistic guarantee using 
minimal communication.
<p>
The dissertation introduces the concept of approximate local algorithms 
that can estimate data mining result within desired accuracy boundary 
with user-specified probabilistic guarantee by operating within a 
spatial locality of the executioner-node. As a foundation of 
probabilistic approximation in P2P network, a random-walk based uniform 
data sampling approach is proposed, that removes the bias and dependence 
in sampling caused by varying degrees of connectivity and sizes of data 
shared. Then the sampling technique is applied to develop approximate 
local algorithms for solving the specific data mining problem of K-means 
clustering and frequent
itemset mining in the context of P2P network. Two K-means clustering 
algorithms are developed, one of which extends the concept of 
centralized K-means algorithm to distributed dynamic peer-to-peer 
environment, while the other provides probabilistic guarantee
on accuracy of clustering result in a static P2P network. A frequent 
itemset mining algorithm is developed as a direct application of the 
uniform data sampling technique that discovers most of the frequent 
itemsets with high probability using bounded communication.
<p>
The main contribution of this research work is to introduce the concept 
of approximate local algorithms for data mining in P2P network that 
provides probabilistic guarantee of  accuracy. It builds a basic tool 
for approximate data analysis in P2P network, a uniform
data sampling technique, and develops communication efficient 
approximate local algorithms for mining data distributed in such 
network. The algorithms developed here provide data mining results 
within desired accuracy level and probabilistic guarantee, and
shows good scalability with low communication overhead.

]]></event:abstract>
    <event:tag><![CDATA[dissertation]]></event:tag>
    <event:tag><![CDATA[datamining]]></event:tag>
    <event:tag><![CDATA[p2p]]></event:tag>
    <event:tag><![CDATA[privacy]]></event:tag>
    <event:host><person:Collaborator rdf:about="http://ebiquity.umbc.edu/person/html/Hillol/Kargupta/"><person:name><![CDATA[Hillol  Kargupta]]></person:name><rdfs:label><![CDATA[Hillol  Kargupta]]></rdfs:label></person:Collaborator></event:host>
  </event:Event>

  <rdf:Description rdf:about="">
    <cc:License rdf:resource="http://creativecommons.org/licenses/by/2.0/" />
  </rdf:Description>

</rdf:RDF>
