<?xml version="1.0"?>

<!DOCTYPE owl [
	<!ENTITY rdf "http://www.w3.org/1999/02/22-rdf-syntax-ns#">
	<!ENTITY rdfs "http://www.w3.org/2000/01/rdf-schema#">
	<!ENTITY xsd "http://www.w3.org/2001/XMLSchema#">
	<!ENTITY owl "http://www.w3.org/2002/07/owl#">
	<!ENTITY cc "http://web.resource.org/cc/#">
	<!ENTITY event "http://ebiquity.umbc.edu/ontology/event.owl#">
	<!ENTITY person "http://ebiquity.umbc.edu/ontology/person.owl#">
	<!ENTITY assert "http://ebiquity.umbc.edu/ontology/assertion.owl#">
]>

<!--

This ontology document is licensed under the Creative Commons
Attribution License. To view a copy of this license, visit
http://creativecommons.org/licenses/by/2.0/ or send a letter to
Creative Commons, 559 Nathan Abbott Way, Stanford, California
94305, USA.

-->

<rdf:RDF 
		xmlns:rdf = "&rdf;"
		xmlns:rdfs = "&rdfs;"
		xmlns:xsd = "&xsd;"
		xmlns:owl = "&owl;"
		xmlns:cc = "&cc;"
		xmlns:event = "&event;"
		xmlns:person = "&person;"
		xmlns:assert = "&assert;">
	<event:Event rdf:about="http://ebiquity.umbc.edu/event/html/id/212/Detecting-Spam-Blogs-An-Adaptive-Online-Approach-">
		<rdfs:label><![CDATA[Detecting Spam Blogs: An Adaptive Online Approach	]]></rdfs:label>
		<event:title><![CDATA[Detecting Spam Blogs: An Adaptive Online Approach	]]></event:title>
		<event:speaker>
<person:Alumnus rdf:about="http://ebiquity.umbc.edu/person/html/Pranam/Kolari"><person:name><![CDATA[Pranam Kolari]]></person:name><rdfs:label><![CDATA[Pranam Kolari]]></rdfs:label></person:Alumnus>
		</event:speaker>
		<event:startDate rdf:datatype="&xsd;dateTime">2007-09-25T14:00:00-05:00</event:startDate>
		<event:endDate rdf:datatype="&xsd;dateTime">2005-09-25T16:00:00-05:00</event:endDate>
		<event:location><![CDATA[325b ITE]]></event:location>
		<event:abstract><![CDATA[Weblogs, or blogs, are an important new way to publish information, engage
in discussions, and form communities on the Internet. Blogs are a global
phenomenon, and with numbers well over 100 million they form the core of
the emerging paradigm of Social Media. While the utility of blogs is
unquestionable, a serious problem now afflicts them, that of spam. Spam
blogs, or splogs are blogs with auto-generated or plagiarized content
with the sole purpose of hosting profitable contextual ads and/or
inflating importance of linked-to sites. Though estimates vary, splogs
account for more than 50% of blog content, and present a serious
threat to their continued utility.
<p>
Splogs impact search engines that index the entire Web or just the
blogosphere by increasing computational overhead and reducing user
satisfaction. Hence, search engines try to minimize the influence of
spam, both prior to indexing and after indexing, by eliminating
splogs, comment spam, social media spam, or generic web spam. In
this work we further the state of the art of splog detection prior to
indexing.
<p>
First, we have identified and developed techniques that are effective
for splog detection in a supervised machine learning setting. While
some of these are novel, a few others confirm the utility of techniques
that have worked well for e-mail and Web spam detection in a new domain
i.e. the blogosphere. Specifically, our techniques identify spam blogs
using URL, home-page, and syndication feeds. To enable the utility of
our techniques prior to indexing, the emphasis of our effort is fast
online detection.
<p>
Second, to effectively utilize identified techniques in a real-world
context, we have developed a novel system that filters out spam in a
stream of update pings from blogs. Our approach is based on using
filters serially in increasing cost of detection that better supports
balancing cost and effectiveness. We have used such a system to support
multiple blog related projects, both internally and externally.
<p>
Next, motivated by these experiences, and input from real-world
deployments of our techniques for over a year, we have developed an
approach for updating classifiers in an adversarial setting. We show
how an ensemble of classifiers can co-evolve and adapt when used on
a stream of unlabeled instances susceptible to concept drift. We
discuss how our system is amenable to such evolution by discussing
approaches that can feed into it.
<p>
Finally, over the course of this work we have characterized the
specific nature of spam blogs along various dimensions, formalized the
problem and created general awareness of the issue. We are the first
to formalize and address the problem of spam in blogs and identify
the general problem of spam in Social Media. We discuss how lessons
learned can guide follow-up work on spam in social media, an important
new problem on the Web.
]]></event:abstract>
		<event:tag><![CDATA[blog]]></event:tag>
		<event:tag><![CDATA[social media]]></event:tag>
		<event:tag><![CDATA[spam]]></event:tag>
		<event:tag><![CDATA[splog]]></event:tag>
	</event:Event>

<rdf:Description rdf:about="">
	<cc:License rdf:resource="http://creativecommons.org/licenses/by/2.0/" />
</rdf:Description>

</rdf:RDF>
