UMBC ebiquity research group Building intelligent systems in open, heterogeneous, dynamic, distributed environments
15 October 2008, 18:32:43 EDT  
2007 March

Archive for March, 2007

MSM Citations in Republican, Democrat Blogs

March 19th, 2007, by Akshay Java, posted in Uncategorized

A number of qualitative and quantitative analysis of Main Stream Media (MSM) sources have caused heated debates about bias and trustworthiness (or in Stephen Colbert’s lingo shall we say “truthiness“? ;-) ) in MSM. Commentary on news and current affairs once used to be the exclusive prerogative of a handful of political analysts on new channels and sites. Today, blogs and citizen journalism are the new form of punditry. It’s importance is also being recognized by some of the 2008 presidential aspirants.

So, the question is — which MSM sources are going to play an important role on the Blogosphere during the election year? To analyze this we first look at the most cited MSM sources from the ICWSM dataset shown on the right (the complete list is here). Next, we use a list of 113 Republican and 144 Democrat blogs. This list was compiled using a data set provided by Dr. Lada Adamic and by querying Technorati. We count the number of citations for MSM in each of the sets. The MSM sources most frequently cited by democrats and Republican blogs is as follows:

These counts also include multiple citations from each blog. We would like to rank the list in a more meaningful way that would indicate how “influential” an MSM is for a particular group. To do this, we first use KL Divergence based scoring to find the difference in the distribution of citations of MSM in the two groups. For example, a MSM would have a high score in the democrat MSM listing (see below) if it has a high probability of being linked by each of the democratic blogs in the set while having a low chance of being linked to by republicans (and vice versa for republican set). We also modify the scoring function to give importance to citations from multiple distinct blogs (vs. many links from a single blog). The final scoring function produces a ranked list of MSM based on preference of being linked to by either Republican or Democrat blogs. This shows some interesting results (complete list here and here):


Of course, this does not explain any bias of the MSM source itself, but provides a good indication of sources that might influence Republicans and Democrats. Here is a questions I ask our readers: “Bias seems to be quite subjective, according to the side of the political spectrum one may associate with. Do you think it would even be possible to agree on the neutrality of a MSM source?”

Limitations

  1. The popularity of the blog that links to the MSM is not considered here, and it would be useful to incorporate it.
  2. The results are limited to a small sample of Republican/Democrat blogs.
  3. No content analysis was performed and results are solely based on citations.
  4. The presence of a link does not always indicate influence and we need to use Link Polarity to improve the scoring function.
  5. There is scope for improvement in the ranking function itself. But I think its a first order approximation (to rank distinctively democrat vs. republican MSM preferences).

Conclusions

MSM is influential and there are selective preferences of each community towards different sources. Some of the sources that are categorized under MSM in the dataset almost have a blog like quality. As people rely on blogs for information and opinions, the indirect influence that MSM sources (and perhaps, its biases) can not be ignored. While blogs and MSM seem to almost have a symbiotic relation, (IMHO) this election season might see a fierce competition between the two.

[Acknowledgment: Buzzmetrics for the dataset, Dr. Lada Adamic for the Republican/Democrat labels]

Journal of Web Semantics blog

March 18th, 2007, by Tim Finin, posted in Uncategorized

The JWS Blog is a new blog about the Journal of Web Semantics published by Elsevier.

The blog will run by the JWS editors-in-chief with occasional posts by area editors and editors of special issues. We will use it for news and announcements about the journal, including calls for papers, descriptions of upcoming special issues and their deadlines, availability of preprints, publication of online material, etc.

The Journal of Web Semantics is an interdisciplinary journal based on research and applications of various subject areas that contribute to the development of a knowledge-intensive and intelligent service Web. These areas include: knowledge technologies, ontology, agents, databases and the semantic grid, obviously disciplines like information retrieval, language technology, human-computer interaction and knowledge discovery are of major relevance as well. All aspects of the Semantic Web development are covered. The publication of large-scale experiments and their analysis is also encouraged to clearly illustrate scenarios and methods that introduce semantics into existing Web interfaces, contents and services. The journal emphasizes the publication of papers that combine theories, methods and experiments from different subject areas in order to deliver innovative semantic methods and applications.

TwitterVision: see who is twittering around the world

March 18th, 2007, by Tim Finin, posted in Uncategorized

Twittervision might give TV a new meaning. It’s nicely done twitter mashup that gives a realtime view of twittering around the world. While I was watching I saw Pranam twittering! It reminds me of one of Dinah Washington’s songs:

“Radio was great, now it’s out of date
TV is the thing this year”

links for 2007-03-18

March 18th, 2007, by Tim Finin, posted in Uncategorized

Splogs are phishing and infecting visitors with trojans

March 18th, 2007, by Tim Finin, posted in Uncategorized

Splogs send visitors to phishing sites like this oneThe Internet security firm Fortinet has issued an advisory Malicious Code Appears on Blogger.com that identifies new splogs intended to expose visitors to malware. The advisory gives two examples found on blogger. In one, scripts are used to redirect visitors to Pharmacy Express, a phishing site. Another example given is of a site devoted to the Honda CR450 that infects visitors with the Javascript Wonka Trojan. Stories on the alet have been written by PC World and cnet.

To date, the splogs we have studied were created to host advertisements and/or raise the pagerank of affiliated sites. It’s inevitable, I think, that blogs will become a vehicle for other uncooperative, unsavory or outright illicit behavior. Their current software and service infrastructure make blogs the easiest and cheapest way to create web sites and populate them with a stream of fresh content. It’s nature, red in tooth and claw, after all.

Assignment Zero pairs Wired writers with citizen journalists

March 15th, 2007, by Tim Finin, posted in Uncategorized

Assignment Zero is an interesting experiment run collaboratively between NewAssignment.Net and Wired. The idea is to team up a professional journalist with citizen journalists for investigative stories. Media companies are all trying to figure out what they need to do to survive and thrive in the new information economy. Here’s another idea.

“Inspired by the open source movement, this is an attempt to bring journalists together with people in the public who can help cover a story. It’s a collaboration among NewAssignment.Net, Wired, and those who chose to participate.
     The investigation takes place in the open, not behind newsroom walls. Participation is voluntary; contributors are welcomed from across the Web. The people getting, telling and vetting the story are a mix of professional journalists and members of the public — also known as citizen journalists. This is a model I describe as “pro-am.”
     The “ams” are simply people getting together on their own time to contribute to a project in journalism that for their own reasons they support. The “pros” are journalists guiding and editing the story, setting standards, overseeing fact-checking, and publishing a final version.
     In this project, we’re trying to crowdsource a single story, and debut a site that makes other such reports possible down the road. But we don’t know yet how well our site and our methods work. Our ideas are crude because they are untested. By participating, you can help us figure this puzzle out. (source)

See Citizen Journalism Wants You! and also Wired Meets Assignment Zero on Wired News.

Hitwise on Fast Growing Social Networks - Implications

March 14th, 2007, by Pranam Kolari, posted in Uncategorized

Hitwise is reporting numbers on social network usage among Web users. This is what stood out:

The market share of visits to the custom category of the top 20 social networking sites increased by 11.5% from January 2007 to February 2007. Year-over-year (February 2006 - February 2007) category traffic was up 87%.

This leads to an interesting question on evolving behavior of Web users. At any given point of time, consider user attention to be at one of these categories of content:

  1. Social Networking Sites
  2. Commerce Sites
  3. Feed Readers
  4. Social Content (Blogs, Wikipedia etc..)
  5. Contextual Advertisements
  6. Organic Search Results
  7. Rest of the Web

It’s well known that traffic to the first five categories is either growing or stable. So which of the last two categories is this growth biting into? Either case, we might soon see a headliner from Hitwise that goes — “Less users searching on the Web”, or something similar.

So what does this mean to Google et al — of course less revenue from self-hosted ads, and consequently reduced margins. Solution — Buy Social Networking Sites and offer new services, a trend that will (and better) continue.

(Via Micropersuasion)

VideoLectures.net: YouTube for Computer Science researchers

March 14th, 2007, by Tim Finin, posted in Uncategorized

Videolectures.net is a new web service, still in beta, that provides “free video lectures from the world’s leading and prominent scientists”. The most common topics are drawn from computer science with an emphasis on Semantic Web and machine learning, although there is an obligatory lecture by Noam Chomsky.

The videos include lectures, tutorials, paper presentations and informal interviews. Most of the current ones are from recent conferences, including NIPS, ICML, ECML and ISWC as well as various workshops and summer schools

Users can leave comments and the system will recommend videos based on “visitors who watched this lecture also watched…”. I

While I doubt that Google will pay a billion dollars for this company, it looks like a great resource for CS researchers.

Danny Hillis on Aristotle (The Knowledge Web)

March 14th, 2007, by Tim Finin, posted in Uncategorized

Three years ago Danny Hillis wrote “Aristotle” (The Knowledge Web) as an essay that appeared on The Edge. The Edge has just added an addendum to his original essay. While it doesn’t add much real information, it does describe the thinking that apparently led to Freebase and frames the original essay in the today’s context.

The Edge features interesting and often provocative articles along with commentary from their stable of “third culture intellectuals”. Reacting to HIllis’ essay are Douglas Rushkoff, Marc D. Hauser, Stewart Brand, Jim O’Donnell, Jaron Lanier, Bruce Sterling, Roger Schank, George Dyson, Howard Gardner, Seymour Papert, Freeman Dyson, Esther Dyson, Kai Krause and Pamela McCorduck. What is surprising about the original 2004 essay is that is has no mention of the Semantic Web, RDF, OWL, DAML or the W3C. The only mention of any of these is in a comment from Jaron Lanier.

“There’s also a Knowledge Web that’s associated with the Semantic Web research community, which is led by Tim Berners-Lee.”

NOTE: Unfortunately the URL put up by the good people at the Edge is wrong, so, at the time of this writing, you will have to read ARISTOTLE” (THE KNOWLEDGE WEB) from Google’s cached copy.

Freebase blog

March 13th, 2007, by Tim Finin, posted in Uncategorized

Metaweb now has a blog on Freebase, Freebasics. There are still no clues to the underlying data model or how knowledge is represented.

-------- Original Message --------
Subject: Thank you for your interest in Freebase.com
Date: Wed, 14 Mar 2007 02:06:16 +0000
From: alpha@metaweb.com

We received your email registration. We’re excited and overwhelmed by the enthusiasm.

You may be wondering if “Information Wants to be Free” why is our Alpha test currently limited? For the time being, we’re open to a small set of users to get critical feedback. We’ll use that feedback to make it all a little less ‘alpha’. At that point, we’ll open up registration to a wider group.

For more on this and other common questions, see our FAQ: http://www.freebase.com/signin/faq

Also, check out our blog as further details are revealed publicly: http://roblog.freebase.com

ebiquity.umbc.edu offline

March 13th, 2007, by Tim Finin, posted in Uncategorized

The machine that we use to serve up many of our web sites got into a strange state sometime early this morning. It was responding to pings, but nothing else. We had to reboot it in a most extreme way. The power button would not work, for some reason, so we had to pull the plug. Looking at the web traffic stats shows it was down from around 5:00am (UTC-4) to just after 11:00am.

ebiquity server outage

2007 TREC blog track

March 12th, 2007, by Tim Finin, posted in Uncategorized

trecThe Text REtrieval Conference (TREC) is a series of workshops intended to encourage research within the IR community. Last year TREC featured a track on opinion extraction from blogs. Some details for the 2007 TREC blog track are now available. One task will be a based on the 2006 opinion extraction task, extended to identify polarity.

We propose to add a related subtask, namely a text classification-related task, requiring participants to determine the polarity (or orientation) of the opinions in the retrieved documents, namely whether the opinions are positive or negative.

In addition to a refined version of the 2006 TREC Blog opinion track, 2007 will also have a blog distillation (feed search) task intended to find blogs “with a principle, recurring interest” in a topic described by a query.

Blog search users often wish to identify blogs about a given topic, which they can subscribe to and read on a regular basis. This user task is most often manifested in two scenarios:

  • Filtering: The user subscribes to a repeating search in their RSS reader.
  • Distillation: The user searches for blogs with a recurring central interest, and then adds these to their RSS reader.

For TREC 2007, we are recommending that the TREC Blog track investigates the latter scenario – Blog Distillation. The Blog Distillation Task can be summarized as Find me a blog with a principle, recurring interest in X. For a given area X, systems should suggest feeds that are principally devoted to X over the timespan of the feed, and would be recommended to subscribe to as an interesting feed about the X (ie a user may be interested in adding it to their RSS reader).

You are currently browsing the UMBC ebiquity weblog archives for March, 2007.

  Home | Archive | Login | Feed





UMBC