Measuring political bias in media

December 30th, 2006, by Tim Finin, posted in Uncategorized

media biasWhich headline would make you buy a newspaper?

  • Winning the global war on terror by repealing the death tax and solving the medical liability crisis
  • Winning war in Iraq by bringing our troops home and raising fuel economy standards

I ran across an article on Slate, How To Speak Republican … or Democratic. that talks about an academic paper and it’s methodology to measure political slant in newspapers.

Gentzkow, Matthew Aaron and Shapiro, Jesse M., What Drives Media Slant? Evidence from U.S. Daily Newspapers, November 13, 2006.

The working paper by University of Chicago economists Gentzkow and Shapiro is really interested in explaining why newspapers are biased. Their answer, not surprisingly, is profit. (They are economists, after all). Their methodology and results are of interest to those of us who would like to develop better techniques for detecting and measuring source bias.

From the paper:

“In this paper, we propose a new index of ideological slant in news coverage, and compute it for a large sample of U.S. daily newspapers. … Our slant index measures the frequency with which newspapers use language that would tend to sway readers to the right or to the left on political issues. To do this, we examine the set of all phrases used by members of Congress in the 2005 Congressional Record, and identify those that are used much more frequently by one party than by another. We then index newspapers by the extent to which the use of politically charged phrases in their news coverage resembles the use of the same phrases in the speech of a congressional Democrat or Republican. Underlying this approach is a revealed preference assumption; namely, that the language chosen by speakers with a political agenda will tend to persuade listeners to support that agenda.”

The paper’s appendix has a list of the two and three word phrases they used to measure bias.

To test their slant identification methodology, they compared their results to those from Mondo Times, an online media guide. I thought that these human judgments of political bias provide an interesting resource as well.

Wii like the Ebiquity blog

December 29th, 2006, by Tim Finin, posted in Uncategorized

Harry Chen installed WiiPress plugin on our blog. It automatically renders an optimized version when visitors visit using Wii’s Opera browser. Check out the screen shots.

Ebiquity blog on a Wii  Ebiquity blog on a Wii

Harry is the only one I know with a Wii. I’ll admit to being jealous. But, I have to give it to him for getting up at 4:30am to go stand in line at Best Buy to get it. As soon as they are available, we’ll get one for the lab. It will be used to support our research in pervasive computing. That’s our story and we are sticking to it.

Thanks, Harry.

Pan-American Intercollegiate Chess Championship

December 28th, 2006, by Tim Finin, posted in Uncategorized

fallen kingThere are many news stories on the going on this week in DC. I was happy to see the article in the New York Times not only quoted Professor Alan Sherman but also talked about two students from the College of Engineering and Information Technology on UMBC’s A team: CS major Katerina Rohonyan and IS major Bruci Lopez. The two photos in the Times article were of Katerina and Bruci. Round three of the six round tournament is going on right now with the final round to finish on Saturday.

Looking at the wall charts, I was amazed to see that UT Dallas has ten players rated above 2425 on their two teams. It’s going to be hard for the UMBC to defend its 2005 title and win, especially with our top top player, Alexander Onischuk, out this semester.

Google blog search tops Technorati

December 28th, 2006, by Tim Finin, posted in Uncategorized

According to LeeAnn Prescott of Hitwise, Google Blog Search attracted more searchers than Technorati for the first time last week. She attributes the surge to links to Google’s Blog Search appearing on the Google News page and on the more>> menu. Since Google is constantly fiddling with such things, maybe the surge in their visits is temporary. On the other hand, I’ve found Google’s service to be slightly faster — both in indexing posts and in doing searches. Technorati has introduced a number of innovative features, yet many may prefer Google’s simplicity. This will be an interesting trend to watch.

Top Blog Posts and Referrers for 2006

December 28th, 2006, by Pranam Kolari, posted in Uncategorized

Bloggers have been publishing their top viewed posts for this year. Here’s our contribution to the Blogosphere, top ten pages, posts or categories viewed in 2006:

  1. Splog Software From Hell
  2. Posts on Swoogle
  3. 100 Most common RDF Namespaces
  4. ICWSM 2007 Blogs Dataset
  5. Welcome to the Splogosphere
  6. EZ Google maps for your web page
  7. Untangling ontologies on the Semantic Web
  8. Thieves use Bluetooth to find laptops to steal
  9. Big OWL documents on the Semantic Web
  10. Big RDF documents on the Semantic Web

While we are pleased that we created useful content, we would also like to acknowledge referrers who helped reach our audience.

  1. Google
  2. Stumble Upon
  3. Yahoo
  4. Swoogle
  5. Wikipedia
  7. Slashdot
  8. MSN
  9. Technorati
  10. DIGG that order, with Google contributing more than 50%.

Is Web 2.0 another bubble?

December 27th, 2006, by Tim Finin, posted in Uncategorized

Mr BubbleThe Wall Street Journal weighs in on the question Is ‘Web 2.0′ Another Bubble?. Well, the WSJ doesn’t address the question, exactly, but does host a dialectic between two VCs. The bottom line? Who knows — we’re already thinking about Web 3.0.

By the way, I was going to provide a link to Web 3.0 and just noticed that Wikipedia has deleted the Web 3.0 article and protected it so that it can not be recreated. I wonder what was there? I think the history has even been expunged. Maybe an oracle prophesied that Web 3.0 would grow up to kill its father and marry the VCs?

Patents and the semantic web

December 26th, 2006, by Tim Finin, posted in Uncategorized

US patentGoogle’s search type options on the splash page and more menu keep changing for me by a process that I’ve never understood. Today Google’s patent search showed up on the short more more (as opposed to the even more page) and the thought sprang to mind to see what patents mention the Semantic Web. So, here are US patents that Google finds for various words and phrases relating to the Semantic Web.

I’ve not had a chance to dig into these to see whether the semantic web is used in a significant way or just mentioned in passing. I don’t think that Google’s database includes disclosures. It would be interesting to look at the distribution of filing dates to look for any trends.

ClamAV great at identifying virus laden spam

December 22nd, 2006, by Tim Finin, posted in Uncategorized

ClamAVOur sysadmins just installed Clamassassin to help manage spam and it works great. Clamassassin is a wrapper for Clam AntiVirus for use in procmail filters, much like SpamAssassin. ClamAV is a GPL anti-virus toolkit for UNIX designed for email scanning on mail gateways. It comes with freshclam, which updates the virus signature database several times a week.

For me, it’s filtering out more than 100 spam messages a day with no false positives that I’ve noticed. I’ve put this filter first, before the spamassassin filter, so some of these would have been caught anyway. But I’d guess it has reduced the amount of spam that makes it to my mail reading client by half.

Getting data out of just got easier

December 20th, 2006, by Tim Finin, posted in Uncategorized

del.icios.usNiall Kennedy posts about a new JSON endpoint that given a URL, returns the assigned tags and their frequencies. This will have a number of potential uses. It’s another way to rank sites on their popularity, at least w.r.t. users. In some cases, it can also be used to categorize web pages by their content.

Here are, for example, three pages that we maintain:

We’ve been playing around with various issues in identifying communities of political blogs. Note that checking the Daily Kos blog shows that it’s been posted over 2700 times and it’s popular tags have lots of clues to what its about:

left, politics, progressive, activism, liberal, political

Compare this to the results for Michelle Malkin’s blog, which has 259 posts with the following popular tags

politics, conservative, political

I guess this also says something about the community of users: they lean to the left.

More on measuring influence in the Blogosphere

December 20th, 2006, by Tim Finin, posted in Uncategorized

As we’ve blogged before, we’re trying to come up with models of influence in the Blogosphere. We started with using Political Blogs as a starting point, with datasets courtsey Buzzmetrics and Lada Adamic. One of the tools we use is the polarity (sentiment) of the post to post links to infer the trust/influence/sentiment that should be reflected in a blog-blog link. Here is what our technique came up with for some “A list” political blogs .




Polarity before

trust propagation

Polarity after

trust propagation

































































MM is Michelle Malkin, AT is Atrios (Eschaton), DK is DailyKos and IP is Instapundit. So looks like we got the sentiment polarity mostly right, even if in our original dataset (rather limited, admittedly) there were no direct links between two blogs. Of course the errors are obvious as well — Atrios trusts MichelleMalkin — that will be the day :-) We’ve figured out why and are now tweaking the technique to fix such errors. Stay tuned for the detailed tech report ….

When did the Semantic Web enter our lexicon?

December 18th, 2006, by Tim Finin, posted in Uncategorized

When was the term Semantic Web first used for the W3C’s vision of embedding a web of data in machine-understandable form in the World Wide Web? The earliest reference I can find on the Web is from one of Tim-Berners Lee’s design documents from January 1997. The ideas and some of the technology goes back a bit further of course, but I’m interested in the evolution of the term. Such questions might seem easy for we amateur (or lazy) scholars to answer by searching such resources as Google’s News Archive, Internet Newsgroups, Google Scholar and the Internet Archive. I have to report that I found more than a few errors in the metadata and content for both Google’s news archive and book search. I started this investigations some months ago, and some of the results no longer seem available. In any case, here are some notes on my investigation.

Measuring distance in social networks

December 17th, 2006, by Tim Finin, posted in Uncategorized

Yehuda Koren, Stephen North and Chris Volinsky have an interesting paper in KDD 2006 on Measuring and Extracting Proximity in Networks. They looked at the problem of measuring the ‘distance’ between two node in a large graph, such as DBLP co-author graph. The desired distance measure is not a simple shortest path, but a metric that takes into account all of the paths between the two node. Building on work by Chris Faloutsos, they propose a model based on conductance and show that it has many desirable properties. They have a demostration web page that allows one to apply the algorithm on the DBLP co-author database as well as the IMDB database.

The algorithm computes a “connection subgraph” which is small portion of a network that capture the relationships between the nodes. In addition to helping visualize and explain the result, the connection subgraph could have many other uses.

As an example, consider the problem of deciding which A. Joshi is the co-pi on a proposal with Deborah McGuiness. The following shows the proximity scores for two of the A. Joshi’s to McGuiness in the DBLP data.

    Aravind Joshi to
    Deborah McGuiness
    d = 0.001846
    connection graph for Aravind Joshi to Deb McGuiness
    Anupam Joshi to
    Deborah McGuiness
    d = 0.3511
    connection graph for Anupam Joshi to Deb McGuiness

The demo page is fun to play with and experimenting with it complements reading the paper as a way to understand the algorithm.

