AAAI to hold Texas Holdem competition for computers

April 30th, 2006

AAAI is sponsoring a heads up, limit Texas Holdem competition (for computers, of course) to be held at the 21st National Conference on Artificial Intelligence in Boston 16-20 July 2006. Participants are not required to attend or register for the conference. The grand prize? “The joy of victory and the envy of the community.” See the AAAI Computer Poker Competition website for rules, a discussion forum, documentation, Java code for the game server, and Java source code for two example bots.

Splog bait: young girls need personal injury lawyer to pay for diplomas

April 29th, 2006

This splog bait has many terms, such as Royal Caribbean Cruise and Aruba Vacation Package, that make the splog bait post likely to be plagiarized by sploggers. Did you ever wonder what happens when a bus full of young girls get into an accident on their way to an online gambling site? They probably hoped to make millions of dollars playing poker, Texas holdem and blackjack. Now they need a personal injury lawyer to sue the bus company! (Yes, this is splog bait.) The poor girls will have to take brand-name, FDA approved medications for their injuries — drugs like ambien, tramadol, lexapro, pehentermine and viagra. Some might even require laser eye surgery. If that doesn’t help, maybe the young girls can recover from the painful illnesses and injuries by making a reservation for a vacation in Orlando, Bermuda, or as a Las Vegas hotel. If their injuries make them bedridden, they will have take classes toward a degree from a distance learning program. Splog bait. It might be for a GED or a high school diploma or a college degree. They will need degrees and have skills to find a good job since good jobs are hard to find in this economy. And the real estate market might go soft if the bank rates are not low for mortgage — maybe mortgage insurance will help. This splog bait has nothing to do with liability insurance, however.

The Economist on the Cambrian explosion of new media

April 28th, 2006

New media The Economist’s New Media Survey is a collection of articles and audio interviews on consumer-generated media, including blogs, wikis, photo sharing, podcasing, video sharing asnd metaversing (sp?). “The era of mass media is giving way to one of personal and participatory media … that will profoundly change both the media industry and society as a whole.”

Last November, the Pew Internet & American Life Project found that 57% of American teenagers create content for the internet—from text to pictures, music and video. In this new-media culture, says Paul Saffo, a director at the Institute for the Future in California, people no longer passively “consume” media (and thus advertising, its main revenue source) but actively participate in them, which usually means creating content, in whatever form and on whatever scale.

This is a good collection of overview articles.

Proving that blogs affect society

April 27th, 2006

The mesh conference will have a panel on how the blogosphere affects society. Mathew Ingram writes in can blogs affect politics and society? :

“As a lead-up to mesh in May, the Gang of Five — that is me, Rob Hyndman, Mark Evans, Mike McDerment and Stuart “call me Chairman Mao” MacDonald — have been talking a lot (not surprisingly) about the themes we want to look at, and crawling the blogosphere for evidence of how Web 2.0 and blogs are — or aren’t — affecting media, marketing, business and society/politics.”

Rob Hyndman also writes about the panel.

While it’s probably not controversial to believe that blogs influence politics and society, it may be hard to prove it objectively. An easier task is to show how blogs can influence other blogs and Web based communities. Akshay Java has been modeling influence in blog communities and has a technical report on it: Modeling the Spread of Influence on the Blogosphere. We think the work can be extended to document the spread of information and ideas from blogs to MSM. That’s a bit closer to showing that blogs affect society.

Four kinds of blogs: political, gossip, mom and music

April 27th, 2006

ClickZ news reports:

“A new survey released today of over 36,000 readers of blogs shows different segments of blog readers have distinct characteristics. Conducted by the Blogads network, the study breaks out blog audiences into four categories: readers of political, gossip, mom and music blogs.”

Here are results of the blogads survey. While these are interesting, our own studies suggest (here and here) that we need a richer ontology of blog topics and better techniques of identifying the feeds that matter.

Which domains matter on the blogosphere?

April 26th, 2006

We recently analyzed data from three different sources: Bloglines, which manages feeds subscribed by users, a sample of Blogpulse index made available for the WWW Weblogging Ecosystems Workshop and Blogwise, a popular blog directory.


Bloglines domain distribution

Bloglines has more than 83,000 publicly listed users who subscribe to about 2,786,687 feeds in all, of which aboout 496,893 are unique. These are feeds that matter since they have been actually subscribed by some users. The above chart shows the top domains from these feeds. It is interesting to note that Blogspot contributes to 45% of the feeds that matter followed by Xanga and Flickr. We also see a substantial presence of web 2.0 sites such as Flickr,, technorati, etc that provide their content in RSS.


Blogpulse domain distribution

The Blogpulse data contains 1.3 Million blogs from a 21 day period. 50% of the top domains are contributed by livejournal and most of the domains are those of blog hosting sites. More analysis of this data could be found in the paper on “Characterizing the Splogosphere“. A related post by Matthew Hurst talks about community structure on the blogosphere that goes across different domains. Also compare this with last year’s post on ranking blog hosts and other related posts here and here.
While this data only provides a sample of the blogpulse index, it shows a very interesting difference in content indexed by blog search engines and the feeds that users actually subscribe to in bloglines. Its understandable that there is a difference, blog search engines should also cater to collective mining for trends, and sources like livejournal render themselves well here.



Blogwise is a blog directory that has a relatively small index of 71,252 blogs most of which are contributed by Blogger. The rest of the domains are mostly from blog hosting sites.


  • Based on bloglines user subscriptions, even though Blogspot has had serious splog issues, Blogspot still contributes to a significant portion of the feeds that matter on the blogosphere.
  • A number of bloglines users subscribe to Web 2.0 sites and dynamically generated RSS feeds over customized queries.
  • Finally, in any index of the blogosphere, the number of blogs that are indexed may not be as important as indexing the feeds that really matter to the user.


Thanks to Pranam Kolari for ideas and help with this post. Also Bloglines, Blogpulse and Blogwise for publicly making some of their data available.

DHS’s proposed RFID tags vulnerable to Man-in-the-Middle Attacks

April 25th, 2006

The DHS asks in a recent RFI for technologies for RFID-equipped identification cards used for border crossings. The RFI specifies that “read ranges shall extend to a minimum of 25 feet” and, for people crossing on a bus, “the solution must sense up to 55 tokens.”This CNET article, New RFID travel cards could pose privacy threat, points out some of the privacy issues.

Bruce Shneier points out those potentially more serious security issues are involved as well:

“And when you start proposing chips with a 25-foot read range, you need to worry about man-in-the-middle attacks. An attacker could potentially impersonate the card of a nearby person to an official reader, just by relaying messages to and from that nearby person’s card. … Defending against this attack is hard. … Time stamps don’t help. Encryption doesn’t help.”

He goes on to lay out the basic scenario by which someone could subvert the system.

This seems like a classic example of the tradeoff between security and convenience.

Sid Meir highlights UMBC computer game conference

April 25th, 2006

Sid Meier, the founder and creative director of Firaxis, will be a featured speaker at UMBC’s Digital Entertainment Conference to be held on the UMBC Campus on Saturday, April 29, 2006. Meier is the programmer and designer of some of the most commercially and critically successful computer games of all time including Civilization and Railroad Tycoon.

The Digital Entertainment Conference has been organized by the UMBC Game Developer’s Club, an organization of students interested in developing interactive computer games. This free event will be held from 11:00am to 5:00pm in Lecture Hall 5 of UMBC’s Engineering and Computer Science building. In the morning, from 11:00 to 12:00, you can learn how to create a job winning portfolio from Seth Spaulding, the Art Director of Firaxis. In the afternoon, from 2:00 to 5:00, hear speakers from Baltimore area game companies on all aspects of game development, art, programming, production and game design. The speakers include

  • Iterative Design: Finding the Fun, Early and Often, Sid Meier – Creative Director, Firaxis
  • Creating an effective portfolio, Seth Spaulding – Art Director, Firaxis
  • Game Production: Herding Cats, Dan Magaha – Producer, Firaxis
  • Game Production: Herding More Cats, Barry Caudill, Executive Producer, Firaxis
  • Tools for Game Development, Katie Hirsch – Programmer/Artist, Breakaway Games
  • Software Engineering, Ryan Mcfall – Programmer, Day 1 Studios

This is a fantastic opportunity for anyone interested in getting an insider’s view of the game industry, For more information, visit the UMBC Game Developer’s Club web page.

Google maps data increased and improved

April 25th, 2006

Over the past few weeks and days, Google maps has made significant improvements, adding newer and higher resolution satellite images for many areas, increasing its coverage in Europe, providing street maps and driving directions for parts of Europe. Check out the 3 inch pixel resolution for Baltimore. Unfortunately, UMBC is just outside the area updated, and the images still look like they are three years old. Spotted on Google Earth Blog.

On the Semantic Web, universities do ontologies, companies do data

April 24th, 2006

Here’s an interesting figure form Li Ding’s dissertation on Semantic Web Search. It shows the distribution across various Internet top level domains of (1) the sites that Swoogle has crawled, (2) ontology documents that Swoogle has discovered, and (3) all Semantic Web documents it has discovered.

Distribution of Semantic Web files by tld

The “pure SWDs” are RDF documents in some form (e.g., XML, N3) and excluding XHTML documents with embedded RDF. Swoogle considers a Semantic Web document to be an ontology (a SWO in Swoogle-speak) if a significant fraction of its triples are involved in defining terms as opposed to making assertions about individuals. What is considered a “significant fraction” has changed and I’m not sure what the current value is. But Swoogle only considers about 1% of the Semantic Web documents it has found to be ontologies.

Note that .edu sites publish 40% of the ontologies, .org sites 20% and .com sites 10%. Of course, many of those .edu ontologies are probably from student projects of one kind or another. When we look at all Semantic Web documents (pure SWDs), the .com sites dominate, publishing over 40% of the files.

The Splogosphere is for sale

April 24th, 2006

For a limited time only, is for sale!

DETAILS: Domain Name without content.

DESCRIPTION: Welcome and thank you for visiting! If you’re interested in purchasing this unique and memorable domain, please feel free to make an offer. All reasonable bids will be considered. Thanks again!”

Unique and memorable, no less. Maybe it’s not to late to get it — thatcould be a good move, now that the real estate market is going soft. But it’s a pain to have to dicker with the owner. Wait…

“No time for direct negotiation?: Then let our professional domain brokers handle it for you! Our brokers will work directly with you, take over the tedious back and forth negotiations with the seller, and use their expert knowledge to get the domain for you at the lowest price possible. Take advantage of our personal and competent service to finally get the domain that you really want.”

Google’s splog detection methodology for Blogger

April 23rd, 2006

Google has a post on their Blogger Buzz blog mentioning their splog elimination strategy for Blogger:

“As others have noted, we’ve made good progress in the past six months in reducing the amount of spam on Blog*Spot. One of the tools we’re using is an automatic spam classifier. The risk in using a classifier is that we will mistakenly identify good content as spam. This percentage of false positives is both very low and one that we are reducing by further improving our systems.”

Irishwonder notes that Google still does manual reviews of splogs:

“A while ago, I posted about how Google’s manual review can be detected through your logs. Well, last week I could verify it’s still true – the URL of doom has appeared in the logs of my other blog splog and it has ceased to exist.”

Having a manual check for automatically classified splogs is a god idea, especially if your classifier produces a certainty measure so the human checkers can focus on blogs on the blog/splog borderline.