WIkipedia research papers

February 28th, 2008

Mike Bergman has a comprehensive list of about 100 papers on Wikipedia as a knowledge source.

“Since about 2005 — and at an accelerating pace — Wikipedia has emerged as the leading online knowledge base for conducting semantic Web and related research. The system is being tapped for both data and structure. Wikipedia has arguably replaced WordNet as the leading lexicon for concepts and relations. Because of its scope and popularity, many argue that Wikipedia is emerging as the de facto structure for classifying and organizing knowledge in the 21st century.”

This complements a similar list on Wikipedia itself, Wikipedia in academic studies.

“Below is an incomplete list of academic conference presentations, peer-reviewed papers and other types of academic writing which focus on Wikipedia as their subject. Works that mention Wikipedia only in passing are unlikely to be listed. Unpublished works of presumably academic quality are listed in a dedicated section.”

(spotted on the dbpedia mailing list)

Hand, foot, circles and sixes

February 28th, 2008

I think our nervous systems must be wired up a bit strangely.

hands and feet, circles and sixes

Join the ICWSM community on CrowdVine

February 26th, 2008

We invite you to join the ICWSM 2008 social networking community site hosted by CrowdVine. ICWSM 2008 is the Second International Conference on Weblogs and Social Media which will take place in Seattle between March 30 and and April 2. If you are coming to ICWSM next month, you can use this site to help plan and shape the event, facilitate finding and connecting with people at the conference, and share your ideas and comments. If you aren’t able to make it to Seattle, it will provide a way for you to engage even though you can’t be there. Joining the ICWSM community on CrowdVine is easy and free, so please check it out.

No spam on Twitter?!

February 25th, 2008

Can it be true? Russell Beattie posts that on Twitter there are nearly a million users, and no spam or trolls. Spam does exist on Twitter, of course, but it does seem to be less of a problem than on the Blogosphere, Web or email. Maybe it’s because that search engines don’t treat tweets like Web pages or blog posts.

Wisdom of the crowd control?

February 24th, 2008

Slate has an interesting article, The Wisdom of the Chaperones — Digg, Wikipedia, and the myth of Web 2.0 democracy, that explores who controls some of the popular social media sites. It turns out that the social web is more hegemonic than we thought.

wikipedia hegemony

“Social-media sites like Wikipedia and Digg are celebrated as shining examples of Web democracy, places built by millions of Web users who all act as writers, editors, and voters. In reality, a small number of people are running the show. According to researchers in Palo Alto, 1 percent of Wikipedia users are responsible for about half of the site’s edits. The site also deploys bots—supervised by a special caste of devoted users—that help standardize format, prevent vandalism, and root out folks who flood the site with obscenities. This is not the wisdom of the crowd. This is the wisdom of the chaperones.” (link)

The work cited is by the Augmented Social Cognition research group at PARC. See, for example, their post on the behavior of the most active Wikipedians. Very interesting.

I think it’ even worse, in many ways, on Digg, which the article also discusses.

“The same undemocratic underpinnings of Web 2.0 are on display at Digg is a social-bookmarking hub where people submit stories and rate others’ submissions; the most popular links gravitate to the site’s front page. The site’s founders have never hidden that they use a “secret sauce”—a confidential algorithm that’s tweaked regularly—to determine which submissions make it to the front page. Historically, this algorithm appears to have favored the site’s most active participants. Last year, the top 100 Diggers submitted 44 percent of the site’s top stories. In 2006, they were responsible for 56 percent.” (link)

Will rule by the few always be the case? Who knows. The article does point out that the moderation system used by Slashdot helps to broaden the elite and also describes a simple “write one, rate two” policy used by Helium, a site new to me. Helium is a community for freelance writers that helps them connect with publishers who will pay for articles on their topics. The publishers are vetted, so students seeking to buy term papers will have to look elsewhere.

Call for ISWC 2008 posters and demos

February 24th, 2008

The call for ISWC 2008 posters and demos for the the Seventh International Semantic Web Conference is out. The poster/demo session is an opportunity for presenting late-breaking results, ongoing research projects, and speculative or innovative work in progress. Posters and demos are intended to provide authors and participants with the ability to connect with each other and to engage in discussions about the work. Technical posters, reports on Semantic Web software systems, descriptions of completed work, and work in progress are all welcome.

I’ve always found the ISWC poster and demo session to be very stimulating and, in many ways, much more interesting than the regular paper presentation sections. Plus there is lots of food and drink available.

Submissions are due by 25 July 2008. For further information and for any questions regarding the event or submissions, contact the ISWC 2008 posters and demonstration co-chairs, Chris Bizer and Anupam Joshi.

Google slow to index blog posts?

February 24th, 2008

Last week I noticed that some of our blog posts took a long time to show up in the Google Blog search index. During the past year, Google has been very fast at indexing blog posts, typically taking less than five minutes from the time is made to when it shows up in their blog search index. But this week it seemed that our posts, or at least some of them, took more than twelve hours to be indexed.

Yesterday I tried to watch a post I made on the IT job market which I wrote just before 11:00am (GMT-5). It showed up in Google Feed Reader quickly enough but had not yet appeared in Google Blog Search when I finally went to bed 14 hours later. When I checked at 9:00am today, it was there, so it took sometime between 14 and 22 hours.

It’s not the case that all posts are being delayed — do a Google Blog search for a popular term (e.g., TV) sorted by date and you’ll see posts made in the past few minutes. Nor do I think it’s related to pageRank — their blog search ingest is based on pings rather than crawling. Besides, our blog enjoys a reasonable rank. Finally, it can’t be the case that Google’s systems are being overwhelmed by new blogs — the growth of the Blogosphere has slowed.

So I’m puzzled about what is going on. (goomtitag)

Update 1: Posted at 9:49, in Google Feed Reader at 10:14, indexed by Google Blog Search by ~19:15 and in Google’s main index about the same time. Maybe this is a clue — it used to be the case that a post hit the blog index within a few minutes and showed up in the main index after about twelve hours. This post hit both indexes around the same time — after about ten hours. Maybe there is now just one (logical) index.

Update 2: Hmmm. Another post seems to have made it into Google’s main index before it got into the blog search index. I imagine that Google revisited our blog home page as part of it’s regular crawl and picked up the new post.

US Government predicts growing IT job market

February 23rd, 2008

The US Bureau of Labor Statistics releases regular projections for changes in demands for different job categories. The Computing Research Association’s blog compares the changes in the BLS predictions for professional-level IT positons and speculated on the factors involved.

Trends in BLS projections for the IT job market

While the projected growth is slowing, the actual number of predicted new jobs has gone up in the latest report. Jay Vesco of the CRA comments:

“It would be easy to see the series of lowered growth projections as signs of trouble within the IT workforce. But there are two other factors to consider: (1) in the 2006-2016 report, expectations for growth lowered also for the overall workforce, and (2) it probably has taken some time for the BLS to assess a relatively new group of occupations that is evolving rapidly (as seen also in the swings in computer science degree production). All in all, in each of its reports BLS predicted that the professional level IT occupations would enjoy high salaries and more than twice the growth rate of the overall workforce.”

Choosing a career in the IT field still looks like a good choice.

Exascale computing targets million fold increase in supercomputing

February 22nd, 2008

Sandia and Oak Ridge national laboratories have established the Institute for Advanced Architectures to work toward computers that are a million times faster than todays supercomputers.

“An exaflop is a thousand times faster than a petaflop, itself a thousand times faster than a teraflop. Teraflop computers —the first was developed 10 years ago at Sandia — currently are the state of the art. They do trillions of calculations a second. Exaflop computers would perform a million trillion calculations per second.” (link)

Initial funding of $7.4M is provided by congressional mandate from the National Nuclear Security Administration and the Department of Energy’s Office of Science.

How to use XFN (XML Friends Network)

February 21st, 2008

Brian Suda has a good, practical article on XFN on — XFN encoding, extraction, and visualizations.

“In this article I will take a good look at XFN – the microformat for describing relationships between people. I will look briefly at what it is and the basic markup needed to add the information to your sites, before then going into depth, looking at the benefits you can get from that data by extracting it and using it in different ways.”

He covers the how and why of XFN and has good examples and code fragments. FOAF is only mentioned once in passing, however..

ISWC 2008 call for doctoral consortium papers

February 20th, 2008

The 2008 Intternational Semantic Web Conference Doctoral Consortium (DC) allows PhD students to present their work and obtain guidance from mentors as well as interact with other postgraduate students. Students who submit papers to the main conference are also invited to apply to the DC. All papers submitted to the DC track will undergo a thorough reviewing process with a view to providing detailed and constructive feedback. The best submissions will be selected for presentation at the ISWC 2008 DC sessions. Five page papers will be published in the main ISWC proceedings. The deadline for submissions is May 16. The ISWC 2008 DC is chaired by Diana Maynard.

Total lunar eclipse 10pm EST (GMT-5) Wed 2/20

February 19th, 2008

A total lunar eclipse will be visible from the US this Wednesday evening, February 20th with the maximal effect at 10:26pm EST. The eclipse will be visible from North and South America, and western Europe and Africa.

Total lunar eclipse visible in the Americas 10pm EST (GMT-5) Wed 2/20 (Image from NASA)
(image courtesy of NASA GSFC)