What makes a Wikipedia article good?

February 28th, 2007

Wikipedia It’s just like my high school English teacher said — the secret to writing well is to “rewrite, rewrite, rewrite”.

A note on news@nature.com, The more, the wikier, cites recent work on the Wikipedia process, starting with a preprint by Dennis Wilkinson and Bernardo A. Huberman and HP’s Information Dynamics Lab that concludes that the the more edits an article has received the higher is its quality.

Assessing the Value of Cooperation in Wikipedia. Since its inception six years ago, the online encyclopedia Wikipedia has accumulated 6.40 million articles and 250 million edits, contributed in a predominantly undirected and haphazard fashion by 5.77 million unvetted volunteers. Despite the apparent lack of order, the 50 million edits by 4.8 million contributors to the 1.5 million articles in the English-language Wikipedia follow strong certain overall regularities. We show that the accretion of edits to an article is described by a simple stochastic mechanism, resulting in a heavy tail of highly visible articles with a large number of edits. We also demonstrate a crucial correlation between article quality and number of edits, which validates Wikipedia as a successful collaborative effort.

Another article cited in the note is by Aniket Kittur, Bryan A. Pendleton, Bongwon Suh, and Todd Mytkowicz, Power of the Few vs. Wisdom of the Crowd: Wikipedia and the Rise of the Bourgeoisie, and submitted to alt.chi 2007.

Wikipedia has been a resounding success story as a collaborative system with a low cost of online participation. However, it is an open question whether the success of Wikipedia results from a “wisdom of crowds” type of effect in which a large number of people each make a small number of edits, or whether it is driven by a core group of “elite” users who do the lion’s share of the work. In this study we examined how the influence of “elite” vs. “common” users changed over time in Wikipedia. The results suggest that although Wikipedia was driven by the influence of “elite” users early on, more recently there has been a dramatic shift in workload to the “common” user. We also show the same shift in del.icio.us, a very different type of social collaborative knowledge system. We discuss how these results mirror the dynamics found in more traditional social collectives, and how they can influence the design of new collaborative knowledge systems.

(spotted on Smart Mobs)

How to fix a Powerbook G4 power adapter

February 26th, 2007

All products seem to know when their warranties are going to run out ;), unfortunately even Apple Products. We have several macs in our group and suddenly 3 of them went kaput one after the other. After calling Apple I found out that the (3 year extended Apple Support) warranties had run out in Nov. 2006, and we would need to buy three new ones, $79 each.

This is obviously not a new problem as I found out from an old blogpost.

Usually one of three things go wrong:

i) There is a snag in the wire, and it shorts itself, burning out at the snag. This snag usually develops very close to where the DC (thinner wire) comes out of the power adapter. In fact so close, that the we have to open the box to solder the wires, after cutting out the burnt part.
ii) The pin at the DC end drops out or gets bent

iii) The whole thing smokes, and thats the end of it.
I had one of each. So, I took the good end from the smoked one, to replace the broken one. For the one with the burnt/snagged cable I meant to cut out the burnt part and solder it back again.

But there were some more complications. For DC voltage I had expected two wires. It turns out that its actually a very thin coax, which makes it a bit harder to peel, separate, and solder. The outer one is Ground, and the inner one is 24.5V, see below:
It seems, Apple never meant for these things to be opened. There are no screws and I had to pry it apart using a screw-driver and a hammer. Even worse, its not merely push-fit, it is super-glued together making it harder to take apart.

Here is what you need: (i) A small flat-head screw-driver, (ii) A hammer, (iii) A sharp razor blade, or wire peeler, and (iv) a soldering gun. The image below also shows the inners of the plastic half, with some glue still sticking to it.

Once I got the two wires soldered together, it worked just fine, tested it on a multimeter first of course.

The other one didn’t turn out as I expected, it had some other damage inside after I managed to pry the plastic covers apart, some solders had come loose and there wasn’t enough wire to solder it together. So ended up trashing two, but nevertheless managed to make one good one.

Soldered together and insulated

I put a knot around the soldered ends, so the soldered joint wouldn’t get pulled apart if accidentally strained. Doesn’t look pretty :) but works fine.
The newer MacBook has a sturdier wire and will hopefully last longer than the older version. Also the magnetic contacts won’t have the the pin related problems — there is no pin.

The I’s of the Blogosphere

February 25th, 2007

The token “I” (1, 2) can provide interesting cues on the Blogosphere, other than signifying the obvious personal nature of blog posts. “I” sometimes use it to study the growth of the blogosphere (between David Sifry reports ofcourse), or just for fun to see how frequently indices of blog search engines are updated and if any of them are in a “breather” mode.

Two charts on the distribution of “I” in blog posts, one from BlogPulse and the other from Technorati.

BlogPulse reports that around 45% of all postings feature an “I”. Technorati indexes around 400000 posts featuring “I” per day. Merging the two data points Technorati indexes around 900000 posts per day, or rather around 40000 posts per hour, a number which has seen no change for almost a year. Nothing new here, the English blogosphere has plateaued. What’s confusing is that this analysis does not correlate with David SIfry’s number from October 2006, with around 1.3 Million postings per day, putting off my analysis by around 50%. What am I missing here?

As an aside this brings to question the growth of blogs in non-US English speaking geographies, India for instance.

Of course the same analysis can be done with other keywords, but neither of them give the coverage , nor are they as temporally independent as “I”. Any other interesting uses of buzz charts?

Intelligence community embraces Web 2.0

February 24th, 2007

A Computerworld story, Top Secret: DIA embraces Web 2.0, that discusses how the Defense Intelligence Agency is embracing new Web based collaboration and integration tools. (spotted on SmartMobs)

“The U.S. Department of Defense’s lead intelligence agency is using wikis, blogs, RSS feeds and enterprise “mashups” to help its analysts collaborate better when sifting through data used to support military operations. The Defense Intelligence Agency (DIA) is seeing “mushrooming” use of these various Web 2.0 technologies that are becoming critical to accomplishing missions that require intelligence sharing among analysts, said Lewis Shepherd, chief of DIA’s Requirements and Research Group at the Pentagon.”

One of the recent technology successes within the US intelligence community is Intellipedia, a set of wikis available on classified networks run by the US Government.

“DIA first launched a wiki it dubbed Intellipedia in 2004 on the Defense Department’s Joint Worldwide Intelligence Communications System (JWICS), a top-secret network that links all the government’s intelligence agencies.”

Another aspect that is being used in the intelligence community is integrating information and services in real time using Web 2.0 techniques.

DIA last year began a project to create a data access layer in its architecture using a service-oriented architecture to pull together human intelligence (data gathered by people) and publicly available data gathered from the Internet and other sources into a single environment for analysis, Shepherd added. Analysis of data in this new environment will be done in part by using Web 2.0 applications, such as “mashups,” that collect RSS feeds, Google maps and data from the DIA network that users can access with a lightweight AJAX front end, he added. “Web 2.0 mashup fans on the Internet would be very much at home in the burgeoning environment of top-secret mashups, which use in some cases Google Earth and in some cases other geospatial, temporal or other display characteristics and top-secret data,” Shepherd said.

These are good example of the movement within the intelligence and law enforcement communities from a “need to know” environment toward a “need to share” one. Traditional access control policies are often based on the concept of “need to know” and are typified by predefined and often rigid specifications of which principals and roles are pre-authorized to access what information. This can and does lead to systems which discourage the sharing of information by requiring principals to be known in advance, depreciating interoperability, ignoring context, and being unresponsive to novel and unexpected situations. One of the recommendations of the 9/11 commission was to find ways to move from this traditional perspective toward one that privileges the “need to share”.

While it may be easy to define what “need to share” means in terms of very high level organizational policy, it will be challenging to understand what new technical approaches and systems are needed to support it. In addition to wikis, blogs, feeds and web services, I suspect that other new ideas will help here. The Semantic Web offers a good approach to publishing, sharing and integrating data. Computational policies can address the contextual sharing of information. NLP and information extraction are important for acquiring information from open sources. Social network analysis, trust models, and reputation systems will also probably play key roles. Finally, machine learning is often an underlying approach to getting all of the components to work and integrate.

On making tags work

February 24th, 2007

tag cloudLibraryThing has an interesting post on when tags work and when they don’t. LibraryThing is an social software service that lets its users catalog, tag, review, and rate books they have read and share the information with other users. The books that you read and what you think about them seems like a good way to induce a social system. There is a finite and relatively small universe of books in print at any given time and people often feel passionate about them.

Both LibraryThing and Amazon allow users to tag books. But with a tiny fraction of Amazon’s traffic, LibraryThing appears to have accumulated *ten times* as many book tags as Amazon—13 million tags on LibraryThing to about 1.3 million on Amazon.

A simple study was done comparing how the two communities tagged books. his conclusion, while not profound, is good to keep in mind when thinking about adding a tagging feature to your system.

“There are a couple of lessons, but the most important is this: Tagging works well when people tag “their” stuff, but it fails when they’re asked to do it to “someone else’s” stuff. You can’t get your customers to organize your products, unless you give them a very good incentive. We all make our beds, but nobody volunteers to fluff pillows at the local Sheraton.”

They offer some specific suggestions for how to make tagging work in an ecommerce environment. I like this one: “Keep tagging social. Stop selling and start connecting. If you connect people up right, the selling will follow. Think Tupperware!”.

Planet social media research

February 23rd, 2007

Planet social media research Planet Social Media Research is a feed aggregator for blogs and feeds on social media research. It’s scope is intended to be wide, covering research in many disciplines — technical, analytic, linguistic, cultural, social, policy, economic, etc. The site is hosted and managed by the UMBC ebiquity research group. We plan to maintain it as a non-commercial and ad-free resource for the research community. If you have any suggestions or recommendations for blogs or feeds to invite, please send them to planet-smr@cs.umbc.edu.planet-smr@cs.umbc.edu. Feeds can be for an entire blog, if most of its posts are relevant to research on social media, or to a category or topic within the blog, if available. We’re thinking about mechanisms for others to suggest an occasional post, as well.

StreetSmart shares traffic data via vehicular ad-hoc networks

February 23rd, 2007

sharing traffic data with vehicular ad-hoc networksEbiquity lab member Sandor Dornbush is on UMBC’s home page cited for his entrepeneurial activities. This is part of the news about the $2M Kauffman Foundation grant to support entrepreneurship.

“Dornbush has found entrepreneurial ideas in things that bug him – being stuck in traffic and his dislike for making playlists for his iPod. Dornbush is part of a research group led by one of his mentors — computer science professor Zary Segall — which specializes in human aware computing-making computers wearable, ubiquitous and most importantly, able to sense and adjust to a user’s mood, surroundings and social situation. Last year, with the help of visual arts professor David Yager and the encouragement of Alex. Brown Center for Entrepreneurship Director Vivian Armor, Dornbush took his idea StreetSmart Traffic to the Greater Baltimore Technology Council’s “Mosh Pit” business plan competition and placed third overall. The idea uses peer-to-peer wireless communication to boost a standard GPS driving aid.” (source)

UMBC’s Alex. Brown Center for Entrepreneurship was established in 2000 to foster entrepreneurship among UMBC’s students and faculty. The center works closely with the Baltimore business community, including the Greater Baltimore Technology Council, which sponsors the MoshPit! to help university students experience every aspect of starting a business. Sandor says of his MoshPit! experience

“The Mosh Pit competition was very fun and I learned a lot about how to develop and pitch a business plan,” said Dornbush. “I was kind of amazed by the support and attention that I got from the University when I did as well as I did. I would strongly recommend anybody with novel ideas that have market potential to pitch their idea.” (source)

Sandor developed the StreetSmart idea as part of his MS thesis,
StreetSmart Traffic: Discovering and Disseminating Automobile Congestion Using VANETs, which explored the idea of using vehicular ad-hoc networks to enable vehicles to automatically share traffic data collected as they travel.

“We propose a system that uses a standard GPS driving aid, augmented with peer-to-peer wireless communication. This system could provide more accurate and complete traffic monitoring than existing systems, and do so at almost no cost to the service provider. StreetSmart has been be evaluated in a simulation. The system uses a combination of clustering and epidemic communication to find and disseminate traffic information. This system is designed to accommodate dynamic traffic patterns. We ensure the privacy of the participating drivers so drivers will be willing to disclose their driving paths. This project could become a very useful system, saving millions of human hours and dollars.” (source)

Some of Sandor’s recent papers are available on the ebiquity site.

Government Research to Track Online Networking

February 23rd, 2007

Researchers at Rutgers are leading an effort funded by the Department of Homeland Security to research techniques for monitoring social networks news articles, Web blogs and other social media for indicators of potential terrorist activity.

The Rutgers Center for Discrete Mathematics and Theoretical Computer Science will lead the team made up of researchers from the University of Southern California, the University of Illinois at Urbana-Champaign and the University of Pittsburgh. The group includes researchers from AT&T Laboratories, Bell Labs’/Lucent Technologies, Princeton University, Rensselaer Polytechnic Institute and Texas Southern University. Rutgers will get $1 million per year for three years. The DHS will fund the entire team $10.2 million over three years.” (source)

With the funds, the team has established the Center for Dynamic Data Analysis (DyDAn), one of four recently-announced University Affiliate Centers of the Institute for Discrete Sciences, which is a joint project between the Department of Homeland Security (DHS) and several national laboratories, led by Lawrence Livermore National Laboratory. DyDAn will coordinate the other three new University Affiliate Centers located at the University of Illinois, the University of Pittsburgh, and the University of Southern California.

“DyDAn researchers will develop new techniques for drawing inferences from massive flows of data arriving continuously over time. Buried in such data are patterns and behaviors that are changing, often quite quickly. The DyDAn team will develop novel technologies to find these patterns and relationships in dynamic and sometimes massive datasets. The DyDAn research program spans topics in Information Management and Knowledge Discovery as well as foundational topics in Discrete Mathematics.” (source)

Among the initial projects are two involving social media. One will study the problem of analyzing large, dynamic multigraphs that arise from blogs. Another will develop algorithms for identifying hidden social structures in virtual communities with a goal of finding hidden groups, coalitions and leaders by non-semantic analysis of large communication networks.

Canada honors Gosling for Java

February 22nd, 2007

James Gosling of SunJames Gosling was honored for his role in inventing Java by being named an officer of the Order of Canada. The Order of Canada, Canada’s highest civilian honour, recognizes outstanding lifetime achievement and contributions to society and the country.

“James Gosling, a vice-president of Sun Microsystems Inc. of Santa Clara, Calif., has been named an officer of the Order of Canada, the office of the Governor General announced on Monday. … Gosling was responsible for the original design of the Java programming language and implemented the original compiler for the so-called Java virtual machine. Java programs are compiled or converted into machine code by a program called a compiler when they run.” (source)

It not often that computer scientists are recognized like this for their contributions to society. Sure, there are many ways that we recognize our own, such as fellowships in processional societies or the Turning award, but I suspect that the general public is mostly oblivious to these awards. It’s quite common for some to be lauded for their accomplishments in building a successful business empire, like Bill Gates or Sergei Brin. But it’s still rare for recognition at this level for a technical contribution.

Gosling richly deserves this honor. Java popularized object-oriented programming and introduced many important new ideas. It’s used daily by many practicing computer scientists and will probably continue to be one of the dominant languages in use for at least another decade.

When I hear Gosling the first thing that always brings to my mind is an early accomplishment. in 1981, while a CMU grad student, he released Gosling Emacs, the first implementation of Emacs for Unix. It freed a generation by allowing us to use Unix without having to suffer under vi. Parts of the program were considered so intricate and complex that hackers were warned about tinkering with them by cryptic and ominous comments in the code.

RFID tagging hospital patients improves safety

February 20th, 2007

RFID improves hospital  patient safety A BBC article, ‘Tagging’ improves patient safety, describes how RFID is being used in UK hospitals.

Hospital patients are used to wearing wristbands. But now those bands have gone high-tech.

At the Birmingham Heartlands Hospital patients wear RFID wristbands that carry personal data embedded. When they arrive they have a digital photo taken and loaded on to an electronic tag contained in a wristband worn throughout their stay.

Staff dealing with the tagged patients have access to PDAs with which they can scan the bands and also access patient details, via wifi, from a secure area on the hospital’s central computer system. A ‘traffic light’ system flashes up when a patient is ready for their operation, and as they go through the theater doors, a sensor reads the bar code on their wrist and their details are displayed on the theater’s computer screen.

Spotted on Smart Mobs.

Promoting your research on the web

February 18th, 2007

66274_m.gif Harry Chen (UMBC BS’98, MS’00 PhD’04) has a post on How scientists should market themselves commenting on Larry Page’s AAAS talk. Harry adds some good ideas and advice.

Last spring Professor Marie desJardins asked me to talk to her graduate class on Basic Research Skills. I chose to talk about why researchers should establish a presence on the web and effective ways to do it today.

Be on the Web. Researchers should develop a presence on the Web to make their work more visible. This presentation touches on some of the best practices and also a few or the worst. topics include web pages, blogs, putting papers on the Web, and search engine optimization.

I’ve agreed to give a similar talk again this spring for the course, which is being taught by Professor Krishna Sivalingam. I am interested in seeing what changes I fell have to me made as I revisit the slides. If anyone has suggestions — please let me know.

A beme is a meme spread by blogs

February 18th, 2007

memes are like genesTom Hayes coins beme as a new word for a meme “propagated by blogs and bloggers”.

“A beme is a turbo-charged meme made possible entirely by the existence of the network affect. A beme can be impactful because it is lurid–a photo of a panty-less Britney Spears, or humorous–a whimisical video of the band OKGO on treadmills, or gut-wrenching–the sad tirade by comedian Michael Richards. A beme can cement an idea with the public in a way that cannot be legislated or regulated. No legal effort by Cisco to enforce a trademark, for example, will make the public unlearn that Apple produces the iPhone.”

He says that bemerz, the people who do the propagation, can spread ideas “faster than any people in history”.

“That’s because a beme moves a billion times faster than a meme ever could. That’s the power of citizen-driven media networks. Do the math. There are nearly 60 million blogs, 600 million email users and many millions of social media citizens. Because we all can be bemerz, powerful enough to spread any idea to anyone, a beme today can be created, promulgated and soldered into social consciousness in a fraction of the time it took memes to spread 30 years ago when Professor Dawkins first made the observation.”

The essential characteristic of both genes and memes is that they spread from host to host because their nature encourages replication. Often, but not always, this is because they are beneficial to the host. Do bemes have this characteristic? Probably. An early Web 2.0 bemerz (i.e., beme host) might start to spread it and become a Web 2.0 consultant. What’s good for the beme is good for the bemerz. Can we differentiate bemes that are truly beneficial to their hosts (e.g., object-oriented programming) from those that are not (insert your favorite example here).

Of course, along with rapid diffusion may come a short life. Genes can last for eons. Memes are probably shorter lived — maybe lasting for millennia or at least centuries. What’s the half-life of a beme? It’s probably on the order of a month.

Lots of people, including our ebiquity research group, are studying how ideas and opinions spread through social media. it’s an interesting problem with many practical applications. I think that this way to frame the problem is broader than the beme idea. Yet it’s useful to have a short term that doesn’t already have meanings to refer to the spread of mental objects via the Internet. There are several basic problems to attack: how can we recognize new bemes, can we track them back to their source or sources, who are influential in their spread, can the spread be controlled, can we find relations between bemes, what happens when bemes compete or cooperate, how can we track their mutation and evolution, can they reproduce sexually, is their a ‘natural selection’ at work for bemes, are they like selfish genes, what are good metrics to measure their strength, how do bemes expire.