Wikipedia experiments with trusted editors to approve revisions

July 18th, 2008

The NYT Bits blog has a post, Wikipedia Tries Approval System to Reduce Vandalism on Pages on Wikipedia’s proposed Flagged revisions/Sighted versions policy. This policy is currently being used in the German version of Wikipedia.

“Wikipedia is considering a basic change to its editing philosophy to cut down on vandalism. In the process, the online encyclopedia anyone can edit would add a layer of hierarchy and eliminate some of the spontaneity that has made the site, at times, an informal source of news. It well could bring some law and order to the creative anarchy that has made the site a runaway success but also made it a target for familiar criticism.

The German site, which is particularly vexed by vandalism, uses the system to delay changes from appearing until someone in authority (a designated checker) has verified that the changes are not vandalism. Once a checker has signed off on the changes, they will appear on the site to any visitor; before a checker has signed off, the last, checker-approved version is what most visitors will see. (There are complicated exceptions, of course. When a “checker” makes a change, it appears immediately. And registered users, who make up less than 5 percent of Wikipedia users, will also see “unchecked” versions.)”

The process adds a new category of Wikipedians, Surveyors, who are “trusted editors” able to review the tentative modifications and promote them to be “sighted pages”. There is a public test-wiki for the English Wikipedia that allows people to try out the new software.

Adding a system of positive endorsement from trusted editors is is an interesting approach that I think could work well. It’s not invulnerable to subversion and gaming, but few non-oppressive systems are. Wikipedia works as well as it does because most people are usually are reasonable cooperative. Even with the three qualifiers in the previous sentence, it works pretty well.

We are on a new server

July 11th, 2008

After several months of procrastination, we’re on a new server. Nicer, faster, hopefully more secure. Thanks to Filip, who helped make the transition painless! a URL shortener with semantic and geo-spatial analysis

July 9th, 2008 is a URL-shortener like TinyURL with a host of interesting features, as enumerated in the switchAbit blog.

1. History — we remember the last 15 shortened URLs you’ve created. They’re displayed on the home page next time you go back. Cookie-based.
2. Click/Referrer tracking — Every time someone clicks on a short URL we add 1 to the count of clicks for that page and for the referring page.
3. There’s a simple API for creating short URLs from your web apps.
4. We automatically create three thumbnail images for each page you link through, small, medium and large size. You can use these in presenting choices to your users.
5. We automatically mirror each page, never know when you might need a backup.

A post in ReadWriteWeb, Please Use This TinyURL of the Future, points out some interesting ‘semantic’ features.

“In the background, is analyzing all of the pages that its users create shortcuts to using the Open Calais semantic analysis API from Reuters! Calais is something we’ve written about extensively here. will use Calais to determine the general category and specific subjects of all the pages its users create shortcuts to. That information will be freely available to the developer community using XML and JSON APIs as well.
    As if that’s not a whole lot of awesome already – is also using the MetaCarta GeoParsing API to draw geolocation data out of all the web pages it collects.
    You want to see all the web pages related to the US Presidential election, Barack Obama and Asheville, North Carolina? Or about Technology, Google and The Dalles, Oregon? That will be what delivers if it can build up a substantial database of pages. Once it does, it will open that data up to other developers as well.”

The idea of using a URL shortening service to identify significant or interesting Web pages for further processing is a new twist. It would be great of other services with catalogs of interesting pages, like, did this as well. Eventually, this will be done to the entire Web, but for now, it’s too expensive. This is an interesting intermediate step.

BBC interviews Tim Berners-Lee on Semantic Web

July 9th, 2008

The BBC broadcast an interview with Tim-Berners Lee on the future of the Internet”. In the interview he talk about the linking open data paradigm. On a related note, MIT recently announced that Tim has been named the 3Com Founders Professor of Engineering in the School of Engineering, with a joint appointment in the Department of Electrical Engineering and Computer Science.

HealthMap mines text for a global disease alert map

July 8th, 2008

HealthMap is an interesting Web site that displays a “global disease alert map” based on information extracted from a variety of text sources on the Web, including news, WHO and NGOs. HealthMap was developed as a research project by Clark Freifeld and John Brownstein of the Children’s Hospital Informatics Program, part of the Harvard-MIT Division of Health Sciences & Technology.

HealthMap mines text for a global disease alert map

Their site says

“HealthMap brings together disparate data sources to achieve a unified and comprehensive view of the current global state of infectious diseases and their effect on human and animal health. This freely available Web site integrates outbreak data of varying reliability, ranging from news sources (such as Google News) to curated personal accounts (such as ProMED) to validated official alerts (such as World Health Organization). Through an automated text processing system, the data is aggregated by disease and displayed by location for user-friendly access to the original alert. HealthMap provides a jumping-off point for real-time information on emerging infectious diseases and has particular interest for public health officials and international travelers.”

The work was done in part with support from Google, as described in a story on ABC news, Researchers Track Disease With Google News, Money

Twitterment, domain grabbing, and grad students who could have been rich!

July 8th, 2008

Here at Ebiquity, we’ve had a number of great grad students. One of them, Akshay Java, hacked out a search engine for twitter posts around early April last year, and named it twitterment. He blogged about it here first. He did it without the benefit of the XMPP updates, by parsing the public timeline. It got talked about in the blogosphere, (including by Scoble), got some press, and there was an article in the MIT Tech review that used his visualization of some of the twitter links. It even got talked about in Wired’s blog, something we found out only yesterday. We were also told that three days after the post in Wired’s blog, someone somewhere registered the domain (I won’t feed them pagerank by linking!), and set up a page that looks very similar to Akshay’s. It has Google Adsense, and of course just passes the query to Google with a site restriction to twitter. So they’re poaching coffee and cookie money from the students in our lab 🙂

So of course we played with Akshay’s hack, hosted it on one of our university boxes for a few months, but didn’t really have the bandwidth or compute (or time) resources to keep up. Startups such as summize appeared later and provided similar functionality. For the last week or two we’ve  been moving the code of twitterment to Amazon’s cloud to restart the service. Of course, today comes the news that twitter might buy summize, quasi confirmed by Om Malik. Lesson to you grad students — if you come up with something clever, file an invention disclosure with your university’s tech transfer folks. And don’t listen to your advisors if they think that there isn’t a paper in what you’ve hacked — there may yet be a few million dollars in it 🙂

FringeDC Land of Lisp, 6pm 7/12/08, DC

July 6th, 2008

This FringeDC meeting looks like fun for Lispers in the DC area.

“Conrad Barski will be presenting excerpts from his new book for community feedback. Join us at Sova Espresso & Wine for a presentation from Conrad Barski, M.D. from the new book “Land of Lisp” published by No Starch Press, due this Fall. We’ll discuss Lisp and see never-before-seen comics and game examples from the book! Afterward, we’ll be talking over some wine, coffee and food at this great little hangout in DC’s H Street Corridor.”

FringeDC meeting: Land of Lisp, 12 July 2008

New FIPA/OMG standards for agents

July 6th, 2008

Jim Odell, the acting chair of the FIPA IEEE Computer Society standards committee, recently sent out an update to the members on current activities.

“FIPA is currently working with the OMG on agent standardization, including an SOA standard that includes agents (SOA-Pro) and an Agent Metamodel and Profile (AMP). The Agent Metamodel and Profile RFP has many companies that are participating, including (but not limited to): HP, Unisys, CSC, Deere & Co, Thales, Metropolitan Life, SINTEF, and DFKI. If you are interested in participating, please let me know.

Any comments on the Agent Metamodel and Profile (AMP) RFP are welcomed. (The above companies and RMIT have already submitted their suggestions. The current release can be downloaded from:

The OMG Agent Platform Special Interest Group page maintains links to documents about these emerging agent standards.

Textbook piracy via BitTorrent on the rise

July 2nd, 2008

The Chronicle of Higher Education has a story on students using BitTorrent to share scanned copies of textbooks. The article, Textbook Piracy Grows Online, Prompting a Counterattack From Publishers, starts off

“College students are increasingly downloading illegal copies of textbooks online, employing the same file-trading technologies used to download music and movies. Feeling threatened, book publishers are stepping up efforts to stop the online piracy. One Web site, called Textbook Torrents, promises more than 5,000 textbooks for download in PDF format, complete with the original textbook layout and full-color illustrations. Users must simply set up a free account and download a free software program that uses a popular peer-to-peer system called BitTorrent. Other textbook-download sites are even easier to use, offering digital books at the click of a mouse.”

Text books are an interesting niche for file sharing. They are surely expensive and publishers manage to publish new editions of popular titles almost every year, undermining the market for used texts. On the other hand, digitizing a text book requires scanning it, which takes time, attention to detail, equipment, and labor. It’s not as simple as ripping a CD.

Update 7/7/08: The Chronicle of Higher Education has a follow up story, Founder of Textbook-Download Site Says Offering Free Copyrighted Textbooks Is Act of ‘Civil Disobedience’

“… But the founder of Textbook Torrents calls his actions “civil disobedience” against “the monopolistic business practices” of textbook publishers. The site’s founder, who asked to remain anonymous for fear of legal action against him, talked to The Chronicle over an Internet phone call last night and defended his creation, though he described it as operating in a “legal gray area.” He said he is an undergraduate at a college outside of the United States, though he would not name the institution or country, and that he operates the Web site from there. His biggest complaint: that textbooks are just too expensive, and that prices climb each year. “We’re showing both students and textbook publishers that this isn’t acceptable anymore,” he said. “A lot of users are absolutely fed up with the system.” He said he views the 64,000 registered users of his textbook-download site as votes against that system.”

Spammers are using Amazon EC2

July 1st, 2008

The Washington Posts Security Fix blog has a post, Amazon: Hey Spammers, Get Off My Cloud!, reporting on allegations that spammers are starting to use Amazon’s Elastic Compute Cloud (EC2) servers. It only makes sense — you can sign up easily without committing to a contract of any length, the price is low, and the IP addresses are drawn from a wide range, making it hard to block them all. Besides, if Amazon’s EC2 IP addresses all get put in a spam blacklist, it will be bad for their many legitimate users. It may be tricky for Amazon to police this.

Blog comment spam magnet

July 1st, 2008

A good fraction of the comment spam that makes it through our Akismet filter is from people who are trying to add a comment to one of our posts about spam blogs or comments. Here’s an example from today’s batch, a comment on a two-year old post Blog comment spam with plagiarized text: hard to spot from cameroun trying to promote the site

“spam is a real problem in this day not just for .edu but for the entire internet world. Plagiarism is a problem too.”

It’s easy for me to classify this as spam since the comment was made on a very old post, is short, includes a reference to a site that looks commercial, makes a few general and superficial statements that are not really tied to any of the posts details.

I think it’s ironic that so many SEO wannabes try to spam posts about spam. I guess they just have spam on the brain. So, I offer up this post as food for the comment spammers and their search and comment tools.

akismet, anti-spam, antispam, automated, automated, automatic, backlink, backlinks, bad behavior, blacklist, block, blocking, blog, blogging, capcha, comment, comment spam, comments, human, keywords, links, links, nofollow, pagerank, people, plagiarize, plagiarism, rank, search engine optimization, seo, spam, spam blogs, spam comments, spam karma, spamming, splog, splog, splogs, steal, target, trackbacks, traffic, typepad, wordpress.

Splogs and politics

July 1st, 2008

Here’s something I never expected: splogs as a political issue. Actually, it’s allegations of political blogs being splogs, or rather allegations of accusing political blogs of being a splogs in order to get Google to block them. The NYT Bits blog has a post, Google and the Anti-Obama Bloggers, that describes the controversy.

“Did Google use its network of online services to silence critics of Barack Obama? That was the question buzzing on a corner of the blogosphere over the last few days, after several anti-Obama bloggers were unable to update their sites, which are hosted on Google’s Blogger service. … In an article that appeared on, the reporter Simon Owens spoke with some of the affected bloggers, who said they believed that Google had fallen prey to a campaign by activists supporting Senator Obama. According to the bloggers, the Obama supporters had clicked on a “flag” on the anti-Obama blogs alerting Google that they were spam.”

Maybe this is a good reason to rely on the judgment of machines, at least until they start running for office.