Wikipedia’s mobile site has been officially launched and running on a new server (in Ruby!).
Currently the site supports four mobile platforms: iPhone, Kindle, Android, and Palm Pre. Only the English and German versions are up, but support for more languages is said to be coming.
If you visit a Wikipedia page from a supported mobile device, you will be automatically redirected to the mobile version. You can click through to the regular page for editing or accessing other features not included in the mobile transcoding (e.g., history). You can also permanently disable the mobile redirects for your device, if you like.
You can get some idea how the page rendering is simplified in a non-mobile browser by looking at a page like http://en.m.wikipedia.org/wiki/Alan_Turing. But the device specific encoding makes this work much better for each device.
I like the way it looks on my Palm Pre, which differs from the iPhone encoding, and think it will make Wikipedia much more usable from it.
The special issue, invites contributions that show how synergies between Semantic Web and Web 2.0 techniques can be successfully used. Since both communities work on network-like data structures, analysis methods from different fields of research could form a link between those communities. Techniques can be – but are not limited to – social network analysis, graph analysis, machine learning and data mining methods.
Relevant topics include
ontology learning from Web 2.0 data
instance extraction from Web 2.0 systems
analysis of Blogs
discovering social structures and communities
predicting trends and user behaviour
analysis of dynamic networks
using content of the Web for modelling
discovering misuse and fraud
network analysis of social resource sharing systems
analysis of folksonomies and other Web 2.0 data structures
Wikinvest is a free, community driven site that “wants to make investing easier by creating the world’s best source of investment information and investment tools”.
“Following the model of Wikipedia, the online encyclopedia that anyone can edit, Wikinvest is building a database of user-generated investment information on popular stocks. A senior at Yale writes about the energy industry, for example, while a former stockbroker covers technology and a mother in Arizona tracks children’s retail chains.
Wikinvest, which recently licensed some content to the Web sites of USA Today and Forbes, seeks to be an alternative to Web portals that are little more than “a data dump” of income statements and government filings, said Parker Conrad, a co-founder.
Users annotate stock charts with notes explaining peaks and valleys, edit company profiles and opine about whether to buy or sell. The site is creating a wire service with articles from finance blogs and building a cheat sheet to guide readers through financial filings by defining terms and comparing a company’s performance to competitors’.”
After a quick look at the site it does look interesting. I may well be ready to trust the wisdom of the crowds over the platitudes of the pundits. The Microsoft article has a lot of useful data and lays out reasons to buy and also to sell and lets registered members vote on whether they agree or not. Of course, I thought the reasons offered on both sides were valid — rather than simple propositions their validity needs to be quantified.
For what it is worth, I note that the site is using MediaWiki. I wonder if there are unique opportunities to incorporate RDF and or RDFa into such a site, perhaps encoding or annotating their WikiData.
This year’s Text Analysis Conference (TAC) has an interesting track focused on processing text to populate Wikipedia infoboxes, both for existing entities with missing values as well as newly discovered entities.
TAC has been run by the US National Institute of Standards and Technology (NIST) to to encourage research in natural language processing and related applications. As in the NIST sponsored MUC, TREC and ACE workshops, this is done by by providing a large test collection, common evaluation procedures, and a forum for organizations to share their results. The first TAC was held this year and included 65 teams from 20 countries who participated in three tracks: question answering, summarization and recognizing textual entailments.
“The goal of the new Knowledge Base Population track is to augment an existing knowledge representation with information about entities that is discovered from a collection of documents. A snapshot of Wikipedia infoboxes will be used as the original knowledge source, and participants will be expected to fill in empty slots for entities that do exist, add missing entities and their learnable attributes, and provide links between entities and references to text supporting extracted information. The KBP task lies at the intersection of Question Answering and Information Extraction and is expected to be of particular interest to groups that have participated in ACE or TREC QA.”
This is an exciting task and doing well in it will require a a mixture of language processing, knowledge-based processing and (probably) machine learning.
The TAC 2009 workshop will be co-located with TREC and held 16-17 November in Gaithersburg, MD. If you are interested in participating, you should register by March 3.
Wikirage is yet another way to track what’s happening in the world via changes in social media, in this case, Wikipedia. As the site suggests, “popular people in the news, the latest fads, and the hottest video games can be quickly identified by monitoring this social phenomenon.”
Wikirage lists the 100 Wikipedia pages that are being heavily edited over any of six time periods from the last hour to the last month. You can see the top 100 by your choice of six metrics: number of quality edits, unique editors, total edits, vandalism, reversions, or undos. Clicking on a result shows a monthly summary for the article, for example, December 2008 Gaza Strip airstrikes, which is at the top of today’s list for number of edits as I write. I understand the Gaza article, but what’s up with the Tasmanian tiger?
The interface has some other nice features, such as marking pages in red that have high revision, vandalism or undo rates and showing associated Wikipedia flags that indicating articles that need attention or don’t live up to standards. Wikirage is also available for the English, Japanese, Spanish, German and French language Wikipedias.
Wikirage was developed by Craig Wood and is a nicely done system.
Scientific journals are undergoing rapid evolution as they adapt to the Web and various forms of social media. As reported by Nature (Publish in Wikipedia or perish) and in ReadWriteWeb, the journal RNA Biology is experimenting with a connection to Wikipedia. Articles submitted for publication about new RNA molecules must also include a draft Wikipedia page that summarizes the work. The journal will then peer review the page before publishing it in Wikipedia.
Here are the guidelines from the RNA Biology site:
“To be eligible for publication the Supplementary Material must contain: (1) a link to a Wikipedia article preferably in a User’s space. Upon acceptance this can easily be moved into Wikipedia itself together with a reference to the published article.
…
At least one stub article (essentially an extended abstract) for the paper should be added to either an author’s userspace at Wikipedia (preferred route) or added directly to the main Wikipedia space (be sure to add literature references to avoid speedy deletion). This article will be reviewed alongside the manuscript and may require revision before acceptance. Upon acceptance the former articles can easily be exported to the main Wikipedia space. See below for guidelines on how to do this. Existing articles can be updated in accordance with the latest published results.”
This is definitely an interesting and forward looking idea. Yet, I can not help having the cynical thought that it’s also a great way for the journal to boost it’s page rank.
David Huynh completed his PhD at MIT CSAIL last year and joined MetaWeb a few months ago, where he has been working on new and better interfaces to explore the data encoded in their Freebase system. He recently released Parallax as a prototype browsing interface for Freebase. Here is a video that shows the interface in action.
Freebase is “an open database of the world’s information” that is constructed by a Wiki-like collaborative community. In many ways it is like the Semantic Web model, with two big differences: (1) the data is stored centrally rather than distributed across the Web and (2) the representation system is not based on RDF but rather uses a custom built object-oriented data representation language.
Freebase is a great resource. Much of the data is extracted from Wikipedia, so its content has a large overlap with DBpedia. But it is also relatively easy to upload additional information in various structured forms and many have done so, resulting in an extended coverage.
This is clearly a system in the Web of Data space along with the Linking Open Data effort and having it should offer a way for us all to explore the consequences of some of the underlying design decisions.
“Wikipedia is considering a basic change to its editing philosophy to cut down on vandalism. In the process, the online encyclopedia anyone can edit would add a layer of hierarchy and eliminate some of the spontaneity that has made the site, at times, an informal source of news. It well could bring some law and order to the creative anarchy that has made the site a runaway success but also made it a target for familiar criticism.
…
The German site, which is particularly vexed by vandalism, uses the system to delay changes from appearing until someone in authority (a designated checker) has verified that the changes are not vandalism. Once a checker has signed off on the changes, they will appear on the site to any visitor; before a checker has signed off, the last, checker-approved version is what most visitors will see. (There are complicated exceptions, of course. When a “checker” makes a change, it appears immediately. And registered users, who make up less than 5 percent of Wikipedia users, will also see “unchecked” versions.)”
The process adds a new category of Wikipedians, Surveyors, who are “trusted editors” able to review the tentative modifications and promote them to be “sighted pages”. There is a public test-wiki for the English Wikipedia that allows people to try out the new software.
Adding a system of positive endorsement from trusted editors is is an interesting approach that I think could work well. It’s not invulnerable to subversion and gaming, but few non-oppressive systems are. Wikipedia works as well as it does because most people are usually are reasonable cooperative. Even with the three qualifiers in the previous sentence, it works pretty well.
Company’s site invites public to contribute, wiki-style, with rules to guard credibility
When I was young, encyclopedias were the Web. I was aware that there was a hierarchy of encyclopedias, with the World Book serving the low end for young students and the Encyclopedia Britannica for those in high school and beyond. The Britannica was so intellectual that they even uses funny letters in their name: Encyclopædia Britannica. My family had a mid-range Encyclopedia set (Colliers) and I spent many hours lost in browsing through it.
Britannica started a Web version, Britannica online, in 1996 that is primarily a paid service ($70/year) with more limited free services. Now they are opening up their pages to allow the public to make suggested additions and changes, as reported in the San Jose Mercury News, Britannica opens its online pages.
“By inviting a larger range of people to contribute and collaborate, we can produce more coverage,” said Britannica spokesman Tom Panelas. “People in the community can contribute to the improvement of Encyclopedia Britannica.”
…
The new site will not be a free-for-all. The core encyclopedia will continue to be edited and will bear the imprimatur “Britannica Checked.” But Britannica will now let outsiders create articles, essays and multi-media presentations. There will be proper attribution. And Britannica still keeps gatekeepers; don’t expect an entry on “Baywatch” actress Pamela Anderson.
Earlier this year, Britannica announced a program granting free access to bloggers and online journalists.
“Bloggers, webmasters, online journalists and anyone else who publishes regularly on the Internet can now get free subscriptions to Britannica Online (www.britannica.com). Anyone interested in participating in Britannica’s new WebShare initiative can apply for a free subscription at http://signup.eb.com or get more information at http://britannicanet.com.
These are clearly smart moves on Britannica’s part, as Wikipedia has shown that their users do a great job of keeping the entries accurate and up to date. A question in my mind is whether Britannica online’s paid subscriber base, even when augmented with free subscriptions, will be large enough, and has sufficient breadth, and motivated to keep its entries current. A second issue is whether this commercial approach will benefit from the technological experimentation and enhancements that can come with an “open source” approach, e.g., what Dbpedia and Freebase and others have done with Wikipedia content.
“Since about 2005 — and at an accelerating pace — Wikipedia has emerged as the leading online knowledge base for conducting semantic Web and related research. The system is being tapped for both data and structure. Wikipedia has arguably replaced WordNet as the leading lexicon for concepts and relations. Because of its scope and popularity, many argue that Wikipedia is emerging as the de facto structure for classifying and organizing knowledge in the 21st century.”
“Below is an incomplete list of academic conference presentations, peer-reviewed papers and other types of academic writing which focus on Wikipedia as their subject. Works that mention Wikipedia only in passing are unlikely to be listed. Unpublished works of presumably academic quality are listed in a dedicated section.”
“Social-media sites like Wikipedia and Digg are celebrated as shining examples of Web democracy, places built by millions of Web users who all act as writers, editors, and voters. In reality, a small number of people are running the show. According to researchers in Palo Alto, 1 percent of Wikipedia users are responsible for about half of the site’s edits. The site also deploys bots—supervised by a special caste of devoted users—that help standardize format, prevent vandalism, and root out folks who flood the site with obscenities. This is not the wisdom of the crowd. This is the wisdom of the chaperones.” (link)
The work cited is by the Augmented Social Cognition research group at PARC. See, for example, their post on the behavior of the most active Wikipedians. Very interesting.
I think it’ even worse, in many ways, on Digg, which the article also discusses.
“The same undemocratic underpinnings of Web 2.0 are on display at Digg.com. Digg is a social-bookmarking hub where people submit stories and rate others’ submissions; the most popular links gravitate to the site’s front page. The site’s founders have never hidden that they use a “secret sauce”—a confidential algorithm that’s tweaked regularly—to determine which submissions make it to the front page. Historically, this algorithm appears to have favored the site’s most active participants. Last year, the top 100 Diggers submitted 44 percent of the site’s top stories. In 2006, they were responsible for 56 percent.” (link)
Will rule by the few always be the case? Who knows. The article does point out that the moderation system used by Slashdot helps to broaden the elite and also describes a simple “write one, rate two” policy used by Helium, a site new to me. Helium is a community for freelance writers that helps them connect with publishers who will pay for articles on their topics. The publishers are vetted, so students seeking to buy term papers will have to look elsewhere.