UMBC ebiquity research group Building intelligent systems in open, heterogeneous, dynamic, distributed environments
Wikipedia

Archive for the 'Wikipedia' Category

Wikipedia mobile launches for iPhone, Palm Pre, Android and Kindle

July 5th, 2009, by Tim Finin, posted in Mobile Computing, Social media, Web, Wikipedia

wikipedia mobileWikipedia’s mobile site has been officially launched and running on a new server (in Ruby!).

Currently the site supports four mobile platforms: iPhone, Kindle, Android, and Palm Pre. Only the English and German versions are up, but support for more languages is said to be coming.

If you visit a Wikipedia page from a supported mobile device, you will be automatically redirected to the mobile version. You can click through to the regular page for editing or accessing other features not included in the mobile transcoding (e.g., history). You can also permanently disable the mobile redirects for your device, if you like.

You can get some idea how the page rendering is simplified in a non-mobile browser by looking at a page like http://en.m.wikipedia.org/wiki/Alan_Turing. But the device specific encoding makes this work much better for each device.

I like the way it looks on my Palm Pre, which differs from the iPhone encoding, and think it will make Wikipedia much more usable from it.

(via ReadWrteWeb)

CFP: JWS special issue on Semantic Web and Social Media

June 27th, 2009, by Tim Finin, posted in Blogging, Semantic Web, Social media, Wikipedia
important dates
abstracts 21 Sept 09
submissions 01 Oct 09
notification 15 Dec 09
final copy 15 Jan 10
publication April 10

The Journal of Web Semantics will publish a special issue on Data Mining and Social Network Analysis for integrating Semantic Web and Web 2.0 in the spring of 2010. The special issue will be edited by Bettina Berendt, Andreas Hotho and Gerd Stumme and initial abstracts for papers must be submitted via the Elsevier EES system by September 21, 2009.

The special issue, invites contributions that show how synergies between Semantic Web and Web 2.0 techniques can be successfully used. Since both communities work on network-like data structures, analysis methods from different fields of research could form a link between those communities. Techniques can be – but are not limited to – social network analysis, graph analysis, machine learning and data mining methods.

Relevant topics include

  • ontology learning from Web 2.0 data
  • instance extraction from Web 2.0 systems
  • analysis of Blogs
  • discovering social structures and communities
  • predicting trends and user behaviour
  • analysis of dynamic networks
  • using content of the Web for modelling
  • discovering misuse and fraud
  • network analysis of social resource sharing systems
  • analysis of folksonomies and other Web 2.0 data structures
  • analysis of Web 2.0 applications and their data
  • deriving profiles from usage
  • personalized delivery of news and journals
  • Semantic Web personalization
  • Semantic Web technologies for recommender systems
  • ubiquitous data mining in Web (2.0) environment
  • applications

Wikinvest offers the wisdom of the investing crowds

February 9th, 2009, by Tim Finin, posted in Social media, Wikipedia

Wikinvest is a free, community driven site that “wants to make investing easier by creating the world’s best source of investment information and investment tools”.

A story in today’s NTY, Offering Free Investment Advice by Anonymous Volunteers, says

“Following the model of Wikipedia, the online encyclopedia that anyone can edit, Wikinvest is building a database of user-generated investment information on popular stocks. A senior at Yale writes about the energy industry, for example, while a former stockbroker covers technology and a mother in Arizona tracks children’s retail chains.

Wikinvest, which recently licensed some content to the Web sites of USA Today and Forbes, seeks to be an alternative to Web portals that are little more than “a data dump” of income statements and government filings, said Parker Conrad, a co-founder.

Users annotate stock charts with notes explaining peaks and valleys, edit company profiles and opine about whether to buy or sell. The site is creating a wire service with articles from finance blogs and building a cheat sheet to guide readers through financial filings by defining terms and comparing a company’s performance to competitors’.”

After a quick look at the site it does look interesting. I may well be ready to trust the wisdom of the crowds over the platitudes of the pundits. The Microsoft article has a lot of useful data and lays out reasons to buy and also to sell and lets registered members vote on whether they agree or not. Of course, I thought the reasons offered on both sides were valid — rather than simple propositions their validity needs to be quantified.

For what it is worth, I note that the site is using MediaWiki. I wonder if there are unique opportunities to incorporate RDF and or RDFa into such a site, perhaps encoding or annotating their WikiData.

Extracting Wikipedia infobox values from text

January 27th, 2009, by Tim Finin, posted in NLP, Semantic Web, Social media, Wikipedia

Text Analysis Conference This year’s Text Analysis Conference (TAC) has an interesting track focused on processing text to populate Wikipedia infoboxes, both for existing entities with missing values as well as newly discovered entities.

TAC has been run by the US National Institute of Standards and Technology (NIST) to to encourage research in natural language processing and related applications. As in the NIST sponsored MUC, TREC and ACE workshops, this is done by by providing a large test collection, common evaluation procedures, and a forum for organizations to share their results. The first TAC was held this year and included 65 teams from 20 countries who participated in three tracks: question answering, summarization and recognizing textual entailments.

TAC 2009 will include a new track on Knowledge Base Population coordinated by Paul McNamee of the Johns Hopkins University Human Language Technology Center of Excellence.

“The goal of the new Knowledge Base Population track is to augment an existing knowledge representation with information about entities that is discovered from a collection of documents. A snapshot of Wikipedia infoboxes will be used as the original knowledge source, and participants will be expected to fill in empty slots for entities that do exist, add missing entities and their learnable attributes, and provide links between entities and references to text supporting extracted information. The KBP task lies at the intersection of Question Answering and Information Extraction and is expected to be of particular interest to groups that have participated in ACE or TREC QA.”

This is an exciting task and doing well in it will require a a mixture of language processing, knowledge-based processing and (probably) machine learning.

The TAC 2009 workshop will be co-located with TREC and held 16-17 November in Gaithersburg, MD. If you are interested in participating, you should register by March 3.

Wikirage tracks whats hot on Wikipedia

December 30th, 2008, by Tim Finin, posted in Social media, Wikipedia


Wikirage is yet another way to track what’s happening in the world via changes in social media, in this case, Wikipedia. As the site suggests, “popular people in the news, the latest fads, and the hottest video games can be quickly identified by monitoring this social phenomenon.”

Wikirage lists the 100 Wikipedia pages that are being heavily edited over any of six time periods from the last hour to the last month. You can see the top 100 by your choice of six metrics: number of quality edits, unique editors, total edits, vandalism, reversions, or undos. Clicking on a result shows a monthly summary for the article, for example, December 2008 Gaza Strip airstrikes, which is at the top of today’s list for number of edits as I write. I understand the Gaza article, but what’s up with the Tasmanian tiger?

The interface has some other nice features, such as marking pages in red that have high revision, vandalism or undo rates and showing associated Wikipedia flags that indicating articles that need attention or don’t live up to standards. Wikirage is also available for the English, Japanese, Spanish, German and French language Wikipedias.

Wikirage was developed by Craig Wood and is a nicely done system.

(via the Porn Sex Viagra Casino Spam site)

Journal requires authors to include Wikipedia article with submissions

December 18th, 2008, by Tim Finin, posted in Social media, Web, Wikipedia

Scientific journals are undergoing rapid evolution as they adapt to the Web and various forms of social media. As reported by Nature (Publish in Wikipedia or perish) and in ReadWriteWeb, the journal RNA Biology is experimenting with a connection to Wikipedia. Articles submitted for publication about new RNA molecules must also include a draft Wikipedia page that summarizes the work. The journal will then peer review the page before publishing it in Wikipedia.

Here are the guidelines from the RNA Biology site:

“To be eligible for publication the Supplementary Material must contain: (1) a link to a Wikipedia article preferably in a User’s space. Upon acceptance this can easily be moved into Wikipedia itself together with a reference to the published article.

At least one stub article (essentially an extended abstract) for the paper should be added to either an author’s userspace at Wikipedia (preferred route) or added directly to the main Wikipedia space (be sure to add literature references to avoid speedy deletion). This article will be reviewed alongside the manuscript and may require revision before acceptance. Upon acceptance the former articles can easily be exported to the main Wikipedia space. See below for guidelines on how to do this. Existing articles can be updated in accordance with the latest published results.”

This is definitely an interesting and forward looking idea. Yet, I can not help having the cynical thought that it’s also a great way for the journal to boost it’s page rank.

Parallax: a better interface for Freebase

August 14th, 2008, by Tim Finin, posted in KR, Ontologies, Semantic Web, Social media, Wikipedia

David Huynh completed his PhD at MIT CSAIL last year and joined MetaWeb a few months ago, where he has been working on new and better interfaces to explore the data encoded in their Freebase system. He recently released Parallax as a prototype browsing interface for Freebase. Here is a video that shows the interface in action.



Freebase Parallax: A new way to browse and explore data from David Huynh on Vimeo.

Freebase is “an open database of the world’s information” that is constructed by a Wiki-like collaborative community. In many ways it is like the Semantic Web model, with two big differences: (1) the data is stored centrally rather than distributed across the Web and (2) the representation system is not based on RDF but rather uses a custom built object-oriented data representation language.

Freebase is a great resource. Much of the data is extracted from Wikipedia, so its content has a large overlap with DBpedia. But it is also relatively easy to upload additional information in various structured forms and many have done so, resulting in an extended coverage.

This is clearly a system in the Web of Data space along with the Linking Open Data effort and having it should offer a way for us all to explore the consequences of some of the underlying design decisions.

Wikipedia experiments with trusted editors to approve revisions

July 18th, 2008, by Tim Finin, posted in Semantic Web, Social media, Wikipedia

The NYT Bits blog has a post, Wikipedia Tries Approval System to Reduce Vandalism on Pages on Wikipedia’s proposed Flagged revisions/Sighted versions policy. This policy is currently being used in the German version of Wikipedia.

“Wikipedia is considering a basic change to its editing philosophy to cut down on vandalism. In the process, the online encyclopedia anyone can edit would add a layer of hierarchy and eliminate some of the spontaneity that has made the site, at times, an informal source of news. It well could bring some law and order to the creative anarchy that has made the site a runaway success but also made it a target for familiar criticism.

The German site, which is particularly vexed by vandalism, uses the system to delay changes from appearing until someone in authority (a designated checker) has verified that the changes are not vandalism. Once a checker has signed off on the changes, they will appear on the site to any visitor; before a checker has signed off, the last, checker-approved version is what most visitors will see. (There are complicated exceptions, of course. When a “checker” makes a change, it appears immediately. And registered users, who make up less than 5 percent of Wikipedia users, will also see “unchecked” versions.)”

The process adds a new category of Wikipedians, Surveyors, who are “trusted editors” able to review the tentative modifications and promote them to be “sighted pages”. There is a public test-wiki for the English Wikipedia that allows people to try out the new software.

Adding a system of positive endorsement from trusted editors is is an interesting approach that I think could work well. It’s not invulnerable to subversion and gaming, but few non-oppressive systems are. Wikipedia works as well as it does because most people are usually are reasonable cooperative. Even with the three qualifiers in the previous sentence, it works pretty well.

Encyclopedia Britannica to let readers contribute, à la Wikipedia

June 18th, 2008, by Tim Finin, posted in Semantic Web, Social media, Web, Wikipedia

Company’s site invites public to contribute, wiki-style, with rules to guard credibility

When I was young, encyclopedias were the Web. I was aware that there was a hierarchy of encyclopedias, with the World Book serving the low end for young students and the Encyclopedia Britannica for those in high school and beyond. The Britannica was so intellectual that they even uses funny letters in their name: Encyclopædia Britannica. My family had a mid-range Encyclopedia set (Colliers) and I spent many hours lost in browsing through it.

Britannica started a Web version, Britannica online, in 1996 that is primarily a paid service ($70/year) with more limited free services. Now they are opening up their pages to allow the public to make suggested additions and changes, as reported in the San Jose Mercury News, Britannica opens its online pages.

“By inviting a larger range of people to contribute and collaborate, we can produce more coverage,” said Britannica spokesman Tom Panelas. “People in the community can contribute to the improvement of Encyclopedia Britannica.”

The new site will not be a free-for-all. The core encyclopedia will continue to be edited and will bear the imprimatur “Britannica Checked.” But Britannica will now let outsiders create articles, essays and multi-media presentations. There will be proper attribution. And Britannica still keeps gatekeepers; don’t expect an entry on “Baywatch” actress Pamela Anderson.

Earlier this year, Britannica announced a program granting free access to bloggers and online journalists.

“Bloggers, webmasters, online journalists and anyone else who publishes regularly on the Internet can now get free subscriptions to Britannica Online (www.britannica.com). Anyone interested in participating in Britannica’s new WebShare initiative can apply for a free subscription at http://signup.eb.com or get more information at http://britannicanet.com.

These are clearly smart moves on Britannica’s part, as Wikipedia has shown that their users do a great job of keeping the entries accurate and up to date. A question in my mind is whether Britannica online’s paid subscriber base, even when augmented with free subscriptions, will be large enough, and has sufficient breadth, and motivated to keep its entries current. A second issue is whether this commercial approach will benefit from the technological experimentation and enhancements that can come with an “open source” approach, e.g., what Dbpedia and Freebase and others have done with Wikipedia content.

WIkipedia research papers

February 28th, 2008, by Tim Finin, posted in Semantic Web, Social media, Web, Wikipedia

Mike Bergman has a comprehensive list of about 100 papers on Wikipedia as a knowledge source.

“Since about 2005 — and at an accelerating pace — Wikipedia has emerged as the leading online knowledge base for conducting semantic Web and related research. The system is being tapped for both data and structure. Wikipedia has arguably replaced WordNet as the leading lexicon for concepts and relations. Because of its scope and popularity, many argue that Wikipedia is emerging as the de facto structure for classifying and organizing knowledge in the 21st century.”

This complements a similar list on Wikipedia itself, Wikipedia in academic studies.

“Below is an incomplete list of academic conference presentations, peer-reviewed papers and other types of academic writing which focus on Wikipedia as their subject. Works that mention Wikipedia only in passing are unlikely to be listed. Unpublished works of presumably academic quality are listed in a dedicated section.”

(spotted on the dbpedia mailing list)

Wisdom of the crowd control?

February 24th, 2008, by Tim Finin, posted in Social media, Web, Web 2.0, Wikipedia

Slate has an interesting article, The Wisdom of the Chaperones — Digg, Wikipedia, and the myth of Web 2.0 democracy, that explores who controls some of the popular social media sites. It turns out that the social web is more hegemonic than we thought.

wikipedia hegemony

“Social-media sites like Wikipedia and Digg are celebrated as shining examples of Web democracy, places built by millions of Web users who all act as writers, editors, and voters. In reality, a small number of people are running the show. According to researchers in Palo Alto, 1 percent of Wikipedia users are responsible for about half of the site’s edits. The site also deploys bots—supervised by a special caste of devoted users—that help standardize format, prevent vandalism, and root out folks who flood the site with obscenities. This is not the wisdom of the crowd. This is the wisdom of the chaperones.” (link)

The work cited is by the Augmented Social Cognition research group at PARC. See, for example, their post on the behavior of the most active Wikipedians. Very interesting.

I think it’ even worse, in many ways, on Digg, which the article also discusses.

“The same undemocratic underpinnings of Web 2.0 are on display at Digg.com. Digg is a social-bookmarking hub where people submit stories and rate others’ submissions; the most popular links gravitate to the site’s front page. The site’s founders have never hidden that they use a “secret sauce”—a confidential algorithm that’s tweaked regularly—to determine which submissions make it to the front page. Historically, this algorithm appears to have favored the site’s most active participants. Last year, the top 100 Diggers submitted 44 percent of the site’s top stories. In 2006, they were responsible for 56 percent.” (link)

Will rule by the few always be the case? Who knows. The article does point out that the moderation system used by Slashdot helps to broaden the elite and also describes a simple “write one, rate two” policy used by Helium, a site new to me. Helium is a community for freelance writers that helps them connect with publishers who will pay for articles on their topics. The publishers are vetted, so students seeking to buy term papers will have to look elsewhere.

You are currently browsing the archives for the Wikipedia category.

  Home | Archive | Login | Feed






UMBC