UMBC ebiquity
2008

Archive for 2008

Neologism Web-based RDFS vocabulary editor

November 27th, 2008, by Tim Finin, posted in RDF, Semantic Web, Web, Web 2.0

Neologism is a simple web-based RDF Schema vocabulary editor and publishing system under development at DERI. It looks like a great lightweight tool for developing Semantic Web vocabularies and publishing them on the Web following current best practices. It’s goal is to “dramatically reduce the time required to create, publish and modify vocabularies for the Semantic Web.” The system is not yet open for use, but there is a good online Neologism demo as well as a screencast of how to use it.

Google to layoff 10,000 workers

November 24th, 2008, by Tim Finin, posted in GENERAL, Semantic Web

We’re looking at some tough times for companies that have been supported by ad revenues, like newspapers, magazines, broadcasters, and Google. Google?!? Yes, Google.

It is being reported that Google is set to lay off about one-third of it’s workforce.

“Google has been quietly laying off staff and up to 10,000 jobs could be on the chopping block according to sources. Since August, hundreds of employees have been laid off and there are reports that about 500 of them were recruiters for Google.

By law, Google is required to report layoffs publicly and with the SEC however, Google has managed to get around the legal requirement. In fact, one of the ways Google was able to meet Wall Street’s Q3 earnings expectations was by trimming “operational” expenses.

Google reports to the SEC that it has 20,123 employees but in reality it has 30,000. Why the discrepancy? Google classifies 10,000 of the employees as temporary operational expenses or “workers”. Google co-founder Sergey Brin said, “There is no question that the number (of workers) is too high”.

According to this article, the bulk of these “temporary workers” have been working at Google for years and moved “from job to job every few months so that their status remains temporary”.

Update I 11/25: This note on CNet, Google cutting contractor workforce, says that Google announced its plans to trip their force of 10,000 contractors earlier in October, as reported by the SJ Mercury New.

Update II 11/25: Vint Cerf commented on Dave Farber’s IP mailing list that:

1. Google is still hiring but at a lower rate than before
2. for the past year, Google has been reducing the number of temporary staff and shifting jobs to employees.

Update III 11/25: Tim O’Reilly reports (also on IP) that WebGuild may not be the most reliable source of information on Google.

McNamee: Textual Representations for Corpus-Based Bilingual Retrieval, 9am Mon 11/24

November 20th, 2008, by Tim Finin, posted in NLP

Paul McNamee will defend his dissertation on Textual Representations for Corpus-Based Bilingual Retrieval at 9:00am Monday 24 November 2008 in ITE 325B. His mentor is Charles Nicholas and the dissertation committee includes Tim Finin, James Mayfield (JHU), Sergei Nirenburg and Doug Oard (UMCP). Here is the abstract.

The traditional approach to information retrieval is based on using words as the indexing and search terms for documents. One part of this research investigates alternative methods for representing text, including a method based on overlapping sequences of characters called n-gram tokenization. N-grams are studied in depth and one notable finding is that they achieve a 20% improvement in retrieval effectiveness over words in certain situations.

The other focus of this research is improving retrieval performance when foreign language documents must be searched and translation is required. In this scenario bilingual dictionaries are often used to translate user queries; however even among the most commonly spoken languages, for which large bilingual lexicons exist, dictionary-based translation suffers from several significant problems. These include: difficulty handling proper names, which are often missing; issues related to morphological variation since entries, or query terms, may not be lemmatized; and, an inability to robustly handle multiword phrases, especially non-compositional expressions. These problems can be addressed when translation is accomplished using parallel collections, sets of documents available in more than one language. Using parallel texts enables statistical translation of character n-grams rather than words or stemmed words, and with this technique highly effective bilingual retrieval performance is obtained. Translation of multiword expressions is also explored.

In this dissertation I present an overview of the field of cross- language information retrieval and then introduce the foundational concepts in n-gram tokenization and corpus-based translation. Then monolingual and bilingual experiments on test sets in 13 languages are described. Analysis of these experiments gives insight into: the relative efficacy of various tokenization methods; reasons why n-grams are effective; the utility of automated relevance feedback, in both monolingual and bilingual contexts; the interplay between tokenization and translation; and, how translation resource selection and size influence bilingual retrieval.

Semantic Applications at age one

November 19th, 2008, by Tim Finin, posted in Semantic Web, Web, Web 2.0

After a year, Read/Write Web has revisited their review of 10 promising Semantic Web apps, producing 10 Semantic Apps to Watch – One Year Later.

“A lot can happen in one year on the Internet, so we thought we’d check back in with each of the 10 products and see how they’re progressing. What’s changed over the past year and what are these companies working on now? The products are, in no particular order: Freebase, Powerset, Twine, AdaptiveBlue, Hakia, Talis, TrueKnowledge, TripIt, Calais (was ClearForest), Spock.”

They plan to publish a completely new list of Semantic applications to watch as the next post in the series and ask people to leave suggestions in the post comments.

Maybe Read/Write Web will do like Michael Apted’s 7up series and report back to us on how the systems are doing each year, which I guess may be like seven Web-years.

3scale provides infrastructure of the programmable web

November 19th, 2008, by Tim Finin, posted in Semantic Web, Swoogle, Web, Web 2.0

3scale provides infrastructure for the programmable web3scale Networks is a Barcelona-based startup that is trying to fill a critical gap in helping organizations manage web services as a business or at least in a business-like manner.

“3scale provides a new generation of infrastructure for the web – point and click contract management, monitoring and billing for Web Services. The 3scale platform makes it easy for providers to launch their APIs, manage user access and, if desired, collect usage fees. Service users can discover services they need and sign up for plans on offer.” (source)

They have been operating a private beta system for a few months and just announced that their public beta is open. Currently signing up with 3scale and registering services is free and the only costs are commissions on transaction fees your service charges. Once you’ve registered a service, you can install one of several 3scale plugins for your programming environment to get your service talking to 3scale and configure one or more usage plans. 3scale uses Amazon’s EC2, S3 and Cloud Computing services.

3scale’s co-founder and technical lead is Steve Wilmott, who we worked with for many years when he was an academic doing research on multiagent systems. Several months ago he invited us to add Swoogle’s web service to 3scale’s private beta. We were please with how easy it was and look forward to exploring how else to use 3scale.

A story in yesterday’s Washington Post, Manage Your API Infrastructure With 3scale Networks, has some more information.

Gladwell: 10,000 hours to success

November 16th, 2008, by Tim Finin, posted in GENERAL

The Guardian has an extract, A gift or hard graft?, from Malcolm Gladwell’s new book, Outliers: The Story Of Success, due out later this month. The piece introduces the idea that a key to becoming extraordinarily successful in a field is achieving early expertise and that to become an expert in a discipline requires on the order of 10,000 hours of practice. The 10K figure comes from the research of Anders Ericsson who in the early 1990s studied violinists at the Berlin Academy of Music.

“The curious thing about Ericsson’s study is that he and his colleagues couldn’t find any “naturals” – musicians who could float effortlessly to the top while practising a fraction of the time that their peers did. Nor could they find “grinds”, people who worked harder than everyone else and yet just didn’t have what it takes to break into the top ranks. Their research suggested that once you have enough ability to get into a top music school, the thing that distinguishes one performer from another is how hard he or she works. That’s it. What’s more, the people at the very top don’t just work much harder than everyone else. They work much, much harder.”

The extract focuses on some of the most successful people in the computer industry — Bill Joy, Bill Gates, Steve Jobs, and others — and argues that another part of their success was being born at the right time, in 1954 or 1955. This made them about 20 years old when the first person computers became available.

I’m going to seize on this as yet another personal excuse — I was born a half decade too early.

Reuters Calais to support Semantic Web Linked Data in next release

November 14th, 2008, by Tim Finin, posted in AI, GENERAL, RDF, Semantic Web

Thompson Reuters announced on their blog (Life in the Linked Data Cloud: Calais Release 4) that their next release of the Calais web-based information extraction services will support linked data.

“In that release we’ll go beyond the ability to extract semantic data from your content. We will link that extracted semantic data to datasets from dozens of other information sources, from Wikipedia to Freebase to the CIA World Fact Book. In short – instead of being limited to the contents of the document you’re processing, you’ll be able to develop solutions that leverage a large and rapidly growing information asset: the Linked Data Cloud.”

The new capabilities will be available in release 4 that is expected
out on 09 January 2009.

The change is based on Calais returning de-referenceable URIs for the entities it finds. Accessing those URIs will produce RDF with links to corresponding entities in DBpedia, Freebase and other sources of “Semantic Web” data. It will be very interesting to see how well their system does at mapping document entities (e.g., “secretary Rice”) to entities in the LOD cloud such as http://dbpedia.org/resource/Condoleezza_Rice. Accessing that URI with a request for content type application/rdf+xml returns the RDF at http://dbpedia.org/data/Condoleezza_Rice that has RDF assertions extracted by DBpedia from Wikipedia.

Malcolm Gladwell (Geek Pop Star) on Outliers

November 12th, 2008, by Tim Finin, posted in GENERAL

New York magazine has an article (Geek Pop Star) on Malcolm Gladwell whose new book, Outliers: The Story of Success, is due out later this fall.

“Malcolm Gladwell’s elegant and wildly popular theories about modern life have turned his name into an adjective—Gladwellian! But in his new book, he seeks to undercut the cult of success, including his own, by explaining how little control we have over it.”

His book explains why I never became a hockey star — I was born too late in the year. A disproportionate number of top Canadian Hockey players are born in the first half of the year. Gladwell’s explanation is that the cut-off for joining a junior hockey league is that you must be 10 years old by January 1. So if you were born on January 2nd, you will start playing with the advantage of being older, larger and stronger than your peers. I’m not sure that my August birthday explains my own poor skating skills, though.

This quote from the article addresses by Bill Gates did so well.

“Or take the case of Bill Gates. Gladwell cites a body of research finding that the “magic number for true expertise” is 10,000 hours of practice. “Practice isn’t the thing you do once you’re good,” Gladwell writes. “It’s the thing you do that makes you good.” Gladwell shows how Gates accumulated his 10,000 hours while in middle and high school in Seattle thanks to a series of nine incredibly fortunate opportunities—ranging from the fact that his private school had a computer club with access to (and money for) a sophisticated computer, to his childhood home’s proximity to the University of Washington, where he had access to an even more sophisticated computer. “By the time Gates dropped out of Harvard after his sophomore year to try his hand at his own computer software company,” Gladwell writes, “he’d been programming practically nonstop for seven consecutive years. He was way past 10,000 hours.” Yes, Gates is obviously brilliant, Gladwell concludes, but without the lucky breaks he had as a kid, he never could have had the opportunity to fulfill the true potential of that brilliance. How many similarly brilliant people never get that opportunity?”

I guess I spent my own 10,000 hours hacking Lisp too late in life.

JWS special issue on The Web of Data

November 11th, 2008, by Tim Finin, posted in Semantic Web

Axel Polleres and David Huynh are editing a special issue of the Journal of Web Semantics on The Web of Data that will appear in the Summer 2009. Submitted papers are due by January 21, 2009. See the special issue call for papers for details.

Journal of Web Semantics blog 2.0

November 11th, 2008, by Tim Finin, posted in Semantic Web

We’ve moved the Journal of Web Semantics blog from a self-hosted WordPress installation to Google-hosted blogger. We’ve moved the old posts (manually!) and the recommended public feed remains the same: http://feeds.feedburner.com/ jwsBlog.

Our move was motivated by a desire to make it easier for more people to contribute to the blog, a need to streamline the maintenance of the JWS infrastructure, and a goal to make the tools we use independent of the institutions of the current editors-in-chief.

When we started the ebiquity blog back in 2003 it was on blogger. After some months we moved to a self-hosted WordPress blog, which we continue to enjoy using for its flexibility, powerful features, and active community of developers and users.

I found it interesting to come back to blogger for the new JWS blog and to see what’s new and what has remained the same.

CloudCamp DC, 3-9pm Wed 12 Nov, Chantilly VA (free)

November 10th, 2008, by Tim Finin, posted in High performance computing, Multicore Computation Center

There will a free CloudCamp ‘unconference’ in Chantilly VA (outside DC) from 3pm to 9pm on Wednesday 12 November.

“CloudCamp is an unconference where early adapters of Cloud Computing technologies exchange ideas. With the rapid change occurring in the industry, we need a place we can meet to share our experiences, challenges and solutions. At CloudCamp, you are encouraged you to share your thoughts in several open discussions, as we strive for the advancement of Cloud Computing. End users, IT professionals and vendors are all encouraged to participate.”

Briggs on Constraint Generation and Reasoning in OWL, Noon Mon 17 Nov @ UMBC

November 10th, 2008, by Tim Finin, posted in OWL, RDF, Semantic Web, UMBC

Tom Briggs will defend his dissertation, Constraint Generation and Reasoning in OWL, at Noon on Monday 17 November 2008 in ITE 325b. His work has focused on automatically computing reasonable domain and range constraints for Semantic Web properties. Here’s the abstract:

The majority of OWL ontologies in the emerging Semantic Web are constructed from properties that lack domain and range constraints. Constraints in OWL are different from the familiar uses in programming languages and databases, and are actually type assertions that are made about the individuals which are connected by the property. These assertions can add vital information to the model because they are assertions of type on the individuals involved, and they can also give information on how the defining property may be used.
    Three different automated generation techniques are explored in this research: disjunction, least-common named subsumer, and vivification. Each algorithm is compared for the ability to generalize, and the performance impacts with respect to the reasoner. A large sample of ontologies from the Swoogle repository are used to compare real-world performance of these techniques.
    Finally, using generated facts, a type of default reasoning, may conflict with future assertions to the knowledge base. While general default reasoning is non-monotonic and undecidable a novel approach is introduced to support efficient retraction of the default knowledge. Combined, these techniques enable a robust and efficient generation of domain and range constraints which will result in inference of additional facts and improved performance for a number of Semantic Web applications.

Tom’s dissertation advisor is Professor Yun Peng.

You are currently browsing the UMBC ebiquity weblog archives for the year 2008.

  Home | Archive | Login | Feed