Akismet is closing in on identifying a billion spam comments. I hope they capture the one that puts them over the top for posterity.
Archive for March, 2007
I noticed in Seth Ladd’s Semergence blog that the next version of Oracle’s RDF Database (11g) is expected to have native inferencing for a subset of OWL. This is in addition to faster querying and bulk-loading and “new SQL operators for enhancing a relational query using an ontology”. See this thread in Oracle’s Semantic Technologies Forum.
According to a recent presentation on 10g, the native OWL inferencing will include:
- Basics: class, subclass, property, subproperty, domain,
- Property Characteristics: transitive, symmetric, functional, inverse functional, inverse
- Class comparisons: equivalence, disjointness
- Property comparisons: equivalence
- Individual comparisons: same, different
- Class expressions: complement
We’ve not yet tried 10g, but it’s on our short list of things to do. I guess this task just moved up in the list.
Stemming is simple right? Well, no…
Slashdot has a post today titled Why the Semantic Web Will Fail that points to a post by Stephen Downes with the same name. His argument is based on the belief that “The Semantic Web will never work because it depends on businesses working together, on them cooperating.” He says:
“But the big problem is they believed everyone would work together:
- would agree on web standards (hah!)
- would adopt a common vocabulary (you don’t say)
- would reliably expose their APIs so anyone could use them (as if)”
While the argument Stephen makes is grounded in his distrust of corporations, his second point above is off the mark, at least for RDF.
One of the features of the W3C’s model (based on RDF) is that it doesn’t push the idea that everyone should adopt the same vocabulary (or ontology) for a topic or domain. Instead it offers a way to publish vocabularies with some semantics, including how terms in one vocabulary relate to terms in another. In addition, the framework makes it trivial to publish data in which you mix vocabularies, making statements about a person, for example, using terms drawn from FOAF, Dublin Core and others.
The RDF approach was designed with interoperability and extensibility in mind, unlike many other approaches. RDF is showing increasing adoption, showing up in products by Oracle, Adobe and Microsoft, for example.
If this approach doesn’t continue to flourish and help realize the envisioned “web of data”, and it might not after all, it will have left some key concepts, tested and explored, on the table for the next push. IMHO, the ’semantic web’ vision — a web of data for machines and their users
– is inevitable.
Which feed readers work best on mobile devices.
Visualizing the citation graph of 800K papers from Nature classified into 776 topics.
Another post that misunderstands Swoogle’s goals. I wish we knew how to give them what they want.
Declan McCullagh and Anne Broache have an article on cnet that explores the question who created the first blog?.
“It may not be one of the Internet’s grandest accomplishments, but with the number of active bloggers hovering somewhere around 100 million, according to one estimate, there are some serious bragging rights to be claimed by the first person who provably laid fingers to keyboard in the traditional bloggy way.”
The article mentions some who come immediately to mind:
Was the first blogger the irascible Dave Winer? The iconoclastic Jorn Barger? Or was the first blogger really Justin Hall, a Web diarist and online gaming expert whom The New York Times Magazine once called the “founding father of personal blogging”? Or did all three merely make incremental improvements on earlier proto-blogs? The answer is most likely “yes” to all of the above. In truth, awarding the title “first blogger”
and also explores some earlier roots, like finger and .plan files. Those last two are interesting connections and makes sense. Kind of. In my experience, the vast majority of people who used .plan files used them to document their generic schedules and availability, rather than to contemporaneously document their activities.
It’s a good article, overall.
“What we mean by ‘Web 3.0′ is that major web sites are going to be transformed into web services – and will effectively expose their information to the world.”
“A domainer is someone who earns a profit buying and selling domain names.”
Google’s patent application (filed 13 September 2005) for Ranking blog documents is being discussed around the web.
“A blog search engine may receive a search query. The blog search engine may determine scores for a group of blog documents in response to the search query, where the scores are based on a relevance of the group of blog documents to the search query and a quality of the group of blog documents. The blog search engine may also provide information regarding the group of blog documents based on the determined scores.”
The Google Operating System blog has a nice summary of the features Google mentions as useful in separating the blogs from the splogs. No surprises here.
|Positive features||Negative features|
Spotted on Micro Persuasion.
One interesting aspect of web spam is that the suckers include both web searchers and advertisers. The goal of spammers is to get the two together and watch the clicking.
A new article in the NYT, Researchers Track Down a Plague of Fake Web Pages, discusses results by a team of researchers from Microsoft and UC Davis.
“Tens of thousands of junk Web pages, created only to lure search-engine users to advertisements, are proliferating like billboards strung along freeways. Now Microsoft researchers say they have traced the companies and techniques behind them.
A technical paper published by the researchers says the links promoting such pages are generated by a small group of shadowy operators apparently with the acquiescence of some major advertisers, Web page hosts and advertising syndicators. … The finding is striking because it hints at the possibility of curbing the practice.
The researchers uncovered a complex scheme in which a small group, creating false doorway pages, works with operators of Web-based computers who profit by redirecting traffic passed from search engines in one direction and then sending advertisements acquired from syndicators in the opposite direction.”
The researchers will present their paper on redirection spamming WWW-2007:
Spam Double-Funnel: Connecting Web Spammers with Advertisers. Yi-min Wang, Ming Ma, Yuan Niu, and Hao Chen. To appear in Proceedings of the 16th International World Wide Web Conference (WWW2007).
Abstract: Spammers use questionable search engine optimization (SEO) techniques to promote their spam links into top search results. In this paper, we focus on one prevalent type of spam â€“ redirection spam â€“ where one can identify spam pages by the third-party domains that these pages redirect traffic to. We propose a fivelayer, double-funnel model for describing end-to-end redirection spam, present a methodology for analyzing the layers, and identify prominent domains on each layer using two sets of commercial keywords â€“ one targeting spammers and the other targeting advertisers. The methodology and findings are useful for search engines to strengthen their ranking algorithms against spam, for legitimate website owners to locate and remove spam doorway pages, and for legitimate advertisers to identify unscrupulous syndicators who serve ads on spam pages.
The results of the finals for the 2007 ACM International Collegiate Programming Contest are in with Warsaw University, Tsinghua University St. Petersburg University of IT, Mechanics and Optics and MIT placing first through fourth.
The contest has been running since the 1970s has is generally recognized as the oldest, largest and most prestigious programming contest in the world. This year over 6000 teams began the multi-tiered competition with88 teams in the finals at Maihama Japan. See the final problems that the teams had to solve and the final team standings.
GrandCenteral.com offers “one number that rings all of your phones so you never miss a call again.”
Robert Scoble unloads about on Microsoft’s Windows Live.