50 million Americans generate Web content

May 30th, 2006

The first time I heard of the concept of consumer generated content was at the AAAI Symposium on Computational Approaches to Analyzing Weblogs. I guess I had not been paying attention — I can see that it represents a sea change in the media. ClickZ News has an article on a recent Pew report on home broadband use that gives some figures:

Forty-eight million American adults have contributed some form of user-generated content on the Internet, it found. That’s 35 percent of Internet users. Of those adults who have posted content on the Web, 73 percent, or 31 million, have a broadband connection at home.

“[The Web is] shifting now to user-generated content; it shows people engaging with the Internet in a number of different ways in their lives,” said John Horrigan, associate director of research at Pew Internet & American Life Project. “It shows that people are pretty interested in using the technology to put something of themselves on the Internet, not just pull down information from the Internet.”

“There is an element of the Internet being the medium for creativity and the Internet being an outlet for creativity people bring to the Worldwide Web,” according to the report. It considers blogging, Web site creation, contribution of work on Web pages or blogs and submissions of artwork, photos, stories or videos as user-generated online content.

The Pew report on Home Broadband Adoption 2006 is available online free.

Splogbait2: cheap insurance for home, car and real estate

May 28th, 2006

Do you want new insurance for your home, car, or hotel but can’t get it because of a credit problem or high mortgage debt? Don’t worry, you can refinance your loans and even have money left to attend film or acting schools. There you can get training on making video films and editing them on a digital computer. Or you might choose summer travel by cheap airline flights or rental cars and an extended stay at free UK or Las Vegas vacation hotels. While there, you can buy an auto, software or furniture on sale. As always, consult with a personal injury lawyer if injured. This splogbait2 uses the most profitable AdSense words.

BigOWLIM reasons over billions of RDF triples

May 28th, 2006

Sirma’s Ontotext Lab announced BigOWLIM, a new high performance storage and inference layer for the Sesame RDF database. They demonstrated that it can handle more than a billion triples by loading the Lehigh LUBM benchmark and correctly answering the evaluation queries. Of course, this took a while — over 70 hours to load and build the model, materialized via forward chaining, which comprised over 1.8B triples.

While their OWLIM system does reasoning and query processing in memory, BigOWLIM stores the model in binary files and used them to answer queries and perform inference.

There is a presentation from WWW2006 Developer day. Evaluation copies of the beta version of BigOWLIM are available on request and a free version of the in memory OWLIM system is available to download.

5-digit blog spam

May 28th, 2006

Peter Kaminsky has noticed a strange phenomenon in the comments coming into his blog:

I’ve gotten a few of these curious spam comments recently. They’re fairly reasonable-looking, from a proxy IP (the ones I looked at have been associated with Tor, although I don’t know if Tor was used in this case) and they have an arbitrary five-digit number in the comment…

The comments have no link or other obvious spam purpose. What’s really interesting though is the large number of completely plausible explanations and nifty ideas the post has received in its comments. Via Boing Boing.

SPARQL: RDF data access for Web 2.0

May 27th, 2006


Leigh Dodds of Ingenta gave a great talk at XTech that convinced me that SPARQL is more significant than I had realized. Leigh’s premise is that

“Backed by the flexibility of the RDF data model, and consisting of both a query language and data access protocol SPARQL has the potential to become a key component in Web 2.0 applications. SPARQL could provide a common query language for all Web 2.0 applications.”

I think that this could pave the way for widespread use of RDF data and hence the larger Semantic Web vision.

Leigh’s paper and presentation slides are available online.

His presentation included some interesting examples of how to use a SPARQL sever to easily extract and format data from RDF documents, such as RSS feeds, FOAF documents, photo sharing sites, or the BBC Programme Catalogue. The approach seems eminently practical, with software packages in many languages and good examples of client-side AJAX processing of SPARQL query results. I also learned about JSON, the lightweight data-interchange format that is easy for people to read and write and for programs (including javascript) to process. The new version of ARQ supports JSON, so it looks like the pieces are all there to promote easy experimentation.

flickr, from beta to gamma, del.icio.us next?

May 16th, 2006

If Yahoo Home Page is in the middle of interface changes, so is flickr. Noticeable one’s include

  • Better use of home page real estate, and navigation.
  • Improved interface, and batch processing to organize pictures.
  • Group recommendations. (Great new feature!)

Batch processing on groups is still missing, which is what I would have loved to have.

What’s next? del.icio.us?

Elsewhere, all positive (Digital Connection, InforNation, Antonescu, Visual Impact, Niall)

VANET’s in Business

May 15th, 2006

I led a team of students who developed a business plan around my thesis topic, StreetSmart Traffic. Out of a field of 174 competitors we made it to 3rd place. With the help of Vivian Armour, Greg Stone, Vicky McAndrews and David Yager our team beat out many good ideas. Our team included undergrad Zach Radtka, PhD students Jeremy Shopf, and Alark Joshi. The Baltimore Business Journal covered the story. You can read about the winning teams at the GBTC site. I would strongly recommend this to any student with a commercial idea. The competition is fun and there is big money, $10k to the winner. With the money that I won I am going to dedicate this summer to developing prototype StreetSmart Traffic devices.

The judge who moved our team from 6th place to 3rd place is Steve Walker. He is a local venture capatalist with a lot of IT background, going back to ARPA Net. He is going to speak tomorrow morning at the Visionaries in IT breakfast. I’ll be there, I think he’ll have an interesting talk. Anybody want to join me?

Is Google too googley for it’s own good?

May 15th, 2006

The Economist has a good article, Fuzzy maths, on the challenges that Google faces.

In a few short years, Google has turned from a simple and popular company into a complicated and controversial one.

RSS Micro feed search

May 14th, 2006

RSS Micro is a new feed search engine. It’s index seems to be dereived from a directory of feeds, rather than from a ping server, so the results are relatviely splog free. Relatively. I found splogs in the search results for ‘personal injury’. Their coverage seems to be low, though — they don’t index this blog, for example. I wonder where they get their feeds? (Spotted on Micro Persuasion)

and now Sony comes out with TIVO anywhere!

May 13th, 2006

Well nothing to do with TIVO really, but just came across Sony’s Location Free Player Pack

Plugin all your video sources — your cable set-top box, DVD player or PS2 or Xbox. It basically ships all the media content to anywhere (interactive mode) you get high-speed wireless access — in a hotel room, at the airport, the ballgame ;) … using your laptop, PC or PSP.
Well there is a small issue with not getting a static IP from your highspeed Internet service provider — but that is easily remedied by using a Dynamic DNS service. I think the idea is pretty neat — the Qwests, Comcasts, Pacific Bells of the world are not going to like this :) and at the same time the Ciscos and Junipers of the world must be drooling over their prospects ;)

Now some people might say that QoS might be an issue. That may be true, however there is always the alternative to use large amount of buffering — storage is cheap!

Coming to think of it, Sony itself has a large stake in the music industry — Sony Music! Remember the sony rootkit scandal …

In making this commonstream, they have their own Digital Rights to worry about … but its just great for the consumer — cable prices nowadays are preposterous anyway.

I think there is a big “aha” factor here, competitors shouldn’t be too far behind.

Sony LocationFree Player Pak

Web 2.0 vs. Semantic Web vs. RDF

May 12th, 2006

Here’s a Google Trend chart comparing the number of Google searches for Web 2.0 versus Semantic Web versus RDF.

Google trend search for Web 2.0, Semantic Web, RDF
While I was not surprised that Web 2.0 overtook both in 2006, I was surprised that RDF consistently dominates Semantic Web. Note that the search terms used for RDF is RDF -media. Click through to the search results and see some interesting information on cities and regions where the searches are popular. The US is 7th, behind Korea, Hong Kong, Ireland, Singapore, India and Taiwan.

We’re the children of shoemakers

May 11th, 2006

I’m looking forward to attending WWW 2006 later this month. Today I got a message from the event management company coordinating the conference with some “important last minute information” including when registration would be open, where to park, etc. — all of the unusual details attendees might be interested in. That’s good!

However, it came as a 3,200,000 byte attached Microsoft Word document. That’s bad!!

It’s also very ironic, given that this is a conference devoted to information sharing using the Web and its non-proprietary standards.

I could have received a 2K message with a few sentences and a link to a web page. That web page could have offered versions in English and other languages along with links to more information. There might even have been machine interpretable metadata to load into my calendar.

Instead I got a bloated document in a virus-prone, proprietary format that can only be read on some computers and then only if I had purchased expensive software. Forget about reading it on my phone. But the news isn’t all bad — at least I know the name of the person who created and edited the file and how many revisions she made and how long it took her.