Sanitize your database inputs

October 28th, 2007

Little Bobby Tables

Her daughter is named ‘Help, I’m trapped in a drivers license factory’

Google ads help fund retirement for some

October 28th, 2007

This week USA Today had an article, ‘Gray Googlers’ strike gold, on older Americans operating websites that make money on Google ads.

“Jerry Alonzy figured he’d be working into his 70s at least. As an independent handyman at the mercy of weather patterns near Hartford, Conn., he’d always made a decent income that rarely grew. Then he found Google, and his life changed. Alonzy, 57, now makes $120,000 a year from the ads Google places on his Natural Handyman website, and he couldn’t be more thrilled. “I put in two, maybe three hours a day on the site, and the checks pour in,” he says. “What’s not to like?”

Of course, this is not a guaranteed get rich quick scheme. You have to have the right niche that will attract good paying ads, constantly writ new quality content, and build up your pagerank. Note that Alonzy spends about 20 hours a week tending his site — not an insignificant amount of time. The story cites some other examples that are probably more typical of what one can expect.

While the upside of working with AdSense sounds exhilarating, it’s not that way for everybody. Scott says she posted an unsold novel on Google and earns about $5 a month from the AdSense ads on the site. Al Needham, 74, who runs a site about the care of bees ( from his home near Boston, reaps about $250 a month. “Forget about getting rich overnight,” says Alonzy. “It takes time to learn.”

Russian Government working to control the message on the Web

October 28th, 2007

The Washington Post has an article, Kremlin Seeks To Extend Its Reach in Cyberspace , on how the Russian government is increasingly using the Web to influence and control public information and opinion.

“After ignoring the Internet for years to focus on controlling traditional media such as television and newspapers, the Kremlin and its allies are turning their attention to cyberspace, which remains a haven for critical reporting and vibrant discussion in Russia’s dwindling public sphere.”

With more than one-third of new Web content now coming from users of social media sites, this effort is focused on blogs, which have been a problem for in the past.

“Some Russian Internet experts say a turning point came in 2004, when blogs and uncensored online publications helped drive a popular uprising in Ukraine after a pro-Moscow candidate was declared the winner of a presidential election.”

But, as we all know, it’s possible for a knowledgeable and active group to have an undue influence in social media systems. An example from the Wapo story is telling.

“On April 14, an opposition movement held a march in central Moscow that drew hundreds of people; police detained at least 170, including the leader of the march, chess star Garry Kasparov. Pavel Danilin, a 30-year-old Putin supporter and blogger whose online icon is the fearsome robot of the “Terminator” movie, works for a political consulting company loyal to the Kremlin. He said he and his team, which included people from a youth movement called the Young Guard, quickly started blogging that day about a smaller, pro-Kremlin march held at the same time. They linked to one another repeatedly and soon, Danilin said, posts about the pro-Kremlin march had crowded out all the items about the opposition march on the Yandex Web portal’s coveted ranking of the top five Russian blog posts. “We played it beautifully,” Danilin said.”

In addition to governments implicit or explicit self-promotion through pushing their message, they can also crack down on voices they do not like.

“Prosecutors have begun to target postings on blogs or Internet chat sites, charging users with slander or extremism after they criticize Putin or other officials. Most such incidents have occurred outside Moscow, and federal officials deny that they signal any broader campaign to control the Internet.”

I am afraid that we will see much more of this from all kinds of governments and also from large and powerful businesses.

Predictify web-based prediction market

October 26th, 2007

Predictify is a new web-based prediction market that was mentioned in a Freakonomics post this week.

“Predictify provides a simple, fun way to engage in current and future newsworthy topics. Users can find events that interest them, predict the outcome, build a reputation based on accuracy, and even get paid real money when they’re right (tell me more). Best of all, it’s free – no points or bets required.” (link)

Registered users can make predictions in answer to a question and develop a reputation for accuracy. Anyone can ask a free question that is limited to 100 results all of which are public. You have to put up money (at least $100US) to ask a Premium question, which can receive up to 10K answers ($1/each) which remain private. People who answer a premium question may get a payout that’s a function of the total payout amount, their answer’s accuracy and their Predictify level. Predictify users advance through the levels (Beginner, Apprentice, Scholar, Expert, and Guru) w.r.t. different topical categories by making accurate predictions.

Her’s an example of a current question in the Pop Culture category with a payout pot of $1000 — on the US television show The Office, which of the couples Jan&Michael and Pam&Jim will still be together on November 16th?

Finally, a way to cash in on all those hours you thought were wasted in front of the tube and Intertubes.

Powerset outsources query result evaluation to Mechanical Turk

October 21st, 2007

TechCrunch reports that Powerset is using Amazon’s Mechanical Turk to evaluate different search results for queries. Techcrunch has a screenshot of an example Turk query.

“See the screen shot… users are shown a query and a number of results and are asked to evaluate the relevancy of each result from five choices. In this case, the query is “revealing bikinis.” Users are asked to evaluate four sets of results within ten minutes, and are paid $0.02 for the effort.

I spoke with Powerset CEO Barney Pell this evening who confirmed that they are using Mechanical Turk to get human feedback on search results. He says the results are not all Powerset generated – rather, they show results from Powerset, Google and others to see which users prefer for a given query. He also says this is an ongoing project, and new ones will be added soon. Pell also said that Powerset plans to use Mechanical Turk over the long haul, even after launch. They’ll put actual user queries into Mechanical Turk in real time, add Powerset and competitor results and see which results people find more relevant. If results suggest Powerset isn’t more relevant, they’ll adjust their engine.” (link)

This is a good example of how Amazon’s service can work. I was surprised at the low cost — two cents for a judgment! We have a number of projects where we need to have human assessments for training data. Instead of turning to our usual source, students and faculty, maybe we should explore using the Mechanical Turk. In some cases, getting local people to do the judgments was difficult. For example, we were interested in expanding our splog detection system to languages other than English, but didn’t have access to native speakers to the right languages.

The Semantic Edge at the Web 2.0 Summit

October 20th, 2007

There’s been a lot of news about last week’s Web 2.0 Summit, much of it very interesting.


There was a panel on the ‘Semantic Edge’ that Read/Write Web wrote about in The New Era of Semantic Apps

“I’m here at the Semantic Edge panel at the Summit, moderated by Tim O’Reilly and featuring W. Daniel Hillis (Co-Chairman and CTO, Applied Minds), Barney Pell (Founder and CEO of Powerset), Nova Spivack (Twine – see our review here). The panel starts with demos from each of the three speakers.”

The applications, Freebase, Twine and Powerset, are quite distinct and represent different perspectives on the Semantic Web, with only Nova Spivak’s Twine making use of the W3C’s RDF-based languages and technology. Shelly Powers blogs about the panel and Twine, with some good observations.

W3C RDFa syntax working draft is out

October 19th, 2007

The W3C’s Semantic Web Deployment Working Group has published the first working draft describing RDFa’s syntax: RDFa in XHTML: Syntax and Processing. This is a significant step for RDFa — congratulations to the group for their effort. RDFa provides a standard way to embed semantic content expressed in RDF in HTML documents. More precisely, it lets one add RFD content in XHTML documents. RDFa will open up more use cases for the Semantic Web and may offer a way to embrace other approaches to adding semantic information to Web documents such as microformats.

Sam Weller on Ray Bradbury, 4pm Web 10/24/07 AOK Library

October 19th, 2007

the Bradbury ChroniclesSam Weller, author of the The Bradbury Chronicles, will lecture at 4:00pm on Wednesday 24 October in UMBC’s Albin O. Kuhn Library Gallery. Weller’s book is the only authorized biography of Bradbury.

The event site describes the talk this way:

“The talk focuses on the man behind the masterpiece. As his biographer, Sam Weller spent five years working very closely with Ray Bradbury. Weller will cover his own relationship with the author, offering a rare window into his private world. He will share many behind-the-scenes stories of the man behind Fahrenheit 451. His presentation surveys the genesis of Fahrenheit 451 and the cultural, historical and personal influences that contributed to its inspired creation in 1953. Finally, he will address the cultural relevance of Fahrenheit 451 and conclude with why Ray Bradbury matters today more than ever before. ”

And an announcement notes that

“Copies of Fahrenheit 451 and an audio guide to the book will be given away at the event while supplies last.

We hope that the lecture will include a live phone interview with Ray Bradbury himself as a part of this event — we cannot guarantee it, but we are planning for it. A reception will follow the lecture.”

When I was young, Bradbury was my favorite author. His writing was an interesting mix of science fiction, fantasy, social commentary and descriptions of life in a small Midwestern town, not unlike the one I grew up in. Well, in my case any fantastical events only occurred in my mind.

For more information, see the event site or call the Library administrative offices at ext. 5-2356 or e-mail

DARPA’s Tony Tether on Urban Challenge and Computer Science research

October 19th, 2007

DARPA’s Tony TetherCnet news has an interview with DARPA directory Tony Tether. The interview, Newsmaker: DARPA sees inspiration as trophy of robot race, mostly focuses on the current $2M DARPA sponsored autonomous vehicle race, Urban Challenge, which takes place November 3 in Victorville CA.

In the interview, he was asked “What are the top three advances to come out of DARPA in the last five years would you say?”. I found his answer interesting.

“Let’s see, we’ve revolutionized the whole computer science industry by moving into cognitive processing, that is, computers that learn you as opposed to you having to learn them. Stanford Research, by the way, in Menlo Park is a major contractor in that area. We’ve also done a lot in biology, again for finding ways for people out in the battlefield to be able to survive their environment. Then wireless, I guess. If you take your cell phone, you might think that you’re wireless and you are. But there’s a big infrastructure called towers that really make it work. And what we proved and have developed is the ability to have no infrastructure and still have total cellular wireless type of communication. That’s important from a military viewpoint because when we go into an area, we don’t have time to build the towers. Now that’s also going to be a big commercial thing because if somebody doesn’t have to build the infrastructure to have a wireless network, that means that the cost for it is much less than somebody who does, (and it) gives them a great price advantage. Those are three, but I’m not supposed to have favorites.”

Spotted on AAAI’s AI in the news.

Is China redirecting access from search engines to Baidu?

October 18th, 2007

Techcrunch writes in Cyberwar: China Declares War On Western Search Sites that someone in China is redirecting search engine access to Baidu, China’s top search engine.

“Further to our earlier story on visitors to Google Blogsearch being redirected to Baidu in China, new reports have surfaced that would indicate that China has unilaterally blocked all three major search engines in China and is redirecting all requests to Baidu. Digital Marketing Blog posts that all requests to and sub-sites are being redirected to Baidu. Google Blogscoped forums indicate that is also being re-directed to Baidu, as well as confirming the Yahoo story and our earlier Google post. The re-direct would also appear to apply to” (link)

Can any ebiquity readers in china confirm this? Is so, please leave a comment.

Entire Jon Stewart Daily Show video content online

October 18th, 2007

Viacom’s Comedy Central channel today will launch a new site for the The Daily Show With Jon Stewart that will host nearly 13,000 video clips covering the entire output of the show since it began in 1999. The site will be supported by advertisements.

This is, of course, a response to the presence of many Stewart clips on Youtube and the related $1B copyright-infringement suit. These clips have high value for many people. Not everyone watches Stewart’s show regularly or can even receive the comedy channel. Some don’t even have a television. But some of the segments on the show quickly enter our popular culture, so there is high interest in seeing them.

The LA Times reports:

“The database is searchable by both date and topic, making it a potential bonanza for students of American pop culture. If you want to see what host Jon Stewart has had to say about former First Lady Barbara Bush or ill-fated Kentucky Derby winner Barbaro, you can find the clips and put them in context by seeing what else was featured on the same day.

Going forward, however, Comedy Central plans to tap into the collective intelligence of its fans by allowing them to contribute to the process, a la Wikipedia, the user-created Internet encyclopedia.”

The article mentions that the segments have been tagged by Comedy Central writers. It’s interesting to see these ‘social web’ features, user generated content and tagging, being used for this site.

Hatebook is a social networking site for suckers

October 15th, 2007

Hatebook is for suckersHatebook is an “anti-social utility that connects you with the people YOU HATE.” Unique among social networking sites, you can use it to “upload blackmail material or publish lies, get the latest gossip from your enemies and friends • post photos and videos on your hate profile • tag your friends • … get hate points from disturbing people who live, study, or work around you …”. Hatebook has nothing but disdain for its own users, which are it addresses as “suckers”.

It’s a fairly intricate parody of Facebook and other social networking sites, but one you are likely to find amusing, if at all, for only a few hours. Unless, of course, you are seriously angry and hateful, in which case you might find Hatebook to annoying to tolerate. (link)