Act before you think you think you think

June 28th, 2008

The WSJ has an article, Get Out of Your Own Way, on research suggesting that people have often form intentions to act and make decisions well before they are conscious of the fact. Maybe this is like detecting the inferences made by the OWL reasoner or classification of a low-level SVM model before the high-level Python code processes its results. This picture from the article sums it up nicely.

As usual, you’re always the last to know. At least this opens up new interpretations for the old excuse, “Hey, I was out of the loop!”.

NYC deploys wireless network for municipal employees

June 28th, 2008

Today’s New York Times has an article, With Wireless Network, City Agencies Have More Eyes in More Places, that describes a city wide wireless network that is operational and is expected to largely completed by the end of the summer.

“Locating vehicles is one of ways the Department of Sanitation and other city agencies are using the city’s new $500 million high-speed wireless secure data network, one of the largest of its kind in the world. The network, known as NYCWiN, was built by Northrop Grumman and by summer’s end will include about 400 cellular antennas covering 95 percent of the city.

The idea is for city agencies to use network-connected hand-held devices and tablet computers to increase efficiency and flexibility: Soon, police officers will be able to view photographs of suspects from their cars, fire chiefs will be able to watch live video of fires taken from traffic helicopters above, and housing inspectors will be capable of looking up building plans while on location.”

The article notes that other cities, including Oklahoma City, Tucson and Washington, are implementing similar wireless networks. One motivation is to provide a secure network for municipal workers who can not rely on commercial cellular networks which can become quickly overloaded in emergencies.

The Gotham Gazette has some information on the NYCWiN system’s specification:

The original specifications for the network called for it to support multiple, simultaneous transmission of full-motion video or large files from and to anywhere in the city, real-time tracking of all city vehicles and control of traffic lights, continuous monitoring of air and water purity, transmission of patient vital signs from ambulances to receiving hospitals, and reliable voice communications to back up radio and cell phone signals. … NYCWiN is not technically Wi-Fi, since it will use licensed spectrum. Wi-Fi operates over a portion of the airwaves that the Federal Communications Commission has designated as unlicensed, or open to the public for use with any approved device. Nevertheless, in non-emergency conditions, NYCWiN will have a lot of unused capacity that could help civic projects keep their bandwidth costs down, as Dana Spiegel suggested.”

According to Paul Cosgrave, NYCWiN is not a WI-FI or a WIMAX system but uses Universal Mobile Telecommunications System technology on the 2.5 GHz band to provide a broadband data network and IP services. The similar Washington DC system uses EV-DO and different frequency band, 700 MHz.

Wireless Blog reports that NYC is “using IPWireless technology for their city-wide safety network with each cell site providing in-building coverage up to 3 to 5 miles from the cell site in an urban setting. It operates in a single channel of 5 or 10MHz of spectrum and supports voice over IP with full QOS based on SIP.”

Microsoft rumored to buy semantic search startup Powerset

June 26th, 2008

Venture Beat reports that Microsoft will acquire Powerset for a price “rumored to be slightly more than $100 million”. Powerset has been developing a Web search system that uses natural language processing technology acquired from PARC to more fully understand user’s queries and the text of documents indexed.

“By buying Powerset, Microsoft is hoping to close the perceived quality gap with Google’s search engine. The move comes as Microsoft CEO Steve Ballmer continues to argue that improving search is Microsoft’s most important task. Microsoft’s market share in search has steadily declined, dropping further and further behind first-place Google and second place Yahoo.

Google has generally dismissed Powerset’s semantic, or “natural language” approach as being only marginally interesting, even though Google has hired some semantic specialists to work on that approach in limited fashion. Google’s search results are still based primarily on the individual words you type into its search bar, and its approach does very little to understand the possible meaning created by joining two or more words together.”

If you put the query “Where is Mount Kilimanjaro” into the beta version of Powerset, it answers “Mount Kilimanjaro: Contained by Tanzania” in addition to showing web pages extracted from Wikipedia. That’s a pretty good answer.

Its response to “what is the Serengeti” is a little less precise. It reports seven things it knows about Serengeti — that it replaced “desert, Platinum”, twilight and Caribbean Blue”, that it hosted ‘migration’, that it provided ‘draw’, that it gained ‘fame’, that it recorded ‘explorations’, that it rutted ‘season’ and that it boasted ‘Blue Wildebeests’. I’m just glad I don’t have a school report due on the Serengeti due tomorrow!

Asking “Who is the president of Zimbabwe” results only in the fallback answer — which appears to be just the set of Wikipedia pages that the query words produce in an IR query. Compare this with the results of the Google query who is the president of zimbabwe

By the way, the AskWiki system often does a better job on these kinds of question. Asking “where is the Serengeti” produces the answer “The Serengeti ecosystem is located in north-western Tanzania and extends to south-western Kenya between latitudes 1 and 3 S and longitudes 34 and 36 E. It spans some 30,000 km.” It’s a bit of a hack, though. It seems to work by selecting the sentence or two in Wikipedia that best serves as an answer. See our post on Askwiki from last Fall for more examples.

Still, Powerset is an ambitious system that shows promise. What they are trying to do is important and will eventually be done. They have shown real progress in the past two years, more than I had expected. I hope Microsoft can accelerate the development and find practical ways to improve Web search even if the ultimate goal of full language understanding is many years away.

Models? We don’t need no stinking models!

June 26th, 2008

Wired has an interesting article, The End of Theory: The Data Deluge Makes the Scientific Method Obsolete, that discusses the data driven revolution that computers and the Web have unleashed. Science used to rely on developing models to explain and organize the world and make predictions. Now much of that can be done by correlating large amounts of data. It applies equally well to other disciplines (e.g., Linguistics) as well as businesses (think Google).

“All models are wrong, but some are useful.” So proclaimed statistician George Box 30 years ago, and he was right. But what choice did we have? Only models, from cosmological equations to theories of human behavior, seemed to be able to consistently, if imperfectly, explain the world around us. Until now. Today companies like Google, which have grown up in an era of massively abundant data, don’t have to settle for wrong models. Indeed, they don’t have to settle for models at all.

Sixty years ago, digital computers made information readable. Twenty years ago, the Internet made it reachable. Ten years ago, the first search engine crawlers made it a single database. Now Google and like-minded companies are sifting through the most measured age in history, treating this massive corpus as a laboratory of the human condition. They are the children of the Petabyte Age.

Update: And then there is this counterpoint: Why the cloud cannot obscure the scientific method .

Journal of Web Semantics has high impact factor

June 25th, 2008

During the past year, the Journal of Web Semantics was added to the list of journals indexed by Thomson Reuters. Their most recent Journal Citation Report (2007) gives the JWS an impact factor of 3.41, which is the third highest out of the 92 titles in its category — Computer Science, Information Systems.

Thomson Reuter’s journal impact factor is a measure of the frequency with which the average article in a journal has been cited in a particular year. The 2007 impact factor is computed as the citations received in 2007 to all articles published in 2006 and 2005, divided by the number of “source items” published in 2006 and 2005.

Technology Review special issue on Web 2.0

June 24th, 2008

Technology Review special issue on Web 2.0, July/August 2008The July/August issue of Technology Review is focused on Web 2.0. The lead article, “The Business of Social Networks“, asks “Web 2.0–the dream of the user-built, user-centered, user-run Internet–has delivered on just about every promise except profit. Will its most prominent example, social networking, ever make any money?”

“Social networking is the fastest-growing activity on Web 2.0–the shorthand term for the new user-centered Internet, where everyone publicly modifies everyone else’s work, whether it’s an encyclopedia entry or a photo album. The growth of social networking is astonishing, and it has spread to sites of all sizes, which are increasingly intertwined as platforms open (see “Who Owns Your Friends?”). Even small players are soaring.”

There are quite a few interesting stories on various Web 2.0 topics. Visit the table of contents to see what’s available.

Web Science CACM cover article now online

June 23rd, 2008

The cover story of the July 2008 CACM (v51, n7) is Web Science by Jim Hendler, Nigel Shadbolt, Wendy Hall, Tim Berners-Lee, and Danny Weitzner. The article argues for an interdisciplinary approach to understanding the Web as an entity in its own right. It’s great that this article is freely available on the web. Ironically, figuring out what URL to use to link to it was a bit tricky and the pages are rendered as png images to protect the IP. But, it’s a good article that lays out an important new area of study in information systems.

“Despite the Web’s great success as a technology and the significant amount of computing infrastructure on which it is built, it remains, as an entity, surprisingly unstudied. Here, we look at some of the technical and social challenges that must be overcome to model the Web as a whole, keep it growing, and understand its continuing social impact. A systems approach, in the sense of “systems biology,”, is needed if we are to be able to understand and engineer the future Web.”

What I find exciting is that one of the attributes that makes the Web so successful is that it is a system to which all can contribute. We need to make sure it remains that way and doesn’t devolve into a hegenomic structure.

Is it Lindsay Lohan or your friends who make you a binge drinker?

June 23rd, 2008

What determines our behavior or beliefs? Are we influenced by people who are the well-known and popular leaders — political, social, religious — in our society or by the few hundred people that are in our immediate social network — family, friends and co-workers. It’s reasonable to assume that it varies by domain or topic, with your music preferences falling in the first category and your spiritual orientation in the second.

Paul Ormerod and Greg Wiltshire have a preprint of a paper ‘Binge’ drinking in the UK: a social network phenomenon (pdf) that reports on a study that the binge drinking phenomenon seems to spread through “small world” social networks rather than by imitating influentials in a “scale free” network

“We analyse the recent rapid growth of ‘binge’ drinking in the UK. This means the consumption of large amounts of alcohol, especially by young people, leading to serious anti-social and criminal behaviour in urban centres. We show how a simple agent-based model, based on binary choice with externalities, combined with a small amount of survey data can explain the phenomenon. We show that the increase in binge drinking is a fashion-related phenomenon, with imitative behaviour spreading across social networks. The results show that a small world network, rather than a random or scale free, offers the best description of the key aspects of the data.”

It’s fascinating that with the right data, simulation models can help to answer such questions.

Someday you will tell your children about this…

June 21st, 2008

A post on the UMBC Office of Information Technology blog announces that they are unplugging their modems.

End of UMBC Modem Services June 2009
UMBC was an early pioneer in offering modem service for Internet connectivity and we are currently among the last universities in Maryland still offering this. Next year, June 30, 2009 will mark the end of UMBC’s dial-up modem service for the campus community. After June 30, 2009 UMBC will no longer provide any dial-up Internet modem service for UMBC faculty, staff or students. …


W3C anounces RDFa as a candidate recommendation

June 20th, 2008

The W3C has officially announced that RDFa is a candidate recommendation

“2008-06-20: The Semantic Web Deployment Working Group has published a Candidate Recommendation of RDFa in XHTML: Syntax and Processing. Web documents contain significant amounts of structured data, which is largely unavailable to tools and applications. When publishers can express this data more completely, and when tools can read it, a new world of user functionality becomes available, letting users transfer structured data between applications and web sites, and allowing browsing applications to improve the user experience. RDFa is a specification for attributes to be used with languages such as HTML and XHTML to express structured data. See the group’s RDFa implementation report. The Working Group also updated the companion document RDFa Primer. Learn more about the Semantic Web and the HTML Activity.”

Achieving candidate recommendation status is a significant step toward becoming a W3C recommendation. Congratulation to the working group for all of their efforts in developing RDFa.

First Obama-McCain Twitter debate starts tonight

June 20th, 2008

The Personal Democracy Forum is sponsoring a twitter debate tonight on “technology and government” between representatives of Barack Obama and John McCain to be moderated by Time magazine blogger Anna Marie Cox. A note on PDF has the details:

“The McCain campaign will be represented by Liz Mair, the online communications director of the Republican National Committee. The Obama campaign will be represented by Mike Nelson, a professor at Georgetown University who served in the Clinton White House under Vice President Gore on tech policy issues. He is an outside advisor to Obama’s campaign on issues of technology, media and telecommunications.”

Of course, it remains to be seen what kind of debate can happen if short taking points are further compressed into 140 character tweeting points. It will be an interesting experiment.

“Mike, Liz and Ana will be using their personal Twitter accounts, @mikenelson, @lizmair and @anamariecox, and we’ve also asked them to tag their responses with the hashtag #pdfdebate. We suggest that readers who want to follow along use a Twitter application like to track the conversation.”

The debate will start sometime tonight (Friday 20 June) and is expected to run through the end of the conference on Tuesday 24 June and maybe beyond.

The singularity: when machines become conscious

June 18th, 2008

The June 2008 IEEE Spectrum is a special report on The Singularity which has many short and provocative articles available online. This is what Wikipedia calls the technological singularity. The idea is that technological advances, especially those involving computers, seem to be accelerating and that at some point in the not too distant future, like in 20 years, we will be faces with the appearance of conscious machines with human or even super-human intelligence.

After all, Moore’s law says that the complexity of integrated circuits roughly doubles every two years. If we can maintain this exponential growth, and it’s held since the early 1970s, something big is bound to happen.

Of course, this is a popular scenario for science fiction books and films. As someone who has worked in the field of AI for more than 35 years, I’m skeptical. But these articles are fun to read and are very stimulating.