Spam bots at the gateway

July 12th, 2007

spam bot You might have noticed that many our ebiquity web systems and services were a bit, well, flaky, last week. We experienced a number of security problems which were most likely all related. On July 3 someone complained via (I think) the comment link on UMBC’s main page that we were flooding their online form with spam. It turns out that someone was, somehow, able to launch a second httpd process using port 8080 on our Web computer. This was serving as a proxy, acting as a relay for requests from spam links placed in comments on forums and blogs back through which relayed them onto their ultimate source. The site which got the spam say the link as something like making it look like we were the spammers.

We’ve still not been able to understand how this was done. I suspect it was a PHP buffer overflow attack. We killed the process and locked our machine down tight and then discovered that several of our blogs had been compromised. We suspect it was because we were running an relatively old version of WordPress (2.0.4) that had some known vulnerabilities. Spammers were able to edit the templates of several of our blogs to add spam links to the footers. They mucked with the comment controls, disabled Akismet, and added over 800 spam comments to posts.

All of this came to a head on July 4, when most of our lab was away on travel or enjoying the (US) holiday. I was disparately trying figure out what was going on and why by examining the web access logs and our MySQL query log. Anand and Filip were able to help and we eventually got things quieted down, by disabling most of the Web administrative functions and using iptables to close of ports and block some IPs.

We’ve cleaned up the templates and killed the spam comments, updated our WordPress version, weeded out many unused user accounts, closed old posts to comments and trackbacks, and added some new instrumentation to our blogs and web server.

I learned a thing or two last week. Things are running reasonably smoothly now, but me, I’m still very paranoid. Hey! what was that sound?

Swoogle 2007

July 11th, 2007

Swoogle 2007 semantic web search engine
We’ve made some recent improvements and bug fixes to the Swoogle Semantic Web search engine, the new version of which is hereby known as “Swoogle 2007″. PhD student Lushan Han is the one who did all of the heavy lifting for this — thanks Lushan!

The biggest change is that Swoogle’s IR index is now updated incrementally, as new or modified Semantic Web documents are processed. When Swoogle processes an RDF document, it analyzes it to extract metadata, and then adds or updates the metadata in Swoogle’s database as well as (re-) indexes information about the document in Swoogle’s IR engine. Previously, these information in the database was updated as documents were found but the IR index was regenerated periodically in an off line batch process. Consequently, the two were not completely synchronized. They are now, at least on a daily basis.

If you want to see the documents that were added or updated today, you can use a term like “hasDateCache:2007-07-11″ in your search. For example, this query finds new or changed RDF documents that were discovered today that use the foaf namespace.

Among the bug fixes getting the “sort by date” option to work correctly on all pages of the result set, fixing the url: qualifier in Swoogle queries, and some memory leaks.

Finally, we had been putting off some changes because we were running out of disk space for Swoogle. We have new hardware that gives us room to grow.

Swarm theory, natural and computational

July 10th, 2007

The current National Geographic magazine has a nice feature article, Swarm Theory on the phenominon in nature and our attempts to apply it to artificial systems.

“A single ant or bee isn’t smart, but their colonies are. The study of swarm intelligence is providing insights that can help humans manage complex systems, from truck routing to military robots.”

OpenMoko, an iphone for the rest of us

July 10th, 2007

openmoko open source smartphoneCould it be? An unlocked GSM smartphone running open source software based on Linux? It’s said that you can order a developer preview phone kit ( Neo 1973, named for the year when mobile phones were introduced) from the store. The cost is $300 for the base model or $450 for the advanced kit. That site is not responding now, so check out Also, see Is your phone free? from the Register.

AI pioneer Donald Mitchie dies in automobile accident

July 9th, 2007

Donald MitchieProfessor Donald Michie died in an automobile accident this past Saturday. He was 83. Mitchie was an early AI researcher who founded the University of Edinburgh’s Department of Machine Intelligence and Perception and was generally considered one of the preeminent European AI researchers in the 60s, 70s and 80s. See his obituary in the Telegraph for more information on his long and productive career. Mitchie was also known for his work developing code breaking hardware and concepts at Bletchley Park and for many other contributions to computer science, such as the memoization technique.

Can predication markets select best sellers?

July 8th, 2007

geicocaveman.jpg The current issue of the New Yorker has an article, The Science of Success, on prediction markets, in particular, Media Predict, which runs markets for music, books, television, and movies.

“Last month, the publisher Simon & Schuster announced a partnership with a Web site called MediaPredict, which would use the collective judgment of readers to evaluate book proposals. The deal drew scorn from many, who saw it as evidence that publishers, in an era of stagnant sales, had so lost confidence in their own judgment that they were reduced to the methods of “American Idol.” Asking readers to weigh in on a book’s commercial prospects was a recipe for mediocrity, and the experiment was “doomed to fail.” Yet even the idea’s critics recognized that it was a response to a real problem: most books today are not economically successful, which means that much of the time and money that publishers invest in projects is wasted.”

Media Predict’s book market works like this. Aspiring authors submit a short book proposal. If selected, the book proposal goes on the market for a fixed amount of time, allowing market players to buy puts and calls on the the proposal. If the proposal gets a publication deal or makes the Simon & Schuster short list within the time window, the shares are valued at $100 and if not, they are worth nothing.

The music market works the same way — you predict if new bands will get a record deal. The markets for movies and television are a bit different. Media Predict generates predictions, “will ABC’s Geico “Cavemen” sitcom get on the air by October 15th”, that players invest.

Market players get $5,000 when they sign up. It’s not real money. So, what happens if you are good at this? Nothing for now, but they say they may introduce prizes and are also investigating the possibility of a real-money prediction exchange.

I’m selling the Caveman short.

Human agent swarm attempts Sudoku solution

July 8th, 2007

I heard this amazing story on NPR yesterday, Understanding Swarm Theory with Sudoku (listen).

Human agent swarm attempts Sudoku solutionJohn Carroll University professors Daniel Palmer and Marc Kirschenbaum assembled 81 people who tried to solve a sudoku puzzle on a nine by nine grid marked off on the grounds of the university. Each person represented an integer between one and nine, wearing a color coded tee shirt bearing the corresponding number. They were asked to mill about and interact with one another to try to find a configuration that represented a sudoku solution, i.e., each grid square with one person and all nine colors represented in each row, column and 3×3 sub-grid.

After seven minutes of milling, jostling and negotiating, the human swarm was close to a solution, but frustrated. Several of the intelligent digits, worked out a solution to the puzzle, took control and orchestrated the group into a valid solution. Palmer and Kirschenbaum plan on analyzing the video of the exercise to see if they can discover algorithms or heuristics that can he used to help swarms of mobile robots coordinate their activities.

Workshop on privacy enforcement and accountability with semantics

July 2nd, 2007

ISWC workshop on privacy enforcement and accountability with semantics
A one-day workshop on Privacy Enforcement and Accountability with Semantics (PEAS) will be held at the 6th International Semantic Web Conference and 2nd Asian Semantic Web Conference in Busan, Korea, 12 Nov 12 2007. Full research papers, short position papers, and demonstration papers are sought. Papers must be submitted by 27 July, authors notified by 31 August and final versions due 14 September.

Workshop topics include ontologies for privacy, techniques for privacy, anonymity, pseudonymity, and unlinkability, privacy management and enforcement, information hiding and watermarking, information provenance , inference channels, generalization of answers, privacy policy specifications and business rules, negotiations and incentives for cooperation enforcement, accountability, privacy and personalization, privacy and mobility, user- and context-awareness in privacy, security and trust, P3P, digital rights management, creative commons, pervasive technologies (RFID, cellular networks, wifi) and semantic web, case studies, prototypes, and experiences, desktop search and sharing.

Meta H-index measures scientific output of departments and labs

July 1st, 2007

the letter HA researcher’s H-index is a measure of her scientific output and is defined as the highest N such that she has N papers that have at least N citations. Pete Shirley of the University of Utah has page that lists the meta-H index for some Computer Science Departments.

“The meta h index of a department is the number of professors with h index higher or equal to h.”

You can add your Department or research lab by sending him a datafile to feed to his program.

Monitoring urban traffic with mobile phones

July 1st, 2007

Real Time Rome visualizes traffic using mobile phone location dataA recent BBC story, Beating congestion with mobiles, describes an interesting experiment in collecting real-time traffic information. MIT’s Real Time Rome aggregates data from mobile phones, buses and taxis in Rome to model traffic in real time. The project is focused on collection, analysis and visualization and collects mobile phone information using Telcom Italia’s LocHNESs system.

“The LocHNESs system developed by Telecom Italia can estimate road traffic by anonymously locating mobile terminals in conversation status in a certain zone. It records the directional shift of terminals by plotting them. This information is used to generate traffic maps in real time by associating one or more velocity values with every territorial pixel and by estimating mean velocity on the roads. Traffic maps can be generated throughout the Italian territory that is covered by the TIM mobile network and can be accessed 24 hours.”

There is also a short article on this in the March 2007 Economist Technology Quarterly.A next step in these and related projects is what to do with the data — how to distribute it and to whom and for what purposes. Interesting work.