 | UMBC eBiquity Blog 
Tim Finin, 9:31pm 8 July 2008
HealthMap is an interesting Web site that displays a “global disease alert map” based on information extracted from a variety of text sources on the Web, including news, WHO and NGOs. HealthMap was developed as a research project by Clark Freifeld and John Brownstein of the Children’s Hospital Informatics Program, part of the Harvard-MIT Division of Health Sciences & Technology.
Their site says
“HealthMap brings together disparate data sources to achieve a unified and comprehensive view of the current global state of infectious diseases and their effect on human and animal health. This freely available Web site integrates outbreak data of varying reliability, ranging from news sources (such as Google News) to curated personal accounts (such as ProMED) to validated official alerts (such as World Health Organization). Through an automated text processing system, the data is aggregated by disease and displayed by location for user-friendly access to the original alert. HealthMap provides a jumping-off point for real-time information on emerging infectious diseases and has particular interest for public health officials and international travelers.”
The work was done in part with support from Google, as described in a story on ABC news, Researchers Track Disease With Google News, Google.org Money
Categories: NLP, Semantic Web, Social media, Web, Web 2.0,
Tags: GIS; Google; healthcare,
Related posts: • Sentiment mining for Wall Street; • The Semantic Naturalist: Ecoinformatics meets Semantic Web; • Long Live PGP — A New PGP Global Directory; Trackback:
link, Comments: none
Anupam Joshi, 2:59pm 8 July 2008
Here at Ebiquity, we’ve had a number of great grad students. One of them, Akshay Java, hacked out a search engine for twitter posts around early April last year, and named it twitterment. He blogged about it here first. He did it without the benefit of the XMPP updates, by parsing the public timeline. It got talked about in the blogosphere, some press, and there was an article in the MIT Tech review that used his visualization of some of the twitter links. It even got talked about in Wired’s blog, something we found out only yesterday. We were also told that three days after the post in Wired’s blog, someone somewhere registered the domain twitterment.com (I won’t feed them pagerank by linking!), and set up a page that looks very similar to Akshay’s. It has Google Adsense, and of course just passes the query to Google with a site restriction to twitter. So they’re poaching coffee and cookie money from the students in our lab
So of course we played with Akshay’s hack, hosted it on one of our university boxes for a few months, but didn’t really have the bandwidth or compute (or time) resources to keep up. Startups such as summize appeared later and provided similar functionality. For the last week or two we’ve been moving the code of twitterment to Amazon’s cloud to restart the service. Of course, today comes the news that twitter might buy summize, quasi confirmed by Om Malik. Lesson to you grad students — if you come up with something clever, file an invention disclosure with your university’s tech transfer folks. And don’t listen to your advisors if they think that there isn’t a paper in what you’ve hacked — there may yet be a few million dollars in it 
Categories: AI, Blogging, Datamining, Social media, Twitter, Web 2.0, cloud computing,
Tags: invention disclosure; search engine; Twitter; twitterment,
Related posts: • New CS grad enrollements in US up slightly in 2005; • .com Domain Names: What’s Left to Register?; • What are we Twittering?; Trackback:
link, Comments: 4
Tim Finin, 2:12pm 6 July 2008
This FringeDC meeting looks like fun for Lispers in the DC area.
“Conrad Barski will be presenting excerpts from his new book for community feedback. Join us at Sova Espresso & Wine for a presentation from Conrad Barski, M.D. from the new book “Land of Lisp” published by No Starch Press, due this Fall. We’ll discuss Lisp and see never-before-seen comics and game examples from the book! Afterward, we’ll be talking over some wine, coffee and food at this great little hangout in DC’s H Street Corridor.”

Categories: Programming,
Tags: lisp,
Related posts: • Land of Lisp: follow the simple rules; • Learn LISP from a comic book!?!; • Lisp in 500 lines of C; Trackback:
link, Comments: none
Tim Finin, 8:28am 6 July 2008
Jim Odell, the acting chair of the FIPA IEEE Computer Society standards committee, recently sent out an update to the members on current activities.
“FIPA is currently working with the OMG on agent standardization, including an SOA standard that includes agents (SOA-Pro) and an Agent Metamodel and Profile (AMP). The Agent Metamodel and Profile RFP has many companies that are participating, including (but not limited to): HP, Unisys, CSC, Deere & Co, Thales, Metropolitan Life, SINTEF, and DFKI. If you are interested in participating, please let me know.
Any comments on the Agent Metamodel and Profile (AMP) RFP are welcomed. (The above companies and RMIT have already submitted their suggestions. The current release can be downloaded from: http://www.omg.org/cgi-bin/doc?ad/2008-06-02”
The OMG Agent Platform Special Interest Group page maintains links to documents about these emerging agent standards.
Categories: Agents,
Tags: FIPA; SOA,
Related posts: • FIPA to become an IEEE standards committee; • First FIPA IEEE meetings; • FIPA as an IEEE standards committee; Trackback:
link, Comments: none
Tim Finin, 10:11pm 2 July 2008
The Chronicle of Higher Education has a story on students using BitTorrent to share scanned copies of textbooks. The article, Textbook Piracy Grows Online, Prompting a Counterattack From Publishers, starts off
“College students are increasingly downloading illegal copies of textbooks online, employing the same file-trading technologies used to download music and movies. Feeling threatened, book publishers are stepping up efforts to stop the online piracy. One Web site, called Textbook Torrents, promises more than 5,000 textbooks for download in PDF format, complete with the original textbook layout and full-color illustrations. Users must simply set up a free account and download a free software program that uses a popular peer-to-peer system called BitTorrent. Other textbook-download sites are even easier to use, offering digital books at the click of a mouse.”
Text books are an interesting niche for file sharing. They are surely expensive and publishers manage to publish new editions of popular titles almost every year, undermining the market for used texts. On the other hand, digitizing a text book requires scanning it, which takes time, attention to detail, equipment, and labor. It’s not as simple as ripping a CD.
Update 7/7/08: The Chronicle of Higher Education has a follow up story, Founder of Textbook-Download Site Says Offering Free Copyrighted Textbooks Is Act of ‘Civil Disobedience’
“… But the founder of Textbook Torrents calls his actions “civil disobedience” against “the monopolistic business practices” of textbook publishers. The site’s founder, who asked to remain anonymous for fear of legal action against him, talked to The Chronicle over an Internet phone call last night and defended his creation, though he described it as operating in a “legal gray area.” He said he is an undergraduate at a college outside of the United States, though he would not name the institution or country, and that he operates the Web site from there. His biggest complaint: that textbooks are just too expensive, and that prices climb each year. “We’re showing both students and textbook publishers that this isn’t acceptable anymore,” he said. “A lot of users are absolutely fed up with the system.” He said he views the 64,000 registered users of his textbook-download site as votes against that system.”
Categories: Social media, Web, Web 2.0,
Tags: bittorrent; file sharing,
Related posts: • The Rise and Fall of CORBA; • Free draft of book on Logic for Philosophy; • online book: Introduction to social network methods; Trackback:
link, Comments: 3
Tim Finin, 4:18pm 1 July 2008
The Washington Posts Security Fix blog has a post, Amazon: Hey Spammers, Get Off My Cloud!, reporting on allegations that spammers are starting to use Amazon’s Elastic Compute Cloud (EC2) servers. It only makes sense — you can sign up easily without committing to a contract of any length, the price is low, and the IP addresses are drawn from a wide range, making it hard to block them all. Besides, if Amazon’s EC2 IP addresses all get put in a spam blacklist, it will be bad for their many legitimate users. It may be tricky for Amazon to police this.
Categories: Social media, Web, splog,
Tags: Amazon; cloud computing; EC2,
Related posts: • Web Service API Helps Amazon to Give Away the Store; • Put cloud computing in your shopping cart; • The State of Blogger (is it Splogger?); Trackback:
link, Comments: none
Tim Finin, 8:14am 1 July 2008
A good fraction of the comment spam that makes it through our Akismet filter is from people who are trying to add a comment to one of our posts about spam blogs or comments. Here’s an example from today’s batch, a comment on a two-year old post Blog comment spam with plagiarized text: hard to spot from cameroun trying to promote the site africapresse.com.
“spam is a real problem in this day not just for .edu but for the entire internet world. Plagiarism is a problem too.”
It’s easy for me to classify this as spam since the comment was made on a very old post, is short, includes a reference to a site that looks commercial, makes a few general and superficial statements that are not really tied to any of the posts details.
I think it’s ironic that so many SEO wannabes try to spam posts about spam. I guess they just have spam on the brain. So, I offer up this post as food for the comment spammers and their search and comment tools.
akismet, anti-spam, antispam, automated, automated, automatic, backlink, backlinks, bad behavior, blacklist, block, blocking, blog, blogging, capcha, comment, comment spam, comments, human, keywords, links, links, nofollow, pagerank, people, plagiarize, plagiarism, rank, search engine optimization, seo, spam, spam blogs, spam comments, spam karma, spamming, splog, splog, splogs, steal, target, trackbacks, traffic, typepad, wordpress.
Categories: Social media, splog,
,
Related posts: • How to get more to get more comment spam; • Big spike in blog comment spam?; • Akismet does not like me; Trackback:
link, Comments: one
Tim Finin, 7:21am 1 July 2008
Here’s something I never expected: splogs as a political issue. Actually, it’s allegations of political blogs being splogs, or rather allegations of accusing political blogs of being a splogs in order to get Google to block them. The NYT Bits blog has a post, Google and the Anti-Obama Bloggers, that describes the controversy.
“Did Google use its network of online services to silence critics of Barack Obama? That was the question buzzing on a corner of the blogosphere over the last few days, after several anti-Obama bloggers were unable to update their sites, which are hosted on Google’s Blogger service. … In an article that appeared on Bloggasm.com, the reporter Simon Owens spoke with some of the affected bloggers, who said they believed that Google had fallen prey to a campaign by activists supporting Senator Obama. According to the bloggers, the Obama supporters had clicked on a “flag” on the anti-Obama blogs alerting Google that they were spam.”
Maybe this is a good reason to rely on the judgment of machines, at least until they start running for office.
Categories: Social media, Web, splog,
,
Related posts: • Fighting kleptotorial splogs; • The other face to splogs; • Proving that blogs affect society; Trackback:
link, Comments: none
Tim Finin, 11:32am 28 June 2008
The WSJ has an article, Get Out of Your Own Way, on research suggesting that people have often form intentions to act and make decisions well before they are conscious of the fact. Maybe this is like detecting the inferences made by the OWL reasoner or classification of a low-level SVM model before the high-level Python code processes its results. This picture from the article sums it up nicely.
As usual, you’re always the last to know. At least this opens up new interpretations for the old excuse, “Hey, I was out of the loop!”.
Categories: AI, Semantic Web,
Tags: cognitive science; consciousness; decision making,
Related posts: • No related posts; Trackback:
link, Comments: none
Tim Finin, 10:33am 28 June 2008
Today’s New York Times has an article, With Wireless Network, City Agencies Have More Eyes in More Places, that describes a city wide wireless network that is operational and is expected to largely completed by the end of the summer.
“Locating vehicles is one of ways the Department of Sanitation and other city agencies are using the city’s new $500 million high-speed wireless secure data network, one of the largest of its kind in the world. The network, known as NYCWiN, was built by Northrop Grumman and by summer’s end will include about 400 cellular antennas covering 95 percent of the city.
The idea is for city agencies to use network-connected hand-held devices and tablet computers to increase efficiency and flexibility: Soon, police officers will be able to view photographs of suspects from their cars, fire chiefs will be able to watch live video of fires taken from traffic helicopters above, and housing inspectors will be capable of looking up building plans while on location.”
The article notes that other cities, including Oklahoma City, Tucson and Washington, are implementing similar wireless networks. One motivation is to provide a secure network for municipal workers who can not rely on commercial cellular networks which can become quickly overloaded in emergencies.
The Gotham Gazette has some information on the NYCWiN system’s specification:
The original specifications for the network called for it to support multiple, simultaneous transmission of full-motion video or large files from and to anywhere in the city, real-time tracking of all city vehicles and control of traffic lights, continuous monitoring of air and water purity, transmission of patient vital signs from ambulances to receiving hospitals, and reliable voice communications to back up radio and cell phone signals. … NYCWiN is not technically Wi-Fi, since it will use licensed spectrum. Wi-Fi operates over a portion of the airwaves that the Federal Communications Commission has designated as unlicensed, or open to the public for use with any approved device. Nevertheless, in non-emergency conditions, NYCWiN will have a lot of unused capacity that could help civic projects keep their bandwidth costs down, as Dana Spiegel suggested.”
According to Paul Cosgrave, NYCWiN is not a WI-FI or a WIMAX system but uses Universal Mobile Telecommunications System technology on the 2.5 GHz band to provide a broadband data network and IP services. The similar Washington DC system uses EV-DO and different frequency band, 700 MHz.
Wireless Blog reports that NYC is “using IPWireless technology for their city-wide safety network with each cell site providing in-building coverage up to 3 to 5 miles from the cell site in an urban setting. It operates in a single channel of 5 or 10MHz of spectrum and supports voice over IP with full QOS based on SIP.”
Categories: Mobile Computing,
,
Related posts: • MIT and Cambridge to build free wireless mesh network; • Software–Defined Radio Could Unify Wireless World; • Developing self-configuring, secure wireless networks; Trackback:
link, Comments: one
Tim Finin, 10:29pm 26 June 2008
Venture Beat reports that Microsoft will acquire Powerset for a price “rumored to be slightly more than $100 million”. Powerset has been developing a Web search system that uses natural language processing technology acquired from PARC to more fully understand user’s queries and the text of documents indexed.
“By buying Powerset, Microsoft is hoping to close the perceived quality gap with Google’s search engine. The move comes as Microsoft CEO Steve Ballmer continues to argue that improving search is Microsoft’s most important task. Microsoft’s market share in search has steadily declined, dropping further and further behind first-place Google and second place Yahoo.
Google has generally dismissed Powerset’s semantic, or “natural language” approach as being only marginally interesting, even though Google has hired some semantic specialists to work on that approach in limited fashion. Google’s search results are still based primarily on the individual words you type into its search bar, and its approach does very little to understand the possible meaning created by joining two or more words together.”
If you put the query “Where is Mount Kilimanjaro” into the beta version of Powerset, it answers “Mount Kilimanjaro: Contained by Tanzania” in addition to showing web pages extracted from Wikipedia. That’s a pretty good answer.
Its response to “what is the Serengeti” is a little less precise. It reports seven things it knows about Serengeti — that it replaced “desert, Platinum”, twilight and Caribbean Blue”, that it hosted ‘migration’, that it provided ‘draw’, that it gained ‘fame’, that it recorded ‘explorations’, that it rutted ’season’ and that it boasted ‘Blue Wildebeests’. I’m just glad I don’t have a school report due on the Serengeti due tomorrow!
Asking “Who is the president of Zimbabwe” results only in the fallback answer — which appears to be just the set of Wikipedia pages that the query words produce in an IR query. Compare this with the results of the Google query who is the president of zimbabwe site:wikipedia.org.
By the way, the AskWiki system often does a better job on these kinds of question. Asking “where is the Serengeti” produces the answer “The Serengeti ecosystem is located in north-western Tanzania and extends to south-western Kenya between latitudes 1 and 3 S and longitudes 34 and 36 E. It spans some 30,000 km.” It’s a bit of a hack, though. It seems to work by selecting the sentence or two in Wikipedia that best serves as an answer. See our post on Askwiki from last Fall for more examples.
Still, Powerset is an ambitious system that shows promise. What they are trying to do is important and will eventually be done. They have shown real progress in the past two years, more than I had expected. I hope Microsoft can accelerate the development and find practical ways to improve Web search even if the ultimate goal of full language understanding is many years away.
Categories: AI, NLP, Semantic Web, Web 2.0,
Tags: Google; Microsoft; powerset,
Related posts: • Barny Pell video: POWERSET - Natural Language and the Semantic Web; • Powerset outsources query result evaluation to Mechanical Turk; • The Semantic Edge at the Web 2.0 Summit; Trackback:
link, Comments: one
Tim Finin, 1:23pm 26 June 2008
Wired has an interesting article, The End of Theory: The Data Deluge Makes the Scientific Method Obsolete, that discusses the data driven revolution that computers and the Web have unleashed. Science used to rely on developing models to explain and organize the world and make predictions. Now much of that can be done by correlating large amounts of data. It applies equally well to other disciplines (e.g., Linguistics) as well as businesses (think Google).
“All models are wrong, but some are useful.” So proclaimed statistician George Box 30 years ago, and he was right. But what choice did we have? Only models, from cosmological equations to theories of human behavior, seemed to be able to consistently, if imperfectly, explain the world around us. Until now. Today companies like Google, which have grown up in an era of massively abundant data, don’t have to settle for wrong models. Indeed, they don’t have to settle for models at all.
Sixty years ago, digital computers made information readable. Twenty years ago, the Internet made it reachable. Ten years ago, the first search engine crawlers made it a single database. Now Google and like-minded companies are sifting through the most measured age in history, treating this massive corpus as a laboratory of the human condition. They are the children of the Petabyte Age.
Update: And then there is this counterpoint: Why the cloud cannot obscure the scientific method .
Categories: Semantic Web, Social media, Web, Web 2.0,
Tags: data,
Related posts: • Freebase’s data and knowledge models; • Models of Trust for the Web at WWW2006; • Models of trust for the Web; Trackback:
link, Comments: 2
|  |
|  |