Archive for January, 2007
January 30th, 2007, by Tim Finin, posted in Uncategorized
WYPR reporter Melody Simmons did a radio story on the use of robotics in health care, Hopkins Scientists Bestowing Sense of Touch To Medical Robots, that included a brief interview with UMBC CSEE faculty member Marie desJardins.
Marie DesJardins is a computer science professor at UMBC. She said that while robots have been around for about 30 years, they are just now beginning, to borrow a human phrase, to multi-task.
“What’s going to happen next is we’re going to have more and more intelligent robots that actually do start making decisions.”
DesJardins says some robots are already making medical decisions, such as life support machines and respirators. They are controlled by computers, yet have a human in the loop.
An MP3 is available here
January 30th, 2007, by Akshay Java, posted in Uncategorized
One of the things that became obvious from last year’s TREC Blog track’s opinion extraction task (more here) was that people frequently express sentiments, opinions and emotions in blog posts. Aggregate analysis of such opinions can provide an interesting glimpse into the collective mind of a community. For example, we collected a number of blog posts from LiveJournal and indexed all the N-grams for the content of the post. Looking at the phrases starting with words like ‘love’, ‘hate’ and ‘want’ shows that most LiveJournal users “Love life, their friends… but hate people and going to school”.
MoodViews is another great example of a tool that analyzes LiveJournal data and provides interesting features for trend-spotting and searching moods. They also extract a tag cloud of top keywords associated with a given mood. Many such tools are likely to play a major role in future advertising and business intelligence. Afterall would’nt advertising dollars be better spent pitching a book to someone who “wants to read”?
January 28th, 2007, by Tim Finin, posted in Uncategorized
TREC will have a blog track again in 2007 following up on the 2006 track on opinion extraction in blogs. During the TREC blog workshop, several proposals were made for a track on blog spam detection (including ours and one from NEC Laboratories America) as well as some other tasks. Groups that want to participate in TREC need to apply by 20 February. See the 2007 TREC CFP for details.
Since 1992, TREC has been fostering research to enable more powerful, faster and easier-to-use technologies for information retrieval. TREC 2007 will focus on seven tracksâ€”including a blog track, to explore information seeking behavior in the blogosphere, and a legal track to develop search technology to help the legal profession find information pertinent to a case in digital document collections. For each TREC, NIST provides a test set of documents and questions. Participants run their own retrieval systems on the data and return to NIST a list of the retrieved top-ranked documents. NIST pools the individual results, judges the retrieved documents for correctness, and evaluates the results. The TREC cycle ends with a workshop in November that is a forum for participants to share their experiences.
Last year was the first time that I was involved with a TREC effort and I found it to be very worthwhile. Harrowing at the end, but well worth it — there is nothing like an impending deadline and the growing chance of looking like an idiot to your friends and colleagues to get the adrenaline flowing.
January 28th, 2007, by Tim Finin, posted in Uncategorized
Machine tags. It’s a bit awkward, but I love the name.
One problem with the name Semantic Web is that it hides the fact that RDF is content generated by machines and for machines. Or at least it should be. Ultimately, of course, most of us don’t want to know what’s going on under the hood if we don’t have to. But for now, those of us trying to work with the ideas and potential applications can’t avoid it.
Last week Flickr announced their new machine tag feature that supports a style of tagging moves much closer to RDF. The idea is to take a tag like
dc:title=”Computing Machinery and Intelligence”
and parse it as
<namespace> : <predicate> = <value>
This syntax was already recognized for at least two special cases: geo:lat=… and geo:lon=… tags were recognized by Flickr’s mapping components and upcoming:event=81334 was recognized as a reference to an event registered on upcoming.org. Now Flickr has updated its systems to recognize any tag with form symbol:symbol=value and has changed its internal databases to record the separate components. What’s more, Flickr has extended its API so that you can query on machine tags with wildcards for any of its three elements.
Flickr’s Dan Catt calls it Not Quite RDF (NQRDF) in an interesting post. Among the things that are missing, of course, is a way to map the “prefix” like DC: into a full URI, the thing that is normally done in an RDF header
Here’s where Swoogle can help. When we were looking at popular RDF namespaces we noticed that there was not much ambiguity in mapping between prefixes and namespaces, except for some pathological cases such as namespace prefixes like n1, n2, …
Take the MusicBrainz ontology, for example. Swoogle knows just over twelve thousand documents that use it, either version 2.0 or 2.1. Of these, all but one use the prefix mm:. The sole oddball chose to use mbmeta: as a prefix for the MusicBrainz ontology. Of the documents that declare the prefix mm: for a namespace, every one uses it to refer to a version of MuzicBrainz.
documents declaring a mm prefix
What ambiguity remains could be further reduced given a combination of a prefix and a predicate. So, if people started using mm: to refer to other vocabularies (e.g., the mickeyMouse vocabulary) chances are good that we could distinguish the desired predicate given the prefix and the predicate’s local name. For example, of the ontologies ever declared using mm:, only MusicBrainz has a releaseStatus predicate and only MickeyMouse has an appearedIn predicate. So these can be easily disambiguated:
- mm:appearedIn=”How to Play Baseball”
We’ve been working on a scheme to make it easy for people to tag photos of organisms with their scientific names and have those tags map to our Ethan ontology. I think that we’ll have to take another look to see how machine tags might help.
January 24th, 2007, by Tim Finin, posted in Uncategorized
Socialemail@example.com is a Google Groups mailing list for people who are doing or interested in research on social media.
What kinds of social media? Well, usual suspects — blogs, forums, wikis, sites for sharing and discussing photos, videos, bookmarks, etc. — as well as new forms yet to be invented.
What kind of research? Our hope is that this list will focus on current topics like the following: natural language processing, information extraction, sentiment detection, opinion mining, bias, text classification, community modeling, detecting spam in social media, tagging, folksonomies, using ontologies, user modeling, recommendations, recognizing trends and buzz, graph analysis, using semantic web technology, FOAF, data blogging, reasoning about trust, modeling influence, social network analysts, event detection, using machine learning techniques and novel applications.
Whew, so what isnâ€™t covered? We hope that the list doesn’t suffer the fate of many, and become just a channel for announcements of conferences, workshops, positions and products. These are valuable, of course, if they are really relevant to social media research. So send them along, if you really think they are relevant to research on social media.
How do I subscribe? join the group via the web page http://groups.google.com/group/social-media-research. When you subscribe, you can choose to view the posts online or to get them by email either as they are made, as a daily summary or in digest form. Currently, anyone can join and you need not have a Google account.
Can I just lurk? Sure. The group is set so that any anyone can view the content, but you have to be a member to post. See the discussions at http://groups.google.com/group/social-media-research.
Is there a feed? Google Groups provides feeds (Atom or RSS 2.0) for both messages and topics. See http://groups.google.com/group/social-media-research/feeds for the options for out list.
Will the list have a lot of spam? We sure hope not. Some is bound to slip through. And, of course, my important announcement of a call for papers may be your spam. If someone joins to send what we consider to be egregious spam, we’ll ban them. If spam is still a problem, we may resort to moderating new members or even all messages.
Who is running this anyway? The list is currently being managed by members of the UMBC ebiquity research group (https://ebiquity.umbc.edu/). We want the character and function of the group to emerge from the members, however, and see ourselves as the current caretakers. If you want to contact use, send email to firstname.lastname@example.org.
If you think you will be interested, please give it a try. You can always unsubscribe if it turns out not to be useful to you. To start, we’d like to invite new members to introduce themselves via a short message to email@example.com. You might say who you are, what you are interested in and what you are working on now.
January 24th, 2007, by Pranam Kolari, posted in Uncategorized
I subscribe and follow keywords of interest through RSS feeds, both on blog search engines and bookmarking tools. Though splogs have always been a problem, lately I have noticed increasing spam in bookmarking tools. What do we call it — b00kmarks? (read zero, zero)
In the more popular one’s (like del.icio.us) the LONG TAIL is highly compromised (1, 2), while in the less popular even the HEAD seems to have problems. The availability of many ready to use “tag and ping” tools is making things worse.
While my immediate response is to unsubscribe, being researchers we will of course investigate this further.
January 23rd, 2007, by Tim Finin, posted in Uncategorized
Many Eyes is interesting data visualization service just released on IBM’s alphaworks. You can upload your own data and generate and publish a visualization of it in a number of formats. As an example, see 100 most popular Semantic Web namespaces as of August 2006 according to Swoogle. The site is similar to Swivel but has more ways to visualize your data. Many eyes also a rudimentary social media site in that people can comment on uploaded datasets and visualizations.
January 20th, 2007, by Pranam Kolari, posted in Uncategorized
We are conducting research on the nature and seriousness of the splog problem in the non-English blogosphere. As contextual advertisements and affiliate marketing become more profitable in these other languages, splogs are bound to infiltrate and pollute them. We suspect its already beginning to, in a limited way, and are interested in studying them.
From the research community, the only work related to non-English splogs is:
Detecting Blog Spams using the Vocabulary Size of All Substrings in Their Copies, by Kazuyuki Narisawa, Yasuhiro Yamada, Daisuke Ikeda and Masayuki Takeda
Most of this work is however based on synthetic data, not actual splogs.
We have also made attempts to see the existence of splogs by querying blog search engines using translated spammy(profitable) advertising contexts like insurance, vacations, loans etc. Cultural differences indicate that this might not be the way to go. True, we haven’t come across many splogs.
This is what prompts us to seek suggestions from the blogging and research community. If you know of this problem, have seen splogs in other languages, know of spammy non-English advertising contexts, and would like to contribute or collaborate please send either of us a note or comment below.
UPDATE: For our readers, Is this a splog in Japanese? http://diet.newstanding.com/hcm/vpb/
January 19th, 2007, by Tim Finin, posted in Uncategorized
Hitwise has an interesting post comparing the popularity of five web-based feed readers. Bloglines continues to dominate with with Rojo, Google, Newsgator and Netvibes following behind.
January 19th, 2007, by Tim Finin, posted in Uncategorized
Last week’s news included reports that the DoD had discovered Canadian coins with hidden radio transmitters had been planted on DoD contractors with classified security clearances. The report in question was Technology Collection Trends in the U.S. Defense Industry issued by the Defense Security Service.
On at least three separate occasions between October 2005 and January 2006, cleared defense contractorsâ€™ employees traveling through Canada have discovered radio frequency transmitters embedded in Canadian coins placed on their persons.
It sure sounded like an implausible story. For example, the metal in the false coins would interfere with the radio transmissions and a planted coins could easily be separated from the target by being spent.
Today the news is that the DSS has acknowledged that the story is false.
A statement in the 2006 Defense Security Service Technology Collection Trends in the U.S. Defense Industry report which claimed radio frequency transmitters were discovered embedded in Canadian coins is not true, according to DSS officials. This statement was based on a report provided to DSS. The allegations, however, were found later to be unsubstantiated following an investigation into the matter.
January 18th, 2007, by Tim Finin, posted in Uncategorized
Mechanical Engineering professor Ann Spence is coordinating the Maryland LEGO Robotics Tournament which will be held this Saturday in the UMBC University Center Ballroom.
More than 500 middle-school youth and their families are expected to participate in the FIRST LEGO League State Tournament, a competition that builds studentsâ€™ ability to design and program LEGO robots, on Saturday, Jan. 20 from 9:30 a.m. until 5 p.m. (source)
The FIRST LEGO League is an international robotics program intended to encourage enthusiasm for discovery, science, and technology in kids ages 9 to 14. This year, teams will build LEGO robots to perform functions such as removing “pizza molecules” from a paper plate. Judges will evaluate the teams on their ability to program robots to achieve tasks relevant to nanotechnology, a scientific frontier focused on achieving advances in medicine and computers through research into particles 100,000 times smaller than the thickness of a single strand of hair.
The UMBC MD LEGO robotics tournament is free and open to the public with the finals beginning at 3:30pm. I think it will be fun to watch.
January 15th, 2007, by Tim Finin, posted in Uncategorized
New York Times reporter David Carr has a funny and insightful article, 24-Hour Newspaper People, on blogs and traditional newspapers. Several quotes stuck me. Carr writes about how tending to his blog competes with his real work.
Sometimes I wonder whether I care to the point that I neglect other things, like, oh, my job. Tweaking the blog is seductive in a way that a print deadline never is. By the time I am done posting entries, moderating comments and making links, my, has the time flown. I probably should have made some phone calls about next weekâ€™s column, but maybe Iâ€™ll write about, ah, blogging instead.
Not that this would ever be a problem for me.
Carr has an interesting quote from Clay Shirky, one which I’ve not been able to find on the Web or Blogosphere.
â€œWe are living through the largest expansion of expressive capability in the history of the human race,â€ said Clay Shirky, an adjunct professor in the graduate interactive telecommunications program at New York University. â€œAnd it wouldnâ€™t be a revolution if there were no losers. The speed of conversation is a part of what is good about it, but then some of the reflectiveness, the ability for careful summation and expression, is lost.â€ Even as Mr. Shirky is saying this, I peek at the comments section of my blog, and he goes on, â€œThere is an obsessive, dollhouse pleasure in configuring and looking at it, a constant measure of social capital.â€ [Emphasis added]
This seems so right. The pleasure of creating and nurturing one’s own little world underlies much of what people do with computers. An image that game to me was that of Kandor, Krypton’s capital city which was miniaturized by the evil Brainiac but rescued and lovingly kept by Superman under a bell jar in his arctic fortress. Kandor was also used as the name for a knowledge representation and reasoning system developed by Patel-Schneider and colleagues in the mid 1980s. The name was chosen because Kandor was a lightweight version of an earlier KR system, Krypton. Both systems were precursors to description logic, a family of representation formalisms that underlies the Semantic Web language OWL. Building a representation of some aspect of reality can deliver that dollhouse pleasure and also lead to obsessing over it.
Finally, Carr delivers another metaphor — blogger as day trader:
There has always been a feedback loop in journalism â€” letters to the editor, the phone and more recently e-mail messages. But a blog provides feedback through a fire hose. The nice thing about putting out a newspaper was that, at some point, the story was set and the writer got to go home. Now I have become a day trader, jacked in to my computer and trading by the second in my most precious commodity: me. How do they like me now? What about … now? Hmmmm … Now? [Emphasis added]
This works on many levels, but I get an immediate visual image of the lonely blogger/trader at home spending 16 hours a day staring into a computer, punctuated by bouts of frantic typing only to fall exhausted into bed at 1:00am.