 | 2005 January 
Archive for January, 2005
January 23rd, 2005, by Tim Finin, posted in Funding
DARPA BAA 04-19 — Biologically-Inspired Cognitive Architectures or (BICA) — calls for proposals to “develop, implement and evaluate psychologically-based and neurobiologically-based theories, design principles, and architectures of human cognition.” The program has the ultimate goal of “implementing computational models of human cognition that could eventually be used to simulate human behavior and approach human cognitive performance in a wide range of situations.” This BAA solicits proposals for the program’s initial 13 month design phase, which will be followed by a second implementation phase. Proposals are due March 1, 2005.
Edit | Bookmark@del.icio.us | Trackback | No Comments »
January 23rd, 2005, by Tim Finin, posted in Social, Semantic Web
Lots of interesting posts on folksonomies in the many2many group blog.
Clay Shirky offers an interesting metaphor for how new ideas and technologies, folksonomies in this case, evolve and are adapted.
To put this metaphorically, we are not driving a car, with gas, brakes, reverse and a lot of choice as to route. We are steering a kayak, pushed rapidly and monotonically down a route determined by the environment. We have a (very small) degree of control over our course in this particular stretch of river, and that control does not extend to being able to reverse, stop, or even significantly alter the direction we’re moving in.
Liz Lawley comments (“it’s the social network, stupid!”) on the need for personalized ranking.
One of the things that I’ve tried to emphasize every time I’ve talked to people involved with search engines is the growing uselessness of ranking algorithms that take the search and linking habits of the whole world into account. I don’t want to know what the average eight-year-old calls an image. I want to know what my friends and colleagues call an image. Or a link. Or a photo.
Flickr and del.icio.us work so well for me not because they aggregate the world’s tags, but because they allow me to aggregate my social network’s tags, links, and photos. I don’t want to see everybody’s links on productivity, but I do want to see Merlin Mann’s. I don’t want to see everybody’s links on blogging, but I do want to see danah’s. I don’t want to see “research” resources from a molecular biologist, but I do want to see them from a sociologist studying online social networks.
How does each of us personalize the ranking algorithms used by information retrieval systems? We can tell Flickr who’s in the group of people whose opinions we value. But do we have to do the same for del.icio.us and technorati and the 87 other sites we visit? An obvious idea is to integrate a trust based approach with a system to aggregate and integrate RDF information on our social network (FOAF) and the objects being searched over. One problem is that the straightforward way to define a ranking algorithm is non-incremental and expensive. Even incremental approximations will be expensive for large collections of things to be ranked. Google can afford to do it for the average web user, but not for each of us. Personalized and topic-based ranking offers many challenges (see An Analytical Comparison of Approaches to Personalizing PageRank for some discussion).
RDF + trust might form the foundation for a good motor for our kayak. We’ll have to see if it’s too big.
Edit | Bookmark@del.icio.us | Trackback | 1 Comment »
January 21st, 2005, by Tim Finin, posted in Security
If you were brave enough to read this item, here’s another dare…
New virus masquerades as news headlines
Friday, January 21, 2005 Posted: 10:19 AM EST (1519 GMT)
(CNN) — Researchers have identified a new computer virus that masquerades as news headlines from CNN’s Web site. …
Edit | Bookmark@del.icio.us | Trackback | 1 Comment »
January 20th, 2005, by Tim Finin, posted in Conferences, Mobile Computing
MobiQuitous 2005, the 2nd International Conference on Mobile and Ubiquitous Systems: Networking and Services, will be held 17-21 July 2005 in San Diego CA. Submitted papers are due 2 February 2005.
Combining mobile and ubiquitous computing yields a paradigm providing people and agents with computing and communication services all the time, everywhere, transparently and invisibly using devices embedded in physical environments. In this context, the communication devices, the objects with which they interact, or both may be mobile. This requires advances in wireless network technologies and devices, development of infrastructures supporting cognitive environments, and discovery and identification of ubiquitous computing applications and services.
MobiQuitous 2005 will cover all these aspects, representing a forum where practitioners and researchers coming from the many areas involved in ubiquitous solutions design and deployment will be able to interact exchanging the cross-layer experiences needed to build the overall ubiquitous systems. Areas addressed by the conference include: applications, service-oriented computing, middleware, networking, agents, knowledge management and databases.
Edit | Bookmark@del.icio.us | Trackback | No Comments »
January 19th, 2005, by Pavan, posted in Web
Preventing Comment Spam : “From now on, when Google sees the attribute (rel=”nofollow”) on hyperlinks, those links won’t get any credit when we rank websites in our search results. This isn’t a negative vote for the site where the comment was posted; it’s just a way to make sure that spammers get no benefit from abusing public areas like blog comments, trackbacks, and referrer lists.
We hope the web software community will quickly adopt this attribute and we’re pleased that a number of blog software makers have already signed on:“
So we can have control semantics, I guess it depends who makes them ….
“We’ve also discussed this issue with colleagues at our fellow search engines and would like to thank MSN Search and Yahoo! for supporting this initiative. Here are a few guidelines for anyone else who wants to join the cause. “
Surprising to see Microsoft and Yahoo agree with Google’s rules.
Edit | Bookmark@del.icio.us | Trackback | No Comments »
January 18th, 2005, by Pavan, posted in Programming, Pervasive Computing
Sun Brings Java to RFID Tagging :Today at the National Retail Federation Convention, Sun Microsystems announced an entry-level RFID solution for retailers based on the company’s Java System. Sun also unveiled what it calls “Industry Solution Architectures” for more complex RFID management such as integration with back-end enterprise systems.
Radio frequency identification (RFID) tags are small transmitters placed on products, which are often used for tracking or inventory purposes. According to the company, Sun’s Java System RFID software “enables customers to process RFID tagged cases or pallets at the rate of approximately one to two seconds per unit and is designed to help customers meet retail mandates in approximately one week.”
Edit | Bookmark@del.icio.us | Trackback | No Comments »
January 18th, 2005, by Tim Finin, posted in Funding
Peter Harsha’s CRA Computing Policy Blog notes a Washington Times article discussing targets for federal budget trims. The story lumps “scientific research” in with “other low-priority and no-priority programs” among those to be cut. NSF and NIH are specifically mentioned.
“…Mr. Bush gave a peek into his budget plans last week when he told The Washington Times his spending blueprint was “going to be tough.” That message was underscored by White House Chief of Staff Andrew Card, who told the U.S. Chamber of Commerce that Mr. Bush will exert “very, very strong discipline” on next year’s spending. “That discipline will be there big time,” Mr. Card told business leaders. Among the budget-cutting targets: the bloated Agriculture Department, corporate welfare, scientific research, housing, state and local giveaway grants, and other low-priority and no-priority programs that will be slashed or eliminated altogether. … The National Science Foundation’s social research grants, long criticized as wasteful, will be cut and NSF’s overall spending is expected to be flatlined. So will the National Institutes of Health, which has seen its budget skyrocket over the past decade, especially in the past four years. “That discipline will be there big time,” Mr. Card told business leaders. …”
Edit | Bookmark@del.icio.us | Trackback | No Comments »
January 17th, 2005, by Tim Finin, posted in Programming, Agents
While militaristic, this might be interesting as a basis for projects in a course on multiagent systems or Java.
“Robocraft, developed for MIT’s 6.370 class, is a real-time strategy game. Two teams of robots roam the screen collecting resources and attacking each other with different kinds of weapons. However, in Robocraft each robot functions autonomously; under the hood it runs a Java virtual machine loaded up with its team’s player program. Robots in the game communicate by radio and must work together to accomplish their goals. The software and competition specifications are available for download.”
Edit | Bookmark@del.icio.us | Trackback | No Comments »
January 14th, 2005, by Harry Chen, posted in KR, Web, Semantic Web
Getting people to agree on a single ontology has always a problem in the Semantic Web research. There are two schools of thinking. Some people believe that in the future all ontologies will be defined by some kind of standard bodies or special interest groups. Some others believe that there will be many different ontologies flowing around, and standard ontologies will emerge as the result of an “evolution” process — i.e., good ontologies will get used and bad ontologies will be forgotten.
I think the latter is more likely to happen than the former. The new tag service of the Technorati website is a good example. Here is a short description of the service from a Slashdot post:
Technorati (a search engine for blogs) has a new ‘tag’ service. If your blog tool of choice uses Categories, has a RSS/Atom feed, and pings technorati, then you’re done. If not, you can add tags via a new tag markup. The twist is that Technorati is working with Del.icio.us (a social/sharing bookmark manager website) and Flickr (a social/sharing photo web site) to read their tagged content! So Flickr pictures, Del.Ico.us bookmarks, and blog posts all on one page! Here’s an example result for the tag Toronto. There is some documentation as well. One current limitation is that there is no way to do tag intersection as with del.icio.us (i.e. http://del.icio.us/tag/toronto+food ) like http://www.technorati.com/tag/toronto+Food. Tagging (also know as Folksonomies) was the topic recently on Slashdot: Folksonomies In Del.icio.us and Flickr.”
Edit | Bookmark@del.icio.us | Trackback | No Comments »
January 14th, 2005, by Tim Finin, posted in KR, Web, Semantic Web
This is a good overview paper with the perspective of someone in library and information science.
Folksonomies - Cooperative Classification and Communication Through Shared Metadata, Adam Mathes, UIUC,December 2004. “This paper examines user-generated metadata as implemented and applied in two web services designed to share and organize digital media to better understand grassroots classification. … Conclusion. A folksonomy represents simultaneously some of the best and worst in the organization of information. Its uncontrolled nature is fundamentally chaotic, suffers from problems of imprecision and ambiguity that well developed controlled vocabularies and name authorities effectively ameliorate. Conversely, systems employing free-form tagging that are encouraging users to organize information in their own ways are supremely responsive to user needs and vocabularies, and involve the users of information actively in the organizational system. Overall, transforming the creation of explicit metadata for resources from an isolated, professional activity into a shared, communicative activity by users is an important development that should be explored and considered for future systems development.”
Edit | Bookmark@del.icio.us | Trackback | No Comments »
January 14th, 2005, by Tim Finin, posted in Swoogle, Semantic Web
After looking at the piece on Peter Norvig’s views on the semantic web (Semantic Web Ontologies: What Works and What Doesn’t), I realized that he’s talking about a request we made when we started developing Swoogle:
“A friend of mine just asked can I send him all the URLs on the web that have dot-RDF, dot-OWL, and a couple other extensions on them; he couldn’t find them all. I looked, and it turns out there’s only around 200,000 of them. That’s about 0.005% of the web. We’ve got a ways to go.”
We never did get any help from Google. What we did do was develop a work around to Google’s restriction of only giving 1000 results for any query, enabling us to more effectively use Google to find a set of initial seed URLs of semantic web documents (SWDs) to bootstrap the Swoogle crawler. Using these initial seeds, we employ a custom SWD crawler to crawl through SODs and a custom focused crawler to dig through HTML documents and directories. Using Swoogle, we have found on the order of two million SWDs (RDF files in XML or N3) publicly accessible on the web.
The hack we employ is to use Google’s ’site:’ qualifier to narrow the search. So we query on “filetype:owl” and get back 1000 results drawn from many different sites. After filtering out the non OWL documents, we extract a list of the sites from which the valid ones came. For example, if http://ebiquity.umbc.edu/ontologies/event.owl is in the initial result set, we note that ‘ebiquity.umbc.edu’ is a site that had at least one OWL file. For each new site S we encounter, we give Google the narrower query ‘filetype:owl site:S’, which will often end up getting some additional results not included in the earlier query.
For Google, a site qualifier specifies a suffix of the server’s symbolic address, so a simple refinement generates other potential site specifiers, e.g., if we find an OWL file at ‘ebiquity.umbc.edu’, we can generate the other sites (‘umbc.edu’, ‘edu’) and add them to the potential site table for querying. So, an important part of Swoogle’s database is the list of sites where we’ve found at least one SWD. Swoogle maintains a list of the top 500 sites from which we’ve extracted the most SWDs.
There are many wrinkles to this process. For example, not every SWD use a file suffix that indicates or even suggests its type. Swoogle can also produce a current analysis of the distribution of Swoogle’s documents by suffix. The second most common suffix is nothing and the fourth is ‘.xml’. And of course, some suffixes, like ‘.rss’ only imply that the file might be an RDF file.
While Google will only give you at most 1000 results for a query, it tried to be helpful in estimating the total number of results it could return. (Or is is taunting us?). We could use this information to inform Swoogle’s focused web crawler about how much effort to spend in rooting around in a site looking for SWDs. Currently, Swoogle’s focused crawler searches to a fixed depth and does not use this information.
As of this writing, I’d guess there are at least two million SWDs accessible on the web. Most of these are FOAF or RSS documents. In order to keep Swoogle’s collection more interesting and representative, we’ve limited the number of documents we collect from any given site, so it purposely ignores many FOAF documents it discovers. We have develop specialized datasets with many of these ignored SWDs. Currently Swoogle has about 340K SWDs indexed.
Note that we have a pretty narrow definition of a semantic web document — an RDF document encoded in XML or N3. There are lots of other uses of RDF content: embedded RDF in HTML documents, in other document types (e.g., PDF, JPG), in databases, etc. I think it’s hard to predict what the most important use cases will be for semantic web technologies.
Edit | Bookmark@del.icio.us | Trackback | No Comments »
January 14th, 2005, by Tim Finin, posted in Security, GENERAL
“The President’s Information Technology Advisory Committee (PITAC) achieved consensus yesterday on the final draft of its report on the status of the federal cyber security R&D effort, finding that support for civilian-oriented, fundamental cyber security research is seriously inadequate, the pool of researchers is insufficient, and that coordination between funding agencies is lacking. … The report will note problems in all three agencies one would expect to be funding critical long-term cyber security R&D: NSF, DARPA and the Department of Homeland Security.” …MORE…
Edit | Bookmark@del.icio.us | Trackback | No Comments »
|  | Recent postsStudents: brand yourself with a blogSocial Data on the Web workshop at ISWC 2008Petrini: Streaming Applications on the Cell BE Processor, 3pm 5/13 UMBCGossip-Based Outlier Detection for Mobile Ad Hoc NetworksInt. Conf. Semantic Web deadlines this week and next (ISWC 2008)
Ebiquity communityFieldmarking data blog
Geospatial Semantic Web
Harry Chen thinks aloud
Planet social media research
Social media research blog
TrackForward by Kolari
UMBC GAIM
|  |