How many subscribers does my blog have?

September 16th, 2007

Google has added the ability to see how many feed subscribers your site has to it’s Webmaster Tools suite, as reported in Google’s Webmaster Central blog.

“First of all, subscriber stats are now available. Webmaster Tools now show feed publishers the number of aggregated subscribers you have from Google services such as Google Reader, iGoogle, and Orkut. We hope this will make it easier to track subscriber statistics across multiple feeds, as well as offer an improvement over parsing through server logs for feed information.” (link)

We’ve found the Webmaster Tools very useful in maintaining our various sites.

Like everyone, We interested in knowing how many readers our sites have. Estimating this is not trivial, unfortunately. Using an analytics service like Sitemeter or Google analytics shows you a lot about visitors who actually visit your site. But if you offer some of your content via feeds, as virtually all blogs do, then of your readers many visit your site infrequently even if they read your content every day. This is especially true if your feed contains the full post content.

So, you need to estimate the number of feed subscribers you have and factor these in to your actual visitors. Doing this is complicated, too, since readers can subscribe in many ways, e.g., directly via a web browser like Firefox or some other software they run on their own computer or through a service like Bloglines, My Yahoo or Google Reader.

If you use Feedburner to proxy your feeds, these get aggregated to some degree. Many of the feed reading services share subscriber statistics with Feedburner, but not in a uniform way. You also might still have subscribers to some direct legacy feeds, as we do for many of our blogs. And, of course, some people, including David Winer, don’t want to use Feedburner because they are worried that it has an unhealthy monopoly.

So, if you obsess about tracking your readership, you may want to sign up for Google’s Webmaster services, register your blogs, and verify your ownership of them. Then you can get direct information on your Google Reader subscribers.

Semantic Eco-blogging: Spotter 1.0 Released

September 14th, 2007

The SPIRE team has been experimenting with semantic eco-blogging at the Fieldmarking site. Our motivation is the increasing popularity of eco-blogs, amongst both amateur nature lovers and working biologists. Subject matter varies, but entries typically include date, location, observed taxa, and description of behavior. These observations have the potential to be an important part of the ecological record, especially in domains (such as invasive species science) where amateur reporting plays an important role, and in the study of environmental response to climate change.

To this end, we developed Spotter, a Firefox extension that enables the easy creation of RDF data by citizen scientists. Spotter is not tied to a particular blogging platform, and can be used both to add semantic markup to one’s own blog posts, and to annotate posts or images on other websites, such as Flickr.

Spotter 1.0 is described and available for download here. This version, designed and tested by David Wang, Cyndy Sims Parr, Andrey Parafiynyk, and myself, offers substantial improvements in user interaction over an earlier prototype.

Of course generating RDF is just the beginning. Once RDF is generated, we’re able to apply all the machinery of the semantic web, including SPIRE tools such as Swoogle (our Semantic Web search engine), Tripleshop (our distributed dataset constructor), and ETHAN (our evolutionary trees and natural history ontology.) We are then able to issue queries like:

    What was the northernmost spotting of the Emerald Ash Borer last year?
    Show all sightings of invasive plants in California.

For example, when we issued the query

    Show all observations of species that are classified as being of concern by the U.S Fish and Wildlife Service

against the 1200 observations from the recent blogger bioblitz, we got back 47 records.

So … Please joint the growing global human sensor-net! Give Spotter a try and don’t hesitate to share your thoughts.

The Semantic Naturalist: Ecoinformatics meets Semantic Web

September 13th, 2007

The Semantic Naturalist is a new blog that is home for “Musings on natural history, geography, and the Semantic Web.”

“This weblog grows out of the Spire project, which is a research effort to explore applications of semantic web technologies to ecoinformatics and biodiversity conservation. Contributors include Allan Hollander (Information Center for the Environment, UC Davis), Joel Sachs and Cyndy Parr (both with the eBiquity research group, UMBC).”

This is a kind of Bioinformatics that is quite different from what usually comes to mind, but one that will be increasingly important as our planet continues to shrink. Awareness and concern for problems like global warming, environmental changes, invasive species, and endangered species are rising. The public interest in the Encyclopedia of Life project is but one recent example. Ecoinformatics is largely driven by data and much of it is collected and published in a very distributed manner. Geotagging is almost always important. So technologies for data sharing, discovery and integration are of central importance. We think that this is a great use case for Semantic Web technologies and one where we might have significant impact.

(spotted on Fieldmarking)

Terrorists on the Dark Web

September 13th, 2007

The University of Arizona’s AI Lab is engaged in an NSF funded project with the goal of “collecting and analyzing terrorism information, modeling terrorist behavior and terrorist networks, and disseminating information to the terrorized”. The most interesting aspect of the Dark Web Project is a collection of Web pages that are believed to be from terrorist affiliated groups.

“We have collected 500,000 Web pages created by 94 US domestic groups, 300,000 Web pages created by 41 Arabic-speaking groups, and 100,000 Web pages created by Spanish-speaking groups. The collection process is ongoing.” (link)

The methodology used to collect some of these pages is described in

Y. Zhou, J. Qin, G. Lai, E. Reid, and H. Chen, Exploring the Dark Side of the Web: Collection and Analysis of U.S. Extremist Online Forums, in Proc. Intelligence and Security Informatics: IEEE International Conference on Intelligence and Security Informatics, San Diego, 23-25 May 2006.

I sounds like a fascinating dataset that I hope can be made available in some form. I would like to try some of our sentiment detection and rust propagation techniques on this data, for example. The group has a demo system online, but it seems to be currently down.

UMBC Semantic Web research mentioned in the NYT

September 12th, 2007

Peter Wayner’s article in today’s New York Times by Peter Wayner, Helping Computers to Search With Nuance, Like Us mentions our research on exploring how the Semantic Web can be used to help Biologists share data and knowledge. This work is part of the NSF sponsored Spire project which is a team effort involving UMBC, UMCP, UC Davis and the Rocky Mountain Biological Institute.

“When Cynthia Sims Parr steps out to a Maryland field to look for nonnative birds and butterflies, half her thoughts are back in the computer lab where she will collate the data. Dr. Parr, who has appointments in both the life sciences and computer studies at the University of Maryland, melds her biology training with the ideas of computer scientists who are building custom adaptable databases. In the field, she records information about the species she sees and adds details like their location and activity. In the lab, she creates a flexible logical structure for the data to avoid linguistic confusion.” (link)

During the first years of the project we developed OWL ontologies (e.g., ETHAN) to represent the terms and properties necessary to express natural history and evolutionary tree information and used it to published taxon data derived from the Animal Diversity Web and other Biological sources. During the last year, we have been building tools to help biologies and ecologists to publish, find and use data that has been described and annotated with such semantic information. We are currently exploring how these tools can work with social media systems and infrastructure such as blogs and photo sharing sites (e.g., see Adding Semantics to Social Websites for Citizen Science).

Unfortunately, the article doesn’t mention UMBC, but rather the “University of Maryland”.

New CS grad enrollements in US up slightly in 2005

September 10th, 2007

first-time, full-time grad CS enrollment in the USAfter 9/11 there was a big drop in foreign graduate students enrolling in US Computer Science programs. The CRA Bulletin reports on New data from NSF that shows a slight increase in 2005 from first-time grad CS enrollment from foreign students.

“According to an NSF InfoBrief, after falling for three years, enrollment of first-time, full-time foreign students in master’s and doctoral programs in the computer sciences rose 11% in 2005. First-time enrolment of US citizens was relatively unchanged. The increase among foreigners meant that first-time enrollment in CS grew 6% after declining the previous 2 years. Nevertheless, these gains in first-time enrollment were not enough to halt a 4% drop in total enrollment between 2004 and 2005, and a 13% drop since 2002.”

It’s a small but promising trend.

On the morality of blocking web ads

September 10th, 2007

On the morality of blocking web ads

In the NYT’s Bits blog (Business, Innovation, Technology, Society) takes up the question of whether blocking web ads is a good thing or not. Web sites that many of us read and enjoy daily are supported by ads. They range from the sophisticated and professional MSM sites down to the smallest lone bloggers who hope to cover the cost of hosting their site. The Bits post, The Morality of Blocking Ads, points to note that appeared on Daily Kos yesterday

“If you use ad blocking software while viewing Daily Kos, you’re getting all the benefits of our site but we’re not getting any of the advertisement revenue associated with your visits. This site relies on ad revenue for daily operations: a decrease in the number of ads seen means a decrease in the funding available to run the site, to pay those that work on it, and to create improved site features. We won’t stop you from using ad blocking software, but if you do use it we ask you to support Daily Kos another way: by purchasing a site subscription. … ”

It’s hard not to be conflicted about this. On the one hand, a site like Daily Kos requires the full time effort of a handful of people as well as some serious bandwidth. If the ads are displayed, you don’t have to read them right? Besides, if they are selected based on their relevance to the site’s content, which you are interested in, maybe they will be of interest? On the other hand, I fear that commercial Web advertisements are in a slow race to the bottom where it will be all Cialis and Lunesta, all the time. Besides, advertisers have taken a first step out onto the slippery truthiness slope and we all know where that ends up. And if you run your Web site or blog as a business, will you be willing to bite the hand that feeds you? Maybe not if you have a mortgage and a family and aren’t getting any younger.

2007 the year of Citizen Journalism?

September 9th, 2007

Blogs and Social Media have revolutionized how people receive news and information. Traditional News outlets must evolve fast or face further loss of readership and advertising. The main reason for this trend is the shift in readership towards Web sources. Even there, social media is keeping online Main Stream Media (MSM) sites on their toes. Journalism has now morphed into a mix of professional reporting and active reader participation.

Citizen Journalism (also called participatory journalism or grassroot journalism) is when people provide news stories, photos, videos and contribute to the evolving news stories. There is a lot of debate on defining or re-defining this term. From my understanding, the general idea is that amateurs reporting news in collaboration with journalists is “participatory”. This is different from opinion blogs, punditry and monologues where individual (or a group of) bloggers comment on current affairs.

Here is a timeline of some of the important developments that are shaking up the fourth estate:

  • 2007, September: Google News recently announced partnerships with Reuters and AP to host stories directly on Google.
  • 2007, September: English version of Wikinews reaches 10,000 articles according to Wikipedia.
  • 2007, August: Google News to include comments from people in news.
  • 2007, August: NY Times Select Content to be free.
  • 2007, April: News stories ranging from earthquake reports to heroic rescue efforts are breaking news on Twitter. IMHO, this is an exciting development.
  • 2007, July: NowPublic, a social media news site is named as one of the 50 best websites by Time.
  • 2007, March: AssignmenZero meets Wired to connect journalists with citizens.
  • 2006, December: Yahoo and Reuters launch YouWitnessNews.
  • 2005, November: Alive in Baghdad, a weekly video blog posts the first video.
  • 2005, July: Pictures of London Bombing from Flickr were being used on several news sites.
  • 2004, November: WikiNews launched as a project by Wikimedia foundation.

Based on these trends it looks like Citizen journalism is bound to become more important and will attract even greater participation (in terms of both readership and contributions). Many MSM sites like MSNBC (CJ report) and CNN (iReport) are already encouraging user supplied news content. I notice that Indian News media is also adapting fast. I think for most Indian News sites, user comments and discussions have become a standard feature now. This is the very first step in moving towards accepting citizen journalism in main stream media. US News sources have been rather slow in either implementing or promoting this.

I think there are a few things that we might see evolve as we move into the next year or so:

  • Importance of citizen journalism in reporting (hyper) local news will grow.
  • MSM sources will continue to drag their feet at accepting the challenge even as new News sites start becoming popular.
  • A Citizen journalism code of ethics will evolve. It is only a guess, but some people might go to any extent to report a news story or become “famous”. Thats dangerous!

[Image Courtesy:]

Parsing Blonde Speak

September 5th, 2007

[Post by Jesse English and Akshay Java]

Understanding blond speak ain’t that easy! Barney Pell, Powerset CEO, recently put Powerset’s NLP technology to the task. Human language is already quite complicated and any NLP system trying to process unstructured, ungrammatical and noisy text needs to be robust. At UMBC, Dr. Sergei Nirenburg and his team at ILIT have been working on OntoSem, an Ontological Semantics-based NLP system. We have used OntoSem to process news data (SemNews) and export the Text Meaning Representation (TMR) into OWL. You can read more about this system in our recent publication.

We decided to use OntoSem to process Miss Carolina’s response. Here is an excerpt of the TMR it generated. The complete TMR is available here (Miss Carolina’s answer processed by OntoSem).

<concept name=”MODALITY-1095″ type=”MODALITY”>
<attribute type=”textpointer” value=”BELIEVE”/>
<attribute type=”word-num” value=”2″/>
<attribute type=”TYPE” value=”BELIEF”/>
<attribute type=”VALUE” value=”1″/>
<relation type=”SCOPE” target=”LARGE-GEOPOLITICAL-ENTITY-1097″/>
<attribute type=”FROM-SENSE” value=”BELIEVE-V2″/>
<attribute type=”ILLOCUTIONARY-FORCE” value=”IMPERATIVE”/>
<attribute type=”TRANSFORMATION-USED” value=”NP_V_NP 1″/>
<attribute type=”TIME” value=”(FIND-ANCHOR-TIME)”/>
<attribute type=”SAME-SCORE” value=”(MODALITY 0.0050000004 BELIEVE-V5)”/>
<attribute type=”HEAD” value=”YES”/>
<attribute type=”TEXT” value=””/>

<concept name=”HELP-1373″ type=”EVENT”>
<attribute type=”textpointer” value=”HELP”/>
<attribute type=”word-num” value=”65″/>
<relation type=”BENEFICIARY” target=”NATION-1376″/>
<relation type=”THEME” target=”EVENT-1374″/>
<attribute type=”FROM-SENSE” value=”HELP-V1″/>
<attribute type=”TRANSFORMATION-USED” value=”NP_V_NP 3″/>
<attribute type=”TIME” value=”(FIND-ANCHOR-TIME)”/>
<attribute type=”HEAD” value=”YES”/>
<attribute type=”TEXT” value=””/>

An interesting part of the text processing is that of understanding modalities. For example the word “believe” which expresses the speaker’s attitude to what is being said. OntoSem has the capability of processing such complicated linguistic constructs. OntoSem uses a large ontology to support it’s text processing capabilities. Hence, the word “Help“, for example, can be mapped to it’s concept “EVENT” and also a relation which indicates that the beneficiary of the Help event is actually U.S.

So … in Miss Carolina’s words I hope “education here in the U.S. help the U.S. or or“! Till then, I guess we will have to rely on machines to understand blonde speak!

Issues with Social Networking (in India and perhaps elsewhere)

September 4th, 2007

Akshay blogged recently about the growth of social networking in India. Which is great and wonderful, but it has brought out interesting issues that seem country specific. Not surprisingly, given that it has the largest share of the social network “market” in India, Orkut figures prominently in most of them.

Perhaps the easiest ones to think about are those related to politics. Recently, it was big news in India that someone had created an Orkut community that claimed to “hate” the chief minister of Uttar Pradesh, Ms. Mayawati[1,2,3]. The news media reported that the chief minister was infuriated, the police were trying to figure out how to stop this, and lawyers were figuring out whether the cyber crime laws in India covered this, and how to fix them if they don’t :-) Of course, one element of this community was a fake profile of Ms. Mayawati, which Orkut promptly removed. Politicians in India are often extremely touchy about such things. One state assembly (i.e. legislature) in India was particularly notorious for its speaker using his “priviledges/contempt” power to punish those in the media that were critical(see page 35 of [4] for one of the more recent examples, and of course Wikipedia for a discussion of the concept of parliamentary contempt and privilege notions). I am not sure if there are fake profiles of say George Bush on Orkut, or communities dedicated to hating him (Hmmmm……, maybe DailyKos qualifies?), but I have not seen much of a discussion about this issue here. Perhaps this is governed by how thick or thin skinned politicians in a country are ? That said, Al Gore’s recent Vanity Fair interview seemed to bring out a lot of his misgivings about the US MSM and how he felt he was mistreated during the ‘00 campaign[8].

Orkut also got into trouble for groups that criticized famous historical figures that are revered by significant sections of the Indian populace. One example of this was Orkut communities that proclaimed hate for Chatrapati Shivaji, a famous king of western and middle India [5]. There was apparently even a “hate india” community.

Fake profiles are of course fairly common in social networking sites in the US. I am not sure though if they have been used to spread watercooler rumors, or present a real (non celebrity) person in a poor or lewd light. Orkut has been so used in India it seems. There was a case of a high schooler in New Delhi who suddenly started getting phone calls of a lewd nature. Turns out that someone had posted a fake profile on Orkut where she was presented as someone of, how to put it delicately, less than stellar virtue. Messages had been sent out from this site to others inviting them for (you can figure out what). There is at least one other known case of something similar happening to an airhostess (aka flight attendant). These stories all made it to the MSM[6]. I haven’t really seen too much discussion of such issues in the MSM here in the US, except for the Allison Stokke story in WaPo [7]. Of course there it was not so much about an explicitly suggestive fake profile but about pictures taken at a track meet. Is this because such things have not happened ? Or people are more blase about them ? Either way, is this a country/culture specific issue ?

Rise of Social Networking in India

September 3rd, 2007

Indian internet users are just hooked to social network sites. What started with the popularity of Orkut in India is now become a cultural revolution. Most school going teens have an account on some social networking site and almost all IT-Savvy urban Yuppies are on it. In fact, just about every internet user I know in my circle is online on some of these sites. There are already dozens of local Indian social networking sites trying to be the next Orkut., a popular portal in India launched its own version recently and claims to have 1 Million subscribers already! Yaari, Minglebox, Hi5 and dozens of other sites are attracting their own fan base. Here are a few points on my perspective about this trend:

  • First mover: Clearly, Orkut has the first mover advantage in this space. There is a lot of inertia for people to move en masse to another service. What makes a social network site successful is the people that are on it. Facebook lost out on a big opportunity there due to its initial focus on American markets, but is now growing in popularity in India as well. (Right: Image of Orkut celebrating Diwali, a popular Indian festival)
  • Socio-Cultural factors: Social networking sites have been great for catching up with long lost friends from high school or your local community. While dating is quite common, in India sometimes it happens in subtle ways. Most users seem to shy away from exclusive dating sites like yahoo personal. There is however one exception. Sites like and others aiming to target the matrimonial market are doing extremely well.
  • Business networking: From LinkedIn stats, it looks like it is gaining new grounds in Indian market. There might also be a potential market for B2B, B2C networking sites.
  • SMS-based sites: While Twitter does not have a large user base in India, it is primarily due to the lack of a 40404 number. India is one of the largest growing market for SMS and Cell phones and almost all carriers provide an array of SMS based services. Its time that such services be integrated into social networking sites.
  • Interest Specific sites: Perhaps the next generation of social networking sites might aim specifically at verticals: Games, Cooking, Finance, Religion and just about any topic. Flickr is a great example for this. There are a number of Indian photography enthusiasts who are sharing their pictures and participating on Flickr.

Online video and music sites are also doing reasonably well. However, one of the major competitor there is the Indian Television and Cinema industry, which still has a grasp on a big share of the user attention. With respect to online music, due to the popularity of Bittorrent in India, most users prefer to download their music rather than listen to it online! Besides, in India the charm of AM/FM radios is still the same. So online music/radio faces an uphill battle.

Google Phone’s Advertising Approach?

September 3rd, 2007

Some time back, NYTimes reported “Cell Phone Ads May Take Off Soon”. With the increasing speculation about the existence of the fabled Gphone or Google phone, it is perhaps only a matter of time that this becomes a reality.

I think advertisers would be gleaming at the thought of highly contextual and targeted advertising sent directly to cell phones, 24X7. Not to mention the ability to know a lot more about you and tailoring the ads based on the wealth of information that the cell phone providers collect about us (assuming Google has some partnerships with existing carriers) .

The reason why Google’s advertising model is so successful is due to the simplicity of text ads. So, I feel the model that would finally win in cell phone ads business would be one that makes advertising seamless and unobtrusive.

One approach would be to provide advertisements alongside services. An idea that I have been thinking about is to do with SMS – In many places like India and Europe, SMS is a cheaper and convenient way to communicate. One problem however is that on a a cell phone one can only store a limited number of SMS. If Google was to have an archiving system for SMS (something like GMail for SMS) – that would be a great way to serve ads while making it least obtrusive and probably even useful to people. With the recent acquisition of GrandCentral, we can see how this might easily plug-in with their existing infrastructure, including voicemail, call forwarding and mapping multiple phone to a single number. Being able to access regular Gmail accounts from cell phones also means more clickthroughs for ads.

However, these are still very small pieces of the pie. The big chunk is in search. Google Phone will have search features that let you find information from the Web and more perhaps even local information. This would almost certainly open up a whole new set of possible enhancements. Wouldn’t searching for information on the go be different from looking up stuff from a computer — almost certainly so. For one, we want quick excerpts and possibly the results and advertisements might be dependent on the location (like a local bar or information on the historical monument while walking down a street).

Finally, the most recent patent by Google is on SMS payments. This is really interesting, since it could integrate with Google checkout and when buying stuff using your Google phone, you dont need to re-enter your information all the time.

Whatever the advertising approach would be, in the end its the consumers who would be reaping the ultimate benefits. I feel, Google phone would really shake up the monopolistic hold that cell phone providers have on US mobile market segment right now!

[Acknowledgment: image from Om Mailik’s blog]