UMBC ebiquity research group Building intelligent systems in open, heterogeneous, dynamic, distributed environments
Blogging

Archive for the 'Blogging' Category

Anonymous, leaderless resistance and Scientology

January 26th, 2008, by Tim Finin, posted in Blogging, Social, Social media, Web

Leaderless resistance is defined on Wikipedia as

“…a political resistance strategy in which small, independent groups (covert cells) challenge an established adversary such as a government. Leaderless resistance can encompass anything from non-violent disruption and disobedience to bombings, assassinations and other violent agitation. Leaderless cells lack bidirectional, vertical command links operating without a hierarchical command.” (link)

It’s challenging to combat a leaderless resistance because one can’t use the usual methods to discover participants by exploiting the social networks of known members.

Today’s new communication infrastructures make it easier for such distributed resistance movements to take hold and grow. Information, instructions and loose coordination can be spread via Web pages, Blogs, text messages, IRCs, mailing lists, etc.

A colleague Chris Diehl at JHU APL suggested the Estonian cyberwar might be a good example to study how the Blogosphere was used for this by combining sentiment analysis, geotagging and temporal analysis. This cyber attack was a subject of a recent colloquium at APL. It’s a great idea, but one made more challenging by the fact that the attack is over and would involve dealing with content in Estonian, which, although not exactly a low-density language, is also not one that has been extensively studied by computational linguists.

But maybe there is another example of an Internet-driven leaderless resistance, going on right now, that would be good to study as it unfolds. A group that calls itself Anonymous has announced it intends to launch an online DDOS attack on Scientology as part of a campaign against the organization.


[youtube=http://youtube.com/watch?v=JCbKv9yiLiQ]

The message is spread in part by YouTube videos starting on 21 January. There is also the Wikipedia page on Project Chanology which was created on 24 January 2008, an Anonymous Scientology Widget that counts down to (I suppose) when participating members should take action, and lost of mentions on forums, blogs and other forms of social media.

Linuxhaxor has instructions for what to do, which are offered only for educational purposes.

“This guide is for information purpose only, I, the site owner, do not encourage people to go about and follow these steps or Chanology in anyway to carry this attack, or any attack to any organization or any person. If you agree to follow these steps and help them carry this attack you are fully responsible for any consequences whatsoever. This act is illegal in many states and countries. ”

Wired just ran a story on this leaderless resistance effort, Anonymous Hackers Shoot For Scientologists, Hit Dutch School Kids, and there are plenty more online.

Finally, you can track the online interest through this Blogpulse trend graph comparing Blogosphere mentions of (1) “Tom Cruise” (2) Scientology and (3) anonymous+scientology and also the Google Trends graph comparing Google searches for the same three terms. Click on the graphs to see the current results.

Mentions of scientology, tom cruise and anonymous via Blogpulse

Google searches for scientology, Tom Cruise and anonymous

Tom Cruise is in there because he’s rumored to be the second most important person in the Church of Scientology and his recent Scientology indoctrination video that surfaced on YouTube may have been the tipping point for some.

Outside.in shows your neighborhood news

January 25th, 2008, by Tim Finin, posted in Blogging, Semantic Web, Social media, Web, Web 2.0

Outside.in is another site that shows news, civic information and blogs posts relevant to a location. See the outside.in site for Catonsville MD, the area around UMBC for an example.

A VC blog calls this a hyperlocal site. (They are an investor). It’s similar to everyblock but it covers nearly 12,000 cities instead of three. The coverage, however, does not seem as deep.

They recognize the location of a blog posts as follows. A blogger registers he feed with outside.in and they monitor the posts. She geotags blog posts with the a location using one of four methods: (1) a link to a Google map with the location, (2) a blog category or tag that looks like a Zip code, (3) an inline text tag like or [where 1000 Hilltop Cir, Baltimore, MD 21250], or (4) geoRSS

everyblock shows what is new in your neighborhood

January 24th, 2008, by Tim Finin, posted in Blogging, Semantic Web, Social media, Web, Web 2.0

Everyblock launched yesterday as a site that shows you news and other item that are about or relevant to your neighborhood. You enter your address, postal code or neighborhood name and see news articles, civic documents like crime reports and building permits, blog posts, craigslist entries, Yelp reviews, and Flickr photos associated with the area around it.

Currently only three cities are covered by everyblock — New York, Chicago and San Francisco. For an example, see the everyblock page for NYC’s Chelsea neighborhood or around the University of Chicago.

I suspect that much of the work that goes into a system like everyblock is selecting the right sources. For example, how you access online local government documents for each state or city will differ. You would want to mine the local newspapers for news items and focusing on them would make disambiguating geonames and addresses easy, since your first order approximation would be that every geo-reference is local. There is also work in adapting to the APIs for other services, like Yelp and craigslist, that each has its own way to sorting items into geographic regions.

There are good data sources for the GIS information and services, free or paid, that will do the geocoding and reverse geocoding of names and addresses. The Earth is a finite place and we have a lot of data about what is where, at least in the more developed parts of it.

Although everyblock claims to include relevant blog posts, I’ve not seen any yet. This is a harder problem, unless you get bloggers to add explicit geographic metadata or register their blogs with a location, like feedmap.org does.

The XKCD data died in a blogging accident

January 13th, 2008, by Tim Finin, posted in Blogging, GENERAL, Social media, Web



The popular XKCD had another Web related comic yesterday, but it trned out to be self-negating. As was noted on Slashdot:

“As I noted yesterday (and was joined by many others)… in an offhand observation xkcd has singlehandedly changed a small section of the Internet. Changing the results from a Google search for “Died in a Blogging Accident” from 2 to (at this writing) over 7,170 in a little more than 24 hours.”

The number of results are now up to 13.3K 66.1K (8/10/08). I guess something like the Heisenberg uncertainty principle applies to the Internet, too.

Update 1/15: Here’s a trend graph from blogpulse for occurrences of “died in a blogging accident” in blogs as of 09:00 gmt+5 on 15 January 2008. Click graph to see current data.


mentions of ‘died in a blogging accident’ in blogs as of 15 Jan 2008 09:00 gmt+5 via blogpulse

Update 1/16: Google trends shows a sudden interest in the dangers of blogging las week. Here’s a graph from 16 January 2008. Click on the graph to see the current trend graph.


Google searches as of 16 Jan 08 for ‘died in a blogging accident’

Hoosgot exploits the wisdom of the Blogosphere crowd

January 2nd, 2008, by Tim Finin, posted in Blogging, Semantic Web, Social media, Web 2.0

Technorati founder David Sifry launched a new service last week, when everyone was recovering from one holiday and preparing for another. Hoosgot (Who’s got …) let’s you ask the collective Blogosphere by posting a question on your blog or on the Twitter microblogging system. You need to include the term hoosgot in your blog post and @hoosgot in your twitter update to have it noticed.

Sifry explanation of how Hoosgot happened reinforces my belief that the greatest skill a practical computer scientist can have is being able to quickly test a new idea by turning it into running code.

You gotta love Holiday Weekends. Friday night (the 28th) The lazyweb popped back into my mind. I missed it. I started asking myself the question, “Why hasn’t anyone reconstituted the lazyweb?, What if we could rebuild the lazyweb for the 2008 web? What if we could take advantage of all the cool tools that have arrived in the last 5 years? Would it work?” Rather than wait around, I realized I could just build it, and maybe folks like me would use it. At about 5am on Saturday morning, the first prototype was up. I made some major changes, including twitter support Saturday night. And launch is today, on Sunday morning! Ain’t working on the web fun?:-)” (link)

Of course it helped that he could tweak Technorati to collect blog posts and tweets.

Will it work? Hooknows. One problem is spam, and Sifry is well positioned to deal with this. The other is that the wisdom of crowds is not uniform. Since your Hoosgot query is going out to a very broad group, a narrow question on an obscure aspect of Java programming will be a head scratcher to most. If you ask the blogmob for a movie recommendation, they will tell you to go see Norbit, which was 2007’s 29th highest grossing movie but also so unredeamably horrible that it almost killed Eddie Murphy’s career.

There are some possible things that could address these problems. Learning to spot Hoosgot spam and automatically adjust the model as it evolves is one. Another is to classify the Hoosgot queries by intent, topic and geography. Both of these are made more difficult if the queries are short, as they will be for Twitter-based queries. We’ve dealt with some of this in Akshay Java’s recent work on analyzing Twitter updates (
Why We Twitter: Understanding Microblogging Usage and Communities
).

(viaReadWriteWeb)

Interlinking your web pages to maximize their PageRank

December 16th, 2007, by Tim Finin, posted in Blogging, Semantic Web, Social media

A post on the physics arXiv blog points to an interesting open access article, Maximizing PageRank via Outlinks, on how to structure your own web pages to maximize the PageRank scores they receive. The paper does not consider tactics for getting sites to link to your pages, but instead looks at how you can organize the internal link structure of your site to maximize your pageRank.

Cristobald de Kerchove, Laure Ninove and Paul Van Dooren, Maximizing PageRank via Outlinks, submitted to Linear Algebra Applications, 19 November 2007, arXiv:0711.2867v1 [cs.IR].

We analyze linkage strategies for a set I of web pages for which the webmaster wants to maximize the sum of Google’s PageRank scores. The webmaster can only choose the hyperlinks starting from the web pages of I and has no control on the hyperlinks from other web pages. We provide an optimal linkage strategy under some reasonable assumptions.

What is being optimized is the sum of the PageRanks for the pages in your site.

The optimal structure for your site is roughly this: organize your site as a linear chain of pages, each linking to the next in the chain and also back to each of its chain ancestors. The final node in the chain should be the only one that links to any node outside of your site, and it should link to just one outside page.

Blogrunner: the New York Times robot in the newsroom

November 1st, 2007, by Tim Finin, posted in Blogging, Social media, Web 2.0

The New York Times has incorporated Blogrunner into it’s Web site. Techcrunch characterizes Blogrunner as a Techmeme Killer

“Last night, the New York Times quietly launched Blogrunner on the technology section of its main site. Blogrunner was one of many Techmeme copycat sites, until the New York Times bought it last year. Like Techmeme, Blogrunner is a service that keeps track of the latest news and blog posts on a range of topics (Politics, Technology, Media, Business, Economy, Law, Health, Movies, Books, Religion, Iraq, Entertainment). Now those links are appearing on the New York Time’s main site, starting with the technology section, in a middle column titled “Technology Headlines from Around the Web.” (link)

Here’s the NYT Bits blog on Blogrunner:

“The biggest change is the feature in the middle column of the technology page titled “Technology Headlines From Around the Web.” It presents a constantly updated list of hot technology stories. Notice what we are not worried about. … Even more interesting to me is how this list gets generated. It is mainly created by an automated algorithm developed by Philippe Lourier, the developer of Blogrunner, a Web site The New York Times Co. bought last year. It has something in common with Digg, the site on which readers vote on what articles they find interesting. But for Blogrunner, votes are links from blogs or other Web sites. This approach, of course, is what powers the PageRank algorithm of Google, and Techmeme, an excellent technology news site. (link)

I wonder what is taught at J Schools about this these days.

Google ads help fund retirement for some

October 28th, 2007, by Tim Finin, posted in Blogging, Social media, Web

This week USA Today had an article, ‘Gray Googlers’ strike gold, on older Americans operating websites that make money on Google ads.

“Jerry Alonzy figured he’d be working into his 70s at least. As an independent handyman at the mercy of weather patterns near Hartford, Conn., he’d always made a decent income that rarely grew. Then he found Google, and his life changed. Alonzy, 57, now makes $120,000 a year from the ads Google places on his Natural Handyman website, and he couldn’t be more thrilled. “I put in two, maybe three hours a day on the site, and the checks pour in,” he says. “What’s not to like?”

Of course, this is not a guaranteed get rich quick scheme. You have to have the right niche that will attract good paying ads, constantly writ new quality content, and build up your pagerank. Note that Alonzy spends about 20 hours a week tending his site — not an insignificant amount of time. The story cites some other examples that are probably more typical of what one can expect.

While the upside of working with AdSense sounds exhilarating, it’s not that way for everybody. Scott says she posted an unsold novel on Google and earns about $5 a month from the AdSense ads on the site. Al Needham, 74, who runs a site about the care of bees (bees-online.com) from his home near Boston, reaps about $250 a month. “Forget about getting rich overnight,” says Alonzy. “It takes time to learn.”

It’s a jungle out there

October 4th, 2007, by Tim Finin, posted in Blogging, Ebiquity, Security, Web

Sigh….

At the end of last week we had a catastrophic failure that resulted in our losing most of our posts. We had a security problem where someone had managed to compromise one of our blog accounts with administrative privileges. Some of the files were modified. We noticed it right away and decided to restore the site files and database from our nightly dump.

However … it turned out that when we did a major Wordpress update back in February 2006, we created a new database but failed to update our backup script. So, for the past 19 months, it’s been creating a nightly backup of the old database. Restoring the old database not only resulted in loosing 19 months worth of posts, but also left the database out of sync with the current Wordpress version.

One of our former students (thanks Filip!) wrote a script to recover the old posts from Google’s cache and reinsert them into the database. it was a tour de force demonstration of quick programming skill. There are still some problems that we’ll need to attend to — we’ve lost all of the new categories that we’ve added since 2/2006, the ‘related posts’ plugin is no longer working, I think the feed links aren’t all right, etc. But we recovered the posts.

We’ve tightened up our security but continue to see lots of malicious visitors knocking on the door and checking the locks.

It’s a jungle out there.

Sifry’s state of the blogosphere

February 6th, 2006, by Tim Finin, posted in Blogging, memeta, splog

Technorati’s David Sifry has posted another State of the Blogosphere report with lots of interesting statistics. Highlights include

  • Technorati tracks 50K posts and hour from 27M blogs.
  • The number of blogs doubles evey six months.
  • Splogs and spings are increasing.
  • Tagging is increasingly popular.

coComment tracks blog conversations

February 5th, 2006, by Tim Finin, posted in Blogging, Web

coComment is a free service to help keep track of comment-based conversations on the blogosphere. After registering, you add their bookmarklet to you browser. When making a comment on a blog using any of the most common platforms (e.g., WordPress, blogger), you first click on the bookmarklet, and then submit your comment. The bookmarklet sends a copy of your comment to coComment which adds it to their database, along with the context. The result is that you can visit their page and see the comments you’ve made and can also add some code to your own blog(s) to show recent comments. Here’s what it should look like:



One thing that’s missing, IMHO, is the ability to register your comments with several IDs. I’d like to have my personal ID, but also define it as part of a group ebiquity ID. We could put code to link the ebiquity group ID comments on our ebiquity group blog.

Btw — to sign up you need an invitation code. To get an invitation code, just enter your email address to be notified when one is available. You may get it almost immediately in email, like I did.

Half of Swoogle’s hits are from referer log spammers

February 4th, 2006, by Tim Finin, posted in Blogging, Semantic Web, Swoogle, Web, splog

We are using bbclone to generate reports on Swoogle access. Look at today’s top 10 referers as of 3:00pm:

  www.legaladvocate.net  246     26.14%
  www.myjavaserver.com   152     16.15%
  www.google.com         125     13.28%
  dannyayers.com         44      4.68%
  lucky7.to              34      3.61%
  ebiquity.umbc.edu      25      2.66%
  www.google.de          18      1.91%
  planetrdf.com          18      1.91%
  mail.google.com        18      1.91%
  groups.google.com      14      1.49%

One and five are clearly spam sites and two is suspicious, too. The first, for example, appears to be about poker, though the site name is legaladvocat. The site’s text is obviously automatically generated nonsense. All of the links point to subpages in the same domain with a similar structure and content. I assume that once the site achineves a high pageRank, it will be repurposed or sold.

So, it seems like nearly 50% of our hits are due to referer log spamming. I’d guess Swoogle was picked by finding its URL on recent posts found on a blog search engine or a ping server.

You are currently browsing the archives for the Blogging category.

  Home | Archive | Login | Feed






UMBC