 | Pranam Kolari 
Author Archive
December 14th, 2006, by Pranam Kolari, posted in Uncategorized
Gartner reports on the blogosphere peaking in 2007. (Yahoo News via Steve Rubel).
The reason: Most people who would ever dabble with Web journals already have. Those who love it are committed to keeping it up, while others have gotten bored and moved on, said Daryl Plummer, chief Gartner fellow.
Duncan Riley, and others don’t seem to agree. The arguments made are about India and China where Internet penetration is still in the low percentages, but growing fast.
There are 1.3 billion people in China, and only 123 million have internet access (Internet World Stats) with various reports putting the broadband number of those at between 70 and 80 million users. Less than 10% of the population of China currently has internet access .. Let’s look at India. According to IWS, there are 40 million internet users in India, out of a population of 1.1 billion. I was unable to find a growth figure for India, but you’d guess from such a small base as a percentage of the population, that internet access would be growing.
However, I think Gartner might be right. Two aspects to this growth:
- Growth is still limited to Internet Cafes, not to homes, where users are charged by the hour. Such users are generally on e-mails and chats, not on blogs which require higher time investment. Look at Internet Time Spent (if available), not penetration.
- As Internet Time Spent moves towards healthy percentages, a new social communication medium might take over. Note that Gartner refers to Blogging, not Social Media.
Edit | Bookmark@del.icio.us | Trackback | Comments Off
December 13th, 2006, by Pranam Kolari, posted in Uncategorized
Researchers at Yahoo! Research Lab – Barcelona are hosting a collection of labeled web spam hosts, which they call WEBSPAM-UK2006. The dataset consists of around 2725 hosts that have agreements across atleast two labels.
The goal of our dataset activity is to make available reference collections that should be:
- Large: the collections should include many examples of spam and non-spam content.
- Clean: the collections should contain little classification errors.
- Uniform: the collections should represent a uniform random sample over a set of pages or hosts.
- Broad: the collections should include as many different Web spam aspects as possible.
- Open: the collections should be freely available for researchers.
We came across similar problems while creating a a labeled dataset on spam blogs late last year. The creation of this new collection has made important contributions to address some of these issues. A paper describing the collection is also available online[PDF].
Edit | Bookmark@del.icio.us | Trackback | Comments Off
December 13th, 2006, by Pranam Kolari, posted in Uncategorized
WOMMA Research Blog reports on a talk from Howard Kaushansky, founder of Umbria. He was speaking at the WOMMA Research Symposium
Because so much time and energy has to be put into blog monitoring to thwart their efforts, splogs are a drain on resources. Instead of working to make the blog world better and more deft, time and attention has to be spent (wasted) on splog control.
This also brings out what is lacking today in the fight against splogs. While spammers continue to collaborate on obscure forums and tune their techniques, the community fighting it is not working together sufficiently.
Thanks to Howard, this talk would have increased exposure to the general threat.
Edit | Bookmark@del.icio.us | Trackback | Comments Off
November 27th, 2006, by Pranam Kolari, posted in Uncategorized
Blogspot now has company.
Though myspace has had its own problems dealing with account spam, it now appears they will have to deal with splogs. Sploggers seem to have compromised myspace captcha system, and are using it heavily to promote affiliates.
I came across a link farm generated using myspace, and indexed by Technorati. Its time blog search engines exercise some caution when indexing myspace pages — possibly employing the same techniques they have been using against blogspot accounts.
Edit | Bookmark@del.icio.us | Trackback | 1 Comment »
November 2nd, 2006, by Pranam Kolari, posted in Uncategorized
Netcraft released their November 2006 Web Server Survey. Its a great milestone, emerged to the top of TechMeme, and was featured on Slashdot. Looking closely, I was amazed at the nature of comments on Slashdot.
…how many of them are ad/pr0n/phishing-laden cybersquats, how many are “my first webpage” single-page sites, how many contain the default IIS … In short, how many of them are actual, funct^M usable, ongoing websites? That’s what I want to know. link
50 million more sites or 50 million more domain name squatters? link
How many of these “new” domains are those horrible “parked domains” that advertise their own sale and link to other sites (presumably to lift their google ranking)? link
and many more.. Yes, there is reason for concern.
Just to put this in perspective, here are all 4 letter info domains, that pinged weblogs.com in October. Looking closely it appears as though all domains were generated by permuting alphabets. Similar domains exists across the Web. Most of them appear to be spam.
If the menace of spam on the web is not controlled, who knows, 200 million won’t take all that long.
Edit | Bookmark@del.icio.us | Trackback | Comments Off
October 25th, 2006, by Pranam Kolari, posted in Uncategorized
Spings, or rather pings from splogs and non-blogs inundate ping servers. We did an analysis on this last year by characterizing splogs at ping servers. The problem makes ping servers far less attractive as a “blogosphere update manager”. Recently while looking at weblogs.com pingstream, I noticed something very strange, a new form of spings are now in use by comment/guestbook spammers.
The model used by these spammers has so far been –
- Spam comments on blog postings/guestbooks
- Wait for the next seach crawl of compromised pages
- Bask in artificially inflated rankings
However, its now changing –
- Spam comments on blog postings/guestbooks
- Send proxy pings on compromised postings/guestbooks to ping servers
- Bask in artificially inflated rankings, faster
Here’s a sampling of what we have seen in the last couple of days (changes.xml) –
weblog name=”auto car finance max” url=”http://www.ctle.ngcsu.edu/prof_chuck/?p=129″ weblog name=”best refinance mortgage” url=”http://www.ctle.ngcsu.edu/prof_chuck/?p=120″ weblog name=”cheap motor car insurance” url=”http://www.ctle.ngcsu.edu/prof_chuck/?p=122″ … weblog name=”interest mortgage rate refinance” url=”http://www.gajaweb.de/guestbook/gb/index.php” weblog name=”debt management plan” url=”http://www.dartzwerge.de/guestbook/index.php” />
And, here’s the entire list of spings for ngcsu.edu domain over the last 5 days.
We will of course investigate this further.
Edit | Bookmark@del.icio.us | Trackback | Comments Off
October 23rd, 2006, by Pranam Kolari, posted in Uncategorized
Domain registration services are now jumping into the private domain registration market, offering workaround to ICANN policies:
ICANN, the international governing body for domain names, requires every Registrar to maintain a publicly accessible “WHOIS” database displaying all contact information for all domain names registered.
Of course protecting personal information has advantages, all of which go with marketing of these services:
- Stop domain-related spam
- Deter identity theft & fraud
- Prevent harassers & stalkers
- End data mining
- Protect your family’s privacy
…
The service is offered for an additional fee, and the market is huge..
What concerns me though is how these services are used by spammers. I recently stumbled across illegitimate use when looking up a couple of domains (1, 2) associated with spam blogs/splogs. On further investigation I noticed that these services are in use by a host of other splogs.
I just submitted a report to DomainsByProxy about some questionable domains. Are they responsive to such reports? Well, I don’t know. Stay tuned.
Edit | Bookmark@del.icio.us | Trackback | 1 Comment »
October 20th, 2006, by Pranam Kolari, posted in Uncategorized
Just noticed this in our Akismet comment moderation queue –
I felt good about this post. It confirmed for me some of the things I’ve been thinking about.
It appears quite authentic (contextually broad) unless you investigate where it links to (spam blogs : splogs). This comment has now infiltrated major search engines. Google lists 714 results.
I am interested in analyzing signatures of such comments. Please share similar comments by either e-mailing me (kolari1@umbc.edu) directly or through comments to this post.
Edit | Bookmark@del.icio.us | Trackback | Comments Off
October 20th, 2006, by Pranam Kolari, posted in Uncategorized
Just got back from CASCON 2006, the 16th in the series of what is arguably the largest technical conference in Canada. I demoed our work on IBM’s internal blogosphere in the exhibit program, had a talk on Enterprise Blogging, and was part of a panel on Web 2.0 in the Social Computing Workshops. Yelena Yesha and Milt Halem organized an SOA workshop.
The event was very effective, thanks to a very well organized program, involving keynotes, paper sessions, exhibits and workshops. A couple of highlights from the keynotes I attended:
- Jerry Cuomo — Websphere will feature a big set of Web 2.0 capabilities, what Jerry terms Web SOA. He had a nice unifying theme — Enterprise SOA + Web SOA. (Jerry stopped by our exhibit earlier and liked some of the work we have been doing.)
- John Cohn — IBM is the leader in gaming chips, and almost all consoles feature them now. He says … humble that IBM is, gaming consoles don’t come with a “IBM Inside”.
The workshop I participated in was quite successful, thanks to a highly receptive audience. I finally got a chance to meet Aaron Kim, Jen Nolan and Peter Finn, Web 2.0 specialists within IBM. Also spoke to Laurie Dillon-Schalk who now works as a branding director at Great Gulf Homes. We have had great Web 2.0 “marketing” conversation in the past. Our presentation on Enterprise Blogging went well, so did the panel. Stephen Perelgut has a comprehensive summary of the entire event over at the CASCON blog.
I also had some great conversations in general with: Ian Spence, Mark Chignell, Munindar Singh, Gabriel Mansour, Sacha Chua, Alvin Chin, Sadek Ali, Antonio Cangiano, and IBMers.
Smart people.
Great event organization by Kelly Lyons and company. CASCON 2007 is just a year away.
Edit | Bookmark@del.icio.us | Trackback | Comments Off
October 12th, 2006, by Pranam Kolari, posted in Uncategorized
As facebook’s valuation gets more publicity, here’s one other perspective — a social vote on social tools. While some might argue about demographics of their user bases I still think this view is valuable.
 There might also be answers here on the Google/YouTube deal. The recent growth of YouTube has just been astounding.
Edit | Bookmark@del.icio.us | Trackback | Comments Off
October 11th, 2006, by Pranam Kolari, posted in Uncategorized
Splogs need not always be bad.
Splogs -> Spa Blogs
So here in the splogs(spa blogs) to come, will be some tips of joy and points of ponder to help you in the continuing quest for the perfect spa or to “perfect” your spa.
Splogs -> Spatial Blogs
Splogs – or spatial blogs ..blog entries on a specific location such as a local historic building, park or other piece of real estate, would be an invaluable tool ..
As the world get flatter, entity disambiguation will only get harder.
Edit | Bookmark@del.icio.us | Trackback | Comments Off
October 2nd, 2006, by Pranam Kolari, posted in Uncategorized
… as seen at Blaugh, which to a large extent is true. Just couldn’t resist re-posting this.

Though blogger has a very large user base, a high percent of newly created blogs are spam (splogs). Here’s a new farm (http://digital-color-cameravrnlibbe.blogspot.com/) our splog filters noticed yesterday. Warning! potentially objectionable content
Edit | Bookmark@del.icio.us | Trackback | Comments Off
|  |
|  |