 | Pranam Kolari 
Author Archive
September 25th, 2006, by Pranam Kolari, posted in Uncategorized
If Techmeme was special, Gabe’s approach to monetizing goes one step further. He announced their new business model today, impressing the blogosphere, and opening up a new advertising based engagement (reach-out) model for businesses that use blogs. This reminds me of something Amazon came up with this year, and what they call Plogs, where select amazon editors/sellers engage customers in a marketing loop. Perhaps Gabe’s work with Techmeme will bring Plogs and many other related efforts into real world use.
Far reaching implications I am sure. Smart!
Looking forward, I wonder what plans Gabe has for memeorandum.The audience catered to by memeorandum is just perfect for many political campaigns.
Edit | Bookmark@del.icio.us | Trackback | No Comments »
September 23rd, 2006, by Pranam Kolari, posted in Uncategorized
Imagine a television broadcaster generating advertisement revenues off stolen programs, thats what Bitacle is getting at, at the scale of the entire Blogosphere. This is just not acceptable.
Well we all love how search engines, aggregators and blog readers organize Web content, eventually directing users to the original source. Bitacle however creates a black hole around copied user content — once you are in, you are in. My concern (and the general debate on the blogosphere) is on their “aggregator” facility, which pulls together user posts and hosts advertisements. To make matters worse they also host new comment threads (gosh!), and this is ours, btw.
It appears that the debate starts with Ivan’s post on “Are Bitacle blog thieves too?”, as early as March 2006. Interestingly an employee from Bitacle has explanations!, in comments, and compares themselves with Google and Yahoo, for god’s sake!
The reason it’s that we don’t be only a blog search engine we are a “archive blog search engine†that it’s different concept.
One question: why you don’t ban Goolge, Yahoo or MSN? That search engines cache all your pages.
Bloggers are outraged, just titles speak for themselves –
All your blogs are belong to Bitacle Bitchacle Bitacle: thieves now open for business in the 8th circle of hell (a good overview of the issue) Why is bitacle stealing all our blogs?? BITACLE DEBACLE CONTINUES — BLOGGERS OUTRAGED — NO NEWS COVERAGE BY OLD MEDIA? Bitacle is Heisting My Content Those Bastards!!(Bitacle.org)
As I write “Bitacle” is the 7th most searched keyword on Technorati today. Bitacle, totally unethical and unprofessional.
UPDATE: I notice their sitemap lists all plagiarized blogs.
Edit | Bookmark@del.icio.us | Trackback | No Comments »
June 5th, 2006, by Pranam Kolari, posted in Uncategorized
The 7th International IEEE Workshop on Policies for Distributed Systems and Networks (Policy 2006) started today at the University of Western Ontario, London, ON, Canada.
Held every year since 1999, Policy 2006, is the primary forum for technical exchange on the research and standards related to policies for networks and distributed systems. Policy 2006 will present up-to-date approaches for policy specifications, integration with management systems and new applications of policies.
We will be presenting one of our papers on Policy Management of Enterprise Systems on Wednesday. Lalana Kagal, an UMBC alumni has a presentation based on her work on policy delegation networks [pdf].
Edit | Bookmark@del.icio.us | Trackback | No Comments »
May 16th, 2006, by Pranam Kolari, posted in Uncategorized
If Yahoo Home Page is in the middle of interface changes, so is flickr. Noticeable one’s include
- Better use of home page real estate, and navigation.

- Group recommendations. (Great new feature!)
Batch processing on groups is still missing, which is what I would have loved to have. More..
What’s next? del.icio.us?
Elsewhere, all positive (Digital Connection, InforNation, Antonescu, Visual Impact, Niall)
Edit | Bookmark@del.icio.us | Trackback | No Comments »
March 27th, 2006, by Pranam Kolari, posted in Uncategorized
AAAI Spring Symposium started today at Stanford. We have a presentation on Wednesday, March 29th, based on a part of our work on splog detection. We are always open to discussion on any topic related to the Blogosphere or the Semantic Web. Catch any of us around — Tim Finin, Pranam Kolari or Akshay Java — anytime!
Edit | Bookmark@del.icio.us | Trackback | No Comments »
March 27th, 2006, by Pranam Kolari, posted in Uncategorized
A panel on “Technologies to Understand it Now and Gain Insight in the Future” was part of the AAAI Weblogs Symposium today.
The panel was organized around questions. We summarize what the panel thought about these questions below. Panelists are identified by their abbreviated names.
What information do you get from Blogs? HK’s take is “Market Research”. Putting the “finger on the pulse” of consumers. AB agrees with HK and also points PR Marketing. AB discusses blog analysis in addition to e-mails received by organizations, and message-board discussions and the task of correlating all of them, with blogs as the pivot! AB mentions splogs and how they are effecting their analysis. CR talks about growth of the blogosphere. It all started as “The most recent post about X”. Relevancy is becoming more important now. CR talks about AOL connection and how AOL is using their index to track conversations on the blogopshere. CR ends by saying that traditional media is now open to involving unedited content on their pages.
How good is it? How much does it matter? Discussion was centered around SPAM and related issues. The question is regarding the robustness of current analysis techniques against spam? CG says don’t worry about it – data is always dirty. Blogger has cracked down spam blog postings. CG deviates from the problem and suggest that query disambiguation and other issues are more important. HK brings back the topic by asking about conflict between search engine revenues and SPAM. MS talks about how spammers go to such an extent as creating paid accounts on TypePad and how Six Apart does not allow automated content generation. MS says blogosphere does care about splogs as opposed to CG. ML gives a great example where hijacked content from his blog listed on another site had outlinks replaced to porn sites.

How will consumers use these analyses? CM raises an interesting point of privacy being an issue in the next 5 years, as organizations increasingly generate blog data. IP rights for RSS feeds is another issue. MS says we will see a bifurcation between bloggers that want to be public and bloggers who don’t. MS also talks about privacy issues in the future. Some bloggers say — “I don’t want to be in Google’s index”, but want to talk about it with my friends. MS says analysis engines have to worry about not having access to this data in the future. TP says privacy won’t be that important in the next generation, he says go look at “MySpace”, and says that tools should look at integrating blogging with social networking.
What do you need from researchers? CG gives a broad view, and talks about personalization in general. MS says there are 3 million active LJ users and there are many coommunities. MS points to recommending communities as very important from their perspective. AB talks about having better tools for relevancy. CR supports the community view. TP talks about employee group blog and how employees can make the entire organization more competitive. HK says there isn’t anything we cannot do with language technologies but almost all of them are not sufficiently accurate. So he suggests that researchers should work on making them more accurate.
Edit | Bookmark@del.icio.us | Trackback | No Comments »
March 15th, 2006, by Pranam Kolari, posted in Uncategorized
A question occured to me when going through Tim Berners-Lee’s blog. TimBL’s first post has 455 comments (at which point comments were turned off) and 200 citations/inlinks (make that 201!) per Technorati. I could not figure the top posts for any of Steve Rubel’s, Scoble’s or Om Malik’s blogs — tech blogs I regularly follow, and possible candidates for the top cited post of all time. So here’s my primary question:
What is the most discussed blog post of all time — comments, inlinks combined?
Understandably, blog search engines are still not capable of indexing and analyzing comments. With the current capabilities, listing the most cited blog posts of all time (on a single page) based on inlinks would surely be feasible, and interesting. Top posts aggregated over authors would be a great complement. Though some might argue that these features are not all that important for the blogosphere, it will still help in understanding what makes the best and most useful blog posts, similar to how we (graduate students!) in academics use top cited papers, both as an insipiration and a guide to work on topics that make a difference. BlogPulse seems to have something close — top blogs for the day — but nothing aggregating them over time or author. Any other answers?
Edit | Bookmark@del.icio.us | Trackback | No Comments »
February 15th, 2006, by Pranam Kolari, posted in Uncategorized
NYMetro has a report on Linkology on the Blogosphere. This is based on Technorati’s index of over 27 million blogs.

To discover how they [blogs] relate to one another, we’ve taken the most-linked-to 50 and mapped their connections. Each arrow represents a hypertext link that was made sometime in the past 90 days.
Edge colors reflect topic of the link. A cursory examination suggests that political blogs (in blue) show a relatively higher tendency of interlinking, which brings us to an interesting question – Are inlinks (only) a good way of ranking A-Listers?
Over a period of time these inlink based ranks are bound to bias the A-List (the way we define it now) to Political blogs. This also reflects in Technorati Top 100 viewed in the Wayback machine (November 26, 2002, December 5, 2003, November 30, 2004, April 1, 2005, today), as linked by Sifry’s State of the Blogosphere: Part 2. It appears as though David Sifry is right on the mark when it comes to mining the “Magic Middle”.
NOTE: Niall Kennedy has made higher resolutions of Linkology accesible.
Edit | Bookmark@del.icio.us | Trackback | No Comments »
January 13th, 2006, by Pranam Kolari, posted in Blogging, Technology, Technology Impact, Web
Ping-O-Matic, a great tool and arguably the most popular update ping service is currently down. Matt blogs about a complete revamp. Apparently their current system was accepting pings on just one box!. Technorati is helping them out.
Most of us don’t even bother to check which update ping services our blog software notifies automatically. Now, is this a good enough motivation to notify additional update ping services ? If yes, who is set to gain? Given the recent valuation of weblogs.com, a short downtime of Ping-O-Matic might well create another multi-million dollar asset.
Related:
Attention Wordpress users!!! from Nick Starr, Ping-o-Matic is offline from Jeff Smith, Pingomatic is gone from Alan Fraser.
Edit | Bookmark@del.icio.us | Trackback | No Comments »
December 15th, 2005, by Pranam Kolari, posted in Blogging, GENERAL, Machine Learning, Semantic Web, Technology, Web, memeta, splog
In the blogosphere, pings are notifications sent by updated blogs to PingServers. A major issue recently has been unjustified pings, also known as Spings, sent by Splogs. Splogs have been discussed a lot recently, including an interesting thread on post piracy that Steve Rubel initiated on Micropersuasion.
The problem of splogs prompted us to analyze pings from weblogs.com, which publishes hourly pings as changes.xml. We have been collecting these pings over the last 4 weeks for a total of 40 million pings from around 14 million (so claimed) blogs. To begin with, we applied a language identification technique implemented by James Mayfield to identify language by fetching these blogs. As expected most of the pings were from blogs authored in English. But we were able to identify blogs from many other languages as well. For instance, charts below show a distribution of pings from blogs authored in Italian — over a day and over a week. Each bar denotes the number of pings per hour.


All times are in GMT; clearly Italian authored blogs display a specific blogging pattern.
In the next step we used our work on splog detection to detect splogs (and hence spings) among the english blogs. Our detection mechanism is close to 90% accurate. As shown in the charts below pings from blogs average around 8K per hour and those from splogs average around 25K.


Clearly almost 3 out of 4 pings are spings! Going back further to the source of these spings, we observed that more than 50% of claimed blogs pinging weblogs.com are splogs.
Based on the interestingness of this preliminary statistics, scope for further analysis and interest in the resulting dataset we decided to continuosly monitor the pingosphere. So, we now do it “live” on updated blogs published by weblogs.com(delayed by an hour), and have made it publicly available at http://memeta.umbc.edu. The site lists blogging patterns for many other languages, and compares splogs with blogs. All of our work is part of a larger project memeta, towards analyzing the content and structure of the blogosphere.
We hope our effort is a good complement to existing services (e.g., FightSplog, SplogReporter and SplogSpot) towards combating splogs. We currently publish only simple ping statistics on this site, but do stay tuned for fresh splog and classified blog dumps and much more!
UPDATE: Matthew Hurst from BlogPulse points us to an interesting analysis he has done on a day of weblogs.com pings.
Edit | Bookmark@del.icio.us | Trackback | 26 Comments »
December 9th, 2005, by Pranam Kolari, posted in Semantic Web, Web
Discussion brewing, see tech.memeorandum. Yahoo wins the game, atleast for now. Flickr, and now del.icio.us! Details at ysearchblog, and from Joshua.
Edit | Bookmark@del.icio.us | Trackback | No Comments »
November 16th, 2005, by Pranam Kolari, posted in GENERAL, Semantic Web, Web
Google Base provides users the ability of bulk uploads. To enable this Google has defined some extensions to syndication feeds.
To facilitate the addition of more detailed information we extended RSS 1.0 by creating a module defined in a Google Base namespace. The namespace defines a list of attributes that can be used to increase the amount of information provided for an item in a bulk upload.
Information about all attributes is available here. Some interesting observations from their RSS 1.0 extension —
- Google has defined a new namespace (http://base.google.com/ns/1.0) to support these attributes. Are we seeing the first formal adoption of Semantic Web concepts (by Google) here?
- Google Base let’s users create new schemas (attributes). For instance, an example from Google Base shows how “language_skills” attribute can be added to a job opening description. I wonder how these new namespaces are ingested by Google Base?
[EDIT] More discussion on Google Base and Semantic Web by Danny, Shelly and many others(here and here)
Edit | Bookmark@del.icio.us | Trackback | 2 Comments »
|  |
|  |