UMBC ebiquity research group Building intelligent systems in open, heterogeneous, dynamic, distributed environments
30 August 2008, 01:38:16 EDT  
Blogging

Archive for the 'Blogging' Category

Google Scholar, it’s a good thing

January 25th, 2006, by Tim Finin, posted in Blogging, Ebiquity, GENERAL, Semantic Web, Web

Google Scholar, it’s a good thing, as Martha Stewart would say.

We recently added a feature to our ebiquity paper repository that ties papers to their Google Scholar entries. The main motivation was to allow us to track citations.

As I’ve worked through our papers to verify and add their Google Scholar keys, other benefits are becoming apparent. In several cases I’ve discovered errors or omissions in our own meta data. Sometimes our own entries have had the title wrong! In other cases, I’ve found several Google Scholar entries for the same paper. Sometimes this is due to an error by the author of a citing paper, which can propagate.

I suspect that some of the errors originate with us. Here’s one scenario. When a paper is accepted for publication, the author is happy and excited and adds an entry in our database, along with softcopy of the draft. People download and read the draft and, if it’s good, start citing it. Months later the ultimate copy, which may have a different title and even a different author list, is finalized. Ideally, our site is edited to reflect the final metadata and final softcopy. But, sometimes this doesn’t happen or the final softcopy is not uploaded for copyright reasons. In any case, the old, and possibly incorrect metadata and draft may have escaped to roam the Internet.

Lately I’ve started to add a header to draft copies of papers posted to our side that states that they are drafts and also where the final version will appear. I’ve found Acrobat’s ability to add a header to an existing pdf file to be very handy for this. I’ve also used Acrobat to extract the first page of an article for which we don’t hold the copyright, add a header pointing to it’s source, and post that on our site (as in this example.)

Finally, one of the ideas that underlies the current Semantic Web vision is that it’s very useful for things on the web to have good identifiers. The Uniform Resource Identifier (URI) is the Semantic Web’s favorite identifier, but we all recognize that just using URIs is to simple for many objects (e.g., people). OWL’s contribution to this is the notion of an inverse functional property. If my ontology defines SSN as an inverse functional property, then two objects that share the same SSN must be the same. So, along these lines, the googleScholarKey property should be inverse-functional and have domain=publication and range=string.

Splogs, like spam, will be with us for a while

January 24th, 2006, by Tim Finin, posted in Blogging, GENERAL, Web, splog

Two years ago Bill Gates predicted that the spam problem would be solved by now, as this article in The Register reports.

Hey Bill, why am I still getting spam?
Junk mail outlives MS mortality prediction
By John Leyden, 24 January 2006

Two years ago today Bill Gates predicted that spam email would be eradicated as a problem within 24 months. The Microsoft chairman predicted the death of spam in a speech at the World Economic Forum on 24 February 2004.

Gates outlined a three-stage plan to eradicate spam within two years. Microsoft’s scheme calls for better filters to weed out spam messages and sender authentication via a form of challenge-response system. Secondly, Microsoft wants to see to a form of tar-pitting so that emails coming from unknown senders are slowed down to a point where bulk mail runs become impractical.

Lastly, and most promisingly as far as Gates is concerned, is a digital equivalent of stamps for email, to be paid out only if the recipient considers an email to be spam. Blocking spam email would appear to be a simple problem but in practice is far trickier than Gates, or indeed the industry, first thought.

It’s tempting to think that we are close to being able to solve the splog identification problem, which enable blog search engines to weed the slogs out of their indices. But, I’ll bet that splogs will be with us for a long time, as is the case with spam. Of course, we do have to work hard to keep them under control, just as we do with spam. If we don’t, the blogosphere will be quickly overrun and its promise squandered.

UMBC blog research on splogs in Baltimore Sun

January 17th, 2006, by Tim Finin, posted in Blogging, Ebiquity, GENERAL, Web, splog

Baltimore Sun’s Troy McCullough talks about Pranam Kolari’s work on detecting splogs in his column on Sunday, 15 January 2006. The column also has an associated podcast.

Fighting spam sites - latest battle in the blog wars
On Blogs: Troy McCullough, Jan 15, 2006

It seems that everyone has a blog these days - a spot that others can visit to find out what they have to say about something or nothing in particular. Some blogs are widely valued fonts of specialized wisdom, but many are viewed as uninteresting expressions of personal ego. The difficulty of sorting the good blogs from the bad can be a frustrating challenge - one that is seen as a serious threat to what has been viewed as a vital feature of the Internet.

Now, three University of Maryland, Baltimore County researchers have made a far more disturbing conclusion about blogs. After analyzing millions of blog posts, they have determined that the blogosphere is drowning in spam, the pejorative nickname given to unsolicited Internet advertising. Using data collected by weblogs.com, a prominent blog tracking service, doctoral student Pranam Kolari and professors Tim Finin and Anupam Joshi analyzed 40 million blog updates submitted from 14 million blogs.

Ping-O-Matic temporarily down

January 13th, 2006, by Pranam Kolari, posted in Blogging, Technology, Technology Impact, Web

Ping-O-Matic, a great tool and arguably the most popular update ping service is currently down. Matt blogs about a complete revamp. Apparently their current system was accepting pings on just one box!. Technorati is helping them out.

Most of us don’t even bother to check which update ping services our blog software notifies automatically. Now, is this a good enough motivation to notify additional update ping services ? If yes, who is set to gain? Given the recent valuation of weblogs.com, a short downtime of Ping-O-Matic might well create another multi-million dollar asset.

Related:
Attention Wordpress users!!! from Nick Starr, Ping-o-Matic is offline from Jeff Smith, Pingomatic is gone from Alan Fraser.

Bloggers to MSM: outta mySpace

January 10th, 2006, by Tim Finin, posted in Blogging, GENERAL, Web

The infrastructure to set up online communities is not all that complicated — the member base is the real asset. I’m not sure that MSM companies will know how to manage them, as the following article suggests.

Get out of MySpace, bloggers rage at Murdoch
Nicholas Wapshott, The Independent, 08 Jan 2006

Angry members of MySpace, the personal file-sharing website for young adults, are accusing Rupert Murdoch’s News Corporation of censoring their postings and blocking their access to rival sites. The 38 million subscribers to MySpace, which News Corp bought for $629m (£355m) last July, discovered that when they wrote to each other about rival video-swapping site YouTube, the words were automatically deleted, and attempts to download video images from YouTube led to blank screens.
…
The protests gathered pace, and when 600 MySpace customers complained and a campaign began to boycott the site and relocate to rival sites such as Friendster, Linkedin, revver.com and Facebook.com, News Corp relented and restored the links. However, MySpace managers promptly shut down the blog forum on which members had complained about the interference. An online notice said the problem was the result of “a simple misunderstanding”.

Talk Digger tracks blog conversations

December 24th, 2005, by Tim Finin, posted in Blogging, Semantic Web, Web

Talk Digger “helps users to find, follow and join conversations evolving on the Internet.” Put in a URL and it queries nine different search engines to find blog posts and othe pages that reference the page. It’s a nicely done, useful web application developed by by Frédérick Giasson. But here’s the most interesting part, at least from my point of view:

“What is the future of Talk Digger? It will evolve as a web service that will broadcast its results in RDF or even OWL. It would be the first step to do to enter Talk Digger into the Semantic Web age. Talk Digger will eventually have some sort of semantic analyzing capabilities and being able to use them on found conversations to clarify and optimize the returned results. It will then be able to semantically analyze conversations (I will not say more than that for the moment) and make the results of these computations available and understandable by other RDF/OWL reasonners and web applications. So, what is the future of Talk Digger? I hope it is the Semantic Web.”

Why no ones goes to your web site

December 16th, 2005, by Tim Finin, posted in Blogging, Humor, Semantic Web, Web

Why no ones goes to your web site

Welcome to the Splogosphere: 75% of new pings are spings (splogs)

December 15th, 2005, by Pranam Kolari, posted in Blogging, GENERAL, Machine Learning, Semantic Web, Technology, Web, memeta, splog

In the blogosphere, pings are notifications sent by updated blogs to PingServers. A major issue recently has been unjustified pings, also known as Spings, sent by Splogs. Splogs have been discussed a lot recently, including an interesting thread on post piracy that Steve Rubel initiated on Micropersuasion.

The problem of splogs prompted us to analyze pings from weblogs.com, which publishes hourly pings as changes.xml. We have been collecting these pings over the last 4 weeks for a total of 40 million pings from around 14 million (so claimed) blogs. To begin with, we applied a language identification technique implemented by James Mayfield to identify language by fetching these blogs. As expected most of the pings were from blogs authored in English. But we were able to identify blogs from many other languages as well. For instance, charts below show a distribution of pings from blogs authored in Italian — over a day and over a week. Each bar denotes the number of pings per hour.


Pings over a day
Pings over 8 days

All times are in GMT; clearly Italian authored blogs display a specific blogging pattern.

In the next step we used our work on splog detection to detect splogs (and hence spings) among the english blogs. Our detection mechanism is close to 90% accurate. As shown in the charts below pings from blogs average around 8K per hour and those from splogs average around 25K.


Blog Pings
Splog Pings

Clearly almost 3 out of 4 pings are spings! Going back further to the source of these spings, we observed that more than 50% of claimed blogs pinging weblogs.com are splogs.

Based on the interestingness of this preliminary statistics, scope for further analysis and interest in the resulting dataset we decided to continuosly monitor the pingosphere. So, we now do it “live” on updated blogs published by weblogs.com(delayed by an hour), and have made it publicly available at http://memeta.umbc.edu. The site lists blogging patterns for many other languages, and compares splogs with blogs. All of our work is part of a larger project memeta, towards analyzing the content and structure of the blogosphere.

We hope our effort is a good complement to existing services (e.g., FightSplog, SplogReporter and SplogSpot) towards combating splogs. We currently publish only simple ping statistics on this site, but do stay tuned for fresh splog and classified blog dumps and much more!

UPDATE: Matthew Hurst from BlogPulse points us to an interesting analysis he has done on a day of weblogs.com pings.

The XKCD data died in a blogging accident

December 8th, 2005, by li ding, posted in Blogging, Google, Social media, Web, Web 2.0

The popular XKCD had another Web related comic yesterday, but it trned out to be self-negating.

Dangers

As was noted on Slashdot:

“As I noted yesterday (and was joined by many others)… in an offhand observation xkcd has singlehandedly changed a small section of the Internet. Changing the results from a Google search for “Died in a Blogging Accident” from 2 to (at this writing) over 7,170 in a little more than 24 hours.”

The number of results are now up to 13.3K. I guess something like the Heisenberg uncertainty principle applies to the Internet, too.

Was this the world’s first Blog post?

December 6th, 2005, by Tim Finin, posted in Blogging, GENERAL, Humor, Semantic Web, Web

We’ve been trying to identify the first example of a blog post and gradually winding our way further into the past. Of course, it’s a bit of a judgement call, since the blog stereotype is constantly evolving. That, said, here’s ourcurrent best candidate for the world’s first blog post and what metadata we could figure out.

FieldMarking: creating the global human sensor net

November 17th, 2005, by Cyndy, posted in Blogging, GENERAL, Semantic Web, Social, Web

We’ve been conducting a pilot study at http://fieldmarking.reger.com/ towards creating a Global Human Sensor Net: people all over the world collaboratively reporting, tagging, and thus exchanging information about their observations of the natural world. Such information is already piling up in casual text in blogs and discussion forums, but it is not very accessible to scientists there.

A variety of efforts are underway to address this general problem of how to share unstructured information: simple tagging, microformats, datablogging, structured blogging, and semantic web browsers.

The FieldMarking concept is to let people freely report what they see in unstructured text, but to provide them with appropriate data fields to structure or annotate their own — or somebody else’s — observations. To use text scrapers and existing ontologies to provide suggestions for appropriate markup. To publish the structured data in RDF so it can be intelligently retrieved and aggregated so that scientists can be alerted, for example, to invasive species or emerging diseases. Interactive graphing tools would allow both citizens and scientists to visually mine the data.

FieldMarking combines observation in the “field” with the idea of filling out data “fields” or creating semantic “markup.”

The current prototype, FieldMarking, uses the datablogging technology at Reger.com. Thus we can take advantage of RSS syndication, mobile posting, and graphable data fields from shared templates. Datablogging also does not require any special plug-ins to be installed by users. Our testing suggests that, in addition to some bugginess in the Reger.com software, this approach has some limitations. We need to be able to apply multiple data records to a text entry, because it often makes sense to report many observations or many kinds of observations in one paragraph. Also, we need to allow data records from other users who may dispute the original markup. Customized log types can be shared with other users of reger.com, but we’ll want to more broadly distribute across multiple platforms.

All the same, the potential is enormous and we will continue to gather pilot data on the kinds of biological information available in these unstructured data sources, the willingness of people to structure it, and the technologies that will make it possible.

Birds blog, bees blog, even educated dogs blog, …

November 2nd, 2005, by Tim Finin, posted in Blogging, Humor, Semantic Web, Web

You are currently browsing the archives for the Blogging category.

  Home | Archive | Login | Feed





UMBC