UMBC ebiquity
Google

Archive for the 'Google' Category

Do not be a Gl***hole, use Face-Block.me!

March 27th, 2014, by Prajit Kumar Das, posted in Ebiquity, Google, Mobile Computing, Policy, Semantic Web, Social, Wearable Computing

If you are a Google Glass user, you might have been greeted with concerned looks or raised eyebrows at public places. There has been a lot of chatter in the “interweb” regarding the loss of privacy that results from people taking your pictures with Glass without notice. Google Glass has simplified photography but as what happens with revolutionary technology people are worried about the potential misuse.

FaceBlock helps to protect the privacy of people around you by allowing them to specify whether or not to be included in your pictures. This new application developed by the joint collaboration between researchers from the Ebiquity Research Group at University of Maryland, Baltimore County and Distributed Information Systems (DIS) at University of Zaragoza (Spain), selectively obscures the face of the people in pictures taken by Google Glass.

Comfort at the cost of Privacy?

As the saying goes, “The best camera is the one that’s with you”. Google Glass suits this description as it is always available and can take a picture with a simple voice command (“Okay Glass, take a picture”). This allows users to capture spontaneous life moments effortlessly. On the flip side, this raises significant privacy concerns as pictures can taken without one’s consent. If one does not use this device responsibly, one risks being labelled a “Glasshole”. Quite recently, a Google Glass user was assaulted by the patrons who objected against her wearing the device inside the bar. The list of establishments which has banned Google Glass within their premises is growing day by day. The dos and donts for Glass users released by Google is a good first step but it doesn’t solve the problem of privacy violation.

FaceBlock_Image_Google_Glass

Privacy-Aware pictures to the rescue

FaceBlock takes regular pictures taken by your smartphone or Google Glass as input and converts it into privacy-aware pictures. This output is generated by using a combination of Face Detection and Face Recognition algorithms. By using FaceBlock, a user can take a picture of herself and specify her policy/rule regarding pictures taken by others (in this case ‘obscure my face in pictures from strangers’). The application would automatically generate a face identifier for this picture. The identifier is a mathematical representation of the image. To learn more about the working on FaceBlock, you should watch the following video.

Using Bluetooth, FaceBlock can automatically detect and share this policy with Glass users near by. After receiving this face identifier from a nearby user, the following post processing steps happen on Glass as shown in the images.

FaceBlock_Image_Eigen_UncheckFaceBlock_Image_Eigen_CheckFaceBlock_Image_Blur

What promises does it hold?

FaceBlock is a proof of concept implementation of a system that can create privacy-aware pictures using smart devices. The pervasiveness of privacy-aware pictures could be a right step towards balancing privacy needs and comfort afforded by technology. Thus, we can get the best out of Wearable Technology without being oblivious about the privacy of those around you.

FaceBlock is part of the efforts of Ebiquity and SID in building systems for preserving user privacy on mobile devices. For more details, visit http://face-block.me

Google MOOC: Making Sense of Data

February 26th, 2014, by Tim Finin, posted in Big data, Google

Google is offering a free, online MOOC style course on ‘Making Sense of Data‘ from March 18 to April 4 taught by Amit Deutsch (Google) and Joe Hellerstein (Berkeley).

Interestingly, it doesn’t require programming or database skills: “Basic familiarity with spreadsheets and comfort using a web browser is recommended. Knowledge of statistics and experience with programming are not required.” The course will use Google’s Fusion Tables service for managing and visualizing data

Freebase knowledge maps

January 1st, 2014, by Tim Finin, posted in Google, Semantic Web

freebase knowledge map demo

Google has a very nice demonstration of web application that extracts information from Freebase and displays it on a Google map. It uses the Google Maps JavaScript API and the Freebase knowledge base to find entities and facts associated with places on a map. The source code is available on Github and is just a small amount of javascript and some css (via less) files.

“The app uses browser’s geolocation feature to find user’s location and displays a map of interesting objects that can be found nearby (within 50 000 ft). It uses the Freebase Search API to find relevant objects. When user clicks on one of the markers, the app calls the Freebase Topic API to fetch more information about that object. Once the information is retrieved, it populates a purejs template to display a knowledge card for the user.”

This sort of application has been done many times before with RDF and the Google approach can be adapted to query an arbitrary RDF resource for custom knowledge bases.

Google knowledge data releases

December 4th, 2013, by Tim Finin, posted in Google, Machine Learning, NLP

A post on Google’s research blog lists the major datasets for NLP and KB processing that Google has released in the past year. They include datasets to help in entity linking, relation extraction, concept spotting and syntactic analysis. Subscribe to the the Knowledge Data Releases mailing list for updates.

Google Top Charts uses the Knowledge Graph for entity recognition and disambiguation

May 23rd, 2013, by Tim Finin, posted in AI, Google, KR, NLP, OWL, Semantic Web

Top Charts is a new feature for Google Trends that identifies the popular searches within a category, i.e., books or actors. What’s interesting about it, from a technology standpoint, is that it uses Google’s Knowledge Graph to provide a universe of things and the categories into which they belong. This is a great example of “Things, not strings”, Google’s clever slogan to explain the importance of the Knowledge Graph.

Here’s how it’s explained in in the Trends Top Charts FAQ.

“Top Charts relies on technology from the Knowledge Graph to identify when search queries seem to be about particular real-world people, places and things. The Knowledge Graph enables our technology to connect searches with real-world entities and their attributes. For example, if you search for ice ice baby, you’re probably searching for information about the musician Vanilla Ice or his music. Whereas if you search for vanilla ice cream recipe, you’re probably looking for information about the tasty dessert. Top Charts builds on work we’ve done so our systems do a better job finding you the information you’re actually looking for, whether tasty desserts or musicians.”

One thing to note is that the Knowledge Graph, which is said to have more than 18 billion facts about 570 million objects, is that its objects include more than the traditional named entities (e.g., people, places, things). For example, there is a top chart for Animals that shows that dogs are the most popular animal in Google searches followed by cats (no surprises here) with chickens at number three on the list (could their high rank be due to recipe searches?). The dog object, in most knowledge representation schemes, would be modeled as a concept or class as opposed to an object or instance. In some representation systems, the same term (e.g., dog) can be used to refer to both a class of instances (a class that includes Lassie) and also to an instance (e.g., an instance of the class animal types). Which sense of the term dog is meant (class vs. instance) is determined by the context. In the semantic web representation language OWL 2, the ability to use the same term to refer to a class or a related instance is called punning.

Of course, when doing this kind of mapping of terms to objects, we only want to consider concepts that commonly have words or short phrases used to denote them. Not all concepts do, such as animals that from a long way off look like flies.

A second observation is that once you have a nice knowledge base like the Knowledge Graph, you have a new problem: how can you recognize mentions of its instances in text. In the DBpedia knowledge based (derived from Wikipedia) there are nine individuals named Michael Jordan and two of them were professional basketball players in the NBA. So, when you enter a search query like “When did Michael Jordan play for Penn”, we have to use information in the query, its context and what we know about the possible referents (e.g., those nine Michael Jordans) to decide (1) if this is likely to be a reference to any of the objects in our knowledge base, and (2) if so, to which one. This task, which is a fundamental one in language processing, is not trivial, but luckily, in applications like Top Charts, we don’t have to do it with perfect accuracy.

Google’s Top Charts is a simple, but effective, example that demonstrates the potential usefulness of semantic technology to make our information systems better in the near future.

Microsoft Bing updates its Satori knowledge base

March 22nd, 2013, by Tim Finin, posted in Google, Microsoft, sEARCH, Semantic Web

A post in Micrsoft’s Bing blog, Understand Your World with Bing, announced that an update to their Satori knowledge base allows Bing to do a better job of identifying queries that mention a known entity, i.e., a person, place of organization. Bing’s use of Satori parallels the efforts of Google and Facebook in developing graph-based knowledge bases to move from “strings” to “things”.

Microsoft is using data from Satori to provide “snapshots” with data about an entity when it detects a likely mention of it in a query. This is very similar to how Google is using its Knowledge Graph KB.

One interesting thing that Satori is now doing is importing data from LinkedIn — data that neither Google’s Knowledge Graph nor Facebook’s Open Graph has. Another difference is that Satori uses RDF as its native model, or at least appears to, based on this description from 2012.

See recent posts in Techcrunch and Search Engine Land for more information.

Google Reader, we hardly knew ye

March 13th, 2013, by Tim Finin, posted in Google, Social media

We felt a great disturbance in the Force, as if millions of voices suddenly cried out in terror and were about to be silenced. We fear something terrible is about to happen. We went access the blogs we follow on Google Reader and found this.

Powering Down Google Reader
3/13/2013 04:06:00 PM

Posted by Alan Green, Software Engineer

We have just announced on the Official Google Blog that we will soon retire Google Reader (the actual date is July 1, 2013). We know Reader has a devoted following who will be very sad to see it go. We’re sad too.

There are two simple reasons for this: usage of Google Reader has declined, and as a company we’re pouring all of our energy into fewer products. We think that kind of focus will make for a better user experience.

To ensure a smooth transition, we’re providing a three-month sunset period so you have sufficient time to find an alternative feed-reading solution. If you want to retain your Reader data, including subscriptions, you can do so through Google Takeout.

Thank you again for using Reader as your RSS platform.
Labels: reader, sunset

Where is old Bloglines now that we need him again? We should not have been so disloyal.

Google Wikilinks corpus

March 8th, 2013, by Tim Finin, posted in Google, Machine Learning, NLP, Semantic Web

Google released the Wikilinks Corpus, a collection of 40M disambiguated mentions from 10M web pages to 3M Wikipedia pages. This data can be used to train systems that do entity linking and cross-document co-reference, problems that Google researchers attacked with an earlier version of this data (see Large-Scale Cross-Document Coreference Using Distributed Inference and Hierarchical Models).

You can download the data as ten 175MB files from and some addional tools from UMASS.

This is yet another example of the important role that Wikipedia continues to play in building a common, machine useable semantic substrate for human conceptualizations.

Google Sets Live

March 7th, 2013, by Tim Finin, posted in AI, Google, Machine Learning, Semantic Web

Google Sets was a the result of a early Google research project that ended in 2011. The idea was to be able to recognize the similarity of a set of terms (e.g., python, lisp and fortran) and automatically identify other similar terms (e.g., c, java, php). Suprisingly (to me) the results of the project live on as an undocumented feature in Google Doc spreadsheets. Try putting a few of the seven deadly sins into a Google spreadsheet and use the feature to see what else you should not do (e.g., creating fire with alchemy, I guess).

Google, of course, continues to work on expanding their use of semantic information, currently through efforts like the Google Knowledge Graph, Freebase, Microdata and Fusion Tables. Other companies, including Mcrosoft, IBM and a host of startups, are also hard at work on similar projects.

Entity Disambiguation in Google Auto-complete

September 23rd, 2012, by Varish Mulwad, posted in AI, Google, KR, Ontologies, Semantic Web

Google has added an “entity disambiguation” feature along with auto-complete when you type in your search query. For example, when I search for George Bush, I get the following additional information in auto-complete.

As you can see, Google is able to identify that there are two George Bushes’ — the 41st and the 43rd President and accordingly makes a suggestion to the user to select the appropriate president. Similarly, if you search for Johns Hopkins, you get suggestions for John Hopkins – the University, the Entrepreneur and the Hospital.  In the case of the Hopkins query, its the same entity name but with different types and thus Google appends different entity types along with the entity name.

However, searching for Michael Jordan produces no entity disambiguation. If you are looking for Michael Jordan, the UC Berkeley professor, you will have to search for “Michael I Jordan“. Other examples that Google is not handling right now include queries such as apple — {fruit, company}, jaguar {animal, car}.  It seems to me that Google is only including disambiguation between popular entities in its auto-complete. While there are six different George Bushes’ and ten different Michael Jordans‘ on Wikipedia, Google includes only two and none respectively when it disambiguates George Bush and Michael Jordan.

Google talked about using its knowledge graph to produce this information.  One can envision the knowledge graph maintaining, a unique identity for each entity in its collection, which will allow it to disambiguate entities with similar names (in the Semantic Web world, we call it as assigning a unique uri to each unique thing or entity). With the Hopkins query, we can also see that the knowledge graph is maintaining entity type information along with each entity (e.g. Person, City, University, Sports Team etc).  While folks at Google have tried to steer clear of the Semantic Web, one can draw parallels between the underlying principles on the Semantic Web and the ones used in constructing the Google knowledge graph.

Google releases dataset linking strings and concepts

May 19th, 2012, by Tim Finin, posted in AI, Google, KR, NLP, Ontologies, Semantic Web, Wikipedia

Yesterday Google announced a very interesting resource with 175M short, unique text strings that were used to refer to one of 7.6M Wikipedia articles. This should be very useful for research on information extraction from text.

“We consider each individual Wikipedia article as representing a concept (an entity or an idea), identified by its URL. Text strings that refer to concepts were collected using the publicly available hypertext of anchors (the text you click on in a web link) that point to each Wikipedia page, thus drawing on the vast link structure of the web. For every English article we harvested the strings associated with its incoming hyperlinks from the rest of Wikipedia, the greater web, and also anchors of parallel, non-English Wikipedia pages. Our dictionaries are cross-lingual, and any concept deemed too fine can be broadened to a desired level of generality using Wikipedia’s groupings of articles into hierarchical categories.

The data set contains triples, each consisting of (i) text, a short, raw natural language string; (ii) url, a related concept, represented by an English Wikipedia article’s canonical location; and (iii) count, an integer indicating the number of times text has been observed connected with the concept’s url. Our database thus includes weights that measure degrees of association.”

The details of the data and how it was constructed are in an LREC 2012 paper by Valentin Spitkovsky and Angel Chang, A Cross-Lingual Dictionary for English Wikipedia Concepts. Get the data here.

Google Knowledge Graph: first impressions

May 19th, 2012, by Tim Finin, posted in AI, Google, KR, NLP, Ontologies, Semantic Web

The Google’s Knowledge Graph showed up for me this morning — it’s been slowly rolling out since the announcement on Wednesday. It builds lots of research from human language technology (e.g., entity recognition and linking) and the semantic web (graphs of linked data). The slogan, “things not strings”, is brilliant and easily understood.

My first impression is that it’s fast, useful and a great accomplishment but leaves lots of room for improvement and expansion. That last bit is a good thing, at least for those of us in the R&D community. Here are some comments based on some initial experimentation.

GKG only works on searches that are simple entity mentions like people, places, organizations. It doesn’t do products (Toyota Camray), events (World War II), or diseases (diabetes) but does recognize that ‘Mercury’ could be a planet or an element.

It’s a bit aggressive about linking: when searching for “John Smith” it zeros in on the 17th century English explorer. Poor Professor Michael Jordan never get a chance, and providing context by adding Berkeley just suppresses the GKG sidebar. “Mitt” goes right to you know who. “George Bush” does lead to a disambiguation sidebar, though. Given that GKG doesn’t seem to allow for context information, the only disambiguating evidence it has is popularity (i.e., pagerank).

Speaking of context, the GKG results seem not to draw on user-specific information, like my location or past search history. When I search for “Columbia” from my location here in Maryland, it suggests “Columbia University” and “Columbia, South Carolina” and not “Columbia, Maryland” which is just five miles away from me.

Places include not just GPEs (geo-political entities) but also locations (Mars, Patapsco river) and facilities (MOMA, empire state building). To the GKG, the White House is just a place.

Organizations seem like a weak spot. It recognizes schools (UCLA) but company mentions seem not to be directly handled, not even for “Google”. A search for “NBA” suggests three “people associated with NBA” and “National Basketball Association” is not recognized. Forget finding out about the Cult of the Dead Cow.

Mike Bergman has some insights based on his exploration of the GKG in Deconstructing the Google Knowledge Graph

The use of structured and semi-structure knowledge in search is an exciting area. I expect we will see much more of this showing up in search engines, including Bing.

You are currently browsing the archives for the Google category.

  Home | Archive | Login | Feed