Initial impressions: Android M permissions

May 29th, 2015

Google I/O 2015 was a very important day for privacy researchers. For the first time Google acknowledged a need for better privacy control. Researchers and Developers working with Android for sometime probably know that their was a feature called AppOps. This feature was introduced in Android 4.3 and later removed in 4.4.2. The reasons stated for its inclusion and removal have been discussed extensively. However, the only conclusion we could clearly draw from all the discussion was that there was a demand for such a feature. Our friends from over at Apple have repeatedly mentioned how Apple has always cared for User Privacy more than Google. As a result of this, it was only a matter of time and a pleasant development for Android enthusiasts to see this new feature in Android.

We installed the new Android M OS on a Nexus 5. The first thing we wanted to see was the permissions feature. Listed below are our impressions of what we thought of this new feature from a Privacy researcher’s perspective.

The feature is not easy to find
We had to weed through the settings of our phone and we were not able to find it straightaway. There was no menu item for Privacy. How do you access it then? You will have to click on the phone’s setting and then click on “Apps” and then select a particular app whose permission access you wish to control. Following this you will have to click on “Permissions” for that app. At this point you get the menu which allows you to toggle the permissions.

The Permission control is essentially useless till your Apps upgrade
Now, Google stated yesterday that the behavior of apps which do not upgrade to the new API version will remain the same as before. Therefore, even with this feature present you cannot actually stop an app from accessing the restricted data. What you do see is a warning dialog stating the obvious.

Warning message for apps using pre Android M SDK

Warning message for apps using pre Android M SDK

Not all permissions shows up in the list
The granularity of permissions that will be available in this new feature is still uncertain. If you check the Facebook permission list in the Google Play Store, you will see that it requests a lot of permissions.

Permissions description

Permissions description

Permissions description

Permissions description

Permissions description

Permissions description

Permissions description

Permissions description

But when you check out the permission control menu, you will see just a few of these permissions here.

App permissions list

App permissions list

We can assume that Google is grouping the permissions into logical groups. However, that means that the primary issue that a lot of researchers have raised about granular access control is still not being addressed by Google. We have been doing research with fine-grained permission control for sometime now. In our work, we have created a system that is capable of controlling the access to data on a mobile device based on the context of the user. Such an intelligent system would not only know what data to give access to but also when to do so. That goal still remains to be completely realized.

Obviously, we must not forget that Something is always better than nothing! Google is taking steps to improve the means by which it protects a user’s privacy and provides security. It is an iterative process and it’s still far from the goal. It is getting closer to that goal though.

Do not be a Gl***hole, use!

March 27th, 2014

If you are a Google Glass user, you might have been greeted with concerned looks or raised eyebrows at public places. There has been a lot of chatter in the “interweb” regarding the loss of privacy that results from people taking your pictures with Glass without notice. Google Glass has simplified photography but as what happens with revolutionary technology people are worried about the potential misuse.

FaceBlock helps to protect the privacy of people around you by allowing them to specify whether or not to be included in your pictures. This new application developed by the joint collaboration between researchers from the Ebiquity Research Group at University of Maryland, Baltimore County and Distributed Information Systems (DIS) at University of Zaragoza (Spain), selectively obscures the face of the people in pictures taken by Google Glass.

Comfort at the cost of Privacy?

As the saying goes, “The best camera is the one that’s with you”. Google Glass suits this description as it is always available and can take a picture with a simple voice command (“Okay Glass, take a picture”). This allows users to capture spontaneous life moments effortlessly. On the flip side, this raises significant privacy concerns as pictures can taken without one’s consent. If one does not use this device responsibly, one risks being labelled a “Glasshole”. Quite recently, a Google Glass user was assaulted by the patrons who objected against her wearing the device inside the bar. The list of establishments which has banned Google Glass within their premises is growing day by day. The dos and donts for Glass users released by Google is a good first step but it doesn’t solve the problem of privacy violation.


Privacy-Aware pictures to the rescue

FaceBlock takes regular pictures taken by your smartphone or Google Glass as input and converts it into privacy-aware pictures. This output is generated by using a combination of Face Detection and Face Recognition algorithms. By using FaceBlock, a user can take a picture of herself and specify her policy/rule regarding pictures taken by others (in this case ‘obscure my face in pictures from strangers’). The application would automatically generate a face identifier for this picture. The identifier is a mathematical representation of the image. To learn more about the working on FaceBlock, you should watch the following video.

Using Bluetooth, FaceBlock can automatically detect and share this policy with Glass users near by. After receiving this face identifier from a nearby user, the following post processing steps happen on Glass as shown in the images.


What promises does it hold?

FaceBlock is a proof of concept implementation of a system that can create privacy-aware pictures using smart devices. The pervasiveness of privacy-aware pictures could be a right step towards balancing privacy needs and comfort afforded by technology. Thus, we can get the best out of Wearable Technology without being oblivious about the privacy of those around you.

FaceBlock is part of the efforts of Ebiquity and SID in building systems for preserving user privacy on mobile devices. For more details, visit

Google MOOC: Making Sense of Data

February 26th, 2014

Google is offering a free, online MOOC style course on ‘Making Sense of Data‘ from March 18 to April 4 taught by Amit Deutsch (Google) and Joe Hellerstein (Berkeley).

Interestingly, it doesn’t require programming or database skills: “Basic familiarity with spreadsheets and comfort using a web browser is recommended. Knowledge of statistics and experience with programming are not required.” The course will use Google’s Fusion Tables service for managing and visualizing data

Freebase knowledge maps

January 1st, 2014

freebase knowledge map demo

Google has a very nice demonstration of web application that extracts information from Freebase and displays it on a Google map. It uses the Google Maps JavaScript API and the Freebase knowledge base to find entities and facts associated with places on a map. The source code is available on Github and is just a small amount of javascript and some css (via less) files.

“The app uses browser’s geolocation feature to find user’s location and displays a map of interesting objects that can be found nearby (within 50 000 ft). It uses the Freebase Search API to find relevant objects. When user clicks on one of the markers, the app calls the Freebase Topic API to fetch more information about that object. Once the information is retrieved, it populates a purejs template to display a knowledge card for the user.”

This sort of application has been done many times before with RDF and the Google approach can be adapted to query an arbitrary RDF resource for custom knowledge bases.

Google knowledge data releases

December 4th, 2013

A post on Google’s research blog lists the major datasets for NLP and KB processing that Google has released in the past year. They include datasets to help in entity linking, relation extraction, concept spotting and syntactic analysis. Subscribe to the the Knowledge Data Releases mailing list for updates.

Google Top Charts uses the Knowledge Graph for entity recognition and disambiguation

May 23rd, 2013

Top Charts is a new feature for Google Trends that identifies the popular searches within a category, i.e., books or actors. What’s interesting about it, from a technology standpoint, is that it uses Google’s Knowledge Graph to provide a universe of things and the categories into which they belong. This is a great example of “Things, not strings”, Google’s clever slogan to explain the importance of the Knowledge Graph.

Here’s how it’s explained in in the Trends Top Charts FAQ.

“Top Charts relies on technology from the Knowledge Graph to identify when search queries seem to be about particular real-world people, places and things. The Knowledge Graph enables our technology to connect searches with real-world entities and their attributes. For example, if you search for ice ice baby, you’re probably searching for information about the musician Vanilla Ice or his music. Whereas if you search for vanilla ice cream recipe, you’re probably looking for information about the tasty dessert. Top Charts builds on work we’ve done so our systems do a better job finding you the information you’re actually looking for, whether tasty desserts or musicians.”

One thing to note is that the Knowledge Graph, which is said to have more than 18 billion facts about 570 million objects, is that its objects include more than the traditional named entities (e.g., people, places, things). For example, there is a top chart for Animals that shows that dogs are the most popular animal in Google searches followed by cats (no surprises here) with chickens at number three on the list (could their high rank be due to recipe searches?). The dog object, in most knowledge representation schemes, would be modeled as a concept or class as opposed to an object or instance. In some representation systems, the same term (e.g., dog) can be used to refer to both a class of instances (a class that includes Lassie) and also to an instance (e.g., an instance of the class animal types). Which sense of the term dog is meant (class vs. instance) is determined by the context. In the semantic web representation language OWL 2, the ability to use the same term to refer to a class or a related instance is called punning.

Of course, when doing this kind of mapping of terms to objects, we only want to consider concepts that commonly have words or short phrases used to denote them. Not all concepts do, such as animals that from a long way off look like flies.

A second observation is that once you have a nice knowledge base like the Knowledge Graph, you have a new problem: how can you recognize mentions of its instances in text. In the DBpedia knowledge based (derived from Wikipedia) there are nine individuals named Michael Jordan and two of them were professional basketball players in the NBA. So, when you enter a search query like “When did Michael Jordan play for Penn”, we have to use information in the query, its context and what we know about the possible referents (e.g., those nine Michael Jordans) to decide (1) if this is likely to be a reference to any of the objects in our knowledge base, and (2) if so, to which one. This task, which is a fundamental one in language processing, is not trivial, but luckily, in applications like Top Charts, we don’t have to do it with perfect accuracy.

Google’s Top Charts is a simple, but effective, example that demonstrates the potential usefulness of semantic technology to make our information systems better in the near future.

Microsoft Bing updates its Satori knowledge base

March 22nd, 2013

A post in Micrsoft’s Bing blog, Understand Your World with Bing, announced that an update to their Satori knowledge base allows Bing to do a better job of identifying queries that mention a known entity, i.e., a person, place of organization. Bing’s use of Satori parallels the efforts of Google and Facebook in developing graph-based knowledge bases to move from “strings” to “things”.

Microsoft is using data from Satori to provide “snapshots” with data about an entity when it detects a likely mention of it in a query. This is very similar to how Google is using its Knowledge Graph KB.

One interesting thing that Satori is now doing is importing data from LinkedIn — data that neither Google’s Knowledge Graph nor Facebook’s Open Graph has. Another difference is that Satori uses RDF as its native model, or at least appears to, based on this description from 2012.

See recent posts in Techcrunch and Search Engine Land for more information.

Google Reader, we hardly knew ye

March 13th, 2013

We felt a great disturbance in the Force, as if millions of voices suddenly cried out in terror and were about to be silenced. We fear something terrible is about to happen. We went access the blogs we follow on Google Reader and found this.

Powering Down Google Reader
3/13/2013 04:06:00 PM

Posted by Alan Green, Software Engineer

We have just announced on the Official Google Blog that we will soon retire Google Reader (the actual date is July 1, 2013). We know Reader has a devoted following who will be very sad to see it go. We’re sad too.

There are two simple reasons for this: usage of Google Reader has declined, and as a company we’re pouring all of our energy into fewer products. We think that kind of focus will make for a better user experience.

To ensure a smooth transition, we’re providing a three-month sunset period so you have sufficient time to find an alternative feed-reading solution. If you want to retain your Reader data, including subscriptions, you can do so through Google Takeout.

Thank you again for using Reader as your RSS platform.
Labels: reader, sunset

Where is old Bloglines now that we need him again? We should not have been so disloyal.

Google Wikilinks corpus

March 8th, 2013

Google released the Wikilinks Corpus, a collection of 40M disambiguated mentions from 10M web pages to 3M Wikipedia pages. This data can be used to train systems that do entity linking and cross-document co-reference, problems that Google researchers attacked with an earlier version of this data (see Large-Scale Cross-Document Coreference Using Distributed Inference and Hierarchical Models).

You can download the data as ten 175MB files from and some addional tools from UMASS.

This is yet another example of the important role that Wikipedia continues to play in building a common, machine useable semantic substrate for human conceptualizations.

Google Sets Live

March 7th, 2013

Google Sets was a the result of a early Google research project that ended in 2011. The idea was to be able to recognize the similarity of a set of terms (e.g., python, lisp and fortran) and automatically identify other similar terms (e.g., c, java, php). Suprisingly (to me) the results of the project live on as an undocumented feature in Google Doc spreadsheets. Try putting a few of the seven deadly sins into a Google spreadsheet and use the feature to see what else you should not do (e.g., creating fire with alchemy, I guess).

Google, of course, continues to work on expanding their use of semantic information, currently through efforts like the Google Knowledge Graph, Freebase, Microdata and Fusion Tables. Other companies, including Mcrosoft, IBM and a host of startups, are also hard at work on similar projects.

Entity Disambiguation in Google Auto-complete

September 23rd, 2012

Google has added an “entity disambiguation” feature along with auto-complete when you type in your search query. For example, when I search for George Bush, I get the following additional information in auto-complete.

As you can see, Google is able to identify that there are two George Bushes’ — the 41st and the 43rd President and accordingly makes a suggestion to the user to select the appropriate president. Similarly, if you search for Johns Hopkins, you get suggestions for John Hopkins – the University, the Entrepreneur and the Hospital.  In the case of the Hopkins query, its the same entity name but with different types and thus Google appends different entity types along with the entity name.

However, searching for Michael Jordan produces no entity disambiguation. If you are looking for Michael Jordan, the UC Berkeley professor, you will have to search for “Michael I Jordan“. Other examples that Google is not handling right now include queries such as apple — {fruit, company}, jaguar {animal, car}.  It seems to me that Google is only including disambiguation between popular entities in its auto-complete. While there are six different George Bushes’ and ten different Michael Jordans‘ on Wikipedia, Google includes only two and none respectively when it disambiguates George Bush and Michael Jordan.

Google talked about using its knowledge graph to produce this information.  One can envision the knowledge graph maintaining, a unique identity for each entity in its collection, which will allow it to disambiguate entities with similar names (in the Semantic Web world, we call it as assigning a unique uri to each unique thing or entity). With the Hopkins query, we can also see that the knowledge graph is maintaining entity type information along with each entity (e.g. Person, City, University, Sports Team etc).  While folks at Google have tried to steer clear of the Semantic Web, one can draw parallels between the underlying principles on the Semantic Web and the ones used in constructing the Google knowledge graph.

Google releases dataset linking strings and concepts

May 19th, 2012

Yesterday Google announced a very interesting resource with 175M short, unique text strings that were used to refer to one of 7.6M Wikipedia articles. This should be very useful for research on information extraction from text.

“We consider each individual Wikipedia article as representing a concept (an entity or an idea), identified by its URL. Text strings that refer to concepts were collected using the publicly available hypertext of anchors (the text you click on in a web link) that point to each Wikipedia page, thus drawing on the vast link structure of the web. For every English article we harvested the strings associated with its incoming hyperlinks from the rest of Wikipedia, the greater web, and also anchors of parallel, non-English Wikipedia pages. Our dictionaries are cross-lingual, and any concept deemed too fine can be broadened to a desired level of generality using Wikipedia’s groupings of articles into hierarchical categories.

The data set contains triples, each consisting of (i) text, a short, raw natural language string; (ii) url, a related concept, represented by an English Wikipedia article’s canonical location; and (iii) count, an integer indicating the number of times text has been observed connected with the concept’s url. Our database thus includes weights that measure degrees of association.”

The details of the data and how it was constructed are in an LREC 2012 paper by Valentin Spitkovsky and Angel Chang, A Cross-Lingual Dictionary for English Wikipedia Concepts. Get the data here.