Watching third party watchers with Spy Watch

December 27th, 2013

svs

At the ISWC Privon workshop in October, Neel Guha talked about his Spy Watch Google Chrome extension that keeps track of the third parties tracking the web pages you visit. Unlike Ghostery, it only collects information and can not block tracking sites, but it logs more information about how your Web behavior is being observed and gives good insight into the nature and scope of the Web tracking phenomenon.

list

When you view a page like www.nytimes.com you expect it to know that you visited the site. It may even know personal information (e.g., name, address, age, sex) if ever divulged it to the site, perhaps when setting up an account. Spy Watch reports that my recent visit to the NYT site was also observed by 24 other sites, including doubleclick.com, brightcove.com, googleapis.com and sothebysrealty.com. And this is with an ad blocker enabled — 28 third parties observed me when I disable it.

Each of these third parties also knows the page on the NYT site I just visited. But I don’t have an account on most of them, so they don’t know who I really am, right? Well, some can easily discover my identity. Doubleclick, for example, knows I just read that Times article on how to cook a duck and, since it’s part of Google, can potentially integrate the information with all of the other information Google has about me.

Not all of the third party sites identified by Spy Watch are tracking us. Sothebysrealty.com, for example, showed up on my visit to the Times because they provided some content (an image) on the page. Checking my Spy Watch data shows that Sothebysrealty has seen me on just two pages (both on the NYT site) whereas Doubleclick has seen me on 1266 pages across 260 sites. Clearly Sothebysrealty is not a tracking company and doubleclick is. Such third party tracking is done via an array of techniques that include using cookies, free analytic services, tags, web bugs, single-pixel images, javascript tags and web beacons.

I’ve been running Spy Watch for about two months and it reports that 1533 third party sites have (potentially) collected data about the 12,000 distinct URLs I’ve visited during this time. It also notes that, on average, every page I’ve visit has been watched by 3.7 third parties. As you might expect, the distribution follows a power law with a long tail of sites that only observed a few of my visits (about 2/3 of them saw three or fewer). Here are the top twenty third party trackers in my two month’s of data.

spy20

Note that Google (red), Facebook (dark blue) and Twitter (green) are the three companies who potentially know the most about what you do on the Web.

Spy Watch can also show how many and which pages have been observed by a tracker. Facebook observed me viewing 2208 pages across 509 sites (via FB like and visit buttons) and now knows that I read reviews for Sharp and LG microwave ovens on toptenreviews.com earlier this month and frequently visit the cra.org site.

You can get and install Spy Watch from the Google Web store, which describes it like this.

Spy Watch is a privacy extension that aims to create transparency in online internet tracking by third party sites. When a user visits a page, Spy Watch lets the user see every site that knows the user visited that page. And for each of these sites, the user can find out what other information the site has gathered about the user’s browsing history. After you install the extension, continue to browse normally. After some time, click on the extension to see who’s watching you! Disclaimer: User data is stored in the browser and is not accessible by the creator of this extension.


Memoto lifelogging camera

March 9th, 2013

Memoto is a $279 lifelogging camera takes a geotagged photo every 30 seconds, holds 6K photos, and runs for several days without recharging. The company producing Memoto is a Swedish company intially funded via kickstarter and expects to start shipping the wearable camera in April 2013. The company will also offer “safe and secure infinite photo storage at a flat monthly fee, which will always be a lot more affordable than hard drives.”

The lifelogging idea has been around for many years but has yet to become propular. One reason is privacy concerns. DARPA’s IPTO office, for example, started a LifeLog program in 2004 which was almost immediately canceled after criticism from civil libertarians concerning the privacy implications of the system.


NIST guidelines for smart grid cybersecurity, 2/15/11 UMBC

January 24th, 2011

The North American electric power system has been called the world’s largest interconnected machine and is a key part of our national infrastructure. The power grid is evolving to better exploit modern information technology and become more integrated with our cyber infrastructure. This presents unprecedented opportunities for enhanced management and efficiency but also introduces vulnerabilities for intrusions, cascading disruptions, malicious attacks, inappropriate manipulations and other threats. Similar issues are foreseen for other cyber-physical infrastructure systems including industrial control systems, transportation, water, natural gas and waste disposal.

A one-day Smart Grid Cyber Security Conference will be held at UMBC on February 15, hosted by the UMBC Computer Science and Electrical Engineering Department and Maryland Clean Energy Technology Incubator. The conference will be a comprehensive presentation by the National Institute of Standards and Technology regarding an Inter-agency Report 7628 (NISTIR 7628) named Guidelines for Smart Grid Cyber Security which is a critically important document for guiding government, regulatory organizations, industry and academia on Smart Grid cybersecurity. This regional outreach conference is valuable to any organization that is planning, integrating, executing or developing cyber technology for the Smart Grid.

The conference is free, but participants are asked to register in advance to help us organize for the correct number of participants.

A full copy of the 600 page report is available here.


JASON report on the Science of Cyber-Security

December 20th, 2010

The DoD-sponsored JASON study group was asked to consider the question of whether there is a ‘science’ to cyber-security or if it is fundamentally empirical. They released an 88-page report last month, Science of Cyber-Security with the following abstract:

“JASON was requested by the DoD to examine the theory and practice of cyber-security, and evaluate whether there are underlying fundamental principles that would make it possible to adopt a more scientific approach, identify what is needed in creating a science of cyber-security, and recommend specific ways in which scientific methods can be applied. Our study identified several sub-?elds of computer science that are specifically relevant and also provides some recommendations on further developing the science of cyber-security.”

The report discusses to general technical approaches to putting cyber-security on a scientific foundation. The first is based on the standard collection of frameworks and tools grounded in logic and mathematics such as cryptography, game theory, model checking and software verification. The second is grounding cyber-security on a model based on an analog to immunology in biological systems.

It concludes with some observations, recommendations and responses to nine questions that were included in their charge. One interesting observation is that cyber-security, unlike the physical sciences, involves adversaries, so its foundation will use many different tools and methods. A recommendation is that the government establish cyber-security research centers in universities and other research organizations with a “long time horizon and periodic reviews of accomplishments”.


FTC proposes a do not track privacy mechanism

December 1st, 2010

Today the FTC released a preliminary staff report that proposes a “do not track” mechanism allowing consumers to opt out of data collection on online searching and browsing activities. The FTC report says that industry self-regulation efforts on privacy have been “too slow, and up to now have failed to provide adequate and meaningful protection.”

“To reduce the burden on consumers and ensure basic privacy protections, the report first recommends that “companies should adopt a ‘privacy by design’ approach by building privacy protections into their everyday business practices.” Such protections include reasonable security for consumer data, limited collection and retention of such data, and reasonable procedures to promote data accuracy. … Second, the report states, consumers should be presented with choice about collection and sharing of their data at the time and in the context in which they are making decisions – not after having to read long, complicated disclosures that they often cannot find. … One method of simplified choice the FTC staff recommends is a “Do Not Track” mechanism governing the collection of information about consumer’s Internet activity to deliver targeted advertisements and for other purposes. Consumers and industry both support increased transparency and choice for this largely invisible practice. The Commission recommends a simple, easy to use choice mechanism for consumers to opt out of the collection of information about their Internet behavior for targeted ads. The most practical method would probably involve the placement of a persistent setting, similar to a cookie, on the consumer’s browser signaling the consumer’s choices about being tracked and receiving targeted ads.”

The full text of the 120-page report, Protecting Consumer Privacy in an Era of Rapid Change — a proposed framework ofr businesses and policymakers is available online.


Tim Berners-Lee on protecting the Web in the December Scientific American

November 19th, 2010

Sir Tim Berners-Lee discusses the principles underlying the Web and the need to protect them in an article from the December issue of Scientific American, Long Live the Web.

“The Web evolved into a powerful, ubiquitous tool because it was built on egalitarian principles and because thousands of individuals, universities and companies have worked, both independently and together as part of the World Wide Web Consortium, to expand its capabilities based on those principles.

The Web as we know it, however, is being threatened in different ways. Some of its most successful inhabitants have begun to chip away at its principles. Large social-networking sites are walling off information posted by their users from the rest of the Web. Wireless Internet providers are being tempted to slow traffic to sites with which they have not made deals. Governments—totalitarian and democratic alike—are monitoring people’s online habits, endangering important human rights.

If we, the Web’s users, allow these and other trends to proceed unchecked, the Web could be broken into fragmented islands. We could lose the freedom to connect with whichever Web sites we want. The ill effects could extend to smartphones and pads, which are also portals to the extensive information that the Web provides.

Why should you care? Because the Web is yours. It is a public resource on which you, your business, your community and your government depend. The Web is also vital to democracy, a communications channel that makes possible a continuous worldwide conversation. The Web is now more critical to free speech than any other medium. It brings principles established in the U.S. Constitution, the British Magna Carta and other important documents into the network age: freedom from being snooped on, filtered, censored and disconnected.”

Near the end of the long feature article, he mentions the Semantic Web’s linked data as one of the major new technologies the Web will give birth to, provided the principles are upheld.

“A great example of future promise, which leverages the strengths of all the principles, is linked data. Today’s Web is quite effective at helping people publish and discover documents, but our computer programs cannot read or manipulate the actual data within those documents. As this problem is solved, the Web will become much more useful, because data about nearly every aspect of our lives are being created at an astonishing rate. Locked within all these data is knowledge about how to cure diseases, foster business value and govern our world more effectively.”

One of the benefits of linked data is that it makes data integration and fusion much easier. The benefit comes with a potential risk, which Berners-Lee acknowledges.

“Linked data raise certain issues that we will have to confront. For example, new data-integration capabilities could pose privacy challenges that are hardly addressed by today’s privacy laws. We should examine legal, cultural and technical options that will preserve privacy without stifling beneficial data-sharing capabilities.”

The risk is not unique to linked data, and new research is underway, in our lab and elsewhere, on how to also use Semantic Web technology to protect privacy.


How Rapleaf is eroding our privacy on the Web

October 24th, 2010

RapLeaf knows what you did last summer.

The Wall Street Journal continues its exploration of how our privacy is eroding on the Web in new article by Emily Steel — A Web Pioneer Profiles Users by Name. The article profiles the San Francisco startup RapLeaf, which defines its vision as follows.

“We want every person to have a meaningful, personalized experience – whether online or offline. We want you see the right content at the right time, every time. We want you to get better, more personalized service. To achieve this, we help Fortune 2000 companies gain insight into their customers, engage them more meaningfully, and deliver the right message at the right time. We also help consumers understand their online footprint.”

RapLeaf ties email address to profiles with information about people and uses the profiles to target advertisements for clients. The articles shows the information collected for one person, Linda Twombly of Nashua NH, and what some of the coded information means.

Rapleaf does allow you to see the information it has collected about you, but you have to create a RapLeaf account to see it. You might be surprised about how well it knows you. Visit this page to see if your browser has RapLeaf cookies. You can also use it to opt out your email addresses from the RapLeaf system.

To be fair, RapLeaf and other companies are not doing anything illegal and mainly collect information that people choose to make public on the Web. However, their use of cookies does allow them to aggregate and integrate information about individuals and to associate that information with email addresses, Facebook UIDs and dozens of other identifiers. The information can be used to help Web-based systems serve you better — but their idea of serving you better is likely to involve peppering you with targeted ads.

How RapLeaf collects information about Web users


WSJ: many Facebook apps transmit user IDs to advertising and tracking companies

October 17th, 2010

This Wall Street Journal article says that many of the most popular of the 550,000 Facebook apps (!) have been transmitting identifying information about users and their friends to dozens of advertising and Internet tracking companies.

“The apps reviewed by the Journal were sending Facebook ID numbers to at least 25 advertising and data firms, several of which build profiles of Internet users by tracking their online activities.

Defenders of online tracking argue that this kind of surveillance is benign because it is conducted anonymously. In this case, however, the Journal found that one data-gathering firm, RapLeaf Inc., had linked Facebook user ID information obtained from apps to its own database of Internet users, which it sells. RapLeaf also transmitted the Facebook IDs it obtained to a dozen other firms, the Journal found.

RapLeaf said that transmission was unintentional. “We didn’t do it on purpose,” said Joel Jewitt, vice president of business development for RapLeaf.”

Update: Facebook responds.


New Facebook Groups Considered Somewhat Harmful

October 7th, 2010

I always think of things I should have added in the hour after making a post. Sigh. Here goes…

The situation is perhaps not so different from mailing lists, Google groups or any number of similar systems. I can set up one of those and add people to them without their consent — even people who are are not my friends. Even people whom I don’t know and who don’t know me. Such email-oriented lists can also have public membership lists. The only check on this is that most mailing lists frameworks send a notice to people being added informing them of the action. But many frameworks allow the list owner to suppress such notifications.

But still, Facebook seems different, based on the how the rest of it is configured and on how people use it. I believe that a common expectation would be that if you are listed as a member of an open or private group, that you are a willing member.

When you get a notification that you are now a member of the Facebook group Crazy people who smell bad, you can leave the group immediately. llBut we have Facebook friends, many of them in fact, who only check in once a month or even less frequently. Notifications of their being added to a group will probably be missed.

Facebook should fix this by requiring that anyone added to a group confirm that they want to be in the group before they become members. After fixing it, there’s lots more that can be done to make Facebook groups a powerful way for assured information sharing.


New Facebook Groups Considered Harmful

October 7th, 2010

Facebook has rolled out a new version of groups announced on the Facebook blog.

“Until now, Facebook has made it easy to share with all of your friends or with everyone, but there hasn’t been a simple way to create and maintain a space for sharing with the small communities of people in your life, like your roommates, classmates, co-workers and family.

Today we’re announcing a completely overhauled, brand new version of Groups. It’s a simple way to stay up to date with small groups of your friends and to share things with only them in a private space. The default setting is Closed, which means only members see what’s going on in a group.”

There are three kinds of groups: open, closed and secret. Open groups have public membership listings and public content. Private ones have public membership but public but private content. For secret groups, both the membership and content are private.

A key part of the idea is that the group members collectively define who is in the group, spreading the work of setting up and maintaining the group over many people.

But a serious issue with the new Facebook group framework is that a member can unilaterally add any of their friends to a group. No confirmation is required by the person being added. This was raised as an issue by Jason Calacanis.

The constraint that one can only add Facebook friend to a group he belongs to does offer some protection against ending up in unwanted groups (e.g., by spammers). But it could still lead to problems. I could, for example, create a closed group named Crazy people who smell bad and add all of my friends without their consent. Since the group is not secret like this one, anyone can see who is in the group. Worse yet, I could then leave the group. (By the way, let me know if you want to join any of these groups).

While this might just be an annoying prank, it could spin out of control — what might happen if one of your so called friends adds you to the new, closed “Al-Queda lovers” group?

The good news is that this should be easy to fix. After all, Facebook does require confirmation for the friend relation and has a mechanism for recommending that friends like pages or try apps. Either mechanism would work for inviting others to join groups.

We have started working with a new group-centric secure information sharing model being developed by Ravi Sandhu and others as a foundation for better access and privacy contols in social media systems. It seems like a great match.

See update.


Taintdroid catches Android apps that leak private user data

September 30th, 2010

Ars Technica has an an article on bad Android apps, Some Android apps caught covertly sending GPS data to advertisers.

“The results of a study conducted by researchers from Duke University, Penn State University, and Intel Labs have revealed that a significant number of popular Android applications transmit private user data to advertising networks without explicitly asking or informing the user. The researchers developed a piece of software called TaintDroid that uses dynamic taint analysis to detect and report when applications are sending potentially sensitive information to remote servers.

They used TaintDroid to test 30 popular free Android applications selected at random from the Android market and found that half were sending private information to advertising servers, including the user’s location and phone number. In some cases, they found that applications were relaying GPS coordinates to remote advertising network servers as frequently as every 30 seconds, even when not displaying advertisements. These findings raise concern about the extent to which mobile platforms can insulate users from unwanted invasions of privacy.”

TaintDroid is an experimental system that “analyses how private information is obtained and released by applications ‘downloaded’ to consumer phones”. A paper on the system will be presented at the 2010 USENIX Symposium on Operating Systems Design and Implementation later this month.

TaintDroid: An Information-Flow Tracking System for Realtime Privacy Monitoring on Smartphones, William Enck, Peter Gilbert, Byung-gon Chun, Landon P. Cox, Jaeyeon Jung, Patrick McDaniel, and Anmol N. Sheth, OSDI, October 2010.

The project, Realtime Privacy Monitoring on Smartphones has a good overview site with a FAQ and demo.

This is just one example of a rich and complex area full of trade-offs. We want our systems and devices to be smarter and to really understand us — our preferences, context, activities, interests, intentions, and pretty much everything short of our hopes and dreams. We then want them to use this knowledge to better serve us — selecting music, turing the ringer on and off, alerting us to relevant news, etc. Developing this technology is neither easy nor cheap and the developers have to profit from creating it. Extracting personal information that can be used or sold is one model — just as Google and others do to provide better ad placement on the Web.

Here’s a quote from the Ars Technical article that resonated with me.

“As Google says in its list of best practices that developers should adopt for data collection, providing users with easy access to a clear and unambiguous privacy policy is really important.”

We, and many others, are trying to prepare for the next step — when users can define their own privacy policies and these will be understood and enforced by their devices.


Is Twitters plan to log all clicks a privacy loss?

September 2nd, 2010

Twitter’s planned shortening of all links via its t.co service is about to happen. The initial motivation was security, according to Twitter:

“Twitter’s link service at http://t.co is used to better protect users from malicious sites that engage in spreading malware, phishing attacks, and other harmful activity. A link converted by Twitter’s link service is checked against a list of potentially dangerous sites. When there’s a match, users can be warned before they continue.”

Declan McCullagh reports that Twitter announced in an email message that when someone click “on these links from Twitter.com or a Twitter application, Twitter will log that click.” Such information is extremely valuable. Give Twitter’s tens of millions of active users, just knowing how often certain URLs are clicked by people indicates what entities and topics are of interest at the moment.

“Our link service will also be used to measure information like how many times a link has been clicked. Eventually, this information will become an important quality signal for our Resonance algorithm—the way we determine if a Tweet is relevant and interesting.”

Associating the clicks with a user, IP address, location or device can yield even more information — like what you are interested in right now. Moreover, Twitter now has a way to associate arbitrary annotation metadata with each tweet. Analyzing all of this data can identify, for example, communities of users with common interests and the influential members within them.

Note that Twitter has not said it will do this or even that it will record and keep any user-identifiable information along with the clicks. They might just log the aggregate number of clicks in a window of time. But going the next step and capturing the additional information would be, in my mind, irresistible, even if there was no immediate plan to use it.

Search engines like Google already link clicks to users and IP addresses and use the information to improve their ranking algorithms and probably in many other ways. But what is troubling is the seemingly inexorable erosion of our online privacy. There will be no way to opt out of having your link wrapped by the t.co service and no announced way to opt out of having your clicks logged.