Privacy continues to be an important topic surrounding social media systems. A big part of the problem is that virtually all of us have a difficult time thinking about what information about us is exposed and to whom and for how long. As UMBC colleague Zeynep Tufekci points out, our intuitions in such matters come from experiences in the physical world, a place whose physics differs considerably from the cyber world.
“Below is my taxonomy of social networking data, which I first presented at the Internet Governance Forum meeting last November, and again — revised — at an OECD workshop on the role of Internet intermediaries in June.
Service data is the data you give to a social networking site in order to use it. Such data might include your legal name, your age, and your credit-card number.
Disclosed data is what you post on your own pages: blog entries, photographs, messages, comments, and so on.
Entrusted data is what you post on other people’s pages. It’s basically the same stuff as disclosed data, but the difference is that you don’t have control over the data once you post it — another user does.
Incidental data is what other people post about you: a paragraph about you that someone else writes, a picture of you that someone else takes and posts. Again, it’s basically the same stuff as disclosed data, but the difference is that you don’t have control over it, and you didn’t create it in the first place.
Behavioral data is data the site collects about your habits by recording what you do and who you do it with. It might include games you play, topics you write about, news articles you access (and what that says about your political leanings), and so on.
Derived data is data about you that is derived from all the other data. For example, if 80 percent of your friends self-identify as gay, you’re likely gay yourself.”
I think most of us understand the first two categories and can easily choose or specify a privacy policy to control access to information in them. The rest however, are more difficult to think about and can lead to a lot of confusion when people are setting up their privacy preferences.
As an example, I saw some nice work at the 2010 IEEE International Symposium on Policies for Distributed Systems and Networks on “Collaborative Privacy Policy Authoring in a Social Networking Context” by Ryan Wishart et al. from Imperial college that addressed the problem of incidental data in Facebook. For example, if I post a picture and tag others in it, each of the tagged people can contribute additional policy constraints that can narrow access to it.
Having a simple ontology for social media data could help us move forward toward better privacy controls for online social media systems. I like Schneier’s broad categories and wonder what a more complete treatment defined using Semantic Web languages might be like.
Apple’s Safari browser has a privacy vulnerability allowing web sites you visit to extract your personal information (e.g., name, address, phone number) from your computer’s address book. The fix is to turn off Safari’s web form autofill feature, which is selected by default (Preferences > AutoFill > AutoFill web form).
It’s an interesting Javascript exploit that does not seem to be a problem for other browsers.
is what the md5sum function returns when applied to the string that is USCYBERCOM’s official mission statement. Here’s a demonstration of this fact done on a Mac. On linux, use the md5sum command instead of md5.
~> echo -n "USCYBERCOM plans, coordinates, integrates, \
synchronizes and conducts activities to: direct the \
operations and defense of specified Department of \
Defense information networks and; prepare to, and when \
directed, conduct full spectrum military cyberspace \
operations in order to enable actions in all domains, \
ensure US/Allied \ freedom of action in cyberspace and \
deny the same to our adversaries." | md5
9ec4c12949a4f31474f299058ce2b22a
~>
md5sum is a standard Unix command that computes a 128 bit “fingerprint” of a string of any length. It is a well designed hashing function that has the property that its very unlikely that any two non-identical strings in the real world will have the same md5sum value. Such functions have many uses in cryptography.
Thanks to Ian Soboroff for spotting the answer on Slashdot and forwarding it.
Someone familiar with md5 would recognize that the secret string has the same length and character mix as an md5 value — 32 hexadecimal characters. Each of the possible hex characters (0123456789abcdef) represents four bits, so 32 of them is a way to represent 128 bits.
We’ll leave it as an exercise for the reader to compute the 128 bit sequence that our secret code corresponds to.
Cyber Command (USCYBERCOM) is the new unit in the US Department of Defense that is responsible for the “defense of specified Department of Defense information networks” and, when needed, to “conduct full-spectrum military cyberspace operations in order to enable actions in all domains, ensure freedom of action in cyberspace for the U.S. and its allies, and deny the same to adversaries.”
Their logo as an encrypted message in its inner gold ring:
“It is not just random numbers and does ‘decode’ to something specific,” a Cyber Command source tells Danger Room. “I believe it is specifically detailed in the official heraldry for the unit symbol.”
“While there a few different proposals during the design phase, in the end the choice was obvious and something necessary for every military unit,” the source adds. “The mission.”
Here’s your chance to use those skills you learned in CMSC 443. Wired is offering a T-shirt to the first person who can crack the code. With that hint in hand, go crack this code open. E-mail us your best guess, or leave it in the comments below. Our Cyber Command source will confirm the right answer. And the first person to get it gets his/her choice of a Danger Room T-shirt. USCYBERCOM might offer you a job.
Here’s a quick trick that could significantly speed up your Web surfing. Download and run the open source namebench on your computer. It does a thorough test of your current DNS servers and some other popular global and regional alternatives, produces a good report and recommends which ones you should use.
Here is how namebench describes what it does:
“namebench looks for the fastest DNS (Domain Name System) servers accessible to your computer. You can think of a DNS server as a phone book: When you want to dial a company on the phone, you may have to flip through a phone book by name to find their phone number. On the Internet, when you want to visit “www.google.com”, a DNS server needs to looks up the correct IP Address for you.
Over the course of loading a single web page, your computer may need to look up a dozen of these addresses. While your Internet provider usually automatically assigns you one of their servers to handle looking up these addresses, there may be others that are significantly faster. namebench finds them.”
Namebench also points out which DNS servers do DNS hijacking — typically by intercepting the error message produced by entering a mistyped URL (e.g., http://umbc.edo/) and redirecting you to a page full of ads and “helpful” search results. Some name severs, like OpenDNS, will also automatically correct some mistyped URLS, e.g., guessing that then you typed http://umbc.edi/ you meant to type http://umbc.edu/. (Shades of DWIM!) It’s not dangerous and is a way private DNS services, like OpenDNS, get revenue to support the service and make a profit.
I have been using OpenDNS because it’s the fastest (for me) and don’t mind their NXDOMAIN hijacking. But I learned from namebench that OpenDNS reroutes www.google.com to google.navigation.opendns.com. That site redirects HTTP GET requests to and then from there onto http://www.google.de/. And Google itself redirects HTTP GET requests for http://google.com/ to http://www.google.com/. I’ll admit I am a bit confused by this. I imagine they do this to capture queries sent to Google, which provide very useful information even in the aggregate. OpenDNS says that they are doing this to correct a problem with Google-specific software installed on Dell computers. They do not seem to be doing this for Microsoft’s Bing search engine, which does lend some credence the claim. I plan on digging into this more to fully understand what is going on and why.
Namebench runs on Macs, Windows and UNIX, and has both a command line and graphical user interface. See the namebench FAQ for more information.
The June 2010 CACM has an interesting article by Jilin Chen and Joseph Konstan of the University of Minnesota on Conference Paper Selectivity and Impact. The abstract gets right to the point:
“Studying the metadata of the ACM Digital Library (http://www.acm.org/dl), we found that papers in low-acceptance-rate conferences have higher impact than those in high-acceptance-rate conferences within ACM, where impact is measured by the number of citations received. We also found that highly selective conferences — those that accept 30% or less of submissions—are cited at a rate comparable to or greater than ACM Transactions and journals.”
A key paragraph later in the paper has some more detail:
“Addressing the second question— on how much impact conference papers have compared to journal papers — in Figures 3 and 4, we found that overall, journals did not outperform conferences in terms of citation count; they were, in fact, similar to conferences with acceptance rates around 30%, far behind conferences with acceptance rates below 25% (T-test, T[7603] = 24.8, p< .001). Similarly, journals published as many papers receiving no citations in the next two years as conferences accepting 35%–40% of submissions, a much higher low-impact percentage than for highly selective conferences. The same analyses over four- and eight-year periods yielded results consistent with the two-year period; journal papers received significantly fewer citations than conferences where the acceptance rate was below 25%."
Impact of CS conferences vs. journals
We have to assume that this study is only applicable to Computer Science, for which the ACM digital library is a very good sample, and not other disciplines (e.g., EE) or even narrow sub-disciplines within CS. Different disciplines have very different publication patterns. But it does confirm our own anecdotal evidence from tracking citations to papers written in our ebiquity lab over the past ten years — those published din top conferences tend to get more citations than those in journals.
TechCrunch is reporting that Twitter is down due to an attack by someone claiming to be part of the ‘Iranian Cyber Army’. Since Twitter is now down, we can’t show a screen shot, but Techrunch reports that a similar defacement is live at mawjcamp.org.
Iranian Cyber Army
THIS SITE HAS BEEN HACKED BY IRANIAN CYBER ARMY
iRANiAN.CYBER.ARMY@GMAIL.COM
U.S.A. Think They Controlling And Managing Internet By
Their Access, But THey Don’t, We Control And Manage
Internet By Our Power, So Do Not Try To Stimulation
Iranian Peoples To….
NOW WHICH COUNTRY IN EMBARGO LIST? IRAN? USA?
WE PUSH THEM IN EMBARGO LIST
Take Care.
The foaf:mbox property is very useful since it is ‘inverse functional’ and can thus serve as an ID for a foaf individual. This lets us infer that two foaf profiles with the same mbox refer to the same person.
Since publishing your email address invites spam, many people use the foaf:mbox_sha1sum property instead of mbox. mbox_sha1sum is also inverse functional but doesn’t reveal your private information (i.e., email address).
The idea exploits the fact that a few free email services (e.g., gmail, hotmail, yahoo, aol) account for a large fraction of email addresses and using a person’s full name, one can generate likely ‘username’ possibilities. Given an email hash and a persons first and last name, one can generate hashes of likely email addresses until a match is found.
Abell was able (!) to crack 10% of the email addresses for 80,871 stackoverflow.com users in an hour with a simple Haskell program.
The same attack can be used on foaf:mbox_sha1sum properties, especially since a foaf profile will very handily provide the other useful information about the person. Given the extra information available in many foaf profile (e.g., nick, school homepage) one might even expect better results.
As vulnerabilities go, this doesn’t seem like a very dangerous one. The use of mbox_sha1sum is usually justified as a way to avoid having your email address harvested by spambots. I doubt that spammers would think it productive to spend an hour of computing time to get 1000 email addresses.
A paper just published in Nature, Common ecology quantifies human insurgency, describes a mathematical model that can be used to predict the the sizes and timing of violent events within different insurgent conflicts.
“We propose a unified model of human insurgency that reproduces these commonalities, and explains conflict-specific variations quantitatively in terms of underlying rules of engagement. Our model treats each insurgent population as an ecology of dynamically evolving, self-organized groups following common decision-making processes. Our model is consistent with several recent hypotheses about modern insurgency is robust to many generalizations, and establishes a quantitative connection between human insurgency, global terrorism and ecology. Its similarity to financial market models provides a surprising link between violent and non-violent forms of human behaviour.”
See also a note in Nature News, Modellers claim wars are predictable and this TED talk by one of the authors, Sean Gourley, on the mathematics of war.
The TED blog has more information and portions of an interview with Gourley.
Mark Chu-Carroll is a Google software engineer who’s written a long, detailed and informed review of Google’s new programming language Go. It’s worth a read if you are interested in understanding what it’s like as a programming language. Here’s a few points that I took note of.
“The guys who designed Go were very focused on keeping things as small and simple as possible. When you look at it in contrast to a language like C++, it’s absolutely striking. Go is very small, and very simple. There’s no cruft. No redundancy. Everything has been pared down. But for the most part, they give you what you need. If you want a C-like language with some basic object-oriented features and garbage collection, Go is about as simple as you could realistically hope to get.”
“The most innovative thing about it is its type system. … It ends up giving you something with the flavor of Python-ish duck typing, but with full type-checking from the compiler.”
“Go programs compile really astonishingly quickly. When I first tried it, I thought that I had made a mistake building the compiler. It was just too damned fast. I’d never seen anything quite like it.”
“At the end of the day, what do I think? I like Go, but I don’t love it. If it had generics, it would definitely be my favorite of the C/C++/C#/Java family. It’s got a very elegant simplicity to it which I really like. The interface type system is wonderful. The overall structure of programs and modules is excellent. But it’s got some ugliness. … It’s not going to wipe C++ off the face of the earth. But I think it will establish itself as a solid alternative.”
Go sounds like a language that will help you grow as a computer scientist if you use it. That’s a good enough recommendation for me.
“The participants in this debate, including the three guest speakers, all agree that computing is moving into the cloud. “We are experiencing a disruptive moment in the history of technology, with the expansion of the role of the internet and the advent of cloud-based computing”, says Stephen Elop, president of Microsoft’s business division, which generates about a third of the firm’s revenues ($13 billion) and more than half of its profits ($4.5 billion) in the most recent quarter. Marc Benioff, chief executive of Salesforce.com, the world’s largest SaaS provider with over $1.2 billion in sales in the past 12 months, is no less bullish: ‘Like the shift [from the mainframe to the client/server architecture] that roiled our industry in decades past, the transition to cloud computing is happening now because of major discontinuities in cost, value and function.’”
While the debate’s proposition suggests that security or privacy is its focus, it’s really a broader argument about how software services will be delivered in the future in which security is just one aspect.
“Whether and to what extent companies and consumers elect to hand their computing over to others, of course, depends on how much they trust the cloud. And customers still have many questions. How reliable are such services? What about privacy? Don’t I lose too much control? What if Salesforce.com, for instance, changes its service in a way I do not like? Are such web-based services really cheaper than traditional software? And how easy is it to get my data if I want to change providers? Are there open technical standards that would make this easier?”
This post on the CACM Blog caught my eye and shows that we still have a long way to go before computing is taken seriously in US secondary education, let alone K-12.
“Up until September, Georgia and Texas were the (only) two states in the US that accepted a computer science course as fulfilling high school graduation requirements. In Texas, the Advanced Placement Computer Science (AP CS) course fulfilled a mathematics requirement. In Georgia, it fulfilled a fourth science course requirement. As of October, however, Georgia has rescinded that decision. … ”
I wonder how other countries treat computing and informatics in primary and secondary education.