UMBC ebiquity
Social media

Archive for the 'Social media' Category

Researchers install PAC-MAN on Sequoia voting machine w/o breaking seals

August 23rd, 2010, by Tim Finin, posted in Games, Security, Social media, Technology Impact

Here’s a new one for the DIY movement.

Security researchers J. Alex Haldeman and Ariel Feldman demonstrated PAC-MAC running on a Sequoia voting machine last week at the EVT/WOTE Workshop held at the USENIX Security conference in DC.

Amazingly, they were able to install the game on a Sequoia AVC Edge touch-screen DRE (direct-recording electronic) voting machine without breaking the original tamper-evident seals.

Here’s how they describe what they did on Haldeman’s web site:

What is the Sequoia AVC Edge?

It’s a touch-screen DRE (direct-recording electronic) voting machine. Like all DREs, it stores votes in a computer memory. In 2008, the AVC Edge was used in 161 jurisdictions with almost 9 million registered voters, including large parts of Louisiana, Missouri, Nevada, and Virginia, according to Verified Voting.

What’s inside the AVC Edge?

It has a 486 SLE processor and 32 MB of RAM—similar specs to a 20-year-old PC. The election software is stored on an internal CompactFlash memory card. Modifying it is as simple as removing the card and inserting it into a PC.

Wouldn’t seals expose any tampering?

We received the machine with the original tamper-evident seals intact. The software can be replaced without breaking any of these seals, simply by removing screws and opening the case.

How did you reprogram the machine?

The original election software used the psOS+ embedded operating system. We reformatted the memory card to boot DOS instead. (Update: Yes, it can also run Linux.) Challenges included remembering how to write a config.sys file and getting software to run without logical block addressing or a math coprocessor. The entire process took three afternoons.”

You can find out more from the presentation slides from the EVT workshop, Practical AVC-Edge CompactFlash Modifications can Amuse Nerds. They sum up their study with the following conclusion.

“In conclusion, we feel our work represents the future of DREs. Now that we know how bad their security is, thousands of DREs will be decommissioned and sold by states over the next several years. Filling our landfills with these machines would be a terrible waste. Fortunately, they can be recycled as arcade machines, providing countless hours of amusement in the basements of the nations’ nerds.”

Google unemployment index estimates and predicts unemployment

August 20th, 2010, by Tim Finin, posted in Google, Social media

The Google Unemployment Index is an economic indicator based on queries sent to Google’s search engine related to unemployment, social security, welfare, and unemployment benefits. Since some of these search terms are probably leading indicators, it can also be used to predict upcoming changes in the actual unemployment rate.


The index is based on queries tracked via Google Insights for Search that are tuned to different countries and you can also focus on particular regions or metropolitan areas and compare the index in several locations. Here’s an example comparing Florida (blue) and Maryland (red).

Probability-based processor might speed AI applications

August 18th, 2010, by Tim Finin, posted in GENERAL, Semantic Web, Social media

Lyric Semiconductor LEC chipAnalog computers were a hot idea — in the 1950s! But I find this intriguing because I’ve come around to the position that a lot of our human “intelligence” is the result of acquiring and using probabilistic models. So supporting this in hardware might be a big win, especially for low-cost, low-power devices. It will also support lots of other common tasks in social computing, image processing and language technology.

Technology review has a short article, A New Kind of Microchip, on computer chip being developed by Lyric Semiconductor that process signals representing probabilities rather than digital bits.

“A computer chip that performs calculations using probabilities, instead of binary logic, could accelerate everything from online banking systems to the flash memory in smart phones and other gadgets. … And because that kind of math is at the core of many products, there are many potential applications. “To take one example, Amazon’s recommendations to you are based on probability,” says Vigoda. “Any time you buy [from] them, the fraud check on your credit card is also probability [based], and when they e-mail your confirmation, it passes through a spam filter that also uses probability.”

All those examples involve comparing different data to find the most likely fit. Implementing the math needed to do this is simpler with a chip that works with probabilities, says Vigoda, allowing smaller chips to do the same job at a faster rate. A processor that dramatically speeds up such probability-based calculations could find all kinds of uses.”

Lyric’s chip is called LEC and was developed with support from DARPA. It is 30 times smaller in size than current digital error correction technology according to Wired. Although small it yields “a Pentium’s worth of computation,” according to Lyric CEO Vigoda. His 2003 dissertation at MIT was on a related topic, Analog Logic: Continuous-Time Analog Circuits for Statistical Signal Processing.

You can also read about the LEC chip in a story in yesterday’s NYT, A Chip That Digests Data and Calculates the Odds.

Usability determines password policy

August 16th, 2010, by Tim Finin, posted in Policy, Privacy, Security, Social media

Some online sites let you use any old five-character string as your password for as long as you like. Others force you to pick a new password every six months and it has to match a complicated set of requirements — at least eight characters, mixed case, containing digits, letters, punctuation and at least one umlaut. Also, it better not contain any substrings that are legal Scrabble words or match any past password you’ve used since the Bush 41 administration.

A recent paper by two researchers from Microsoft concludes that an organization’s usability requirements is the main factor that determines the complexity of its password policy.

Dinei Florencio and Cormac Herley, Where Do Security Policies Come From?, Symposium on Usable Privacy and Security (SOUPS), 14–16 July 2010, Redmond.

We examine the password policies of 75 different websites. Our goal is understand the enormous diversity of requirements: some will accept simple six-character passwords, while others impose rules of great complexity on their users. We compare different features of the sites to find which characteristics are correlated with stronger policies. Our results are surprising: greater security demands do not appear to be a factor. The size of the site, the number of users, the value of the assets protected and the frequency of attacks show no correlation with strength. In fact we find the reverse: some of the largest, most attacked sites with greatest assets allow relatively weak passwords. Instead, we find that those sites that accept advertising, purchase sponsored links and where the user has a choice show strong inverse correlation with strength.

We conclude that the sites with the most restrictive password policies do not have greater security concerns, they are simply better insulated from the consequences of poor usability. Online retailers and sites that sell advertising must compete vigorously for users and traffic. In contrast to government and university sites, poor usability is a luxury they cannot afford. This in turn suggests that much of the extra strength demanded by the more restrictive policies is superfluous: it causes considerable inconvenience for negligible security improvement.

h/t Bruce Schneier

An ontology of social media data for better privacy policies

August 15th, 2010, by Tim Finin, posted in Policy, Privacy, Security, Semantic Web, Social media

Privacy continues to be an important topic surrounding social media systems. A big part of the problem is that virtually all of us have a difficult time thinking about what information about us is exposed and to whom and for how long. As UMBC colleague Zeynep Tufekci points out, our intuitions in such matters come from experiences in the physical world, a place whose physics differs considerably from the cyber world.

Bruce Schneier offered a taxonomy of social networking data in a short article in the July/August issue of the IEEE Security & Privacy. A version of the article, A Taxonomy of Social Networking Data, is available on his site.

“Below is my taxonomy of social networking data, which I first presented at the Internet Governance Forum meeting last November, and again — revised — at an OECD workshop on the role of Internet intermediaries in June.

  • Service data is the data you give to a social networking site in order to use it. Such data might include your legal name, your age, and your credit-card number.
  • Disclosed data is what you post on your own pages: blog entries, photographs, messages, comments, and so on.
  • Entrusted data is what you post on other people’s pages. It’s basically the same stuff as disclosed data, but the difference is that you don’t have control over the data once you post it — another user does.
  • Incidental data is what other people post about you: a paragraph about you that someone else writes, a picture of you that someone else takes and posts. Again, it’s basically the same stuff as disclosed data, but the difference is that you don’t have control over it, and you didn’t create it in the first place.
  • Behavioral data is data the site collects about your habits by recording what you do and who you do it with. It might include games you play, topics you write about, news articles you access (and what that says about your political leanings), and so on.
  • Derived data is data about you that is derived from all the other data. For example, if 80 percent of your friends self-identify as gay, you’re likely gay yourself.”

I think most of us understand the first two categories and can easily choose or specify a privacy policy to control access to information in them. The rest however, are more difficult to think about and can lead to a lot of confusion when people are setting up their privacy preferences.

As an example, I saw some nice work at the 2010 IEEE International Symposium on Policies for Distributed Systems and Networks on “Collaborative Privacy Policy Authoring in a Social Networking Context” by Ryan Wishart et al. from Imperial college that addressed the problem of incidental data in Facebook. For example, if I post a picture and tag others in it, each of the tagged people can contribute additional policy constraints that can narrow access to it.

Lorrie Cranor gave an invited talk at the workshop on Building a Better Privacy Policy and made the point that even P3P privacy policies are difficult for people to comprehend.

Having a simple ontology for social media data could help us move forward toward better privacy controls for online social media systems. I like Schneier’s broad categories and wonder what a more complete treatment defined using Semantic Web languages might be like.

Papers with more references are cited more often

August 15th, 2010, by Tim Finin, posted in Semantic Web, Social media

The number of citations a paper receives is generally thought to be a good and relatively objective measure of its significance and impact.

Researchers naturally are interested in knowing how to attract more citations to their papers. Publishing the results of good work helps of course, but everyone knows there are many other factors. Nature news reports on research by Gregory Webster that analyzed the 53,894 articles and review articles published in Science between 1901 and 2000.

The advice the study supports is “cite and you shall be cited”.

A long reference list at the end of a research paper may be the key to ensuring that it is well cited, according to an analysis of 100 years’ worth of papers published in the journal Science.
     The research suggests that scientists who reference the work of their peers are more likely to find their own work referenced in turn, and the effect is on the rise, with a single extra reference in an article now producing, on average, a whole additional citation for the referencing paper.
     ’There is a ridiculously strong relationship between the number of citations a paper receives and its number of references,” Gregory Webster, the psychologist at the University of Florida in Gainesville who conducted the research, told Nature. “If you want to get more cited, the answer could be to cite more people.’

A plot of the number of references listed in each article against the number of citations it eventually received reveal that almost half of the variation in citation rates among the Science papers can be attributed to the number of references that they include. And — contrary to what people might predict — the relationship is not driven by review articles, which could be expected, on average, to be heavier on references and to garner more citations than standard papers.

Researchers prove Rubics Cube solvable in 20 moves or less

August 13th, 2010, by Tim Finin, posted in AI, Games, GENERAL, Google, Social media

Using a combination of mathematical tricks, good programming and 35 CPU-years on Google’s servers, a group of researchers have proved that every position of Rubik’s Cube can be solved in 20 moves or less. The group consists of Kent State mathematician Morley Davidson, Google engineer John Dethridge, math teacher Herbert Kociemba, and programmer Tomas Rokicki.

This is an amazing result and a testament to more than 30 years of work on the problem. The Cube was invented in 1974 and almost immediately the subject for programs to solve it. In 1981, Morwen Thistlethwaite proved that any configuration could be solved in no more than 52 moves. Periodically, tighter upper bounds for the maximum solution length were found. This result ends the quest — there are some configurations (about 300M) that require 20 moves to solve and there are none that require more than 20 moves.

In their own words, here’s how the group solved all 43,252,003,274,489,856,000 Cube positions:

  • We partitioned the positions into 2,217,093,120 sets of 19,508,428,800 positions each.
  • We reduced the count of sets we needed to solve to 55,882,296 using symmetry and set covering.
  • We did not find optimal solutions to each position, but instead only solutions of length 20 or less.
  • We wrote a program that solved a single set in about 20 seconds.
  • We used about 35 CPU years to find solutions to all of the positions in each of the 55,882,296 sets.

This reminds me of the first program I wrote for my own enjoyment, which used brute force to find all solutions to Piet Hein’s Soma Cube. In 1969 I had a summer job as the night operator for an IBM 360 and I would turn off the clock to run my program so that the management wouldn’t know how much computer time I was consuming.

See this BBC story more more information on this amazing result.

W3C EmotionML provides markup for emotions

July 31st, 2010, by Tim Finin, posted in KR, Semantic Web, Social media, Web

The W3C has published a second working draft of EmotionML, or the emotion markup language, Here’s how it’s described.

As the web is becoming ubiquitous, interactive, and multimodal, technology needs to deal increasingly with human factors, including emotions. The present draft specification of Emotion Markup Language 1.0 aims to strike a balance between practical applicability and scientific well-foundedness. The language is conceived as a “plug-in” language suitable for use in three different areas: (1) manual annotation of data; (2) automatic recognition of emotion-related states from user behavior; and (3) generation of emotion-related system behavior.

Unfortunately EmotionML is not built on RDF. If it were, I would have marked up this post in RDFa using it!

The working draft identifies concrete examples where EmotionML might be useful including as a markup or representation for systems that do opinion mining, sentiment analysis, affect monitoring, and emotion recognition. A list of 39 individual use cases for EmotionML are given in an appendix.

EmotionML markup explicitly refers to one or more separate vocabularies used for representing emotion-related states. However, the group has defined some default vocabularies that can be used. An example is the Ekman “big six” basic emotions (anger, disgust, fear, happiness, sadness, and surprised). Another is the a set of appraisal terms defined by Ortony et al. (desirability, praiseworthiness, appealingness,, desirability-for-other, deservingness, liking, likelihood, effort, realization, strength-of-identification, expectation-of-deviation and familiarity)

Here’s an example from the working draft where a static image is annotated with several emotion categories with different intensities.

<emotionml xmlns="http://www.w3.org/2009/10/emotionml"
           xmlns:meta="http://www.example.com/metadata"
           category-set="http://www.example.com/custom/
                hall-matsumoto-emotions.xml">
   <info>
      <meta:media-type>image</meta:media-type>
      <meta:media-id>disgust</meta:media-id>
      <meta:media-set>JACFEE-database</meta:media-set>
      <meta:doc>Example adapted from (Hall and Matsumoto 2004) 

http://www.davidmatsumoto.info/Articles/

          2004_hall_and_matsumoto.pdf
      </meta:doc>
   </info>

   <emotion>
       <category name="Disgust"/>
       <intensity value="0.82"/>
   </emotion>
   <emotion>
       <category name="Contempt"/>
       <intensity value="0.35"/>
   </emotion>
   <emotion>
       <category name="Anger"/>
       <intensity value="0.12"/>
   </emotion>
   <emotion>
       <category name="Surprise"/>
       <intensity value="0.53"/>
   </emotion>
</emotionml>

rdfs:seeAlso the short article by InqoQ on the EmotionML working draft.

New Yorker on voting systems and fair elections

July 25th, 2010, by Tim Finin, posted in Social media

votingThis week’s New Yorker magazine has an article by Anthony Gottlieb on different voting systems, including range voting.

WIN OR LOSE: No voting system is flawless. But some are less democratic than others. Can theorists engineer a better way to elect candidates?

The article provides an interesting introduction to some of the voting systems that have been developed and used over the centuries and the advantages and vulnerabilities. There’s no mention of Scantegrity or security or the general issue of verifiability, however.

It’s actually in the Book’s section, so I guess it is ostensibly a review of a new book “Numbers Rule: The Vexing Mathematics of Democracy, from Plato to the Present” by journalist and mathematician George Szpiro.

The article also mentions a book by William Poundstone, “Gaming the Vote: Why Elections Aren’t Fair (and What We Can Do About It)” which is a steal on amazon for $5.00. Such a steal that I ordered two last week, one for me and one to share. Poundstone, btw, has written some good popular books on a wide range of topics (e.g., game theory, technical interviewing techniques, etc). I’ve read quite a few and both enjoyed them and learned things. According to Wikipedia, he is a cousin of comedian Paula Poundstone!

Google acquires Metaweb and Freebase

July 16th, 2010, by Tim Finin, posted in Database, Google, sEARCH, Semantic Web, Social media, Web

Google announced today that it has acquired Metaweb, the company behind Freebase — a free, semantic database of “over 12 million people, places, and things in the world.” This is from their announcement on the Official Google blog:

“Over time we’ve improved search by deepening our understanding of queries and web pages. The web isn’t merely words — it’s information about things in the real world, and understanding the relationships between real-world entities can help us deliver relevant information more quickly. … With efforts like rich snippets and the search answers feature, we’re just beginning to apply our understanding of the web to make search better. Type [barack obama birthday] in the search box and see the answer right at the top of the page. Or search for [events in San Jose] and see a list of specific events and dates. We can offer this kind of experience because we understand facts about real people and real events out in the world. But what about [colleges on the west coast with tuition under $30,000] or [actors over 40 who have won at least one oscar]? These are hard questions, and we’ve acquired Metaweb because we believe working together we’ll be able to provide better answers.”

In their announcement, Google promises to continue to maintain Freebase “as a free and open database for the world” and invites other web companies use and contribute to it.

Freebase is a system very much in the linked open data spirit, even thought RDF is not its native representation. It’s content is available as RDF and there are many links that bind it to the LOD cloud. Moreover, Freebase has a very good wiki-like interface allowing people to upload, extend and edit both its schema and data.

Here’s a video on the concepts behind Metaweb which are, of course, also those underlying the Semantic Web. What the difference — I’d say a combination of representational details and centralized (Metaweb) vs. distributed (Semantic Web).

Search neutrality: Google and Danny Sullivan weigh in

July 16th, 2010, by Tim Finin, posted in Google, Semantic Web, Social media, Web

Web search guru Danny Sullivan has a great response to the NYT editorial on regulating search engine algorithms: The New York Times Algorithm and Why It Needs Government Regulation. Here’s how it starts:

“The New York Times is the number one newspaper web site. Analysts reckon it ranks first in reach among US opinion leaders. When the New York Times editorial staff tweaks its supersecret algorithm behind what to cover and exactly how to cover a story — as it does hundreds of times a day — it can break a business that is pushed down in coverage or not covered at all.”

Google published its own response to the Times piece as a Financial Times op-ed and also posted it to the Google public policy blog: regulating what is “best” in search?

“Search engines use algorithms and equations to produce order and organisation online where manual effort cannot. These algorithms embody rules that decide which information is “best”, and how to measure it. Clearly defining which of any product or service is best is subjective. Yet in our view, the notion of “search neutrality” threatens innovation, competition and, fundamentally,your ability as a user to improve how you find information.”

The penultimate paragraph gives what they say is their strongest argument againt mandating “search neutrality”.

“But the strongest arguments against rules for “neutral search” is that they would make the ranking of results on each search engine similar, creating a strong disincentive for each company to find new, innovative ways to seek out the best answers on an increasingly complex web. What if a better answer for your search, say, on the World Cup or “jaguar” were to appear on the web tomorrow? Also, what if a new technology were to be developed as powerful as PageRank that transforms the way search engines work? Neutrality forcing standardised results removes the potential for innovation and turns search into a commodity.”

This assumes of course, that there is real competition among Internet search engines. Microsoft has been putting a lot of research and development into Bing with good results and it’s been gaining market share. Yahoo is doing very interesting this as well. Consumer choice among a handful of competitors would be the best way to ensure that none abuse their customers.

New York Times editorializes about the Google search ranking algorithm

July 15th, 2010, by Tim Finin, posted in Google, Semantic Web, Social media, Web

In what may be a first, today’s New York Times has an editorial about an algorithm. No, they haven’t waded into the P=NP issue, but commented on Google’s algorithm for ranking search results and accusations that Google unfairly biases it for its own self interest.

“In the past few months, Google has come under investigation by antitrust regulators in Europe. Rivals have accused Google of placing the Web sites of affiliates like Google Maps or YouTube at the top of Internet searches and relegating competitors to obscurity down the list. In the United States, Google said it expects antitrust regulators to scrutinize its $700 million purchase of the flight information software firm ITA, with which it plans to enter the online travel search market occupied by Expedia, Orbitz, Bing and others.”

This issue will become more important as the companies dominating Web search (Google, Microsoft and Yahoo) continue to increase their importance and also broaden their acquisition of companies offering web services.

The NYT’s position is moderate, recommending:

Google provides an incredibly valuable service, and the government must be careful not to stifle its ability to innovate. Forcing it to publish the algorithm or the method it uses to evaluate it would allow every Web site to game the rules in order to climb up the rankings — destroying its value as a search engine. Requiring each algorithm tweak to be approved by regulators could drastically slow down its improvements. Forbidding Google to favor its own services — such as when it offers a Google Map to queries about addresses — might reduce the value of its searches. With these caveats in mind, if Google is to continue to be the main map to the information highway, it concerns us all that it leads us fairly to where we want to go.

You are currently browsing the archives for the Social media category.

  Home | Archive | Login | Feed