What is up with Clearspring and malware?

July 31st, 2010

Google Chrome has been showing me a malware warning page today as I try to visit normally trusted and benign sites. I got this one just now as I tried to got to Planet RDF.

Warning: Visiting this site may harm your computer!

The website at planetrdf.com contains elements from the site bin.clearspring.com, which appears to host malware – software that can hurt your computer or otherwise operate without your consent. Just visiting a site that contains malware can infect your computer.

For detailed information about the problems with these elements, visit the Google Safe Browsing diagnostic page for bin.clearspring.com.

Learn more about how to protect yourself from harmful software online.

[ ] I understand that visiting this site may harm my computer. PROCEED

Clearspring claims it’s a technical problem, although they admit they were using a service that was compromised with files redirecting users to a certain malware domain. I’m a bit fuzzy on what clearspring does and where they are being used on the Planet RDF site. I don’t see it in the page source, for example.

update: Maybe the problem stems from flash cookies in blog content being syndicated by Planet RDF that have flash objects mediated by clearspring.

W3C EmotionML provides markup for emotions

July 31st, 2010

The W3C has published a second working draft of EmotionML, or the emotion markup language, Here’s how it’s described.

As the web is becoming ubiquitous, interactive, and multimodal, technology needs to deal increasingly with human factors, including emotions. The present draft specification of Emotion Markup Language 1.0 aims to strike a balance between practical applicability and scientific well-foundedness. The language is conceived as a “plug-in” language suitable for use in three different areas: (1) manual annotation of data; (2) automatic recognition of emotion-related states from user behavior; and (3) generation of emotion-related system behavior.

Unfortunately EmotionML is not built on RDF. If it were, I would have marked up this post in RDFa using it!

The working draft identifies concrete examples where EmotionML might be useful including as a markup or representation for systems that do opinion mining, sentiment analysis, affect monitoring, and emotion recognition. A list of 39 individual use cases for EmotionML are given in an appendix.

EmotionML markup explicitly refers to one or more separate vocabularies used for representing emotion-related states. However, the group has defined some default vocabularies that can be used. An example is the Ekman “big six” basic emotions (anger, disgust, fear, happiness, sadness, and surprised). Another is the a set of appraisal terms defined by Ortony et al. (desirability, praiseworthiness, appealingness,, desirability-for-other, deservingness, liking, likelihood, effort, realization, strength-of-identification, expectation-of-deviation and familiarity)

Here’s an example from the working draft where a static image is annotated with several emotion categories with different intensities.

<emotionml xmlns="http://www.w3.org/2009/10/emotionml"
      <meta:doc>Example adapted from (Hall and Matsumoto 2004) 

       <category name="Disgust"/>
       <intensity value="0.82"/>
       <category name="Contempt"/>
       <intensity value="0.35"/>
       <category name="Anger"/>
       <intensity value="0.12"/>
       <category name="Surprise"/>
       <intensity value="0.53"/>

rdfs:seeAlso the short article by InqoQ on the EmotionML working draft.

New Yorker on voting systems and fair elections

July 25th, 2010

votingThis week’s New Yorker magazine has an article by Anthony Gottlieb on different voting systems, including range voting.

WIN OR LOSE: No voting system is flawless. But some are less democratic than others. Can theorists engineer a better way to elect candidates?

The article provides an interesting introduction to some of the voting systems that have been developed and used over the centuries and the advantages and vulnerabilities. There’s no mention of Scantegrity or security or the general issue of verifiability, however.

It’s actually in the Book’s section, so I guess it is ostensibly a review of a new book “Numbers Rule: The Vexing Mathematics of Democracy, from Plato to the Present” by journalist and mathematician George Szpiro.

The article also mentions a book by William Poundstone, “Gaming the Vote: Why Elections Aren’t Fair (and What We Can Do About It)” which is a steal on amazon for $5.00. Such a steal that I ordered two last week, one for me and one to share. Poundstone, btw, has written some good popular books on a wide range of topics (e.g., game theory, technical interviewing techniques, etc). I’ve read quite a few and both enjoyed them and learned things. According to Wikipedia, he is a cousin of comedian Paula Poundstone!

Apple Safari can expose your private data

July 22nd, 2010

Apple’s Safari browser has a privacy vulnerability allowing web sites you visit to extract your personal information (e.g., name, address, phone number) from your computer’s address book. The fix is to turn off Safari’s web form autofill feature, which is selected by default (Preferences > AutoFill > AutoFill web form).


It’s an interesting Javascript exploit that does not seem to be a problem for other browsers.

Google acquires Metaweb and Freebase

July 16th, 2010

Google announced today that it has acquired Metaweb, the company behind Freebase — a free, semantic database of “over 12 million people, places, and things in the world.” This is from their announcement on the Official Google blog:

“Over time we’ve improved search by deepening our understanding of queries and web pages. The web isn’t merely words — it’s information about things in the real world, and understanding the relationships between real-world entities can help us deliver relevant information more quickly. … With efforts like rich snippets and the search answers feature, we’re just beginning to apply our understanding of the web to make search better. Type [barack obama birthday] in the search box and see the answer right at the top of the page. Or search for [events in San Jose] and see a list of specific events and dates. We can offer this kind of experience because we understand facts about real people and real events out in the world. But what about [colleges on the west coast with tuition under $30,000] or [actors over 40 who have won at least one oscar]? These are hard questions, and we’ve acquired Metaweb because we believe working together we’ll be able to provide better answers.”

In their announcement, Google promises to continue to maintain Freebase “as a free and open database for the world” and invites other web companies use and contribute to it.

Freebase is a system very much in the linked open data spirit, even thought RDF is not its native representation. It’s content is available as RDF and there are many links that bind it to the LOD cloud. Moreover, Freebase has a very good wiki-like interface allowing people to upload, extend and edit both its schema and data.

Here’s a video on the concepts behind Metaweb which are, of course, also those underlying the Semantic Web. What the difference — I’d say a combination of representational details and centralized (Metaweb) vs. distributed (Semantic Web).

Search neutrality: Google and Danny Sullivan weigh in

July 16th, 2010

Web search guru Danny Sullivan has a great response to the NYT editorial on regulating search engine algorithms: The New York Times Algorithm and Why It Needs Government Regulation. Here’s how it starts:

“The New York Times is the number one newspaper web site. Analysts reckon it ranks first in reach among US opinion leaders. When the New York Times editorial staff tweaks its supersecret algorithm behind what to cover and exactly how to cover a story — as it does hundreds of times a day — it can break a business that is pushed down in coverage or not covered at all.”

Google published its own response to the Times piece as a Financial Times op-ed and also posted it to the Google public policy blog: regulating what is “best” in search?

“Search engines use algorithms and equations to produce order and organisation online where manual effort cannot. These algorithms embody rules that decide which information is “best”, and how to measure it. Clearly defining which of any product or service is best is subjective. Yet in our view, the notion of “search neutrality” threatens innovation, competition and, fundamentally,your ability as a user to improve how you find information.”

The penultimate paragraph gives what they say is their strongest argument againt mandating “search neutrality”.

“But the strongest arguments against rules for “neutral search” is that they would make the ranking of results on each search engine similar, creating a strong disincentive for each company to find new, innovative ways to seek out the best answers on an increasingly complex web. What if a better answer for your search, say, on the World Cup or “jaguar” were to appear on the web tomorrow? Also, what if a new technology were to be developed as powerful as PageRank that transforms the way search engines work? Neutrality forcing standardised results removes the potential for innovation and turns search into a commodity.”

This assumes of course, that there is real competition among Internet search engines. Microsoft has been putting a lot of research and development into Bing with good results and it’s been gaining market share. Yahoo is doing very interesting this as well. Consumer choice among a handful of competitors would be the best way to ensure that none abuse their customers.

Barry Smith short course online: An Introduction to ontology

July 15th, 2010

Here’s a great resource if you want to come up to speed on ontologies and their importance today.

Professor Barry Smith of the University at Buffalo held a two-day course, An Introduction to Ontology: From Aristotle to the Universal Core, in 2009, to introduce ontologies and their applications to both philosophers and computer scientists. It consisted of of eight lectures for which slides and downloadable videos are available. Paul Alexander has also made the videos available in streaming form here if you want to view them without downloading.

The lectures are all either 60 or 90 minutes. Here are links to the streaming videos, thanks to Paul Alexander:

  • Ontology as a Branch of Philosophy
  • Ontology and Logic
  • The Ontology of Social Reality
  • Why I Am Not a Philosopher (or: Ontology Leaving the Mother Ship of Philosophy)
  • Why Computer Science Needs Philosophy
  • Ontology and the Semantic Web
  • Towards a Standard Upper Level Ontology
  • The Universal Core: Ontology and the US Federal Government Data Integration Initiative

  • New York Times editorializes about the Google search ranking algorithm

    July 15th, 2010

    In what may be a first, today’s New York Times has an editorial about an algorithm. No, they haven’t waded into the P=NP issue, but commented on Google’s algorithm for ranking search results and accusations that Google unfairly biases it for its own self interest.

    “In the past few months, Google has come under investigation by antitrust regulators in Europe. Rivals have accused Google of placing the Web sites of affiliates like Google Maps or YouTube at the top of Internet searches and relegating competitors to obscurity down the list. In the United States, Google said it expects antitrust regulators to scrutinize its $700 million purchase of the flight information software firm ITA, with which it plans to enter the online travel search market occupied by Expedia, Orbitz, Bing and others.”

    This issue will become more important as the companies dominating Web search (Google, Microsoft and Yahoo) continue to increase their importance and also broaden their acquisition of companies offering web services.

    The NYT’s position is moderate, recommending:

    Google provides an incredibly valuable service, and the government must be careful not to stifle its ability to innovate. Forcing it to publish the algorithm or the method it uses to evaluate it would allow every Web site to game the rules in order to climb up the rankings — destroying its value as a search engine. Requiring each algorithm tweak to be approved by regulators could drastically slow down its improvements. Forbidding Google to favor its own services — such as when it offers a Google Map to queries about addresses — might reduce the value of its searches. With these caveats in mind, if Google is to continue to be the main map to the information highway, it concerns us all that it leads us fairly to where we want to go.

    Google Open Spot Android app finds parking

    July 9th, 2010

    sf_retrieving_spotGoogle’s Open Spot Android app lets people leaving parking spots share the information with others searching for parking nearby. Running the app shows you parking spots within a 1.5km. New parking spots are assumed to be gone after 20 minutes and removed from the system.

    People who announce open spots gain karma points, while those who report false spots, known as griefers, are on notice:

    “We’re watching for behavior that looks like a griefer spoofing parking spots. We have a couple of mechanisms available to make sure someone can’t leave a bunch of fake parking spots. If we see this happening we will take steps to fix it.

    This is a simple example of a context-aware mobile app that can further benefit from also knowing that you are driving, as opposed to riding, in your car and likely to want to find a parking spot, as opposed to doing 70mph on I-95 as it goes through Baltimore. Moreover, context would also inform that app that you are probably leaving a public parking spot and mark it automatically. However, such a feature should be smart enough to avoid being tagged by Google as a griefer and finding out what punishment Google has in store for you.

    USCYBERCOM secret revealed

    July 8th, 2010
    USCYBERCOM logo.  Click to enlarge.

    The secret message embedded in the USCYBERCOM logo


    is what the md5sum function returns when applied to the string that is USCYBERCOM’s official mission statement. Here’s a demonstration of this fact done on a Mac. On linux, use the md5sum command instead of md5.

    ~> echo -n "USCYBERCOM plans, coordinates, integrates, \
    synchronizes and conducts activities to: direct the \
    operations and defense of specified Department of \
    Defense information networks and; prepare to, and when \
    directed, conduct full spectrum military cyberspace \
    operations in order to enable actions in all domains, \
    ensure US/Allied \ freedom of action in cyberspace and \
    deny the same to our adversaries." | md5

    md5sum is a standard Unix command that computes a 128 bit “fingerprint” of a string of any length. It is a well designed hashing function that has the property that its very unlikely that any two non-identical strings in the real world will have the same md5sum value. Such functions have many uses in cryptography.

    Thanks to Ian Soboroff for spotting the answer on Slashdot and forwarding it.

    Someone familiar with md5 would recognize that the secret string has the same length and character mix as an md5 value — 32 hexadecimal characters. Each of the possible hex characters (0123456789abcdef) represents four bits, so 32 of them is a way to represent 128 bits.

    We’ll leave it as an exercise for the reader to compute the 128 bit sequence that our secret code corresponds to.

    Cyber Command embeds encrypted message in USCYBERCOM logo

    July 7th, 2010
    USCYBERCOM logo.  Click to enlarge.

    Cyber Command (USCYBERCOM) is the new unit in the US Department of Defense that is responsible for the “defense of specified Department of Defense information networks” and, when needed, to “conduct full-spectrum military cyberspace operations in order to enable actions in all domains, ensure freedom of action in cyberspace for the U.S. and its allies, and deny the same to adversaries.”

    Their logo as an encrypted message in its inner gold ring:


    An article in Wired quotes a USCYBERCOM source:

    “It is not just random numbers and does ‘decode’ to something specific,” a Cyber Command source tells Danger Room. “I believe it is specifically detailed in the official heraldry for the unit symbol.”

    “While there a few different proposals during the design phase, in the end the choice was obvious and something necessary for every military unit,” the source adds. “The mission.”

    Here’s your chance to use those skills you learned in CMSC 443. Wired is offering a T-shirt to the first person who can crack the code. With that hint in hand, go crack this code open. E-mail us your best guess, or leave it in the comments below. Our Cyber Command source will confirm the right answer. And the first person to get it gets his/her choice of a Danger Room T-shirt. USCYBERCOM might offer you a job.

    Wikipedia offline due to power outage

    July 4th, 2010

    Wikipedia was offline for nearly twelve hours today, starting about 11:00am EDT. According to Wikipedia’s Twitter feed:

    “Thanks for being patient, everyone. We’ve figured out the problem: power outage in our Florida data center. Slowly coming back online!”

    This is not the first time that Wikimedia has experienced problems cause by power outages. In March 2010, Wikipedia was also knocked offline globally:

    “Due to an overheating problem in our European data center many of our servers turned off to protect themselves. As this impacted all Wikipedia and other projects access from European users, we were forced to move all user traffic to our Florida cluster, for which we have a standard quick failover procedure in place, that changes our DNS entries. However, shortly after we did this failover switch, it turned out that this failover mechanism was now broken, causing the DNS resolution of Wikimedia sites to stop working globally. This problem was quickly resolved, but unfortunately it may take up to an hour before access is restored for everyone, due to caching effects.”

    According to a story in itnews

    “The cluster is hosted in a co-location facility in Tampa, Florida, which has approximately 300 servers, a 350 Mbps connection, and supports up to 3,000 hits per second, or 150 million hits per day. Two other server clusters – knams in Amsterdam, Netherlands and yaseo, provided by Yahoo! in Seoul, South Korea – also provide hosting and bandwidth to serve users in various regions.

    It looks like there are still failover problems. 🙁 We can watch the WIkimedia Technical blog for more information.