How the Srizbi botnet escaped destruction to spam again

November 30th, 2008

Just like Freddy Kreuger, botnets are hard to kill.

In a series of posts on his Security Fix blog, Brian Krebs provides a good explanation of how the Srizbi botnet was able to come back to life after being killed (we thought!) earlier this month.

“The botnet Srizbi was knocked offline Nov. 11 along with Web-hosting firm McColo, which Internet security experts say hosted machines that controlled the flow of 75 percent of the world’s spam. One security firm, FireEye, thought it had found a way to prevent the botnet from coming back online by registering domain names it thought Srizbi was likely to target. But when that approach became too costly for the firm, they had to abandon their efforts.”

In a example of good distributed programming design, the botnet had a backup plan if its control servers were taken down.

“The malware contained a mathematical algorithm that generates a random but unique Web site domain name that the bots would be instructed to check for new instructions and software updates from its authors. Shortly after McColo was taken offline, researchers at FireEye said they deciphered the instructions that told computers infected with Srizbi which domains to seek out. FireEye researchers thought this presented a unique opportunity: If they could figure out what those rescue domains would be going forward, anyone could register or otherwise set aside those domains to prevent the Srizbi authors from regaining control over their massive herd of infected machines.”

Unfortunately, FireEye did not have the resources to carry out its plan and was forced to abandon it, but not before seeking help from other companies and organizations with deeper pockets.

“A week ago, FireEye researcher Lanstein said they were looking for someone else to register the domain names that the Srizbi bots might try to contact to revive themselves. He said they approached other companies such as VeriSign Inc. and Microsoft Corp. After FireEye abandoned its efforts, some other members of the computer security community said they reached out for help from the United States Computer Emergency Readiness Team, or US-CERT, a partnership between the Department of Homeland Security and the private sector to combat cypersecurity threats.

File this one under opportunity, lost.

iPhone linux

November 29th, 2008

Quoted without comment or speculation, from the Linux on the iPhone blog.

“I’m pleased to announce that the Linux 2.6 kernel has been ported to Apple’s iPhone platform, with support for the first and second generation iPhones as well as the first generation iPod touch. This is a rough first draft of the port, and many drivers are still missing, but it’s enough that a real alternative operating system is running on the iPhone.”

Jon Kleinberg named as one of 20 Best Brains Under 40 by Discover Magazine

November 28th, 2008

Discover magazine has named Jon Kleinberg as one of the 20 Best Brains Under 40 for his work on HITS and social networks.

“In the mid-1990s a Web search for, say, “DISCOVER magazine” meant wading through thousands of results presented in a very imperfect order. Then, in 1996, 24-year-old Jon Kleinberg developed an algorithm that revolutionized Web search. That is why today, that same search lists this magazine’s home page first. Kleinberg, now 37, created the Hyperlink-Induced Topic Search algorithm, which estimates a Web page’s value in both authority (quality of content and endorsement by other pages) and hub (whether it links to good pages).

Kleinberg continues to combine computer science, data analysis, and sociological research to help create better tools that link social networking sites. He envisions an increase in how we can see information move through space over time, in what he calls geographic hot spots on the Web, based on the interests of a particular region.

Our social network links and friendships depend on these geographic hot spots, Kleinberg says, which makes searching easier by “taking into account not just who and when, but where.” He is now studying how word-of-mouth phenomena like fads and rumors flow through groups of people, hoping to apply this knowledge to processes such as political mobilization.”

Practical research results we can all use

November 28th, 2008

Here are some results of practical research from which many of us can immediately benefit.

Robin Goldstein et al., Do more expensive wines taste better? Evidence from a large sample of blind tastings, American Association of Wine Economists, Working Papers, April 2008,

Individuals who are unaware of the price do not derive more enjoyment from more expensive wine. In a sample of more than 6,000 blind tastings, we find that the correlation between price and overall rating is small and negative, suggesting that individuals on average enjoy more expensive wines slightly less. For individuals with wine training, however, we find indications of a positive relationship between price and enjoyment. Our results are robust to the inclusion of individual fixed effects, and are not driven by outliers: when omitting the top and bottom deciles of the price distribution, our qualitative results are strengthened, and the statistical significance is improved further. Our results indicate that both the prices of wines and wine recommendations by experts may be poor guides for non-expert wine consumers.

Jonah Lehrer compares this to an earlier Stanford study. One possible takeaway point: avoid wine training, it will only diminish your ability to enjoy wine. Another possible lesson to be learned: maybe going into engineering wasn’t such a great idea, after all. (spotted on daily dish)

Neologism Web-based RDFS vocabulary editor

November 27th, 2008

Neologism is a simple web-based RDF Schema vocabulary editor and publishing system under development at DERI. It looks like a great lightweight tool for developing Semantic Web vocabularies and publishing them on the Web following current best practices. It’s goal is to “dramatically reduce the time required to create, publish and modify vocabularies for the Semantic Web.” The system is not yet open for use, but there is a good online Neologism demo as well as a screencast of how to use it.

Google to layoff 10,000 workers

November 24th, 2008

We’re looking at some tough times for companies that have been supported by ad revenues, like newspapers, magazines, broadcasters, and Google. Google?!? Yes, Google.

It is being reported that Google is set to lay off about one-third of it’s workforce.

“Google has been quietly laying off staff and up to 10,000 jobs could be on the chopping block according to sources. Since August, hundreds of employees have been laid off and there are reports that about 500 of them were recruiters for Google.

By law, Google is required to report layoffs publicly and with the SEC however, Google has managed to get around the legal requirement. In fact, one of the ways Google was able to meet Wall Street’s Q3 earnings expectations was by trimming “operational” expenses.

Google reports to the SEC that it has 20,123 employees but in reality it has 30,000. Why the discrepancy? Google classifies 10,000 of the employees as temporary operational expenses or “workers”. Google co-founder Sergey Brin said, “There is no question that the number (of workers) is too high”.

According to this article, the bulk of these “temporary workers” have been working at Google for years and moved “from job to job every few months so that their status remains temporary”.

Update I 11/25: This note on CNet, Google cutting contractor workforce, says that Google announced its plans to trip their force of 10,000 contractors earlier in October, as reported by the SJ Mercury New.

Update II 11/25: Vint Cerf commented on Dave Farber’s IP mailing list that:

1. Google is still hiring but at a lower rate than before
2. for the past year, Google has been reducing the number of temporary staff and shifting jobs to employees.

Update III 11/25: Tim O’Reilly reports (also on IP) that WebGuild may not be the most reliable source of information on Google.

McNamee: Textual Representations for Corpus-Based Bilingual Retrieval, 9am Mon 11/24

November 20th, 2008

Paul McNamee will defend his dissertation on Textual Representations for Corpus-Based Bilingual Retrieval at 9:00am Monday 24 November 2008 in ITE 325B. His mentor is Charles Nicholas and the dissertation committee includes Tim Finin, James Mayfield (JHU), Sergei Nirenburg and Doug Oard (UMCP). Here is the abstract.

The traditional approach to information retrieval is based on using words as the indexing and search terms for documents. One part of this research investigates alternative methods for representing text, including a method based on overlapping sequences of characters called n-gram tokenization. N-grams are studied in depth and one notable finding is that they achieve a 20% improvement in retrieval effectiveness over words in certain situations.

The other focus of this research is improving retrieval performance when foreign language documents must be searched and translation is required. In this scenario bilingual dictionaries are often used to translate user queries; however even among the most commonly spoken languages, for which large bilingual lexicons exist, dictionary-based translation suffers from several significant problems. These include: difficulty handling proper names, which are often missing; issues related to morphological variation since entries, or query terms, may not be lemmatized; and, an inability to robustly handle multiword phrases, especially non-compositional expressions. These problems can be addressed when translation is accomplished using parallel collections, sets of documents available in more than one language. Using parallel texts enables statistical translation of character n-grams rather than words or stemmed words, and with this technique highly effective bilingual retrieval performance is obtained. Translation of multiword expressions is also explored.

In this dissertation I present an overview of the field of cross- language information retrieval and then introduce the foundational concepts in n-gram tokenization and corpus-based translation. Then monolingual and bilingual experiments on test sets in 13 languages are described. Analysis of these experiments gives insight into: the relative efficacy of various tokenization methods; reasons why n-grams are effective; the utility of automated relevance feedback, in both monolingual and bilingual contexts; the interplay between tokenization and translation; and, how translation resource selection and size influence bilingual retrieval.

Semantic Applications at age one

November 19th, 2008

After a year, Read/Write Web has revisited their review of 10 promising Semantic Web apps, producing 10 Semantic Apps to Watch – One Year Later.

“A lot can happen in one year on the Internet, so we thought we’d check back in with each of the 10 products and see how they’re progressing. What’s changed over the past year and what are these companies working on now? The products are, in no particular order: Freebase, Powerset, Twine, AdaptiveBlue, Hakia, Talis, TrueKnowledge, TripIt, Calais (was ClearForest), Spock.”

They plan to publish a completely new list of Semantic applications to watch as the next post in the series and ask people to leave suggestions in the post comments.

Maybe Read/Write Web will do like Michael Apted’s 7up series and report back to us on how the systems are doing each year, which I guess may be like seven Web-years.

3scale provides infrastructure of the programmable web

November 19th, 2008

3scale provides infrastructure for the programmable web3scale Networks is a Barcelona-based startup that is trying to fill a critical gap in helping organizations manage web services as a business or at least in a business-like manner.

“3scale provides a new generation of infrastructure for the web – point and click contract management, monitoring and billing for Web Services. The 3scale platform makes it easy for providers to launch their APIs, manage user access and, if desired, collect usage fees. Service users can discover services they need and sign up for plans on offer.” (source)

They have been operating a private beta system for a few months and just announced that their public beta is open. Currently signing up with 3scale and registering services is free and the only costs are commissions on transaction fees your service charges. Once you’ve registered a service, you can install one of several 3scale plugins for your programming environment to get your service talking to 3scale and configure one or more usage plans. 3scale uses Amazon’s EC2, S3 and Cloud Computing services.

3scale’s co-founder and technical lead is Steve Wilmott, who we worked with for many years when he was an academic doing research on multiagent systems. Several months ago he invited us to add Swoogle’s web service to 3scale’s private beta. We were please with how easy it was and look forward to exploring how else to use 3scale.

A story in yesterday’s Washington Post, Manage Your API Infrastructure With 3scale Networks, has some more information.

Gladwell: 10,000 hours to success

November 16th, 2008

The Guardian has an extract, A gift or hard graft?, from Malcolm Gladwell’s new book, Outliers: The Story Of Success, due out later this month. The piece introduces the idea that a key to becoming extraordinarily successful in a field is achieving early expertise and that to become an expert in a discipline requires on the order of 10,000 hours of practice. The 10K figure comes from the research of Anders Ericsson who in the early 1990s studied violinists at the Berlin Academy of Music.

“The curious thing about Ericsson’s study is that he and his colleagues couldn’t find any “naturals” – musicians who could float effortlessly to the top while practising a fraction of the time that their peers did. Nor could they find “grinds”, people who worked harder than everyone else and yet just didn’t have what it takes to break into the top ranks. Their research suggested that once you have enough ability to get into a top music school, the thing that distinguishes one performer from another is how hard he or she works. That’s it. What’s more, the people at the very top don’t just work much harder than everyone else. They work much, much harder.”

The extract focuses on some of the most successful people in the computer industry — Bill Joy, Bill Gates, Steve Jobs, and others — and argues that another part of their success was being born at the right time, in 1954 or 1955. This made them about 20 years old when the first person computers became available.

I’m going to seize on this as yet another personal excuse — I was born a half decade too early.

Reuters Calais to support Semantic Web Linked Data in next release

November 14th, 2008

Thompson Reuters announced on their blog (Life in the Linked Data Cloud: Calais Release 4) that their next release of the Calais web-based information extraction services will support linked data.

“In that release we’ll go beyond the ability to extract semantic data from your content. We will link that extracted semantic data to datasets from dozens of other information sources, from Wikipedia to Freebase to the CIA World Fact Book. In short – instead of being limited to the contents of the document you’re processing, you’ll be able to develop solutions that leverage a large and rapidly growing information asset: the Linked Data Cloud.”

The new capabilities will be available in release 4 that is expected
out on 09 January 2009.

The change is based on Calais returning de-referenceable URIs for the entities it finds. Accessing those URIs will produce RDF with links to corresponding entities in DBpedia, Freebase and other sources of “Semantic Web” data. It will be very interesting to see how well their system does at mapping document entities (e.g., “secretary Rice”) to entities in the LOD cloud such as Accessing that URI with a request for content type application/rdf+xml returns the RDF at that has RDF assertions extracted by DBpedia from Wikipedia.

Malcolm Gladwell (Geek Pop Star) on Outliers

November 12th, 2008

New York magazine has an article (Geek Pop Star) on Malcolm Gladwell whose new book, Outliers: The Story of Success, is due out later this fall.

“Malcolm Gladwell’s elegant and wildly popular theories about modern life have turned his name into an adjective—Gladwellian! But in his new book, he seeks to undercut the cult of success, including his own, by explaining how little control we have over it.”

His book explains why I never became a hockey star — I was born too late in the year. A disproportionate number of top Canadian Hockey players are born in the first half of the year. Gladwell’s explanation is that the cut-off for joining a junior hockey league is that you must be 10 years old by January 1. So if you were born on January 2nd, you will start playing with the advantage of being older, larger and stronger than your peers. I’m not sure that my August birthday explains my own poor skating skills, though.

This quote from the article addresses by Bill Gates did so well.

“Or take the case of Bill Gates. Gladwell cites a body of research finding that the “magic number for true expertise” is 10,000 hours of practice. “Practice isn’t the thing you do once you’re good,” Gladwell writes. “It’s the thing you do that makes you good.” Gladwell shows how Gates accumulated his 10,000 hours while in middle and high school in Seattle thanks to a series of nine incredibly fortunate opportunities—ranging from the fact that his private school had a computer club with access to (and money for) a sophisticated computer, to his childhood home’s proximity to the University of Washington, where he had access to an even more sophisticated computer. “By the time Gates dropped out of Harvard after his sophomore year to try his hand at his own computer software company,” Gladwell writes, “he’d been programming practically nonstop for seven consecutive years. He was way past 10,000 hours.” Yes, Gates is obviously brilliant, Gladwell concludes, but without the lucky breaks he had as a kid, he never could have had the opportunity to fulfill the true potential of that brilliance. How many similarly brilliant people never get that opportunity?”

I guess I spent my own 10,000 hours hacking Lisp too late in life.