You are currently browsing the Markus Breitenbach weblog archives for the day March 11, 2007 11:59 pm.
- Advertising (1)
- Artificial Intelligence (AI) (13)
- Classification (3)
- Clustering (1)
- Coding / Programming (8)
- Cryptography (1)
- Data Mining (19)
- Economy / Investing (1)
- ewrt linux (2)
- Fixing Stuff (8)
- Machine Learning (30)
- Math (2)
- Politics (3)
- Predictive Modeling (4)
- Psychology (3)
- Ramblings (26)
- Random (9)
- Security (15)
- Society (12)
- Sociology (4)
- spam (3)
- Statistics (15)
- July 11, 2010 8:56 pm: GraphLab & Parallel Machine Learning
- June 15, 2010 8:21 pm: PHP configuration using htaccess on 1and1 shared hosting
- February 28, 2010 12:21 pm: Energy efficient data mining algorithms
- February 16, 2010 11:56 pm: Alternative measures to the AUC for rare-event prognostic models
- January 26, 2010 9:54 pm: Spam Filtering by Learning a Pattern Language
- January 10, 2010 5:37 pm: Strong profiling is not mathematically optimal for discovering rare malfeasors (on rare event detection)
- November 13, 2009 12:27 am: Starcraft AI competition
- July 25, 2009 8:34 pm: Random characters in text mode -> graphics card
- June 7, 2009 5:04 pm: Programs stealing the input focus
- May 2, 2009 4:06 pm: Famous bugs in AI game engine caught on tape
Blogroll
Uncategorized
Useful Links
- July 2010
- June 2010
- February 2010
- January 2010
- November 2009
- July 2009
- June 2009
- May 2009
- April 2009
- March 2009
- February 2009
- January 2009
- December 2008
- November 2008
- October 2008
- September 2008
- August 2008
- July 2008
- June 2008
- May 2008
- April 2008
- March 2008
- February 2008
- January 2008
- December 2007
- November 2007
- October 2007
- September 2007
- August 2007
- July 2007
- June 2007
- May 2007
- April 2007
- March 2007
- February 2007
- January 2007
- December 2006
- November 2006
- October 2006
- September 2006
- August 2006
Archive for March 11, 2007 11:59 pm
Comment Spam and more Spam …
March 11, 2007 11:59 pm by Markus.
When I started my blog I was already aware about the Comment Spam problem and thus enabled a Wordpress plugin to prevent comment spam (”did you pass math”). The other day a friend complained that when he wanted to comment on something and forgot to fill out the captcha-field his comment got lost (and pushing the back-button had his browser loose all that he had typed up). And when I was reading through raw apache logs and saw somebody trying to post a comment and apparently not succeeding. So I turned the plugin off and within a day I had 8 spam comments on my blog (which does not have a high pagerank and uses nofollow-links; What’s the gain?)… So I’ll keep it turned on. There!
Spam is an interesting problem, because you have an “adversary” with a lot of resources who will do whatever it takes to get your attention, an email in your inbox or a comment with links on your blog. The more filters we build, even with machine learning, the more sophisticated they become. It will probably be a driving force for classification for some time to come. However, machine learning and filters are very expensive in CPU time and do not scale very well. Sander told me about the email server at their institute having a backlog in emails of 40 Gigabytes, i.e. 40 Gigabytes of emails staying in the spool waiting to be scanned for spam and virii. Given that this server was only serving about 50 users and given that 99% of the email in the spool is probably spam illustrates the problem. Currently (in my opinion) mechanisms like Grey-Listing and such are a better solution simply because they scale better as they exploit “implementation issues” of the spam-software and don’t require the CPU-intensive scan of every email. That is, until the next generation of Spam-bots will adapt to those measures. Build a better spam-filter and somebody will build a better spam.
Posted in Ramblings | Print | No Comments »