You are currently browsing the Markus Breitenbach weblog archives for February, 2010.
- Advertising (1)
- Artificial Intelligence (AI) (13)
- Classification (3)
- Clustering (1)
- Coding / Programming (8)
- Cryptography (1)
- Data Mining (19)
- Economy / Investing (1)
- ewrt linux (2)
- Fixing Stuff (8)
- Machine Learning (30)
- Math (2)
- Politics (3)
- Predictive Modeling (4)
- Psychology (3)
- Ramblings (26)
- Random (9)
- Security (15)
- Society (12)
- Sociology (4)
- spam (3)
- Statistics (15)
- July 11, 2010 8:56 pm: GraphLab & Parallel Machine Learning
- June 15, 2010 8:21 pm: PHP configuration using htaccess on 1and1 shared hosting
- February 28, 2010 12:21 pm: Energy efficient data mining algorithms
- February 16, 2010 11:56 pm: Alternative measures to the AUC for rare-event prognostic models
- January 26, 2010 9:54 pm: Spam Filtering by Learning a Pattern Language
- January 10, 2010 5:37 pm: Strong profiling is not mathematically optimal for discovering rare malfeasors (on rare event detection)
- November 13, 2009 12:27 am: Starcraft AI competition
- July 25, 2009 8:34 pm: Random characters in text mode -> graphics card
- June 7, 2009 5:04 pm: Programs stealing the input focus
- May 2, 2009 4:06 pm: Famous bugs in AI game engine caught on tape
Blogroll
Uncategorized
Useful Links
- July 2010
- June 2010
- February 2010
- January 2010
- November 2009
- July 2009
- June 2009
- May 2009
- April 2009
- March 2009
- February 2009
- January 2009
- December 2008
- November 2008
- October 2008
- September 2008
- August 2008
- July 2008
- June 2008
- May 2008
- April 2008
- March 2008
- February 2008
- January 2008
- December 2007
- November 2007
- October 2007
- September 2007
- August 2007
- July 2007
- June 2007
- May 2007
- April 2007
- March 2007
- February 2007
- January 2007
- December 2006
- November 2006
- October 2006
- September 2006
- August 2006
Archive for February 2010
Energy efficient data mining algorithms
February 28, 2010 12:21 pm by Markus.
I was a bit amused to read about this new algorithm that IBM research developed and that was sold as “energy efficient” in their press-release. This is good marketing, because the average journalist and reader might not understand the impact of the improvement. It just sounds a lot better to be green and save energy than to improve computational complexity…
Posted in Data Mining | Print | No Comments »
Alternative measures to the AUC for rare-event prognostic models
February 16, 2010 11:56 pm by Markus.
How can one evaluate the performance of prognostic models in a meaningful way? This is a very basic and yet an interesting problem especially in the context of prediction of very rare events (base-rates <10%). How reliable is the model’s forecast? This is a good question and of particular importance when it matters - think criminal psychology where models forecast the likelihood of recidivism for criminally insane people (Quinsey 1980). There are a variety of ways to evaluate a model’s predictive performance on a hold out sample, and some are more meaningful than others. For example, when using error-rates one should keep in mind that they are only meaningful when you consider the base-rate of your classes and the trivial classifier as well. Often this gets confusing when you are dealing with very imbalanced data sets or rare events. In this blog post, I’ll summarize a few techniques and alternative evaluation methods for predictive models that are particularly useful when dealing with rare events or low base-rates in general.
The Receiver Operator Characteristic is a graphical measure that plots the true versus false positive rates such that the user can decide where to cut for making the final classification decision. In order to summarize the performance of the graph in a single, reportable number, the area under the curve (AUC) is generally used.
Posted in Statistics, Classification, Data Mining, Machine Learning | Print | 2 Comments »