How can one evaluate the performance of prognostic models in a meaningful way? This is a very basic and yet an interesting problem especially in the context of prediction of very rare events (base-rates <10%). How reliable is the model’s forecast? This is a good question and of particular importance when it matters – think criminal psychology where models forecast the likelihood of recidivism for criminally insane people (Quinsey 1980). There are a variety of ways to evaluate a model’s predictive performance on a hold out sample, and some are more meaningful than others. For example, when using error-rates one should keep in mind that they are only meaningful when you consider the base-rate of your classes and the trivial classifier as well. Often this gets confusing when you are dealing with very imbalanced data sets or rare events. In this blog post, I’ll summarize a few techniques and alternative evaluation methods for predictive models that are particularly useful when dealing with rare events or low base-rates in general.
The Receiver Operator Characteristic is a graphical measure that plots the true versus false positive rates such that the user can decide where to cut for making the final classification decision. In order to summarize the performance of the graph in a single, reportable number, the area under the curve (AUC) is generally used.