You are currently browsing the Markus Breitenbach weblog archives for the day August 25, 2007 10:01 pm.
- Advertising (1)
- Artificial Intelligence (AI) (10)
- Classification (2)
- Coding / Programming (6)
- Cryptography (1)
- Data Mining (11)
- ewrt linux (2)
- Fixing Stuff (5)
- Machine Learning (21)
- Math (1)
- Politics (2)
- Psychology (3)
- Ramblings (19)
- Random (6)
- Security (13)
- Society (9)
- Sociology (3)
- spam (2)
- Statistics (9)
- August 14, 2008 11:13 pm: CAPTCHAs - Not dead
- August 1, 2008 10:25 pm: ISC on the Future of Anti-Virus Protection
- July 12, 2008 4:41 pm: The cloud obscuring the scientific method
- June 22, 2008 5:05 pm: Debugging and Evaluating Predictive Models
- May 21, 2008 8:08 pm: Cult of the Amateur
- April 21, 2008 1:38 am: ART OF SEDUCTION: Not Pretty, Really
- March 25, 2008 2:25 am: "Internal Server Error" when converting phpBB v2 to phpBB v3
- March 6, 2008 1:29 am: Firewire and DRM
- February 28, 2008 10:46 pm: Using Psychological Domain Knowledge for the Netflix Challenge
- February 12, 2008 1:24 am: VPN Tunnels from within VMWare (Windows XP and GRE weirdness)
Blogroll
Useful Links
- August 2008
- July 2008
- June 2008
- May 2008
- April 2008
- March 2008
- February 2008
- January 2008
- December 2007
- November 2007
- October 2007
- September 2007
- August 2007
- July 2007
- June 2007
- May 2007
- April 2007
- March 2007
- February 2007
- January 2007
- December 2006
- November 2006
- October 2006
- September 2006
- August 2006
Archive for August 25, 2007 10:01 pm
Human Intuition vs. Statistical Models
August 25, 2007 10:01 pm by Markus.
I just came across a very interesting book announcement for “Super Crunchers: Why Thinking-by-Numbers Is the New Way to Be Smart” by Ian Ayres, a professor Yale Law School and econometrician. In the book (I haven’t read it yet, but I will) the author argues that intuition is losing ground to statistical methods and data mining. According to the Amazon abstract he gives examples from the airline industry, medical diagnostics and even online dating services showing that a statistical model will outperform human intuition.
That machines can outperform human judgement has been known for quite some time. For example, in the field of psychology the diagnosis of mental disorders is more or less standardized by them DSM. There was a very interesting meta-analysis that showed that a mechanical predictor always outperformed the human psychologist. To be specific: Grove, W.M., Zald, D.H., Hallberg, A.M., Lebow, B., Snitz, E., & Nelson, C. (2000). Clinical versus mechanical prediction: A meta-analysis. Psychological Assessment, 12, 19–30. To quote from the Abstract: “On average, mechanical-prediction techniques were about 10% more accurate than clinical predictions. Depending on the specific analysis, mechanical prediction substantially outperformed clinical prediction in 33%-47% of studies examined. Although clinical predictions were often as accurate as mechanical predictions, in only a few studies (6%-16%) were they substantially more accurate. Superiority for mechanical-prediction techniques was consistent, regardless of the judgment task, type of judges, judges’ amounts of experience, or the types of data being combined.”
I’m a little bit skeptical about using data crunching to decide important questions (as in life and death questions). In general it seems like a good idea, but it always comes down to how you model the data and how you model the question to be answered. In many cases this might be obvious, in others not so much. The art is then to model the data, not the application of the algorithm or technique. It reminds me a bit of a class about formal program verification I took back in Darmstadt. Stefan, the TA of the class, and I had an argument about the use of practicability of program verification. He gave the unix find utility as an example for which you can show - more or less - easily that the program will terminate while enumerating all the files in all the directories in the system, and how find can be nicely modeled with a well-founded relation to show the termination of the algorithm. I objected that I could set a symbolic link to a uper-level directory (which is why find does not by default follow them) and could make find go in circles. Stefan conceded, “Oh well, I guess then the model was wrong…”. Similar things have happened in e.g. Cryptography, where a finite-state model (sorry, lost the citation somewhere; I’m not quite sure if that was the Usenix paper from the Stanford guys I read or something else) showed that the SSL protocol (Secure Socket Layer) is secure. Later the protocol was broken nonetheless (Schneier, Bruce; Wagner, David; Analysis of the SSL 3.0 protocol).
I think that with the wrong model you can show a lot of good things about anything. Once you abstract from the real world and build a model you might just have ignored that little most important feature. Maybe it is time for a best-practices in data modeling and data mining (there are already some books out there for some specific domains) …
Posted in Data Mining, Machine Learning, Ramblings | Print | No Comments »