You are currently browsing the Markus Breitenbach weblog archives for August, 2007.
- Advertising (1)
- Artificial Intelligence (AI) (8)
- Coding / Programming (6)
- Cryptography (1)
- Data Mining (10)
- ewrt linux (2)
- Fixing Stuff (5)
- Machine Learning (18)
- Math (1)
- Politics (2)
- Psychology (3)
- Ramblings (18)
- Random (6)
- Security (11)
- Society (9)
- Sociology (3)
- spam (2)
- Statistics (9)
- April 21, 2008 1:38 am: ART OF SEDUCTION: Not Pretty, Really
- March 25, 2008 2:25 am: "Internal Server Error" when converting phpBB v2 to phpBB v3
- March 6, 2008 1:29 am: Firewire and DRM
- February 28, 2008 10:46 pm: Using Psychological Domain Knowledge for the Netflix Challenge
- February 12, 2008 1:24 am: VPN Tunels from within VMWare (Windows XP and GRE weirdness)
- February 2, 2008 5:59 pm: License Key Copy Protection
- January 8, 2008 8:34 pm: Registering Domains with Network Solutions
- January 7, 2008 10:22 pm: Joe-job ...
- December 11, 2007 1:37 am: Back from NIPS 2007
- November 24, 2007 1:03 am: GMail Logout Strangeness
Blogroll
Useful Links
Archive for August 2007
Advertising and Data Mining
August 30, 2007 1:44 pm by Markus.
Lately I’ve become fascinated with the field of Advertising and Marketing. Honestly I’ve never paid much attention to ads before as most of them were just not interesting to me or were just plain horrible. I just finished reading a good book about the choice of headlines and direct marketing and how it is actually tested what gets the most responses (”Tested Advertising Methods”, John Caples).
There’s obviously a lot that could be done with machine learning. For example, some kind of predictor that learns from previous ads (or possibly over ads from various companies) how successful they have been and then predicts return rates for new ads. Google and Yahoo probably do something like this already …
Posted in Advertising, Machine Learning | Print | No Comments »
Human Intuition vs. Statistical Models
August 25, 2007 10:01 pm by Markus.
I just came across a very interesting book announcement for “Super Crunchers: Why Thinking-by-Numbers Is the New Way to Be Smart” by Ian Ayres, a professor Yale Law School and econometrician. In the book (I haven’t read it yet, but I will) the author argues that intuition is losing ground to statistical methods and data mining. According to the Amazon abstract he gives examples from the airline industry, medical diagnostics and even online dating services showing that a statistical model will outperform human intuition.
That machines can outperform human judgement has been known for quite some time. For example, in the field of psychology the diagnosis of mental disorders is more or less standardized by them DSM. There was a very interesting meta-analysis that showed that a mechanical predictor always outperformed the human psychologist. To be specific: Grove, W.M., Zald, D.H., Hallberg, A.M., Lebow, B., Snitz, E., & Nelson, C. (2000). Clinical versus mechanical prediction: A meta-analysis. Psychological Assessment, 12, 19–30. To quote from the Abstract: “On average, mechanical-prediction techniques were about 10% more accurate than clinical predictions. Depending on the specific analysis, mechanical prediction substantially outperformed clinical prediction in 33%-47% of studies examined. Although clinical predictions were often as accurate as mechanical predictions, in only a few studies (6%-16%) were they substantially more accurate. Superiority for mechanical-prediction techniques was consistent, regardless of the judgment task, type of judges, judges’ amounts of experience, or the types of data being combined.”
I’m a little bit skeptical about using data crunching to decide important questions (as in life and death questions). In general it seems like a good idea, but it always comes down to how you model the data and how you model the question to be answered. In many cases this might be obvious, in others not so much. The art is then to model the data, not the application of the algorithm or technique. It reminds me a bit of a class about formal program verification I took back in Darmstadt. Stefan, the TA of the class, and I had an argument about the use of practicability of program verification. He gave the unix find utility as an example for which you can show - more or less - easily that the program will terminate while enumerating all the files in all the directories in the system, and how find can be nicely modeled with a well-founded relation to show the termination of the algorithm. I objected that I could set a symbolic link to a uper-level directory (which is why find does not by default follow them) and could make find go in circles. Stefan conceded, “Oh well, I guess then the model was wrong…”. Similar things have happened in e.g. Cryptography, where a finite-state model (sorry, lost the citation somewhere; I’m not quite sure if that was the Usenix paper from the Stanford guys I read or something else) showed that the SSL protocol (Secure Socket Layer) is secure. Later the protocol was broken nonetheless (Schneier, Bruce; Wagner, David; Analysis of the SSL 3.0 protocol).
I think that with the wrong model you can show a lot of good things about anything. Once you abstract from the real world and build a model you might just have ignored that little most important feature. Maybe it is time for a best-practices in data modeling and data mining (there are already some books out there for some specific domains) …
Posted in Data Mining, Machine Learning, Ramblings | Print | No Comments »