Archive for July, 2008

The cloud obscuring the scientific method

Saturday, July 12th, 2008

“All models are wrong, and increasingly you can succeed without them” — George Box
Sometimes…” — Me

In a Wired article about the Peta-byte age of data processing the author claimed that given the enormous amounts of data and the patterns found by data mining we are less and less dependent on scientific theory. This has been strongly disputed (see Why the cloud cannot obscure the Scientific Method) as the author simply ignores the fact that all the patterns that were found are not necessarily exploitable – finding a group of genes that interact is a first step, but won’t cure cancer. However, in machine translation or placing advertising online one can succeed with little to no domain knowledge. That is, once somebody comes up with the right features to use (see Choosing the right features for Data Mining).

What would be interesting to develop, however, is a “meta-learning” algorithm that can abstract from simpler models and learn e.g. a differential equation. For example, lets take data from several hundred Physics experiments about heat-distribution conducted on different surfaces etc. We can probably learn a regression model for one particular experiment which could predict how the heat will distribute given the parameters of the experiment (material, surface etc.). The meta-learning algorithm would then look at these models and somehow come up with the heat-equation. That would be something…