You are currently browsing the Markus Breitenbach weblog archives for the day January 8, 2007 7:58 am.
- Advertising (1)
- Artificial Intelligence (AI) (9)
- Classification (1)
- Coding / Programming (6)
- Cryptography (1)
- Data Mining (11)
- ewrt linux (2)
- Fixing Stuff (5)
- Machine Learning (19)
- Math (1)
- Politics (2)
- Psychology (3)
- Ramblings (19)
- Random (6)
- Security (11)
- Society (9)
- Sociology (3)
- spam (2)
- Statistics (9)
- July 12, 2008 4:41 pm: The cloud obscuring the scientific method
- June 22, 2008 5:05 pm: Debugging and Evaluating Predictive Models
- May 21, 2008 8:08 pm: Cult of the Amateur
- April 21, 2008 1:38 am: ART OF SEDUCTION: Not Pretty, Really
- March 25, 2008 2:25 am: "Internal Server Error" when converting phpBB v2 to phpBB v3
- March 6, 2008 1:29 am: Firewire and DRM
- February 28, 2008 10:46 pm: Using Psychological Domain Knowledge for the Netflix Challenge
- February 12, 2008 1:24 am: VPN Tunnels from within VMWare (Windows XP and GRE weirdness)
- February 2, 2008 5:59 pm: License Key Copy Protection
- January 8, 2008 8:34 pm: Registering Domains with Network Solutions
Blogroll
Useful Links
Archive for January 8, 2007 7:58 am
Sequential Sampling and Machine Learning
January 8, 2007 7:58 am by Markus.
In order to estimate an unknown quantity mu a common approach is to design an experiment that results in a random variable Z distributed within the interval [0,1]. The expectation E[Z]=μ can then be estimated by running this experiment independently, averaging the outcomes, and using Monte-Carlo techniques for the estimate. In (Dagum, Karp, Luby and Ross, SIAM Computing,1995) the AA algorithm (”Approximation Algorithm”) is introduced which, given epsilon and delta and independent experiments for the random variable Z, produces an estimate of the mean (or the true expectation) that is within the factor of 1+ε of μ with probability of success of at least 1-δ. Note that there are no distributional assumptions by the algorithm. This has a couple of applications in machine learning, for example in Bayesian Inference, Bayesian Networks and Boosting (Domingo and Watanabe, PAC-KDD, 2000).
The AA algorithm works in three steps. First, the stopping rule computes an initial estimate of the mean. Then, the variance is determined and, in the third step, additional samples are taken to approximate the expectation even further. A small improvement for the stopping rule in step one can be made as follows. The algorithm assumes a non-zero expectation and keeps sampling until the sum of the elements is larger than a constant determined by epsilon and delta (read the paper to see why that works). The problem is that the closer the elements are to zero, the more elements are needed.
Observe that the following holds for the mean:

With that one can improve the stopping rule as follows:

P.S.: To type Greek letters into Wordpress use the html named entities such as ε for ε. That took me forever …
Posted in Coding / Programming, Statistics, Machine Learning | Print | No Comments »