You are currently browsing the archives for the Math category.
| M | T | W | T | F | S | S |
|---|---|---|---|---|---|---|
| « Feb | ||||||
| 1 | 2 | 3 | 4 | 5 | 6 | 7 |
| 8 | 9 | 10 | 11 | 12 | 13 | 14 |
| 15 | 16 | 17 | 18 | 19 | 20 | 21 |
| 22 | 23 | 24 | 25 | 26 | 27 | 28 |
| 29 | 30 | 31 | ||||
- Advertising (1)
- Artificial Intelligence (AI) (13)
- Classification (3)
- Clustering (1)
- Coding / Programming (8)
- Cryptography (1)
- Data Mining (18)
- Economy / Investing (1)
- ewrt linux (2)
- Fixing Stuff (7)
- Machine Learning (29)
- Math (2)
- Politics (3)
- Predictive Modeling (4)
- Psychology (3)
- Ramblings (26)
- Random (9)
- Security (15)
- Society (12)
- Sociology (4)
- spam (3)
- Statistics (15)
- February 28, 2010 12:21 pm: Energy efficient data mining algorithms
- February 16, 2010 11:56 pm: Alternative measures to the AUC for rare-event prognostic models
- January 26, 2010 9:54 pm: Spam Filtering by Learning a Pattern Language
- January 10, 2010 5:37 pm: Strong profiling is not mathematically optimal for discovering rare malfeasors (on rare event detection)
- November 13, 2009 12:27 am: Starcraft AI competition
- July 25, 2009 8:34 pm: Random characters in text mode -> graphics card
- June 7, 2009 5:04 pm: Programs stealing the input focus
- May 2, 2009 4:06 pm: Famous bugs in AI game engine caught on tape
- April 16, 2009 2:27 am: Vundo?
- April 8, 2009 12:13 am: Filler items for Amazon Super Saver shipping
Blogroll
Uncategorized
Useful Links
- February 2010
- January 2010
- November 2009
- July 2009
- June 2009
- May 2009
- April 2009
- March 2009
- February 2009
- January 2009
- December 2008
- November 2008
- October 2008
- September 2008
- August 2008
- July 2008
- June 2008
- May 2008
- April 2008
- March 2008
- February 2008
- January 2008
- December 2007
- November 2007
- October 2007
- September 2007
- August 2007
- July 2007
- June 2007
- May 2007
- April 2007
- March 2007
- February 2007
- January 2007
- December 2006
- November 2006
- October 2006
- September 2006
- August 2006
Archive for the Math Category
Strong profiling is not mathematically optimal for discovering rare malfeasors (on rare event detection)
January 10, 2010 5:37 pm by Markus.
Just in time for the latest Christmas terror scare, I came across an interesting paper: “Strong profiling is not mathematically optimal for discovering rare malfeasors” (William H. Press; PNAS 106(6), p. 1716-1719 www.pnas.org/cgi/doi/10.1073/pnas.0813202106). In the paper, the author investigates whether profiling by nationality or ethnicity can be justified mathematically and tries to answer the question of how much screening must we do, on average, to catch the bad guys in the crowd. Rare events detection is hard as it is, and it’s interesting to see a look from the sampling perspective. It’s an interesting and short read. Long story short, it shows that using an indiscriminate feature like nationality or ethnicity is not optimal (as is any screening at least in proportion to a prior probability) and wastes resources.
Posted in Math, Society, Statistics | Print | 1 Comment »
EM-Clustering when double isn’t enough…
February 13, 2007 6:18 pm by Markus.
One of the more interesting developments in clustering (in my opinion) is clustering of data the is on a unit hypersphere. It sounds like some rare special case at first, but appears quite frequently in real life applications such as Spectral Embeddings in Spectral Clustering, some subdomains of Bio-Informatics data or text-data in TFIDF representation. The data can be analyzed as unit vectors on a d-dimensional hypersphere, or equivalently are directional in nature. Spectral clustering techniques generate embeddings in the normalization step that constitute an example of directional data and can result in different shapes on a hypersphere.
The first paper published that suggested a good clustering algorithm presented an Expectation Maximization (EM) algorithm for the von Mises-Fisher distribution (Banerjee et.al, JMLR (6) 2005). Avleen and myself started to work on extension for this that utilizes the Watson distribution, a distribution for directional data that has more modeling capability than the simple von Mises-Fisher distribution. We just published our results for the Watson EM clustering algorithm in the AISTATS 2007 conference to be held in March (Matlab code will be available soon).
One problem with both algorithms is that they require a high precission number representation in order to work well for high-dimensional problems such as bio-informatics data and text. Most prior work with directional data was limited to maybe 3 dimensional cases, and most Kummer-function approximations (another problem we had to address) work well only for the lower dimensional cases. In our AISTATS paper we only presented results for lower dimensional embeddings as we had some problems getting it to work for higher dimensional data (also, the root-solver that was involved is just incapable of handling larger problems). We have been working on a speedup with some success, but I have to say that it was mostly the numerical problems that gave us a hard time.
More and more Machine Learning techniques require a more careful consideration of numerical problems (Support Vector Machines, my manifold clustering algorithm etc.) and I run into numerical problems every other day. While trying to improve our Watson-EM algorithm I found out that Continued Fractions have many desirable properties such as the unique, finite representation of rational numbers. Numbers can be represented exact with no numerical error. In the EM algorithm we use them to approximate the Kummer function. Maybe a more exact number representation for fractions can be made out of this?
So I started looking and found in an tech-report that the number representation of continued fractions can be nicely implemented in Prolog. It also explains how to add up numbers in a Continued Fraction Representation and so on with an arbitrary precision.
I haven’t found any papers yet that suggest a suitable hardware implementation of Continued Fractions to replace the IEEE floating point numbers we use nowadays, but it can probably be done.
Posted in Math, Machine Learning, Artificial Intelligence (AI) | Print | No Comments »