<?xml version="1.0" encoding="UTF-8"?>
<!-- generator="wordpress/2.2.1" -->
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	>

<channel>
	<title>Markus Breitenbach</title>
	<link>http://blog.markus-breitenbach.com</link>
	<description>AI, Data Mining, Machine Learning and other things</description>
	<pubDate>Sat, 28 Jan 2012 20:56:10 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.2.1</generator>
	<language>en</language>
			<item>
		<title>Will 2012 be the year of Big Data?</title>
		<link>http://blog.markus-breitenbach.com/2012/01/28/will-2012-be-the-year-of-big-data/</link>
		<comments>http://blog.markus-breitenbach.com/2012/01/28/will-2012-be-the-year-of-big-data/#comments</comments>
		<pubDate>Sat, 28 Jan 2012 20:56:10 +0000</pubDate>
		<dc:creator>Markus</dc:creator>
		
		<category><![CDATA[Data Mining]]></category>

		<guid isPermaLink="false">http://blog.markus-breitenbach.com/2012/01/28/will-2012-be-the-year-of-big-data/</guid>
		<description><![CDATA[Interesting view on that here.
]]></description>
			<content:encoded><![CDATA[<p>Interesting view on that <a href="http://www.fastcompany.com/1811441/why-big-data-won-t-make-you-smart-rich-or-pretty">here</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.markus-breitenbach.com/2012/01/28/will-2012-be-the-year-of-big-data/feed/</wfw:commentRss>
		</item>
		<item>
		<title>UK plans to exempt data mining from copyright laws</title>
		<link>http://blog.markus-breitenbach.com/2011/08/14/uk-plans-to-exempt-data-mining-from-copyright-laws/</link>
		<comments>http://blog.markus-breitenbach.com/2011/08/14/uk-plans-to-exempt-data-mining-from-copyright-laws/#comments</comments>
		<pubDate>Mon, 15 Aug 2011 02:41:04 +0000</pubDate>
		<dc:creator>Markus</dc:creator>
		
		<category><![CDATA[Data Mining]]></category>

		<guid isPermaLink="false">http://blog.markus-breitenbach.com/2011/08/14/uk-plans-to-exempt-data-mining-from-copyright-laws/</guid>
		<description><![CDATA[The UK is in the process of overhauling their overly stringent copyright laws. That&#8217;s an interesting development (see the Nature blog entry on the topic).  One idea being discussed is to generally allow data and text mining without the copyright holders permission, which would usually be required for any kind of electronic processing.
]]></description>
			<content:encoded><![CDATA[<p>The UK is in the process of overhauling their overly stringent copyright laws. That&#8217;s an interesting development (see the <a href="http://blogs.nature.com/news/2011/08/data_mining_given_the_go_ahead.html" target="_blank">Nature blog entry on the topic</a>).  One idea being discussed is to generally allow data and text mining without the copyright holders permission, which would usually be required for any kind of electronic processing.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.markus-breitenbach.com/2011/08/14/uk-plans-to-exempt-data-mining-from-copyright-laws/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Risk Assessment of Rare Events in adversarial Scenarios</title>
		<link>http://blog.markus-breitenbach.com/2011/06/21/risk-assessment-of-rare-events-in-adversarial-scenarios/</link>
		<comments>http://blog.markus-breitenbach.com/2011/06/21/risk-assessment-of-rare-events-in-adversarial-scenarios/#comments</comments>
		<pubDate>Tue, 21 Jun 2011 07:26:53 +0000</pubDate>
		<dc:creator>Markus</dc:creator>
		
		<category><![CDATA[Predictive Modeling]]></category>

		<category><![CDATA[Society]]></category>

		<category><![CDATA[Statistics]]></category>

		<category><![CDATA[Security]]></category>

		<guid isPermaLink="false">http://blog.markus-breitenbach.com/2011/06/21/risk-assessment-of-rare-events-in-adversarial-scenarios/</guid>
		<description><![CDATA[The RAND corporation just published an interesting paper exploring the benefits of using risk prediction to reduce the screening required at airports. You might have noticed various attempts to establish some kind of fast-lane or trusted traveler program. Obvious this is a very sensitive topic and probably hard to get right. Screening certain groups of [...]]]></description>
			<content:encoded><![CDATA[<p>The RAND corporation just published an interesting paper exploring the benefits of using risk prediction to reduce the screening required at airports. You might have noticed various attempts to establish some kind of fast-lane or trusted traveler program. Obvious this is a very sensitive topic and probably hard to get right. Screening certain groups of the population more than others (&#8221;profiling&#8221;) is generally frowned upon and also not a good idea in general (see &#8220;<a href="http://blog.markus-breitenbach.com/2010/01/10/strong-profiling-is-not-mathematically-optimal-for-discovering-rare-malfeasors-on-rare-event-detection/">Strong profiling is not mathematically optimal for discovering rare malfeasors on rare event detection</a>&#8220;), but what hasn&#8217;t been examined much is identifying people that can be considered more &#8220;safe&#8221; than others. The paper explores that idea and shows that even under the assumption that the bad guys will try and subvert this program that there can be benefits to implementing this solution. The paper is a bit sparse on mathematical details. Certainly an interesting idea, though.</p>
<p><strong>Paper</strong>: <a href="http://www.rand.org/content/dam/rand/pubs/working_papers/2011/RAND_WR855.pdf">Assessing the Security Benefits of a Trusted Traveler Program in the Presence of Attempted Attacker Exploitation and Compromise</a></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.markus-breitenbach.com/2011/06/21/risk-assessment-of-rare-events-in-adversarial-scenarios/feed/</wfw:commentRss>
		</item>
		<item>
		<title>How Kinect body tracking works and how Machine Learning helped</title>
		<link>http://blog.markus-breitenbach.com/2011/03/26/how-kinect-body-tracking-works-and-how-machine-learning-helped/</link>
		<comments>http://blog.markus-breitenbach.com/2011/03/26/how-kinect-body-tracking-works-and-how-machine-learning-helped/#comments</comments>
		<pubDate>Sat, 26 Mar 2011 23:57:17 +0000</pubDate>
		<dc:creator>Markus</dc:creator>
		
		<category><![CDATA[Machine Learning]]></category>

		<guid isPermaLink="false">http://blog.markus-breitenbach.com/2011/03/26/how-kinect-body-tracking-works-and-how-machine-learning-helped/</guid>
		<description><![CDATA[Microsoft Research has published a paper explaining how the Kinect body tracking algorithm works [PDF]. This video shows how it all comes together. They trained a variation of Random Forests on the various pre-labeled images to identify the various body parts from a normal RBG camera and a depth-camera. The way they create many more [...]]]></description>
			<content:encoded><![CDATA[<p>Microsoft Research has published a <a href="http://research.microsoft.com/pubs/145347/BodyPartRecognition.pdf">paper explaining how the Kinect body tracking algorithm</a> works [PDF]. <a href="http://www.youtube.com/watch?v=HNkbG3KsY84">This video</a> shows how it all comes together. They trained a variation of Random Forests on the various pre-labeled images to identify the various body parts from a normal RBG camera and a depth-camera. The way they create many more training images from previously captured data is also interesting. The final system can run at 200 frames per second and it doesn&#8217;t need an initial calibration pose. Very interesting&#8230;</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.markus-breitenbach.com/2011/03/26/how-kinect-body-tracking-works-and-how-machine-learning-helped/feed/</wfw:commentRss>
		</item>
		<item>
		<title>European Court of Justice ruling (indirectly) on what cannot be used in Insurance Risk Models</title>
		<link>http://blog.markus-breitenbach.com/2011/03/01/european-court-of-justice-ruling-on-what-can-be-used-in-risk-models/</link>
		<comments>http://blog.markus-breitenbach.com/2011/03/01/european-court-of-justice-ruling-on-what-can-be-used-in-risk-models/#comments</comments>
		<pubDate>Tue, 01 Mar 2011 15:58:56 +0000</pubDate>
		<dc:creator>Markus</dc:creator>
		
		<category><![CDATA[Statistics]]></category>

		<guid isPermaLink="false">http://blog.markus-breitenbach.com/2011/03/01/european-court-of-justice-ruling-on-what-can-be-used-in-risk-models/</guid>
		<description><![CDATA[Insurers cannot charge different premiums to men and women because of  their gender, the European Court of Justice (ECJ) has ruled.
I&#8217;m not sure what to think of it. For one, insurance is not about fairness; it&#8217;s about risk. An insurance company should be able to use whatever reliable information for determining the true risk [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.bbc.co.uk/news/business-12606610" target="_blank">Insurers cannot charge different premiums to men and women because of  their gender, the European Court of Justice (ECJ) has ruled.</a></p>
<p>I&#8217;m not sure what to think of it. For one, insurance is not about fairness; it&#8217;s about risk. An insurance company should be able to use whatever reliable information for determining the true risk to help price policies. From what I&#8217;ve read it seems that young men cost ~50% more to insure than young women. This might not be true on an individual level, but it is true across the entire pool people. On the other hand, if all reliable information could be used, then health insurance would naturally be more expensive for people with, e.g., known genetic disorders if it were purely about risk. That wouldn&#8217;t be fair either. Legislating what can and cannot be used in what circumstances will be a hard trade off. In the intermediate term this ruling will probably lead to models that are using all sorts of things to work around this ruling in order to get an adequate risk score.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.markus-breitenbach.com/2011/03/01/european-court-of-justice-ruling-on-what-can-be-used-in-risk-models/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Mining of Massive Datasets</title>
		<link>http://blog.markus-breitenbach.com/2010/12/11/mining-of-massive-datasets/</link>
		<comments>http://blog.markus-breitenbach.com/2010/12/11/mining-of-massive-datasets/#comments</comments>
		<pubDate>Sun, 12 Dec 2010 00:35:17 +0000</pubDate>
		<dc:creator>Markus</dc:creator>
		
		<category><![CDATA[Data Mining]]></category>

		<guid isPermaLink="false">http://blog.markus-breitenbach.com/2010/12/11/mining-of-massive-datasets/</guid>
		<description><![CDATA[Anand Rajaraman and Jeff Ullman wrote a book called Mining of Massive Datasets that can be downloaded for free (PDF, 340 pages, 2MB). It focuses on data mining of very large amounts of data that do not fit in main memory as found on the frequently on the web from an algorithmic point of view.
Edit:Fixed URL
]]></description>
			<content:encoded><![CDATA[<p>Anand Rajaraman and Jeff Ullman wrote a book called <a href="http://infolab.stanford.edu/%7Eullman/pub/book.pdf" target="_blank">Mining of Massive Datasets</a> that can be downloaded for free (PDF, 340 pages, 2MB). It focuses on data mining of very large amounts of data that do not fit in main memory as found on the frequently on the web from an algorithmic point of view.</p>
<p><strong>Edit:</strong><a href="http://infolab.stanford.edu/~ullman/mmds/book.pdf">Fixed URL</a></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.markus-breitenbach.com/2010/12/11/mining-of-massive-datasets/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Ideas on communicating risks and probabilities to the general public</title>
		<link>http://blog.markus-breitenbach.com/2010/12/04/ideas-on-communicating-risks-and-probabilities-to-the-general-public/</link>
		<comments>http://blog.markus-breitenbach.com/2010/12/04/ideas-on-communicating-risks-and-probabilities-to-the-general-public/#comments</comments>
		<pubDate>Sat, 04 Dec 2010 18:28:58 +0000</pubDate>
		<dc:creator>Markus</dc:creator>
		
		<category><![CDATA[Statistics]]></category>

		<guid isPermaLink="false">http://blog.markus-breitenbach.com/2010/12/04/ideas-on-communicating-risks-and-probabilities-to-the-general-public/</guid>
		<description><![CDATA[I found an interesting article on how to communicate risks and probabilities to the public.
]]></description>
			<content:encoded><![CDATA[<p>I found an interesting article on <a href="http://www.decisionsciencenews.com/2010/12/03/some-ideas-on-communicating-risks-to-the-general-public/" target="_blank">how to communicate risks and probabilities to the public</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.markus-breitenbach.com/2010/12/04/ideas-on-communicating-risks-and-probabilities-to-the-general-public/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Birthday Paradox</title>
		<link>http://blog.markus-breitenbach.com/2010/10/17/birthday-paradox/</link>
		<comments>http://blog.markus-breitenbach.com/2010/10/17/birthday-paradox/#comments</comments>
		<pubDate>Sun, 17 Oct 2010 21:48:04 +0000</pubDate>
		<dc:creator>Markus</dc:creator>
		
		<category><![CDATA[Statistics]]></category>

		<guid isPermaLink="false">http://blog.markus-breitenbach.com/2010/10/17/birthday-paradox/</guid>
		<description><![CDATA[Here&#8217;s an interesting real world example for the Birthday Paradox: Lottery number combination repeats itself. Obligatory XKCD link.
]]></description>
			<content:encoded><![CDATA[<p>Here&#8217;s an interesting real world example for the <a href="http://en.wikipedia.org/wiki/Birthday_problem" target="_blank">Birthday Paradox</a>: <a href="http://www.ynetnews.com/articles/0,7340,L-3970484,00.html" target="_blank">Lottery number combination repeats itself</a>. Obligatory <a href="http://xkcd.com/221/" target="_blank">XKCD</a> link.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.markus-breitenbach.com/2010/10/17/birthday-paradox/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Elo Scores and Rating Contestants</title>
		<link>http://blog.markus-breitenbach.com/2010/08/05/elo-scores-and-rating-contestants/</link>
		<comments>http://blog.markus-breitenbach.com/2010/08/05/elo-scores-and-rating-contestants/#comments</comments>
		<pubDate>Thu, 05 Aug 2010 05:06:57 +0000</pubDate>
		<dc:creator>Markus</dc:creator>
		
		<category><![CDATA[Statistics]]></category>

		<category><![CDATA[Machine Learning]]></category>

		<guid isPermaLink="false">http://blog.markus-breitenbach.com/2010/08/05/elo-scores-and-rating-contestants/</guid>
		<description><![CDATA[Kaggle has a new and interesting competition on building a chess rating algorithm that performs better than the official Elo rating system currently in use. Entrants build their own rating systems based on the results of more than 65,000 historical chess games and then test their algorithms by predicting the results on a holdout set [...]]]></description>
			<content:encoded><![CDATA[<p>Kaggle has a new and interesting competition on building a <a href="http://kaggle.com/chess?viewtype=leaderboard" target="_blank">chess rating algorithm</a> that performs better than the official <a href="http://en.wikipedia.org/wiki/Elo_rating_system">Elo rating system</a> currently in use. Entrants build their own rating systems based on the results of more than 65,000 historical chess games and then test their algorithms by predicting the results on a holdout set of 7,800 games.</p>
<p>Looks like an interesting problem. The only other thing that comes to my mind literature-wise is that Microsoft built and published their <a href="http://research.microsoft.com/en-us/projects/trueskill/" target="_blank">TrueSkill™ Ranking System</a> for the XBox in order to match players with similar skills in online games. In the original paper at NIPS, the authors had shown that TrueSkill outperformed Elo.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.markus-breitenbach.com/2010/08/05/elo-scores-and-rating-contestants/feed/</wfw:commentRss>
		</item>
		<item>
		<title>GraphLab &#038; Parallel Machine Learning</title>
		<link>http://blog.markus-breitenbach.com/2010/07/11/graphlab-parallel-machine-learning/</link>
		<comments>http://blog.markus-breitenbach.com/2010/07/11/graphlab-parallel-machine-learning/#comments</comments>
		<pubDate>Mon, 12 Jul 2010 00:56:46 +0000</pubDate>
		<dc:creator>Markus</dc:creator>
		
		<category><![CDATA[Data Mining]]></category>

		<category><![CDATA[Machine Learning]]></category>

		<guid isPermaLink="false">http://blog.markus-breitenbach.com/2010/07/11/graphlab-parallel-machine-learning/</guid>
		<description><![CDATA[Interesting article:  GraphLab: A New Framework for Parallel Machine Learning

From the abstract:
Designing and implementing efficient, provably correct parallel machine learning (ML) algorithms is challenging. Existing high-level parallel abstractions like MapReduce are insufficiently expressive while low-level tools like MPI and Pthreads leave ML experts repeatedly solving the same design challenges. By targeting common patterns in ML, [...]]]></description>
			<content:encoded><![CDATA[<p>Interesting article:<a href="http://arxiv.org/abs/1006.4990" target="_blank">  GraphLab: A New Framework for Parallel Machine Learning<br />
</a></p>
<p>From the abstract:</p>
<blockquote><p>Designing and implementing efficient, provably correct parallel machine learning (ML) algorithms is challenging. Existing high-level parallel abstractions like MapReduce are insufficiently expressive while low-level tools like MPI and Pthreads leave ML experts repeatedly solving the same design challenges. By targeting common patterns in ML, we developed GraphLab, which improves upon abstractions like MapReduce by compactly expressing asynchronous iterative algorithms with sparse computational dependencies while ensuring data consistency and achieving a high degree of parallel performance. We demonstrate the expressiveness of the GraphLab framework by designing and implementing parallel versions of belief propagation, Gibbs sampling, Co-EM, Lasso and Compressed Sensing. We show that using GraphLab we can achieve excellent parallel performance on large scale real-world problems.</p></blockquote>
<p>Given all the talk about Map-Reduce, Hadoop etc. this seems like a logical next step to make scaling data mining to large data sets a lot easier.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.markus-breitenbach.com/2010/07/11/graphlab-parallel-machine-learning/feed/</wfw:commentRss>
		</item>
	</channel>
</rss>

