<?xml version="1.0" encoding="UTF-8"?>
<!-- generator="wordpress/2.2.1" -->
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	>

<channel>
	<title>Markus Breitenbach</title>
	<link>http://blog.markus-breitenbach.com</link>
	<description>AI, Data Mining, Machine Learning and other things</description>
	<pubDate>Mon, 12 Jul 2010 00:56:46 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.2.1</generator>
	<language>en</language>
			<item>
		<title>GraphLab &#038; Parallel Machine Learning</title>
		<link>http://blog.markus-breitenbach.com/2010/07/11/graphlab-parallel-machine-learning/</link>
		<comments>http://blog.markus-breitenbach.com/2010/07/11/graphlab-parallel-machine-learning/#comments</comments>
		<pubDate>Mon, 12 Jul 2010 00:56:46 +0000</pubDate>
		<dc:creator>Markus</dc:creator>
		
		<category><![CDATA[Data Mining]]></category>

		<category><![CDATA[Machine Learning]]></category>

		<guid isPermaLink="false">http://blog.markus-breitenbach.com/2010/07/11/graphlab-parallel-machine-learning/</guid>
		<description><![CDATA[Interesting article:  GraphLab: A New Framework for Parallel Machine Learning

From the abstract:
Designing and implementing efficient, provably correct parallel machine learning (ML) algorithms is challenging. Existing high-level parallel abstractions like MapReduce are insufficiently expressive while low-level tools like MPI and Pthreads leave ML experts repeatedly solving the same design challenges. By targeting common patterns in ML, [...]]]></description>
			<content:encoded><![CDATA[<p>Interesting article:<a href="http://arxiv.org/abs/1006.4990" target="_blank">  GraphLab: A New Framework for Parallel Machine Learning<br />
</a></p>
<p>From the abstract:</p>
<blockquote><p>Designing and implementing efficient, provably correct parallel machine learning (ML) algorithms is challenging. Existing high-level parallel abstractions like MapReduce are insufficiently expressive while low-level tools like MPI and Pthreads leave ML experts repeatedly solving the same design challenges. By targeting common patterns in ML, we developed GraphLab, which improves upon abstractions like MapReduce by compactly expressing asynchronous iterative algorithms with sparse computational dependencies while ensuring data consistency and achieving a high degree of parallel performance. We demonstrate the expressiveness of the GraphLab framework by designing and implementing parallel versions of belief propagation, Gibbs sampling, Co-EM, Lasso and Compressed Sensing. We show that using GraphLab we can achieve excellent parallel performance on large scale real-world problems.</p></blockquote>
<p>Given all the talk about Map-Reduce, Hadoop etc. this seems like a logical next step to make scaling data mining to large data sets a lot easier.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.markus-breitenbach.com/2010/07/11/graphlab-parallel-machine-learning/feed/</wfw:commentRss>
		</item>
		<item>
		<title>PHP configuration using htaccess on 1and1 shared hosting</title>
		<link>http://blog.markus-breitenbach.com/2010/06/15/php-configuration-in-htaccess-on-1and1-shared-hosting/</link>
		<comments>http://blog.markus-breitenbach.com/2010/06/15/php-configuration-in-htaccess-on-1and1-shared-hosting/#comments</comments>
		<pubDate>Wed, 16 Jun 2010 00:21:04 +0000</pubDate>
		<dc:creator>Markus</dc:creator>
		
		<category><![CDATA[Fixing Stuff]]></category>

		<guid isPermaLink="false">http://blog.markus-breitenbach.com/2010/06/15/php-configuration-in-htaccess-on-1and1-shared-hosting/</guid>
		<description><![CDATA[I had some problems setting PHP values for shared hosting on 1and1 and the suggested way from their FAQ using php.ini didn&#8217;t work for me. Here are the settings in .htaccess that worked for me:
AddType x-mapp-php5 .php
# PHP 4, Apache 1
&#60;IfModule mod_php4.c&#62;
php_value magic_quotes_gpc             [...]]]></description>
			<content:encoded><![CDATA[<p>I had some problems setting PHP values for shared hosting on 1and1 and the suggested way from their FAQ using php.ini didn&#8217;t work for me. Here are the settings in .htaccess that worked for me:</p>
<blockquote><p>AddType x-mapp-php5 .php</p>
<p># PHP 4, Apache 1<br />
&lt;IfModule mod_php4.c&gt;<br />
php_value magic_quotes_gpc                0<br />
php_value register_globals                0<br />
php_value session.auto_start              0<br />
&lt;/IfModule&gt;</p>
<p># PHP 4, Apache 2<br />
&lt;IfModule sapi_apache2.c&gt;<br />
php_value magic_quotes_gpc                0<br />
php_value register_globals                0<br />
php_value session.auto_start              0<br />
&lt;/IfModule&gt;</p>
<p># PHP 5, Apache 1 and 2<br />
&lt;IfModule mod_php5.c&gt;<br />
php_value magic_quotes_gpc                0<br />
php_value register_globals                0<br />
php_value session.auto_start              0<br />
&lt;/IfModule&gt;</p></blockquote>
<p>This instructs the server to use PHP5 and the configuration below is turning off the magic quotes, register globals and session auto start features.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.markus-breitenbach.com/2010/06/15/php-configuration-in-htaccess-on-1and1-shared-hosting/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Energy efficient data mining algorithms</title>
		<link>http://blog.markus-breitenbach.com/2010/02/28/energy-efficient-data-mining-algorithms/</link>
		<comments>http://blog.markus-breitenbach.com/2010/02/28/energy-efficient-data-mining-algorithms/#comments</comments>
		<pubDate>Sun, 28 Feb 2010 16:21:20 +0000</pubDate>
		<dc:creator>Markus</dc:creator>
		
		<category><![CDATA[Data Mining]]></category>

		<guid isPermaLink="false">http://blog.markus-breitenbach.com/2010/02/28/energy-efficient-data-mining-algorithms/</guid>
		<description><![CDATA[I was a bit amused to read about this new algorithm that IBM research developed and that was sold as &#8220;energy efficient&#8221; in their press-release. This is good marketing, because the average journalist and reader might not understand the impact of the improvement. It just sounds a lot better to be green and save energy [...]]]></description>
			<content:encoded><![CDATA[<p>I was a bit amused to read about this <a href="http://science.slashdot.org/story/10/02/26/1343222/IBM-Claims-Breakthrough-Energy-Efficient-Algorithm?art_pos=3" target="_blank">new algorithm that IBM research developed and that was sold as &#8220;energy efficient&#8221;</a> in their press-release. This is good marketing, because the average journalist and reader might not understand the impact of the improvement. It just sounds a lot better to be green and save energy than to improve computational complexity&#8230;</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.markus-breitenbach.com/2010/02/28/energy-efficient-data-mining-algorithms/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Alternative measures to the AUC for rare-event prognostic models</title>
		<link>http://blog.markus-breitenbach.com/2010/02/16/alternative-measures-to-the-auc-for-rare-event-prognostic-models/</link>
		<comments>http://blog.markus-breitenbach.com/2010/02/16/alternative-measures-to-the-auc-for-rare-event-prognostic-models/#comments</comments>
		<pubDate>Wed, 17 Feb 2010 03:56:41 +0000</pubDate>
		<dc:creator>Markus</dc:creator>
		
		<category><![CDATA[Statistics]]></category>

		<category><![CDATA[Classification]]></category>

		<category><![CDATA[Data Mining]]></category>

		<category><![CDATA[Machine Learning]]></category>

		<guid isPermaLink="false">http://blog.markus-breitenbach.com/2010/02/16/alternative-measures-to-the-auc-for-rare-event-prognostic-models/</guid>
		<description><![CDATA[How can one evaluate the performance of prognostic models in a meaningful way? This is a very basic and yet an interesting problem especially in the context of prediction of very rare events (base-rates &#60;10%). How reliable is the model&#8217;s forecast? This is a good question and of particular importance when it matters - think [...]]]></description>
			<content:encoded><![CDATA[<p>How can one evaluate the performance of prognostic models in a meaningful way? This is a very basic and yet an interesting problem especially in the context of prediction of very rare events (base-rates &lt;10%). How reliable is the model&#8217;s forecast? This is a good question and of particular importance when it matters - think criminal psychology where models forecast the likelihood of recidivism for criminally insane people (Quinsey 1980). There are a variety of ways to evaluate a model&#8217;s predictive performance on a hold out sample, and some are more meaningful than others. For example, when using error-rates one should keep in mind that they are only meaningful when you consider the base-rate of your classes and the trivial classifier as well. Often this gets confusing when you are dealing with very imbalanced data sets or rare events. In this blog post, I&#8217;ll summarize a few techniques and alternative evaluation methods for predictive models that are particularly useful when dealing with rare events or low base-rates in general.</p>
<p>The <a href="http://en.wikipedia.org/wiki/Receiver_operating_characteristic" target="_blank">Receiver Operator Characteristic</a> is a graphical measure that plots the true versus false positive rates such that the user can decide where to cut for making the final classification decision. In order to summarize the performance of the graph in a single, reportable number, the area under the curve (AUC) is generally used.</p>
<p> <a href="http://blog.markus-breitenbach.com/2010/02/16/alternative-measures-to-the-auc-for-rare-event-prognostic-models/#more-95" class="more-link">(more&#8230;)</a></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.markus-breitenbach.com/2010/02/16/alternative-measures-to-the-auc-for-rare-event-prognostic-models/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Spam Filtering by Learning a Pattern Language</title>
		<link>http://blog.markus-breitenbach.com/2010/01/26/spam-filtering-by-learning-a-pattern-language/</link>
		<comments>http://blog.markus-breitenbach.com/2010/01/26/spam-filtering-by-learning-a-pattern-language/#comments</comments>
		<pubDate>Wed, 27 Jan 2010 01:54:12 +0000</pubDate>
		<dc:creator>Markus</dc:creator>
		
		<category><![CDATA[spam]]></category>

		<category><![CDATA[Machine Learning]]></category>

		<guid isPermaLink="false">http://blog.markus-breitenbach.com/2010/01/26/spam-filtering-by-learning-a-pattern-language/</guid>
		<description><![CDATA[The New Scientist describes a new method for spam detection by learning patterns. This new method exploits the spammers most powerful weapon - the automatic generation of many, similar messages by automated means (i.e., some grammar in a formal language) - and turns it against them. The article reports that a pattern can reliably be [...]]]></description>
			<content:encoded><![CDATA[<p>The New Scientist describes a new <a href="http://www.newscientist.com/article/mg20527446.000-to-beat-spam-turn-its-own-weapons-against-it.html" target="_blank">method for spam detection by learning patterns</a>. This new method exploits the spammers most powerful weapon - the automatic generation of many, similar messages by automated means (i.e., some grammar in a <a href="http://en.wikipedia.org/wiki/Formal_languages" target="_blank">formal language</a>) - and turns it against them. The article reports that a pattern can reliably be learned from about 1000 examples captured from a bot, allowing the method to classify new messages accurately and with zero false positives. This sounds really exciting given my full spam-folder.</p>
<p>However, I&#8217;m a bit cautious. The article is a bit sparse on technical details, so I might make some wrong assumptions here. First, zero false positives reported is the discrimination of spam <em>from that particular spam-grammar</em> versus other messages. At least that&#8217;s how I understand it. Second, it seems from the article that they only learn from positive examples. Overall the technique sounds to me like they are learning a pattern language. Pattern languages are a class of grammars that overlap with linear and context-sensitive grammars (<a href="http://en.wikipedia.org/wiki/Chomsky_hierarchy" target="_blank">Chomsky hierarchy</a>). Unfortunately they don&#8217;t have a real <a href="http://en.wikipedia.org/wiki/Pattern_language_(disambiguation)" target="_blank">Wikipedia page</a> so I&#8217;ll try to give a bit of background. The closest I can give for an example right now would be regular expressions with back-references. I&#8217;m not sure if this is an accurate description for all possible patterns, but it&#8217;s close enough for an example.</p>
<p>I don&#8217;t know how the specific technique mentioned in the article works in detail, but I&#8217;ve learned two things about learning grammars from text: (a) we can&#8217;t learn all linear or context-sensitive languages, only all pattern language grammars; (b) learning patterns without negative examples leads to over-generalization really really fast.</p>
<p>While I haven&#8217;t worked with learning grammars in a long while, the only algorithm of which I&#8217;m aware is the Lange-Wiehagen algorithm (Steffen Lange and Rolf Wiehagen; Polynomial-time inference of arbitrary pattern languages. New Generation Computing, 8(4):361-370, 1991). This algorithm is not a consistent learner, but can learn all pattern languages in polynomial time. There might be better ones available by now, but learning grammars is not that popular in the machine learning community right now. I&#8217;m sure there are some other interesting applications besides spam filtering. Maybe it&#8217;s time for a revival.</p>
<p>Overall, it sounds like a promising new anti-spam technique, but I&#8217;d like to see some more realistic testing done. There are some obvious ways for spammers to make learning these patterns harder, but either way I&#8217;m curious - maybe the inventors of this technique discovered a better way to learn patterns? Maybe by using some problem-specific domain knowledge?</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.markus-breitenbach.com/2010/01/26/spam-filtering-by-learning-a-pattern-language/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Strong profiling is not mathematically optimal for discovering rare malfeasors (on rare event detection)</title>
		<link>http://blog.markus-breitenbach.com/2010/01/10/strong-profiling-is-not-mathematically-optimal-for-discovering-rare-malfeasors-on-rare-event-detection/</link>
		<comments>http://blog.markus-breitenbach.com/2010/01/10/strong-profiling-is-not-mathematically-optimal-for-discovering-rare-malfeasors-on-rare-event-detection/#comments</comments>
		<pubDate>Sun, 10 Jan 2010 21:37:45 +0000</pubDate>
		<dc:creator>Markus</dc:creator>
		
		<category><![CDATA[Math]]></category>

		<category><![CDATA[Society]]></category>

		<category><![CDATA[Statistics]]></category>

		<guid isPermaLink="false">http://blog.markus-breitenbach.com/2010/01/10/strong-profiling-is-not-mathematically-optimal-for-discovering-rare-malfeasors-on-rare-event-detection/</guid>
		<description><![CDATA[Just in time for the latest Christmas terror scare, I came across an interesting paper: &#8220;Strong profiling is not mathematically optimal for discovering rare malfeasors&#8221; (William H. Press; PNAS 106(6), p. 1716-1719 www.pnas.org/cgi/doi/10.1073/pnas.0813202106). In the paper, the author investigates whether profiling by nationality or ethnicity can be justified mathematically and tries to answer the question [...]]]></description>
			<content:encoded><![CDATA[<p>Just in time for the latest Christmas terror scare, I came across an interesting paper: &#8220;<a href="http://www.pnas.org/content/106/6/1716.full.pdf" target="_blank">Strong profiling is not mathematically optimal for discovering rare malfeasors</a>&#8221; (William H. Press; PNAS 106(6), p. 1716-1719 <a href="http://www.pnas.org/cgi/doi/10.1073/pnas.0813202106" target="_blank">www.pnas.org/cgi/doi/10.1073/pnas.0813202106</a>). In the paper, the author investigates whether profiling by nationality or ethnicity can be justified mathematically and tries to answer the question of how much screening must we do, on average, to catch the bad guys in the crowd. Rare events detection is hard as it is, and it&#8217;s interesting to see a look from the sampling perspective. It&#8217;s an interesting and short read. Long story short, it shows that using an indiscriminate feature like nationality or ethnicity is not optimal (as is any screening at least in proportion to a prior probability) and wastes resources.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.markus-breitenbach.com/2010/01/10/strong-profiling-is-not-mathematically-optimal-for-discovering-rare-malfeasors-on-rare-event-detection/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Starcraft AI competition</title>
		<link>http://blog.markus-breitenbach.com/2009/11/13/starcraft-ai-competition/</link>
		<comments>http://blog.markus-breitenbach.com/2009/11/13/starcraft-ai-competition/#comments</comments>
		<pubDate>Fri, 13 Nov 2009 04:27:55 +0000</pubDate>
		<dc:creator>Markus</dc:creator>
		
		<category><![CDATA[Artificial Intelligence (AI)]]></category>

		<guid isPermaLink="false">http://blog.markus-breitenbach.com/2009/11/13/starcraft-ai-competition/</guid>
		<description><![CDATA[UCSC is holding a Starcraft AI competition. I wish I had the time to participate&#8230; Starcraft is one of my all time favorite games, and writing a better AI for a real-time strategy game is certainly interesting and challenging.
]]></description>
			<content:encoded><![CDATA[<p>UCSC is holding a <a href="http://eis.ucsc.edu/StarCraftAICompetition" target="_blank">Starcraft AI competition</a>. I wish I had the time to participate&#8230; Starcraft is one of my all time favorite games, and writing a better AI for a real-time strategy game is certainly interesting and challenging.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.markus-breitenbach.com/2009/11/13/starcraft-ai-competition/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Random characters in text mode -&gt; graphics card</title>
		<link>http://blog.markus-breitenbach.com/2009/07/25/random-characters-in-text-mode-graphics-card/</link>
		<comments>http://blog.markus-breitenbach.com/2009/07/25/random-characters-in-text-mode-graphics-card/#comments</comments>
		<pubDate>Sun, 26 Jul 2009 00:34:00 +0000</pubDate>
		<dc:creator>Markus</dc:creator>
		
		<category><![CDATA[Fixing Stuff]]></category>

		<guid isPermaLink="false">http://blog.markus-breitenbach.com/2009/07/25/random-characters-in-text-mode-graphics-card/</guid>
		<description><![CDATA[Quick note: One of the strangest things I&#8217;ve seen in a while was during my desktop&#8217;s boot-up today. There were random lines across the manufacturer&#8217;s BIOS logo, then all sorts of weird and random characters during BIOS messages and boot-manager. The monitor was fine, the power-on self test didn&#8217;t indicate anything fishy and even Linux [...]]]></description>
			<content:encoded><![CDATA[<p>Quick note: One of the strangest things I&#8217;ve seen in a while was during my desktop&#8217;s boot-up today. There were random lines across the manufacturer&#8217;s BIOS logo, then all sorts of weird and random characters during BIOS messages and boot-manager. The monitor was fine, the power-on self test didn&#8217;t indicate anything fishy and even Linux would boot fine (but only in 640 x 480 resolution). If it had been the RAM or something, chances would be that the OS would have crashed or complained. Obviously it wasn&#8217;t a driver or OS issue as the computer hadn&#8217;t even booted up yet. It turns out it was the graphics card (an old 7xxx nVidia) and replacing it with a newer one did the trick. I&#8217;m a bit puzzled how the graphics card could have caused all those weird characters to show up, but I&#8217;m guessing the graphics RAM might have died or something like that.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.markus-breitenbach.com/2009/07/25/random-characters-in-text-mode-graphics-card/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Programs stealing the input focus</title>
		<link>http://blog.markus-breitenbach.com/2009/06/07/programs-stealing-the-input-focus/</link>
		<comments>http://blog.markus-breitenbach.com/2009/06/07/programs-stealing-the-input-focus/#comments</comments>
		<pubDate>Sun, 07 Jun 2009 21:04:22 +0000</pubDate>
		<dc:creator>Markus</dc:creator>
		
		<category><![CDATA[Coding / Programming]]></category>

		<category><![CDATA[Ramblings]]></category>

		<guid isPermaLink="false">http://blog.markus-breitenbach.com/2009/06/07/programs-stealing-the-input-focus/</guid>
		<description><![CDATA[Ok, this post is more of a rant. I&#8217;m one of those people that are a bit impatient when starting a program on my desktop. When I start up my Windows machine I click on several buttons in the &#8220;quicklaunch&#8221; bar to fire up what I&#8217;ll need to use - Outlook, R / SPSS /SAS, [...]]]></description>
			<content:encoded><![CDATA[<p>Ok, this post is more of a rant. I&#8217;m one of those people that are a bit impatient when starting a program on my desktop. When I start up my Windows machine I click on several buttons in the &#8220;quicklaunch&#8221; bar to fire up what I&#8217;ll need to use - Outlook, R / SPSS /SAS, Winamp etc. So why do all sorts of dialogs pop up in my face while I am typing? Why does winamp have to pop up while I&#8217;m typing my email password? And why do they have to switch the input focus so that whatever I&#8217;ve happen to type now ends up in the wrong window? This is so annoying. <a href="http://www.codinghorror.com/blog/archives/001011.html" target="_blank">Stealing the input focus is a known problem </a>that has been written about countless times. It&#8217;s even against the <a href="http://msdn.microsoft.com/en-us/library/ms971323.aspx" target="_blank">GUI programming guidelines</a>. &#8220;Do not steal the input focus&#8221; - what&#8217;s so difficult about that?</p>
<p>As a first consequence Norton Internet Security is now gone from my machine forever after it kept reminding me constantly - specifically with an uncanny accuracy when I was busy playing computer games - that I need to renew my anti-virus subscription or bad things will happen to my computer. And bad things did happen to my video game. But not anymore&#8230;</p>
<p>On the upside, there&#8217;s a carefully hidden option in the <a href="http://www.microsoft.com/windowsxp/Downloads/powertoys/Xppowertoys.mspx" target="_blank">Windows XP Powertoys (TweakUI)</a> that is supposed to prevent programs from stealing the input focus. It made things better, but doesn&#8217;t seem to work all the time.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.markus-breitenbach.com/2009/06/07/programs-stealing-the-input-focus/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Famous bugs in AI game engine caught on tape</title>
		<link>http://blog.markus-breitenbach.com/2009/05/02/famous-bugs-in-ai-game-engine-caught-on-tape/</link>
		<comments>http://blog.markus-breitenbach.com/2009/05/02/famous-bugs-in-ai-game-engine-caught-on-tape/#comments</comments>
		<pubDate>Sat, 02 May 2009 20:06:58 +0000</pubDate>
		<dc:creator>Markus</dc:creator>
		
		<category><![CDATA[Artificial Intelligence (AI)]]></category>

		<category><![CDATA[Ramblings]]></category>

		<category><![CDATA[Random]]></category>

		<guid isPermaLink="false">http://blog.markus-breitenbach.com/2009/05/02/famous-bugs-in-ai-game-engine-caught-on-tape/</guid>
		<description><![CDATA[Found this on aigamedev and some of them are really hilarious: AI game bugs caught on tape
]]></description>
			<content:encoded><![CDATA[<p>Found this on aigamedev and some of them are really hilarious: <a href="http://aigamedev.com/open/article/bugs-caught-on-tape/" target="_blank">AI game bugs caught on tape</a></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.markus-breitenbach.com/2009/05/02/famous-bugs-in-ai-game-engine-caught-on-tape/feed/</wfw:commentRss>
		</item>
	</channel>
</rss>
