<?xml version="1.0" encoding="UTF-8"?><!-- generator="wordpress/2.2.1" -->
<rss version="2.0" 
	xmlns:content="http://purl.org/rss/1.0/modules/content/">
<channel>
	<title>Comments for Markus Breitenbach</title>
	<link>http://blog.markus-breitenbach.com</link>
	<description>AI, Data Mining, Machine Learning and other things</description>
	<pubDate>Fri, 09 May 2008 20:47:38 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.2.1</generator>

	<item>
		<title>Comment on Safe Strings in PHP (2) by David Spector</title>
		<link>http://blog.markus-breitenbach.com/2007/07/01/safe-strings-in-php-2/#comment-6660</link>
		<author>David Spector</author>
		<pubDate>Mon, 21 Jan 2008 01:18:15 +0000</pubDate>
		<guid>http://blog.markus-breitenbach.com/2007/07/01/safe-strings-in-php-2/#comment-6660</guid>
		<description>I was struck by the complexity of your solution to the safety ("injection") issue for strings in PHP that come from a user.

I, too, have produced programmed solutions to other PHP problems.

I now believe that the proper solution is to fix PHP and/or the other components of Web programming. Something that you always want done should be done in the language or computer system, not in explicit coding.

Since the environment (the server programming tools) know which strings come from users and which do not, they can implement safety automatically.

If the maintainers of PHP are reluctant to improve it in areas such as string safety, that is a political or psychological problem only, and could be solved by proper justification.

After all, PHP has had some large changes in its history already, and people accepted them. A great example of this is the unsafe automatic importation of all GET and POST data into global variables, which is no longer done.

As a former professional compiler writer and language designer, I am well aware of the tradeoff between keeping a language fixed (standard) and improving it. But there are well-known ways to manage this process, such as the standards development policies of Ada and Fortran. In PHP, a simple directive could enable or disable a new feature like automatic safe strings. To gain the advantages, webmasters or programmers would only have to add one line to each PHP page or to the site's .htaccess file or the equivalent to enable the feature, and remove any existing safety code.

I can't see why the maintainers of PHP should object.

David Spector
Springtime Software</description>
		<content:encoded><![CDATA[<p>I was struck by the complexity of your solution to the safety (&#8221;injection&#8221;) issue for strings in PHP that come from a user.</p>
<p>I, too, have produced programmed solutions to other PHP problems.</p>
<p>I now believe that the proper solution is to fix PHP and/or the other components of Web programming. Something that you always want done should be done in the language or computer system, not in explicit coding.</p>
<p>Since the environment (the server programming tools) know which strings come from users and which do not, they can implement safety automatically.</p>
<p>If the maintainers of PHP are reluctant to improve it in areas such as string safety, that is a political or psychological problem only, and could be solved by proper justification.</p>
<p>After all, PHP has had some large changes in its history already, and people accepted them. A great example of this is the unsafe automatic importation of all GET and POST data into global variables, which is no longer done.</p>
<p>As a former professional compiler writer and language designer, I am well aware of the tradeoff between keeping a language fixed (standard) and improving it. But there are well-known ways to manage this process, such as the standards development policies of Ada and Fortran. In PHP, a simple directive could enable or disable a new feature like automatic safe strings. To gain the advantages, webmasters or programmers would only have to add one line to each PHP page or to the site&#8217;s .htaccess file or the equivalent to enable the feature, and remove any existing safety code.</p>
<p>I can&#8217;t see why the maintainers of PHP should object.</p>
<p>David Spector<br />
Springtime Software</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Making the Cisco VPN Client work (Error 51) by Bumika</title>
		<link>http://blog.markus-breitenbach.com/2006/11/22/making-ciscos-vpn-client-work-error-51/#comment-5113</link>
		<author>Bumika</author>
		<pubDate>Mon, 01 Oct 2007 16:01:03 +0000</pubDate>
		<guid>http://blog.markus-breitenbach.com/2006/11/22/making-ciscos-vpn-client-work-error-51/#comment-5113</guid>
		<description>I had the same problem using another security software. It blocked cvpnd.exe and logged a boep (buffer overflow exploit prevention) event. McAfee AV has a similar feature so that I think its boep blocked your instance of Cisco VPN client.</description>
		<content:encoded><![CDATA[<p>I had the same problem using another security software. It blocked cvpnd.exe and logged a boep (buffer overflow exploit prevention) event. McAfee AV has a similar feature so that I think its boep blocked your instance of Cisco VPN client.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on I passed my PhD defense by Christian Spannagel</title>
		<link>http://blog.markus-breitenbach.com/2007/09/19/i-passed-my-phd-defense/#comment-4927</link>
		<author>Christian Spannagel</author>
		<pubDate>Thu, 20 Sep 2007 13:40:01 +0000</pubDate>
		<guid>http://blog.markus-breitenbach.com/2007/09/19/i-passed-my-phd-defense/#comment-4927</guid>
		<description>Herzlichen Glückwunsch! :-)</description>
		<content:encoded><![CDATA[<p>Herzlichen Glückwunsch! <img src='http://blog.markus-breitenbach.com/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /></p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on I passed my PhD defense by Thomas Strohmann</title>
		<link>http://blog.markus-breitenbach.com/2007/09/19/i-passed-my-phd-defense/#comment-4924</link>
		<author>Thomas Strohmann</author>
		<pubDate>Thu, 20 Sep 2007 06:07:36 +0000</pubDate>
		<guid>http://blog.markus-breitenbach.com/2007/09/19/i-passed-my-phd-defense/#comment-4924</guid>
		<description>Congratulations, Markus!!!</description>
		<content:encoded><![CDATA[<p>Congratulations, Markus!!!</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Choosing the right features for Data Mining by Will Dwinnell</title>
		<link>http://blog.markus-breitenbach.com/2007/06/01/choosing-the-right-features-for-data-mining/#comment-2931</link>
		<author>Will Dwinnell</author>
		<pubDate>Mon, 11 Jun 2007 20:12:11 +0000</pubDate>
		<guid>http://blog.markus-breitenbach.com/2007/06/01/choosing-the-right-features-for-data-mining/#comment-2931</guid>
		<description>This is indeed an interesting subject, both from intellectual and practical perspectives.  I've been using two heuristics borrowed from Weiss and Indurkhya's "Predictive Data Mining".  

The first step excludes variables which seem obviously useless.  Sometimes this step removes few variables, but sometimes it removes many, which helps reduce the computational workload down-stream.

The second step uses a simple Mahalanobis-like distance between classes to gauge the predictive power of any given set of predictors.  I hooked up a genetic algorithm to this heuristic, to search for optimal sets of predictors.</description>
		<content:encoded><![CDATA[<p>This is indeed an interesting subject, both from intellectual and practical perspectives.  I&#8217;ve been using two heuristics borrowed from Weiss and Indurkhya&#8217;s &#8220;Predictive Data Mining&#8221;.  </p>
<p>The first step excludes variables which seem obviously useless.  Sometimes this step removes few variables, but sometimes it removes many, which helps reduce the computational workload down-stream.</p>
<p>The second step uses a simple Mahalanobis-like distance between classes to gauge the predictive power of any given set of predictors.  I hooked up a genetic algorithm to this heuristic, to search for optimal sets of predictors.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Choosing the right features for Data Mining by Olivier Bousquet</title>
		<link>http://blog.markus-breitenbach.com/2007/06/01/choosing-the-right-features-for-data-mining/#comment-2770</link>
		<author>Olivier Bousquet</author>
		<pubDate>Sat, 02 Jun 2007 21:12:12 +0000</pubDate>
		<guid>http://blog.markus-breitenbach.com/2007/06/01/choosing-the-right-features-for-data-mining/#comment-2770</guid>
		<description>Interesting post.
However, I would tend to disagree on the last paragraph, for at least two reasons:
1) you can see a recommender system as a prediction problem with features: indeed, what you are trying to predict is whether someone will like a certain kind of music and your features are the id of the person and the style of music. So you can consider that your data consists of 2 features (which can take their values in large but finite sets). The specificity is that you use some sort of transitive relation to build your model (eg. if you have (user=123, music=jazz, +) and (user=123, music=classical, +) and (user=245, music=jazz, +) in your database, you can predict a + for (user=245, music=classical))
2) you can probably do a better recommendation if you use features. Indeed, in the case you do not have an exact match, the recommender system may not do a good job, while if you had some additional features or information, you might solve the problem. Let me give an example: if instead of music style you try to predict whether someone likes a specific artist/band. You can find out that some people who like Mozart also like Duke Ellington, and you have a user who likes Charlie Parker. The question is whether this person likes Mozart. With a purely link-based recommender system, you cannot really answer, but if you have the additional feature "music style" you might recognize that Charlie Parker and Duke Ellington are jazz artists and thus draw a connection (of course if you had much more data you might also find out this connection automatically, but this is just an example of how data can substitute for prior knowledge). So the additional features (which introduce a structure or distance on your initial features [user, artist]) may provide useful information that helps you make better decisions with little data.
3) any prediction problem from features can be converted into a generalized recommender problem: indeed, if you have a problem with features X,Y,Z,T you can simply consider for  example the pair (X,Y) as the 'user' and the pair (Z,T) as the 'product'. You may as well consider any other split into two groups of your features. You can also consider splits into 3 groups or more (thus generalizing the concept), and it is relatively easy to generalize some of the algorithms used in recommender systems to more than 2 'matrix dimensions'. Yet another way would be to split into user X, with feature Y and 'product' Z with feature T...

So as a conclusion, I would really not distinguish recommender systems and vector-based prediction problems as they can be converted into one another, so I do not think there is any way to 'escape' the need of finding right features!</description>
		<content:encoded><![CDATA[<p>Interesting post.<br />
However, I would tend to disagree on the last paragraph, for at least two reasons:<br />
1) you can see a recommender system as a prediction problem with features: indeed, what you are trying to predict is whether someone will like a certain kind of music and your features are the id of the person and the style of music. So you can consider that your data consists of 2 features (which can take their values in large but finite sets). The specificity is that you use some sort of transitive relation to build your model (eg. if you have (user=123, music=jazz, +) and (user=123, music=classical, +) and (user=245, music=jazz, +) in your database, you can predict a + for (user=245, music=classical))<br />
2) you can probably do a better recommendation if you use features. Indeed, in the case you do not have an exact match, the recommender system may not do a good job, while if you had some additional features or information, you might solve the problem. Let me give an example: if instead of music style you try to predict whether someone likes a specific artist/band. You can find out that some people who like Mozart also like Duke Ellington, and you have a user who likes Charlie Parker. The question is whether this person likes Mozart. With a purely link-based recommender system, you cannot really answer, but if you have the additional feature &#8220;music style&#8221; you might recognize that Charlie Parker and Duke Ellington are jazz artists and thus draw a connection (of course if you had much more data you might also find out this connection automatically, but this is just an example of how data can substitute for prior knowledge). So the additional features (which introduce a structure or distance on your initial features [user, artist]) may provide useful information that helps you make better decisions with little data.<br />
3) any prediction problem from features can be converted into a generalized recommender problem: indeed, if you have a problem with features X,Y,Z,T you can simply consider for  example the pair (X,Y) as the &#8216;user&#8217; and the pair (Z,T) as the &#8216;product&#8217;. You may as well consider any other split into two groups of your features. You can also consider splits into 3 groups or more (thus generalizing the concept), and it is relatively easy to generalize some of the algorithms used in recommender systems to more than 2 &#8216;matrix dimensions&#8217;. Yet another way would be to split into user X, with feature Y and &#8216;product&#8217; Z with feature T&#8230;</p>
<p>So as a conclusion, I would really not distinguish recommender systems and vector-based prediction problems as they can be converted into one another, so I do not think there is any way to &#8216;escape&#8217; the need of finding right features!</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Google Co-Op by Markus</title>
		<link>http://blog.markus-breitenbach.com/2006/10/26/google-co-op/#comment-1392</link>
		<author>Markus</author>
		<pubDate>Tue, 27 Mar 2007 04:35:11 +0000</pubDate>
		<guid>http://blog.markus-breitenbach.com/2006/10/26/google-co-op/#comment-1392</guid>
		<description>John Ricard fixed a bug in my module. If you have problems getting it to work, the following might work for you.

http://www.johnricard.org/modules.php?name=Content&#038;pa=showpage&#038;pid=2</description>
		<content:encoded><![CDATA[<p>John Ricard fixed a bug in my module. If you have problems getting it to work, the following might work for you.</p>
<p><a href="http://www.johnricard.org/modules.php?name=Content&#038;pa=showpage&#038;pid=2" rel="nofollow">http://www.johnricard.org/modules.php?name=Content&#038;pa=showpage&#038;pid=2</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Making the Cisco VPN Client work (Error 51) by raytube</title>
		<link>http://blog.markus-breitenbach.com/2006/11/22/making-ciscos-vpn-client-work-error-51/#comment-1270</link>
		<author>raytube</author>
		<pubDate>Wed, 14 Mar 2007 15:51:37 +0000</pubDate>
		<guid>http://blog.markus-breitenbach.com/2006/11/22/making-ciscos-vpn-client-work-error-51/#comment-1270</guid>
		<description>That sounds excactly like the problem I have on a laptop.  It had way too many protocols installed, tunnelling hooks in it, it wouldnt get on the net.  I removed bunches of those, but it still has windows firewall and mcafee AV/sysprotect running.  I'm going to try and run zonealarm on it.  Then I'll try putting in  a router beteen the cable modem and laptop.</description>
		<content:encoded><![CDATA[<p>That sounds excactly like the problem I have on a laptop.  It had way too many protocols installed, tunnelling hooks in it, it wouldnt get on the net.  I removed bunches of those, but it still has windows firewall and mcafee AV/sysprotect running.  I&#8217;m going to try and run zonealarm on it.  Then I&#8217;ll try putting in  a router beteen the cable modem and laptop.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Validating patterns found by Data Mining techniques by Dan</title>
		<link>http://blog.markus-breitenbach.com/2007/02/22/validating-patterns-found-by-data-mining-techniques/#comment-146</link>
		<author>Dan</author>
		<pubDate>Tue, 27 Feb 2007 18:19:27 +0000</pubDate>
		<guid>http://blog.markus-breitenbach.com/2007/02/22/validating-patterns-found-by-data-mining-techniques/#comment-146</guid>
		<description>Check out this paper: Ioannidis JPA (2005) Why Most Published Research Findings Are False. PLoS Med 2(8): e124

http://dx.doi.org/10.1371/journal.pmed.0020124

To quote from the Abstract: "There is increasing concern that most current published research findings are false. The probability that a research claim is true may depend on study power and bias, the number of other studies on the same question, and, importantly, the ratio of true to no relationships among the relationships probed in each scientific field. [...] Simulations show that for most study designs and settings, it is more likely for a research claim to be false than true. [...]"

Also interesting: Djulbegovic B, Hozo I (2007) When Should Potentially False Research Findings Be Considered Acceptable? PLoS Med 4(2): e26 doi:10.1371/journal.pmed.0040026

Also: Moonesinghe R, Khoury MJ, Janssens ACJW (2007) Most Published Research Findings Are False-But a Little Replication Goes a Long Way. PLoS Med 4(2): e28 doi:10.1371/journal.pmed.0040028</description>
		<content:encoded><![CDATA[<p>Check out this paper: Ioannidis JPA (2005) Why Most Published Research Findings Are False. PLoS Med 2(8): e124</p>
<p><a href="http://dx.doi.org/10.1371/journal.pmed.0020124" rel="nofollow">http://dx.doi.org/10.1371/journal.pmed.0020124</a></p>
<p>To quote from the Abstract: &#8220;There is increasing concern that most current published research findings are false. The probability that a research claim is true may depend on study power and bias, the number of other studies on the same question, and, importantly, the ratio of true to no relationships among the relationships probed in each scientific field. [&#8230;] Simulations show that for most study designs and settings, it is more likely for a research claim to be false than true. [&#8230;]&#8221;</p>
<p>Also interesting: Djulbegovic B, Hozo I (2007) When Should Potentially False Research Findings Be Considered Acceptable? PLoS Med 4(2): e26 doi:10.1371/journal.pmed.0040026</p>
<p>Also: Moonesinghe R, Khoury MJ, Janssens ACJW (2007) Most Published Research Findings Are False-But a Little Replication Goes a Long Way. PLoS Med 4(2): e28 doi:10.1371/journal.pmed.0040028</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Online Dating by Markus</title>
		<link>http://blog.markus-breitenbach.com/2007/01/03/online-dating/#comment-125</link>
		<author>Markus</author>
		<pubDate>Tue, 27 Feb 2007 01:44:26 +0000</pubDate>
		<guid>http://blog.markus-breitenbach.com/2007/01/03/online-dating/#comment-125</guid>
		<description>@Dave: Does it actually work on "real" profiles? Do you have woman email you back with that stuff? I played a bit around with it and I must say it's a bit simple (from an AI standpoint; try negating statements), but it does work well enough.

@DaveR: True, but my point was that even if a site gets 10.000 new members per month you will end up with only a couple of them near your city and an even smaller number that you might be interested in. Online Dating is something I would try if I were single, but in my opinion the odds are not any better or worse than meeting somebody at a coffee-shop. It's certainly worth trying, though, as you never know... Good luck with your site!</description>
		<content:encoded><![CDATA[<p>@Dave: Does it actually work on &#8220;real&#8221; profiles? Do you have woman email you back with that stuff? I played a bit around with it and I must say it&#8217;s a bit simple (from an AI standpoint; try negating statements), but it does work well enough.</p>
<p>@DaveR: True, but my point was that even if a site gets 10.000 new members per month you will end up with only a couple of them near your city and an even smaller number that you might be interested in. Online Dating is something I would try if I were single, but in my opinion the odds are not any better or worse than meeting somebody at a coffee-shop. It&#8217;s certainly worth trying, though, as you never know&#8230; Good luck with your site!</p>
]]></content:encoded>
	</item>
</channel>
</rss>
