You are currently browsing the archives for the Society category.
| M | T | W | T | F | S | S |
|---|---|---|---|---|---|---|
| « Feb | ||||||
| 1 | 2 | 3 | 4 | 5 | 6 | 7 |
| 8 | 9 | 10 | 11 | 12 | 13 | 14 |
| 15 | 16 | 17 | 18 | 19 | 20 | 21 |
| 22 | 23 | 24 | 25 | 26 | 27 | 28 |
| 29 | 30 | 31 | ||||
- Advertising (1)
- Artificial Intelligence (AI) (13)
- Classification (3)
- Clustering (1)
- Coding / Programming (8)
- Cryptography (1)
- Data Mining (18)
- Economy / Investing (1)
- ewrt linux (2)
- Fixing Stuff (7)
- Machine Learning (29)
- Math (2)
- Politics (3)
- Predictive Modeling (4)
- Psychology (3)
- Ramblings (26)
- Random (9)
- Security (15)
- Society (12)
- Sociology (4)
- spam (3)
- Statistics (15)
- February 28, 2010 12:21 pm: Energy efficient data mining algorithms
- February 16, 2010 11:56 pm: Alternative measures to the AUC for rare-event prognostic models
- January 26, 2010 9:54 pm: Spam Filtering by Learning a Pattern Language
- January 10, 2010 5:37 pm: Strong profiling is not mathematically optimal for discovering rare malfeasors (on rare event detection)
- November 13, 2009 12:27 am: Starcraft AI competition
- July 25, 2009 8:34 pm: Random characters in text mode -> graphics card
- June 7, 2009 5:04 pm: Programs stealing the input focus
- May 2, 2009 4:06 pm: Famous bugs in AI game engine caught on tape
- April 16, 2009 2:27 am: Vundo?
- April 8, 2009 12:13 am: Filler items for Amazon Super Saver shipping
Blogroll
Uncategorized
Useful Links
- February 2010
- January 2010
- November 2009
- July 2009
- June 2009
- May 2009
- April 2009
- March 2009
- February 2009
- January 2009
- December 2008
- November 2008
- October 2008
- September 2008
- August 2008
- July 2008
- June 2008
- May 2008
- April 2008
- March 2008
- February 2008
- January 2008
- December 2007
- November 2007
- October 2007
- September 2007
- August 2007
- July 2007
- June 2007
- May 2007
- April 2007
- March 2007
- February 2007
- January 2007
- December 2006
- November 2006
- October 2006
- September 2006
- August 2006
Archive for the Society Category
Strong profiling is not mathematically optimal for discovering rare malfeasors (on rare event detection)
January 10, 2010 5:37 pm by Markus.
Just in time for the latest Christmas terror scare, I came across an interesting paper: “Strong profiling is not mathematically optimal for discovering rare malfeasors” (William H. Press; PNAS 106(6), p. 1716-1719 www.pnas.org/cgi/doi/10.1073/pnas.0813202106). In the paper, the author investigates whether profiling by nationality or ethnicity can be justified mathematically and tries to answer the question of how much screening must we do, on average, to catch the bad guys in the crowd. Rare events detection is hard as it is, and it’s interesting to see a look from the sampling perspective. It’s an interesting and short read. Long story short, it shows that using an indiscriminate feature like nationality or ethnicity is not optimal (as is any screening at least in proportion to a prior probability) and wastes resources.
Posted in Math, Society, Statistics | Print | 1 Comment »
Adversarial Scenarios in Risk Mismanagement
January 11, 2009 4:31 pm by Markus.
I just read another article discussing weather Risk Management tools had an impact on the current financial crisis. One of the most commonly used risk management measures is the Value-at-Risk (VaR) measure, a comparable measure that specifies a worst-case loss for some confidence interval. One of the major criticisms is (e.g. Nassim Nicholas Taleb, the author of the black swan) that the measure can be gamed. Risk can be hidden “in the rare event part” of the prediction and not surprisingly this seems to have happened.
Given that a common question during training with risk assessment software is “what do I do to get outcome/prediction x” from the software it should be explored how to safeguard in the software against users gaming the system. Think detecting multiple model evaluations with slightly changed numbers in a row…
Edit: I just found an instrument implemented as an Excel Spreadsheet. Good for prototyping something, but using that in practice is just asking people to fiddle with the numbers until the desired result is obtained. You couldn’t make it more user-friendly if you tried…
Posted in Predictive Modeling, Society, Statistics, Data Mining, Machine Learning | Print | No Comments »
Can statistical models be intellectual property?
September 1, 2008 8:19 pm by Markus.
Recently I had a fun discussion with Bill over lunch about intellectual property and how that might apply to statistical modeling work. Given that there are more and more companies making a living from forming predictions with a model they have built (churn-prediction, credit-scores and other risk-models) we were wondering if there were any means of protecting them as intellectual property. For example, the ZETA-model for predicting corporate bankruptcies is a closely guarded secret with having published only the variables being used (Altman E. I. (2000); Predicting financial distress for companies: revisiting the Z-Score and ZETA models). Obviously this model is useful for lending and can make serious money for the user. Making decisions guided by a formula is becoming more popular. This might be something over which legal battles will be fought in the future.
Copyrighted works and patents often count towards what a company would be worth should somebody acquire it. This means there would be motivation for start-up companies to protect their models. A mathematical formula (e.g. a regression equation) cannot be patented, and copyright probably won’t apply either; even if copyright would apply, it’s trivial to build a formula that does essentially the same thing (e.g. multiply all the weights in the formula by 10). This leaves only trade secret protection and means there is no recourse once the cat is out of the bag. Often it’s also the data-collection method that is kept secret - a company called Epagogix developed a method to judge the success of movies from a script by scoring it against some scales that they keep secret.
Currently, I don’t see any legal protections with the exception of trade-secrets for this. And given that there is infinitely many ways to express the same scoring rules in a different way, this would be a fairly hard problem for lawyers and politicians to formulate sensible rules for establishing protection for this kind of intellectual property.
Posted in Society, Politics, Data Mining, Machine Learning | Print | 1 Comment »
ART OF SEDUCTION: Not Pretty, Really
April 21, 2008 1:38 am by Markus.
Pretty interesting short-film: http://www.youtube.com/watch?v=bd4Gpi9ksXw
Posted in Society | Print | No Comments »
The GPL and Machine Learning Software - Should the GPL cover training data?
October 1, 2007 10:19 pm by Markus.
I’ve followed the discussion and introduction of the GPL v3 for a bit. One major change in the license is supposed to close the loophole commonly referred to as the “tivoization” of GPL software, i.e. mechanisms that prevent people from tinkering with the product they bought which includes GPL software. Tivo, in particular, accomplishes this by requiring a valid cryptographic signature for the code to run - the user has access to the code, but it’s of no use. One of the main ideas of the GPL was to allow people the freedom to tinker, improve and understand how something works. This got me thinking a bit about software that uses machine learning techniques.
For the sake of the argument, let’s assume that somebody releases a GPL version of a speech recognition system, or say an improved version of a GPL speech recognition system. While the algorithms would be in the open for everyone to see, two major components of speech recognition systems, the Acoustic Model and Language Model, do not have to be. The Acoustic Model is created by taking a very large number of audio recordings of speech and their transcriptions (Speech Corpus) and ‘compiling’ them into statistical representations of the sounds that make up each word. The Language Model is a very large file containing the probabilities of certain sequences of words in order to narrow down the search for what was said.
A big part of how well the speech recognition system works relies on the training. The author who improved upon the software should publish the training set as well. Otherwise people won’t be able to tinker with the system or understand why the software works well.
The same would hold for things like a handwriting recognition system. One could publish it along with a model (a bunch of numbers) that make the recognition work. It would be pretty hard for somebody to reverse engineer what the training examples were and how training was conducted to make the system work. Getting the training data is the expensive part in building speech-recognition and handwriting-recognition systems.
Think Spam-Assassin - what if the authors suddenly decide to not make their training corpus available anymore? How would users be able to determine the weights for the different tests?
I don’t think this case is covered by the GPL (v3 or older) - (However, I’m not a lawyer). Somebody could include the model in C code (i.e. define those weights for a Neural Net as a bunch of consts) and then argue that all is included to compile the program from scratch as per the requirements of the license. However, the source by itself wouldn’t allow anybody to understand or change (in a meaningful way) what the program is doing. With the growing importance of machine learning methods just being able to recompile something won’t be enough. I think this should be taken into consideration by the open source community for GPL v3.01.
Posted in Society, Machine Learning | Print | No Comments »
Text Mining for Tax Evaders on eBay
June 12, 2007 3:32 pm by Markus.
A (long) while ago there was a lot of talk about the IRS doing text-mining for people that did not report income they got from e.g. ebay internet auctions. The German version of this attempt is called XPIDER (see also here and there) was just shown to be totally ineffective. After years of trying and spending millions they did not identify conclusive evidence to go after a single tax-evader. The German GAO (Bundesrechnungshof) is trying to find out what went wrong, why and who misspend all that money. I wonder if the more sophisticated web-crawling program the Canadian IRS is using (it’s called XENON as reported by Wired) is similarly effective …
Posted in Society, Data Mining | Print | No Comments »
Copyright law
April 9, 2007 11:20 pm by Markus.
I found one most insightful piece on copyright law on youtube using the Amen break as an example.
Posted in Society, Ramblings | Print | No Comments »
Artificial Intelligence Cited for Unlicensed Practice of Law
March 8, 2007 5:52 pm by Markus.
I just read an article in the Wired blog titled “AI Cited for Unlicensed Practice of Law” citing a ruling from a court upholding its decision that the owner through the expert system he developed has given unlicensed legal advise. While an expert system is a clear cut case (as the system always does exactly what it was told [minus errors in the rules]; it just follows given rules and makes logical conclusions), this becomes more interesting in cases in which the machine learns or otherwise modifies its behavior over time. For example, lets say I put an AI software online that interacts with people and learns over time. Should I be held responsible if the program does something bad? What if I was not the person that taught it that particular behavior? This will probably be a topic that the courts will have to figure out in the future. For one, people should not be able to hide behind actions their computer has done. But what if it is reasonably beyond the capability of the individual to forsee what the AI has done?
This will probably end up being the next big challenge for courts just like the internet has been. It is interesting how the internet has created legal problems just with people being able to communicate more easily with each other: think trademark issues, advertising restrictions for tobacco or copyright violations (fair use differs from country to country; what is legal in one might be illegal in another) …
Update: And it just started. Check out this article: Colorado Woman Sues To Hold Web Crawlers To Contracts
Posted in Society, Artificial Intelligence (AI), Ramblings | Print | No Comments »
“I’m sorry, but I’m married …”
March 5, 2007 8:52 pm by Markus.
As I enjoy going out with friends and mingling a lot, I noticed a very interesting trend lately. Some of my single friends will go and chat up some woman they find attractive (and sometimes with success) and if the woman is not interested, she will tend to show them a ring on her finger and tell them that she unfortunately is taken. So far, so good. I was out with some friends of mine. We were just talking and observed some gentleman asking a woman out right of the bat. She showed her ring and politely turned the guy down. Chris ended up talking to this lovely lady later - we all met as part of some group going out, and Chris and she had a longer conversation. After a long conversation, lots of laughter she excused herself for a minute. Once she came back from the bathroom, her ring was gone. He didn’t notice it at first, but as she suddenly became a lot more flirty, he asked her about it straight up. Her answer: “It’s a fake ring. Just to keep guys in bars from hitting on me.” They talked for a bit more about this and that, and she started hinting more and more that she would be very interested in going on a date with Chris. He thought about it, and walked away. Chris told me later that he just does not like to date woman that lie; trust and honesty are important to him. His reasoning was that if she is willing to lie (or: use little “white lies”) right from the start, how can one expect that it does not get worse over time? What if she uses little white lies to get out of every situation she does not want to be in?
What do we learn from this? A fake wedding ring might keep all the drunk loosers away, but will possibly confuse or scare off Mr. Right when he comes along. I’ve heard similar stories from a couple of other guys; it just does not go over well.
Posted in Society, Ramblings | Print | No Comments »
Online Dating
January 3, 2007 8:20 am by Markus.
As I’m currently visiting Germany over the winter break, I couldn’t help but notice the advertising for an online dating website here. They spend a lot of money to get this stuff into peoples heads here. I’ve seen some of that stuff advertised in the US (such as match.com and TRUE) so for the hell of it I went and checked out the website. First thing I noticed is that they require you to create an account to see peoples pictures or browse more than a couple of pages in the search results. That, of course, leads to many many stale profiles from people that just want to window-shop and are not really interested in giving it a serious try. To interested parties (i.e. people that pay) this of course might look like there are so many members on the website that it might be worth paying for.
It just helps add to my impression after reading about Bad Experiences with canceling accounts, which gives a not-so-honorable mention to certain US based dating websites. Apparently you can’t just cancel your membership using the website, but have to take a phone-exit interview. Otherwise, your profile will be kept and your credit card will be continue to be charged. It seems that dating websites are forced to keep people active as long as possible (or at least keep up the illusion). The reason for this might be less mean-spirited than one would at first assume. For example, just to have a couple of thousand people in each major city of the US a dating website would have to have roughly 100.000 active members. That is tough to accomplish, esp. given that without the illusion of activity nobody else would join.
With all that said, a friend of mine found his girlfriend through the Denver Personals on Craigslist. It can work.
Posted in Society, Statistics, Ramblings, Random | Print | 4 Comments »