Archive for the ‘Artificial Intelligence (AI)’ Category

Deferring Decisions to AI

Wednesday, February 14th, 2018

Would you let an AI make decisions for you? With all the buzz about AI and calls for its regulation (apparently AI will kill us all some day), I’ve been thinking about under what circumstances we might delegate decision making in certain areas to machines (call it AI). That’s not as crazy as it sounds, because mankind has been doing that for years already.


Machine Learning Newsticker

Saturday, April 2nd, 2016

Shameless plug: I’ve fixed my scripts and my Machine Learning Newsticker (that reaches back to February 2005) is back in business. RSS Link Expect periodic updates on ML and AI related news items.


Starcraft AI competition

Friday, November 13th, 2009

UCSC is holding a Starcraft AI competition. I wish I had the time to participate… Starcraft is one of my all time favorite games, and writing a better AI for a real-time strategy game is certainly interesting and challenging.

Famous bugs in AI game engine caught on tape

Saturday, May 2nd, 2009

Found this on aigamedev and some of them are really hilarious: AI game bugs caught on tape

Automation of Science

Sunday, April 5th, 2009

Two interesting articles in Science: The Automation of Science is about a robotic system that autonomously generated functional genomic hypotheses about a yeast. The second article, Distilling Free-Form Natural Laws from Experimental Data, is about a system learning from physics experiments and deriving a hypothesis from the data (this is along the lines of the general idea I’ve written about in the past). Cool stuff.

ISC on the Future of Anti-Virus Protection

Friday, August 1st, 2008

An article on the Internet Storm Center discusses wether Anti-Virus software in the current state is a dead end. In my opinion it has been dead for quite a while now. Apart from the absolutely un-usable state that anti-virus software is in, I think it’s protecting the wrong things. Most attacks (trojans, spyware) nowadays come through web-browser exploits and maybe instant-messenger (see reports on ISC). So instead of scanning incoming emails, how about a behavior blocker for the web-browser and the instant messenger? There are a couple of freeware programs (e.g. IEController [German]) out there that successfully put Internet Explorer, etc. into a sandbox; whatever Javascript exploit – known or unknown – the browser won’t be able to execute arbitrary files or write outside its cache-directory. Why is there nothing like that in the commercial AV packages?

However, a few possibilities suggested in the article might be worth exploring. For example, they suggest Bayesian heuristics to identify threats. Using machine learning techniques might be a direction worth exploring. IBM AntiVirus (maybe not the current version anymore) has been using Neural Networks with 4Byte sequences (n-grams) for bootsector virus detection.

A couple things to keep in mind, though:

  • Quality of the classifier (detection rate) should be measured with Area-under-ROC-Curve (AUC), not error-rate like most people tend to do in Spam-Filter comparisons. The base-rate of the “non-virus” class is pretty high; I have over 10.000 executables/libraries on my windows machine. All (most?) of them non-malicious.
  • The tricky part with that is the feature extraction. While sequences of bytes or strings extracted from a binary might be a good start, advanced features like call-graphs or imported API-calls should be used as well. This is pretty tricky and time-consuming, especially when it has to be done for different types of executables (Windows scripts, x86-EXE files, .Net files etc.). De-obfuscation techniques, just like in the signature based scanners, will probably be necessary before the features can be extracted.
  • Behavior blocking and sandboxes are probably easier, a better short-term fix, and more pro-active. This has been my experience with email-based attacks as well back in the Mydoom days when a special mime-type auto-executed an attachment in Outlook. Interestingly there are only two programs out there that sanitize emails (check mime-types, headers, rename executable attachments etc.) at the gateway-level – a much better pro-active approach than simply detecting known threats. The first is Mimedefang, a sendmail plugin. The other is impsec, based on procmail. CU Boulder was using impsec to help keep student’s machines clean (there were scalability issues with the procmail solution, though).

The cloud obscuring the scientific method

Saturday, July 12th, 2008

“All models are wrong, and increasingly you can succeed without them” — George Box
Sometimes…” — Me

In a Wired article about the Peta-byte age of data processing the author claimed that given the enormous amounts of data and the patterns found by data mining we are less and less dependent on scientific theory. This has been strongly disputed (see Why the cloud cannot obscure the Scientific Method) as the author simply ignores the fact that all the patterns that were found are not necessarily exploitable – finding a group of genes that interact is a first step, but won’t cure cancer. However, in machine translation or placing advertising online one can succeed with little to no domain knowledge. That is, once somebody comes up with the right features to use (see Choosing the right features for Data Mining).

What would be interesting to develop, however, is a “meta-learning” algorithm that can abstract from simpler models and learn e.g. a differential equation. For example, lets take data from several hundred Physics experiments about heat-distribution conducted on different surfaces etc. We can probably learn a regression model for one particular experiment which could predict how the heat will distribute given the parameters of the experiment (material, surface etc.). The meta-learning algorithm would then look at these models and somehow come up with the heat-equation. That would be something…

Artificial Addition (Overcoming BIAS)

Friday, November 23rd, 2007

I found the following article interesting:

What Machine Learning Papers to read …

Friday, July 13th, 2007

Laura just pointed me to this system, best described as:

I have a routine problem that sometimes paper titles are not enough to tell me what papers to read in recent conferences, and I often do not have time to read abstracts fully. This collection of scripts is designed to help alleviate the problem. Essentially, what it will do is compare what papers you like to cite with what new papers are citing. High overlap means the paper is probably relevant to you. Sure there are counter-examples, but overall I have found it useful (eg., it has suggested papers to me that are interesting that I would otherwise have missed). Of course, you should also read through titles since that is a somewhat orthogonal source of information.

I have the same problem. And wow… I will have a lot to read this weekend.

Interesting Experimental Captchas

Monday, June 11th, 2007

Captchas are these little word-puzzles in images that web-sites use to keep spammers and bots out. They are everywhere and even the New York Times had an article about Captchas recently. It turns out it’s a nice exercise in applying some machine learning to break these things (with lots of image manipulation to clean up the images). Since spam-bots are becoming smarter, people are switching to new kinds of Captchas. My favorites (using images) so far are Kittenauth and a 3D-rendered word-captcha.