You are currently browsing the Markus Breitenbach weblog archives for July, 2007.
- Advertising (1)
- Artificial Intelligence (AI) (13)
- Classification (3)
- Clustering (1)
- Coding / Programming (8)
- Cryptography (1)
- Data Mining (19)
- Economy / Investing (1)
- ewrt linux (2)
- Fixing Stuff (8)
- Machine Learning (31)
- Math (2)
- Politics (3)
- Predictive Modeling (4)
- Psychology (3)
- Ramblings (26)
- Random (9)
- Security (15)
- Society (12)
- Sociology (4)
- spam (3)
- Statistics (16)
- August 5, 2010 1:06 am: Elo Scores and Rating Contestants
- July 11, 2010 8:56 pm: GraphLab & Parallel Machine Learning
- June 15, 2010 8:21 pm: PHP configuration using htaccess on 1and1 shared hosting
- February 28, 2010 12:21 pm: Energy efficient data mining algorithms
- February 16, 2010 11:56 pm: Alternative measures to the AUC for rare-event prognostic models
- January 26, 2010 9:54 pm: Spam Filtering by Learning a Pattern Language
- January 10, 2010 5:37 pm: Strong profiling is not mathematically optimal for discovering rare malfeasors (on rare event detection)
- November 13, 2009 12:27 am: Starcraft AI competition
- July 25, 2009 8:34 pm: Random characters in text mode -> graphics card
- June 7, 2009 5:04 pm: Programs stealing the input focus
Blogroll
Uncategorized
Useful Links
- August 2010
- July 2010
- June 2010
- February 2010
- January 2010
- November 2009
- July 2009
- June 2009
- May 2009
- April 2009
- March 2009
- February 2009
- January 2009
- December 2008
- November 2008
- October 2008
- September 2008
- August 2008
- July 2008
- June 2008
- May 2008
- April 2008
- March 2008
- February 2008
- January 2008
- December 2007
- November 2007
- October 2007
- September 2007
- August 2007
- July 2007
- June 2007
- May 2007
- April 2007
- March 2007
- February 2007
- January 2007
- December 2006
- November 2006
- October 2006
- September 2006
- August 2006
Archive for July 2007
A strange end for a night…
July 29, 2007 3:16 pm by Markus.
Last night I was out with a couple of friends at our favorite club. I had the time of my life and met some interesting new people. When we left the club, a girl got hit by a car right in front of our eyes. I’ve seen quite a bit of “shit going down” in my life now (3 fires, one guy getting beat up in an alley, 2 times one or more people getting hit by a car, once getting hit by a car myself) and what I find interesting is the different reactions by people. Last night some guys freaked, some were calm, some tried to help, some girls just started crying. Most people just stared in some kind of apathy and did nothing. Some “tough guys”, who I wouldn’t have expected that from, freaked out. In all the instances I’ve been involved with I’ve stayed calm and just did what was necessary (including the one where I got hit). That said, 911 operators are kinda slow - why do they ask if the person hit by the car is OK when I already told them that the person is seriously injured? What a weird end for a good night - I’m still shaking a bit…
Posted in Ramblings | Print | No Comments »
Modeling Tree Structures and Hierarchies in a Data Warehouse
July 20, 2007 9:25 pm by Markus.
I’m currently reading a book about Data Warehouse design (”Mastering Data Warehouse Design”, Claudia Imhoff, Nicholas Galemmo and Jonathan Geiger, Wiley). One thing I noticed is the incredibly inefficient way the authors encode trees in relational databases. Their suggestion is to model it with “pointers” to child-nodes which is incredibly inefficient to deal with in SQL, leads to recursive queries (unless proprietary SQL extensions are used) and you’ll have to write loads of self-joins. A much better way of encoding trees in SQL is based on nested sets. However, it always depends on what kind of queries you will run later. According to the book, those would be listing elements on the same level of the tree as well as retrieving sub-trees. This is something that I think is still kind of painful when using nested sets.
Here is my favorite solution for encoding trees in SQL:
CREATE TABLE TreeMagic (Mykey CHAR(10) PRIMARY KEY, FatherNode ChAR(10) NOT NULL, length INTEGER NOT NULL);
| MyKey | FatherNode | Length |
|---|---|---|
| A | 1 | |
| AA | A | 2 |
| AB | A | 2 |
| AAA | AA | 3 |
| AAB | AA | 3 |
| ABA | AB | 3 |
| ABB | AB | 3 |
So key A is the root, AA and AB are child-nodes of A, AAA and AAB are child-nodes of AA, and so on. The cool part is traversing the tree on one level is easy due to the length-field and selecting subtrees is easy as well using the like-operator, which moves all the hard work into the B-Tree index. Inserting a new node is simpler than with set-based trees for which in the worst case you might have to increase the left/right numbers at a couple of other nodes. The path to each node is always fully known. Traversing the tree in DFS comes for free by lexicographical ordering of the keys with an “order by” clause. BFS is available when ordered by “length.FatherNode”. One operation the nested sets can do faster is determining the number child-nodes just with a bit of math. Either way I think this idea is way more efficient than the one proposed in the book
Edit: I just found a book that is full of patterns for modeling the most common structures in relational databases: “SQL for Smarties - Advanced SQL Programming” by Joe Celko (who also wrote the article about the nested-set method for trees). It contains various graph problems and how to manage them in relational databases. Looks interesting…
Posted in Coding / Programming, Data Mining | Print | No Comments »
What Machine Learning Papers to read …
July 13, 2007 1:08 pm by Markus.
Laura just pointed me to this system, best described as:
I have a routine problem that sometimes paper titles are not enough to tell me what papers to read in recent conferences, and I often do not have time to read abstracts fully. This collection of scripts is designed to help alleviate the problem. Essentially, what it will do is compare what papers you like to cite with what new papers are citing. High overlap means the paper is probably relevant to you. Sure there are counter-examples, but overall I have found it useful (eg., it has suggested papers to me that are interesting that I would otherwise have missed). Of course, you should also read through titles since that is a somewhat orthogonal source of information.
http://www.cs.utah.edu/~hal/WhatToSee/
I have the same problem. And wow… I will have a lot to read this weekend.
Posted in Statistics, Machine Learning, Artificial Intelligence (AI) | Print | No Comments »
Safe Strings in PHP (2)
July 1, 2007 4:38 pm by Markus.
I wrote about the problems with PHP strings here and the possible solution I liked using a class encapsulating strings in PHP. I now worked out some details to make every string function in PHP work with the new “SafeString”-class. You can find the details and source here. This is still more a proof-of-concept and for all practical purposes would require the re-writing of a couple of things like database abstraction layers and such to return SafeStrings as well.
Posted in Coding / Programming, Security | Print | 1 Comment »