Wednesday, July 15, 2009

Queries for Nested Set Model

So this is just a follow up post to my Nested Set Model post to document the scheme/queries that I'm using to define my tree. I created a node with id=0 called ROOT (since it's basically just a place holder that I use to do the dirty work and won't ever show the user) and create a UNIQUE constraint on (lft, rgt) to satisfy my obsession with data integrity (and I imagine that it should help speed up the queries as well).

So here's the queries I use to grab the information in useful manners.

Grab the Whole Tree:
SELECT *
FROM bills
ORDER BY lft;
Grabs the Ancestors of a Node:
SELECT parent.*
FROM bills AS node, bills AS parent
WHERE node.lft BETWEEN parent.lft AND parent.rgt
AND node.id = ? AND parent.id <> ? (NOTE: Both ?'s are the same value)
ORDER BY node.lft;
Grabs the Descendents of a Node:
SELECT node.*
FROM bills AS node, bills AS parent
WHERE node.lft > parent.lft AND node.rgt < id =" ?">
Grab the Parent of a Node:
SELECT parent.*
FROM bills AS node, bills AS parent
WHERE node.lft BETWEEN parent.lft AND parent.rgt
AND node.id = ?
ORDER BY parent.lft DESC LIMIT 1,1
Obviously, the '?' is the ID of the node that you're interested in and you can cater the selected columns to your needs.

Tuesday, July 14, 2009

So maybe databases aren't half bad afterall

So I've been working on a budgeting program for the last few years (none of the available ones had the features that I needed/wanted) and when I first started I put all of the data in XML. It was easy/quick to get it up and running and handled the hierarchical nature of the data very well, but as is the case with XML is was very space heavy and took a while to save, load, and query the information (I later changed the program to just load the XML to mirrored data structures in memory to speed things up, but querying transactions by date or bill assignment still meant that I had to touch every single transaction in memory).

So I eventually made the switch to SQLite and since the hierarchy was fixed and I didn't have any trees in my data structures, the weaknesses of hierarchical data in databases weren't a concern. Currently there's only 2 levels in my bills structure (categories and then bills), but a few weeks ago I decided that I wanted to get rid of categories and change the way I did bills to allow infinite depth/sub-bills to my bill organization.

This then had me wondering if I should go back to a more hierarchical friendly data format. After the compactness and speed of SQLite, I didn't want to go back to XML and a little searching turned up Protocol Buffers. They are actually really cool and I was amazed by them that I was about to redo my whole program to make use of them (I'm a sucker for strict typing), but then I started thinking about the whole querying thing again. Right now, I can query in so many ways in a very easy and efficient manner that it's almost impossible to match with my own data structures. I thought about reorganizing the data so that querying was more efficient, but then I was just recreating a database by hand in my code and I just couldn't justify all of the programming time that that would require just to get some potential speed improvements (and then all of the data would have to be resident in memory all the time (or I'd have to write my own read/write/cache system) and the cons definitely started outweighing the potential pros).

So I was then left with getting the hierarchical data working in SQLite. The standard parent-child tree structure was the simple approach that popped into my head first and apparently some new SQL to handle such a thing in some DBs (but not SQLite). I figured that someone smarter than myself must have already solved this problem, and some digging revealed that I was right about that. I stumbled on the nested set model, and it's one of those beautifully elegant solutions that I love so much about programming (but that's another post). It's pretty cool, because you actually encode all of the links in a separate table (or you can do the same table if you like) and you do it by doing a pre-order tree traversal and numbering as you go. So insert, moving, and deleting nodes is just a matter of updating the numbering in linkage. The one disadvantage is that when you add/delete/change a node in the tree, you are going to have to touch half the entries in the table on average, but in my program adding/moving bills is infrequent and there are FAR fewer bills than transactions, so paying that small hit in performance on the bills is more than worth it for the big gains with transactions.