« Code Snippet of the Day: Recursive File Finder
Why free software shouldn't depend on Cocoa or GNUStep »


Trees in relational databases

I recently posted an on how to use a CTE to traverse a tree in a relational database on Reddit.

This started a lot of conversation, and people were wondering on the performance aspect of using recursive queries.

Well, I decided to take an extreme look. I'll pretend that Reddit had a very popular thread that had 5,000 comments on it. And, to really tax this query, I'll make it so that it is an extremely unbalanced tree 1000 comments deep.

For those following along, I'm going to use T-SQL, and everything in this post can be dropped directly into SQLServer 2005+ and ran.

I'm going to build a tree that looks like this:

  Root
      \
      Comment 1
      Comment 2
      Comment 3
      Comment 4
      Comment 5
               \
               Comment 6
               Comment 7
               Comment 8
               Comment 9
               Comment 10
                         \
                         Comment 11
                         Comment 12
                         Comment 13
                         Comment 14
                         Comment 15 (This repeats to depth of 1,000)

This is the absolute worst case scenario, because I have a horrible unbalanced tree, and will have to recursively walk the entire thing.

So, to create the schema:

  USE [Tree]
  CREATE TABLE [dbo].[Tree]
  (
    [Id] [int] IDENTITY(1,1) NOT NULL,
        [ParentID] [int] NULL,
        CONSTRAINT [PK_Tree] PRIMARY KEY CLUSTERED 
        (
           [Id] ASC
        )WITH (PAD_INDEX  = OFF, IGNORE_DUP_KEY = OFF) ON [PRIMARY]
  ) ON [PRIMARY]

  GO
  ALTER TABLE [dbo].[Tree]  WITH CHECK ADD  CONSTRAINT [FK_Tree_Tree] FOREIGN KEY([ParentID])
         REFERENCES [dbo].[Tree] ([Id])
  GO
  ALTER TABLE [dbo].[Tree] CHECK CONSTRAINT [FK_Tree_Tree]

This is a simple table with the following attributes:

  • Two Columns (Id, ParentId)
  • Id is an identity column and the primary key
  • ParentId is Nullable, and is a Foreign Key back to Id

Now, I'm going to insert my comment thread of doom into this table, I'm using some pretty tacky T-SQL to do this, but I'm lazy:

USE Tree;

DECLARE @depthCounter int;
DECLARE @lastNode int;
DECLARE @i int;

SET @DepthCounter = 0;

INSERT INTO Tree (ParentID)
        VALUES
        (
                NULL
        );

SET @LastNode = SCOPE_IDENTITY();

WHILE @DepthCounter < 1000
BEGIN   
        SET @i = 0;
        WHILE @i < 5
                BEGIN
                        INSERT INTO Tree (ParentID)
                                VALUES
                                (
                                        @LastNode
                                );      
                        SET @i = @i + 1;
                END
        SET @LastNode = SCOPE_IDENTITY();       
        SET @DepthCounter = @DepthCounter + 1;
END

With my tree of doom inserted, I'll run the query starting from the root:

Use Tree;

WITH CommentTree (ParentID, ID, Level)
 AS
 (
        SELECT ParentID, ID, 0 as [Level]
                FROM Tree
                WHERE ParentID IS NULL
        UNION ALL
                SELECT c.ParentID, c.ID, ct.Level + 1
                        FROM Tree c
                        JOIN CommentTree ct ON ct.ID = c.ParentID
  )

  SELECT ParentID, ID, Level FROM CommentTree 
         ORDER BY ParentID, Level, ID OPTION (MAXRECURSION 5000)

On my home laptop (A modest 2ghz) running Microsoft SQLServer 2005 EXPRESS I get the following result to generate this dataset:

  SQL Server Execution Times:
  CPU time = 10998 ms,  elapsed time = 11209 ms.

11 Seconds is an eternity, but this is a the worlds worst case. Now to try the same test data, but instead walking only 100 levels deep (500 comments):

Use Tree;

WITH CommentTree (ParentID, ID, Level)
 AS
 (
        SELECT ParentID, ID, 0 as [Level]
                FROM Tree
                WHERE ParentID = 44339
        UNION ALL
                SELECT c.ParentID, c.ID, ct.Level + 1
                        FROM Tree c
                        JOIN CommentTree ct ON ct.ID = c.ParentID
  )

  SELECT ParentID, ID, Level FROM CommentTree 
         ORDER BY ParentID, Level, ID OPTION (MAXRECURSION 5000)

This returned 500 rows and the following speed stats:

  SQL Server Execution Times:
    CPU time = 1217 ms,  elapsed time = 1223 ms.

This is still a unusual dataset, because it is still a very off balance tree.

So, my final benchmark will be an approximation of a regular reddit thread. 300 comments total, 100 root comments, with 2 replies each.

The setup script:

USE Tree;

DECLARE @depthCounter int;
DECLARE @lastNode int;
DECLARE @firstNode int;

SET @DepthCounter = 0;

INSERT INTO Tree (ParentID)
        VALUES
        (
                NULL
        );

SET @FirstNode = SCOPE_IDENTITY();

WHILE @DepthCounter < 100
BEGIN   
        INSERT INTO Tree (ParentID)
                VALUES
                (
                        @FirstNode
                );
        SET @LastNode = SCOPE_IDENTITY();

        INSERT INTO Tree (ParentID)
                VALUES
                (
                        @LastNode
                );
        INSERT INTO Tree (ParentID)
                VALUES
                (
                        @LastNode
                );
        SET @DepthCounter = @DepthCounter + 1;
END

Now, when ran:

(301 row(s) affected)
SQL Server Execution Times:
    CPU time = 47 ms,  elapsed time = 51 ms.

That's right, in the real world, this query took 51ms.

I think this shows that CTE (even running on commodity hardware) are perfectly adequate to handle nested trees on social forums.

Posted by Jonathan Holland on 6/6/2009.

Tags: SQLServer   Trees   Common-Table-Expressions   Recursion

Comments:

51 ms might not sound like much, but it's a throttle neck that limits to 20 requests/sec

plus how much time to keep the tree balanced?

plus how does multiple threaded requests impact performance?

nicely done and documented tho

Gravatar Posted by OldAndGrey on 6/6/2009.

> plus how much time to keep the tree balanced?

This article discusses the use of this method with respect to threaded conversations; the trees aren't balanced, they're whatever shape they naturally acquire.

Gravatar Posted by JB on 6/6/2009.

There are better ways to store trees. are a few of those ways in action.

Gravatar Posted by Jason C on 6/6/2009.

>There are better ways to store trees. Here are a few of those ways in action.

Jason, that is simply Joe Celko's nested sets methods, and the insert/update cost of nested sets is huge.

Gravatar Posted by Jonathan Holland on 6/6/2009.

Yes the insert cost is huge, but far more people will read a thread without contributing than those who contribute.

Also some of the methods in the link I posted are a bit friendly insert-wise than Joe's MPTT.

Gravatar Posted by Jason C on 6/6/2009.

Jonathan, the benchmarks in django-treebeard include the 3 tree data models that it supports: nested sets, materialized paths and adjacency lists. I wrote some support for Tropashko's fraction encoding approach (my personal favorite), but still needs some cleanup.

You can read about Tropashko's encoding in http://arxiv.org/pdf/cs/0402051

Gravatar Posted by tabo on 6/6/2009.

@Tabo and @Jason:

I scrolled up and read a bit about treebeard and now better see what it offers.

What I find interesting is that we know that the method I explain here is by far the fastest when it comes to inserts / deletes and reorganization, so the only place it can lose is in descendant retrieval.

However, looking at the treebeard stats, retrieval appears to be around 5000ms on average for their 1,000 node tree.

Considering my benchmark was 5,000 nodes in essentially a linked list (Worse case scenario), and it took 11000ms, I expected CTE's to do better than treebird, so I thought I'd try it.

I went ahead and built a new tree of 1,000 nodes that vary and have a max depth of up to 4, this was inserted in under 25ms.

To actually fetch the descendants returned this:

   SQL Server Execution Times:
      CPU time = 530 ms,  elapsed time = 538 ms.

This is compared to treebeards best time of 5197ms.

I don't see how you can say that MPTT and AL methods are better with these results.

Gravatar Posted by Jonathan Holland on 6/6/2009.

Oh, and just to point out, this was ran on a laptop far inferior to the Lenovo T61 the Treebeard benchmarks used.

I only have 2gb of ram here, and a slower processor.

Gravatar Posted by Jonathan Holland on 6/6/2009.

Jonathan:

All your testing is being done in SQL. Treebeard benchmarks have tcp+django(orm)+python overhead, that's very taxing with lots of small queries.

Gravatar Posted by tabo on 6/6/2009.

@Tabo:

Since I was curious, I created a C# app that creates 50 threads, with each thread opening 5 DB connections and running the Tree query each time.

The response is this then loaded into a .NET datatable, which is pretty similar to what TreeBeard is doing.

This is a pretty good stress test.

My average after running it was 1679.1ms. So about 3 times as slow, but still significantly faster than TreeBeard's fastest.

Of course, there is the advantage of being C# vrs Python, but at least this isn't a pure SQL enviroment anymore.

The C# stress code is here:

(http://pastebin.com/m2ccc96c3 )

Gravatar Posted by Jonathan Holland on 6/6/2009.

Jonathan, that's very interesting! But note that even those results are really not comparable to the treebeard benchmark (if what you want to measure is the tree algorithm) because:

  1. You're using a faster language (C# vs Python)
  2. You're just discarding the data retrieved from the DB. Each treebeard query in the benchmark: a. parses the db output in a db library (written in python) b. goes to django's db handler c. is turned into an object by the Django ORM (possible the most expensive operation here) d. goes to treebeard
  3. You're using a pool of 50 threads. The treebeard benchmark runs on a single thread.

( I apologize if this list isn't correctly formatted, but I suck with Markdown)

So as you can see there is really no point in comparing the results :) My intention with publishing those benchmark results was to have numbers that made sense to the intended audience of the library: Django programmers. That's why those benchmarks don't even bypass the (expensive but convenient) Django ORM.

If what you want to measure is the raw performance of the tree implementation in SQL, the best way to do that is with your original approach: pure SQL and stored procedures.

Now about the different tree approaches used, your tests are using the "Adjacency List" approach (also implemented in treebeard). The difference is that the recursion in your approach is done by SQL Server. In treebeard it's done on the app layer (since the point of treebeard is to be a db-neutral library).

Also note that, as always, there is no silver bullet here. Every tree implementation has strengths and weaknesses. And things are VERY different when things grow, i.e. millions of nodes in very deep trees instead of 1000 nodes in very shallow (< 10 depth) trees. So, as usual, we must measure and compare depending on our needs :)

Gravatar Posted by tabo on 6/6/2009.

@Tabo:

I didn't realize you were the author of TreeBeard. I want you to know that I didn't intend to start comparing my stuff here to you until Jason brought it up as superior :)

Regarding your list,

  1. I do admit that.
  2. I'm actually not discarding the data, it is being used to populate a .NET datatable object. While this is not has heavy as a full fledged ORM object, it is still a heavy object that offers lots of introspection on the data and as such is expensive to fill.
  3. The point of the threadpool was to simulate real world use of the CTE in a multi user environment. It would actually be far far faster without the thread pool, in fact taking it out leads to an average time of 560ms (just 20ms slower than directly hitting the DB).

In all honesty, you can't really call your benchmarks accurate until you test them in a situation where the DB engine has to handle concurrency.

Kudos for a cool library and page.

Gravatar Posted by Jonathan Holland on 6/6/2009.

I've not followed all the db technicalities here but have plenty of programming experience.

A tree with 5000 nodes, and a depth of 1000 is a zero challenge to the average desktop pc? why use a db if it can't handle something so simple?

Gravatar Posted by Andrew on 6/7/2009.

Maybe you should add some payload to your data. The performance will of course be much better if the database can just jump around on one or two pages. I'd try myself, but I'm not really a friend of Microsoft products, so I neither have a current windows version nor SQL server (express) around.

Gravatar Posted by fforw on 6/7/2009.

Hi again :)

I was curious again this morning about this, so I came back and read your post again. Then I noticed something interesting:

In your first test, the 11s one, you're retrieving the tree once. I couldn't believe it myself so I installed SQL Server Express 2008 in the same thinkpad I used for the treebeard benchmarks (running Vista Enterprise), and I got 8 seconds. This is with a 100% SQL approach.

I reproduced your exact same test procedure as a treebeard test program on a macbook pro with 4gb of ram running postgresql 8.3, and even with all the overhead I mentioned (tcp, sql generation, the heavy django orm, treebeard itself, and the fact that to retrieve the same data I'm sending A THOUSAND QUERIES through all that overhead instead of one, it completed in 3.7 seconds.

3.7 seconds. Instead of 8 or 11, even with all that overhead.

So there must be something REALLY wrong with CTE, or with that example.

Now this is with the adjacency list tree model. How much time did it took for the Materialized Path model? 0.18s. And for the Nested Sets model? 0.22s. Again, these are not raw SQL numbers. We must consider all that ORM+python overhead.

Now, moving on to your 2nd example, 100 comments and 2 replies to each comment, retrieving that with treebeard using the Adjacency List model takes 0.22s, with the Nested Sets model it takes 0.0066s, and with the Materialized Path model it takes 0.0067s. In SQL Server, using pure SQL with NO overhead at all, it took 0.08s.

Also, I didn't read your comment regarding your C# example correctly! Having 0.5s (raw sql) or 1.5s (with VERY light app overhead) to retrieve 1000 nodes that have a max depth of 4 is SLOW. You were comparing those results to the time it took treebeard to make 5270 tree and subtree requests. So basically you were comparing the time it took SQL Server to retrieve a full tree of 1000 nodes ONCE vs the time it took treebeard to retrieve trees and subtrees 5270 times ;) (the treebeard benchmarks are very complex, predictable and undocumented).

I put the benchmarks source and the full results in a pastie: http://www.pastie.org/503879

(and note how awfully expensive write operations are with nested sets!)

Gravatar Posted by tabo on 6/7/2009.

Just wrote this while waiting for WWDC ;)

It's a postgres recursive solution against your original 5k nodes tree.

CREATE OR REPLACE FUNCTION tbreddit_worstalnode_subtree (integer)
RETURNS SETOF tbreddit_worstalnode AS
$BODY$
DECLARE
    node tbreddit_worstalnode;
    query text;
BEGIN
    FOR node IN
        SELECT * FROM tbreddit_worstalnode
        WHERE ($1 IS NULL AND parent_id IS NULL) OR
              ($1 IS NOT NULL AND parent_id = $1)
    LOOP
        RETURN NEXT node;
        RETURN QUERY SELECT *
                       FROM tbreddit_worstalnode_subtree(node.id);
    END LOOP;
END;
$BODY$
LANGUAGE 'plpgsql';

It runs in postgres 8.3 in 1.7s. SQL Server's CTE solution runs in 8-11s. I wouldn't call that fast :) (or maybe it's a postgres vs sql server thing?)

Gravatar Posted by tabo on 6/8/2009.

Comments are closed on this post.