db-derby-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Oyvind.Bakk...@Sun.COM
Subject Re: Recursive Select for Discussion Forum
Date Thu, 01 Dec 2005 13:01:27 GMT
Michael Segel wrote:
> On Wednesday 30 November 2005 12:42, Rick Hillegas wrote:
> 
>>Hi Michael,
>>
>>You could streamline your recursive walk by using a temporary table and
>>a database procedure. The temporary table would hold the ids you
>>recursively harvest. It would be populated by your database procedure,
>>which would walk up the levels of your hierarchy. When the procedure
>>returned, you could then join the temporary table to your
>>discussion_item table to get the details you needed. It's not an elegant
>>solution, but by running the procedure inside the database, you would
>>eliminate a lot of network handshaking.
>>
>>Derby does not support hierarchical queries. You're welcome to log an
>>enhancement request for this feature.
>>
> 
> Well if you're going to use Temp tables, why not go outside of just SQL 
> statements and then use cursors?
> 
> By using prepared statements you could do this and maybe use a little bit of 
> recursion. Since this is SELECT, you'd want to make sure that you do this out 
> side of a Transaction. (Transactions and recursive SQL statements can get 
> messy.)
> 
> Pretty straight forward from there.
> You only need to prepare two statements, and each recursion requires a unique 
> resultSet to be declared.
> 
> Your table would look like this:
> Table foo:
> 	msg_id int --  (you can autogenerate or use long or something else depending 
> on # of messages and retension rules.)
> 	parent_id int (set default to 0)
> 	msg_author ...
> 	msg_header ...
> 	msg_txt ... 
> 
> Then create an index on msg_id and also on parent_id;
> 
> This can be done a couple of ways or permutations.
> Here's the simplest.
> 
> Make the following CLASS variables;
> A database connection con;
> Create a vector to store the result sets.
> int rec_lvl = 0 (this is the level of recursion)
> Create and prepare the following select statement:
> SELECT msg_id, msg_author, msg_header, msg_txt
> FROM   foo
> WHERE parent_id = ?
> ORDER BY msg_id;
> 
> You then have two methods to write:
> 1) A Method to pull the data out of the result set, and pop it on to a vector, 
> including the rec_lvl variable. (This will be used to build your tree)
> 
> 2) The recursive method that finds all the children to the input value of 
> parent_id.
> 
> So the first time you call the method you pass in the value of 0 for the 
> parent id.
> Then for each record you get back, you pass the result set to the popOnVector 
> and the current rec_lvl value.
> You then call the recursive routine, passing in the current msg_id and 
> (rec_lvl + 1);
> 
> Thats all it takes.

When presenting a discussion thread, there are typically two things you 
want to take into account:
* Ordering of messages
* Indentation

Ordering:
* All replies to a specific message (messages having a 'parent') should 
be displayed directly under that parent message.
* Within a sub-group of messages, sharing the same parent, messages 
should be ordered on its timestamp.

Indentation:
* Every message which has a parent should be more indented than the parent.
* Every message having the same distance to the "root" message (the one 
without a parent) should have the same indentation.

With the current table definitions I have seen suggested here, neither 
of these can be computed dynamically with a single, non-recursive SQL query.

The indentation part is easy: Don't compute it dynamically, store it in 
the table. Whenever you add a new record to the table, you know the id 
of the parent message. The indentation level for the record to be 
inserted is the parent's indentation level plus one.

The ordering part is more tricky. What information do we have in each 
record that can be used for ordering.

message id - most likely monotonically increasing with each message 
added (chronologically)
parent id
indentation level
timestamp - this one is, like the message id, monotonically increasing, 
so it does not add any information for us to use when sorting

So the table is (wrt sorting) a set of triplets: (message_id, parent_id, 
indentation). For a database to sort this set, it must be able to look 
at two random triplets and determine which of those comes first, without 
considering other triplets that may be in the set. Is this possible?

As an example, look at these two triplets:

(8,3,2)
(7,6,3)

Which comes first?

It's impossible to tell, since it depends on the values of other rows. 
Consider these two message trees:

(1,null,0) <- root message
  (2,1,1) <- 2nd message received, 1 is parent, indentation is 1
   (4,2,2) < - Received after message 3 below, but is reply to 2
  (3,1,1)
   (8,3,2)
  (5,1,1)
   (6,5,2)
    (7,6,3)

Here, (8,3,2) comes before (7,6,3), because of the subgroups they belong to.

(1,null,0)
  (2,1,1)
   (6,2,2)
    (7,6,3)
  (3,1,1)
   (8,3,2)
  (4,1,1)
  (5,1,1)

Here, (7,6,3) comes before (8,3,2).

So I think you don't get the complete ordering for free from SQL, you'll 
have to do some of the sorting in your java code. It should be possible 
to do with a 1-pass algorithm, I think.

--
Oyvind Bakksjo
Sun Microsystems, Database Technology Group
Trondheim, Norway
http://weblogs.java.net/blog/bakksjo/

Mime
View raw message