From derby-user-return-2872-apmail-db-derby-user-archive=db.apache.org@db.apache.org Thu Dec 01 17:24:20 2005 Return-Path: Delivered-To: apmail-db-derby-user-archive@www.apache.org Received: (qmail 49437 invoked from network); 1 Dec 2005 17:24:20 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 1 Dec 2005 17:24:20 -0000 Received: (qmail 75402 invoked by uid 500); 1 Dec 2005 17:24:17 -0000 Delivered-To: apmail-db-derby-user-archive@db.apache.org Received: (qmail 75380 invoked by uid 500); 1 Dec 2005 17:24:17 -0000 Mailing-List: contact derby-user-help@db.apache.org; run by ezmlm Precedence: bulk list-help: list-unsubscribe: List-Post: List-Id: Reply-To: "Derby Discussion" Delivered-To: mailing list derby-user@db.apache.org Received: (qmail 75369 invoked by uid 99); 1 Dec 2005 17:24:17 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 01 Dec 2005 09:24:17 -0800 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received-SPF: pass (asf.osuosl.org: local policy) Received: from [65.195.181.50] (HELO webRack01.Segel.com) (65.195.181.50) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 01 Dec 2005 09:25:43 -0800 Received: from dbrack01.segel.com (ns2.segel.com [65.195.181.55]) by webRack01.Segel.com (Postfix) with ESMTP id 5F34217EAC; Thu, 1 Dec 2005 12:29:23 -0600 (CST) From: Michael Segel Reply-To: msegel@segel.com Organization: MSCC To: derby-user@db.apache.org Subject: Re: Recursive Select for Discussion Forum Date: Thu, 1 Dec 2005 11:24:03 -0600 User-Agent: KMail/1.8.2 Cc: Oyvind.Bakksjo@sun.com References: <438344B6.2000003@Sun.COM> <200511301351.29961.msegel@segel.com> <438EF427.9080405@sun.com> In-Reply-To: <438EF427.9080405@sun.com> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Content-Disposition: inline Message-Id: <200512011124.03708.msegel@segel.com> X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N On Thursday 01 December 2005 7:01 am, Oyvind.Bakksjo@sun.com wrote: > Michael Segel wrote: > > On Wednesday 30 November 2005 12:42, Rick Hillegas wrote: > >>Hi Michael, > >> > >>You could streamline your recursive walk by using a temporary table and > >>a database procedure. The temporary table would hold the ids you > >>recursively harvest. It would be populated by your database procedure, > >>which would walk up the levels of your hierarchy. When the procedure > >>returned, you could then join the temporary table to your > >>discussion_item table to get the details you needed. It's not an elegant > >>solution, but by running the procedure inside the database, you would > >>eliminate a lot of network handshaking. > >> > >>Derby does not support hierarchical queries. You're welcome to log an > >>enhancement request for this feature. > > > > Well if you're going to use Temp tables, why not go outside of just SQL > > statements and then use cursors? > > > > By using prepared statements you could do this and maybe use a little b= it > > of recursion. Since this is SELECT, you'd want to make sure that you do > > this out side of a Transaction. (Transactions and recursive SQL > > statements can get messy.) > > > > Pretty straight forward from there. > > You only need to prepare two statements, and each recursion requires a > > unique resultSet to be declared. > > > > Your table would look like this: > > Table foo: > > msg_id int -- (you can autogenerate or use long or something else > > depending on # of messages and retension rules.) > > parent_id int (set default to 0) > > msg_author ... > > msg_header ... > > msg_txt ... > > > > Then create an index on msg_id and also on parent_id; > > > > This can be done a couple of ways or permutations. > > Here's the simplest. > > > > Make the following CLASS variables; > > A database connection con; > > Create a vector to store the result sets. > > int rec_lvl =3D 0 (this is the level of recursion) > > Create and prepare the following select statement: > > SELECT msg_id, msg_author, msg_header, msg_txt > > FROM foo > > WHERE parent_id =3D ? > > ORDER BY msg_id; > > > > You then have two methods to write: > > 1) A Method to pull the data out of the result set, and pop it on to a > > vector, including the rec_lvl variable. (This will be used to build your > > tree) > > > > 2) The recursive method that finds all the children to the input value = of > > parent_id. > > > > So the first time you call the method you pass in the value of 0 for the > > parent id. > > Then for each record you get back, you pass the result set to the > > popOnVector and the current rec_lvl value. > > You then call the recursive routine, passing in the current msg_id and > > (rec_lvl + 1); > > > > Thats all it takes. > > When presenting a discussion thread, there are typically two things you > want to take into account: > * Ordering of messages > * Indentation > > Ordering: > * All replies to a specific message (messages having a 'parent') should > be displayed directly under that parent message. > * Within a sub-group of messages, sharing the same parent, messages > should be ordered on its timestamp. > > Indentation: > * Every message which has a parent should be more indented than the paren= t. > * Every message having the same distance to the "root" message (the one > without a parent) should have the same indentation. > > With the current table definitions I have seen suggested here, neither > of these can be computed dynamically with a single, non-recursive SQL > query. > > The indentation part is easy: Don't compute it dynamically, store it in > the table. Whenever you add a new record to the table, you know the id > of the parent message. The indentation level for the record to be > inserted is the parent's indentation level plus one. > > The ordering part is more tricky. What information do we have in each > record that can be used for ordering. > > message id - most likely monotonically increasing with each message > added (chronologically) > parent id > indentation level > timestamp - this one is, like the message id, monotonically increasing, > so it does not add any information for us to use when sorting > > So the table is (wrt sorting) a set of triplets: (message_id, parent_id, > indentation). For a database to sort this set, it must be able to look > at two random triplets and determine which of those comes first, without > considering other triplets that may be in the set. Is this possible? > > As an example, look at these two triplets: > > (8,3,2) > (7,6,3) > > Which comes first? > > It's impossible to tell, since it depends on the values of other rows. > Consider these two message trees: > > (1,null,0) <- root message > (2,1,1) <- 2nd message received, 1 is parent, indentation is 1 > (4,2,2) < - Received after message 3 below, but is reply to 2 > (3,1,1) > (8,3,2) > (5,1,1) > (6,5,2) > (7,6,3) > > Here, (8,3,2) comes before (7,6,3), because of the subgroups they belong > to. > > (1,null,0) > (2,1,1) > (6,2,2) > (7,6,3) > (3,1,1) > (8,3,2) > (4,1,1) > (5,1,1) > > Here, (7,6,3) comes before (8,3,2). > > So I think you don't get the complete ordering for free from SQL, you'll > have to do some of the sorting in your java code. It should be possible > to do with a 1-pass algorithm, I think. > Sigh. Since this really sounds more like a trivial class exercise on recursive=20 programming, I didn't give away anything more than the basic structure. But since you've asked... ;-) 1) Recursive SQL =20 Maybe I'm a tad slow since I've never attempted to write an SQL statement t= hat=20 called itself. And while you could write a recursive stored procedure, I=20 wouldn't recommend it, unless you had to... So, does the initial poster mean that he doesn't want to write a recursive= =20 java class or a complex SQL statement? If you're going to use temp tables, then you're not writing a single SQL=20 statement, are you? So my solution is really a recursive java class that builds a vector that c= an=20 be used to build the tree structure. 2) Assumptions: a) That all root discussions have a value of 0 for their parent_id.=20 b) How you want to order discussion threads is up to the student. Note: message_id will increment in FIFO fashion therefore they will most likely appear in time ordered fashion. ;-) c) The popOnVector method will add to the end of the vector To refresh your memory: msg_id int parent_id int (set default to 0) msg_author ... msg_header ... msg_txt ... Since this example deals with how to build the tree and not the details of= =20 timestamps, author, header, actual text... I leave that up to the student.= =20 The point is that with the tupple msg_id and parent_id I can build a vector= =20 using a single SQL statement and a recursive java app.=20 =46or example, our message data set has the following: {(1,0),(2,0),(3,1),(4,0),(5,3),(6,2)}=20 Note: the actual tuple would look like (1,0,'Fred','How about those White=20 Sox!', 'Blah blah blah') but since we're not really looking at the whole=20 statement but how to create an ordered tree, we can forget about everything= =20 past the first two values. The select statement is as follows: SELECT msg_id, parent_id, ... =46ROM foo WHERE parent_id =3D ? ORDER BY msg_id; Since this is a simple class, lets do the base stuff in the main: (This is psuedo code) class retx{ // class variables here Connection con; Vector victor; =09 String stmt =3D " SELECT msg_id, parent_id, ... "+ " FROM foo " + " WHERE parent_id =3D ? " + " ORDER BY msg_id; " PreparedStatement pstmt; main(){ // Get database connection con =3D getConnection(); // prepare the statement pstmt =3D con.prepareStatement(stmt); // now fetch the data and put it on the vector fetchTree(0,0);=20 =09 // the vector is build, so go do whatever -- } fetchTree(int rec_lvl, int parentID){ =09 // create current result set=20 pstmt.setInt(1,parentID); ResultSet rs =3D pstmt.executeQuery() while(rs.next()) { // Put the result set on the vector along with the recursion=20 PopOnVector(rs,rec_lvl); // check to see if this message has any children fetchTree(rec_lvl +1, rs.getInt(1)); // we know that msg_id is first } } Ok so you should be able to figure out how to pop the resultset and the=20 recursion level on to the vector... So here's what you get: The first select will return: {(1,0),(2,0),(4,0)} Since we're being recursive: After we fetch (1,0) we then see if there are any children. {(3,1)}=20 and so on...=20 Once we hit a node with no child, we return to the previous resultSet and w= e=20 then get the next value and start the recursion all over. So when you pop it on to the vector you will see the following pattern: (note v[n] is the vector where n is the integer number of the node. each no= de=20 contains the result set and the recursion level) {(1,0),(2,0),(3,1),(4,0),(5,3),(6,2)}=20 v[1]:(1,0),0 v[2]:(3,1),1 v[3]:(5,3),2 v[4]:(2,0),0 v[5]:(6,2),1 v[6]:(4,0),0 Note: I didn't need to select the parent_id. Just makes it easier to see an= d=20 match visually. =46rom this vector you can build your tree in a single pass: Topic: 1 + 3 + 5 2 + 6 4 Having said all of this, why not just do a select, ordering by parent_id an= d=20 then msg_id and build your tree using a tree model? Since you're ordering t= he=20 select by parent, and then message id, you'll always know that the parent=20 node has already been created, so then you're just changing it from a leaf = to=20 a branch. You can always find the node because you know its id in the tree. But does this defeat the purpose of the assignment? Again, in both of these examples, you don't need to sort within your code. Or did I miss something?=20 ;-)