db-derby-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Segel <mse...@segel.com>
Subject Re: Recursive Select for Discussion Forum
Date Thu, 01 Dec 2005 17:24:03 GMT
On Thursday 01 December 2005 7:01 am, Oyvind.Bakksjo@sun.com wrote:
> Michael Segel wrote:
> > On Wednesday 30 November 2005 12:42, Rick Hillegas wrote:
> >>Hi Michael,
> >>
> >>You could streamline your recursive walk by using a temporary table and
> >>a database procedure. The temporary table would hold the ids you
> >>recursively harvest. It would be populated by your database procedure,
> >>which would walk up the levels of your hierarchy. When the procedure
> >>returned, you could then join the temporary table to your
> >>discussion_item table to get the details you needed. It's not an elegant
> >>solution, but by running the procedure inside the database, you would
> >>eliminate a lot of network handshaking.
> >>
> >>Derby does not support hierarchical queries. You're welcome to log an
> >>enhancement request for this feature.
> >
> > Well if you're going to use Temp tables, why not go outside of just SQL
> > statements and then use cursors?
> >
> > By using prepared statements you could do this and maybe use a little bit
> > of recursion. Since this is SELECT, you'd want to make sure that you do
> > this out side of a Transaction. (Transactions and recursive SQL
> > statements can get messy.)
> >
> > Pretty straight forward from there.
> > You only need to prepare two statements, and each recursion requires a
> > unique resultSet to be declared.
> >
> > Your table would look like this:
> > Table foo:
> > 	msg_id int --  (you can autogenerate or use long or something else
> > depending on # of messages and retension rules.)
> > 	parent_id int (set default to 0)
> > 	msg_author ...
> > 	msg_header ...
> > 	msg_txt ...
> >
> > Then create an index on msg_id and also on parent_id;
> >
> > This can be done a couple of ways or permutations.
> > Here's the simplest.
> >
> > Make the following CLASS variables;
> > A database connection con;
> > Create a vector to store the result sets.
> > int rec_lvl = 0 (this is the level of recursion)
> > Create and prepare the following select statement:
> > SELECT msg_id, msg_author, msg_header, msg_txt
> > FROM   foo
> > WHERE parent_id = ?
> > ORDER BY msg_id;
> >
> > You then have two methods to write:
> > 1) A Method to pull the data out of the result set, and pop it on to a
> > vector, including the rec_lvl variable. (This will be used to build your
> > tree)
> >
> > 2) The recursive method that finds all the children to the input value of
> > parent_id.
> >
> > So the first time you call the method you pass in the value of 0 for the
> > parent id.
> > Then for each record you get back, you pass the result set to the
> > popOnVector and the current rec_lvl value.
> > You then call the recursive routine, passing in the current msg_id and
> > (rec_lvl + 1);
> >
> > Thats all it takes.
>
> When presenting a discussion thread, there are typically two things you
> want to take into account:
> * Ordering of messages
> * Indentation
>
> Ordering:
> * All replies to a specific message (messages having a 'parent') should
> be displayed directly under that parent message.
> * Within a sub-group of messages, sharing the same parent, messages
> should be ordered on its timestamp.
>
> Indentation:
> * Every message which has a parent should be more indented than the parent.
> * Every message having the same distance to the "root" message (the one
> without a parent) should have the same indentation.
>
> With the current table definitions I have seen suggested here, neither
> of these can be computed dynamically with a single, non-recursive SQL
> query.
>
> The indentation part is easy: Don't compute it dynamically, store it in
> the table. Whenever you add a new record to the table, you know the id
> of the parent message. The indentation level for the record to be
> inserted is the parent's indentation level plus one.
>
> The ordering part is more tricky. What information do we have in each
> record that can be used for ordering.
>
> message id - most likely monotonically increasing with each message
> added (chronologically)
> parent id
> indentation level
> timestamp - this one is, like the message id, monotonically increasing,
> so it does not add any information for us to use when sorting
>
> So the table is (wrt sorting) a set of triplets: (message_id, parent_id,
> indentation). For a database to sort this set, it must be able to look
> at two random triplets and determine which of those comes first, without
> considering other triplets that may be in the set. Is this possible?
>
> As an example, look at these two triplets:
>
> (8,3,2)
> (7,6,3)
>
> Which comes first?
>
> It's impossible to tell, since it depends on the values of other rows.
> Consider these two message trees:
>
> (1,null,0) <- root message
>   (2,1,1) <- 2nd message received, 1 is parent, indentation is 1
>    (4,2,2) < - Received after message 3 below, but is reply to 2
>   (3,1,1)
>    (8,3,2)
>   (5,1,1)
>    (6,5,2)
>     (7,6,3)
>
> Here, (8,3,2) comes before (7,6,3), because of the subgroups they belong
> to.
>
> (1,null,0)
>   (2,1,1)
>    (6,2,2)
>     (7,6,3)
>   (3,1,1)
>    (8,3,2)
>   (4,1,1)
>   (5,1,1)
>
> Here, (7,6,3) comes before (8,3,2).
>
> So I think you don't get the complete ordering for free from SQL, you'll
> have to do some of the sorting in your java code. It should be possible
> to do with a 1-pass algorithm, I think.
>
Sigh.

Since this really sounds more like a trivial class exercise on recursive 
programming, I didn't give away anything more than the basic structure.

But since you've asked... ;-)

1) Recursive SQL  
Maybe I'm a tad slow since I've never attempted to write an SQL statement that 
called itself. And while you could write a recursive stored procedure, I 
wouldn't recommend it, unless you had to...

So, does the initial poster mean that he doesn't want to write a recursive 
java class or a complex SQL statement?

If you're going to use temp tables, then you're not writing a single SQL 
statement, are you?

So my solution is really a recursive java class that builds a vector that can 
be used to build the tree structure.

2) Assumptions:
	a) That all root discussions have a value of 0 for their parent_id. 
	b) How you want to order discussion threads is up to the student.
		Note: message_id will increment in FIFO fashion therefore
		they will most likely appear in time ordered fashion. ;-)
	c) The popOnVector method will add to the end of the vector

To refresh your memory:
	msg_id int
	parent_id int (set default to 0)
 	msg_author ...
  	msg_header ...
  	msg_txt ...

Since this example deals with how to build the tree and not the details of 
timestamps, author, header, actual text... I leave that up to the student. 
The point is that with the tupple msg_id and parent_id I can build a vector 
using a single SQL statement and a recursive java app. 

For example, our message data set has the following:
{(1,0),(2,0),(3,1),(4,0),(5,3),(6,2)} 
Note: the actual tuple would look like (1,0,'Fred','How about those White 
Sox!', 'Blah blah blah') but since we're not really looking at the whole 
statement but how to create an ordered tree, we can forget about everything 
past the first two values.

The select statement is as follows:
SELECT msg_id, parent_id, ...
FROM foo
WHERE parent_id = ?
ORDER BY msg_id;

Since this is a simple class, lets do the base stuff in the main:
(This is psuedo code)
class retx{

	// class variables here
	Connection con;
	Vector victor;
	
	String stmt = 	" SELECT msg_id, parent_id, ... "+
				" FROM foo " +
				" WHERE parent_id = ? " +
				" ORDER BY msg_id; "
	PreparedStatement pstmt;
main(){
	// Get database connection
	con = getConnection();
	// prepare the statement
	pstmt = con.prepareStatement(stmt);
	// now fetch the data and put it on the vector
	fetchTree(0,0); 
	
	// the vector is build, so go do whatever --
}

fetchTree(int rec_lvl, int parentID){
	
	// create current result set 
	pstmt.setInt(1,parentID);
	ResultSet rs = pstmt.executeQuery()
	while(rs.next()) {
		// Put the result set on the vector along with the recursion 
		PopOnVector(rs,rec_lvl);
		// check to see if this message has any children
		fetchTree(rec_lvl +1, rs.getInt(1)); // we know that msg_id is first
        }
}


Ok so you should be able to figure out how to pop the resultset and the 
recursion level on to the vector...

So here's what you get:
The first select will return:
{(1,0),(2,0),(4,0)}
Since we're being recursive:
After we fetch (1,0) we then see if there are any children.
{(3,1)} 
and so on... 
Once we hit a node with no child, we return to the previous resultSet and we 
then get the next value and start the recursion all over.
So when you pop it on to the vector you will see the following pattern:
(note v[n] is the vector where n is the integer number of the node. each node 
contains the result set and the recursion level)
{(1,0),(2,0),(3,1),(4,0),(5,3),(6,2)} 
v[1]:(1,0),0
v[2]:(3,1),1
v[3]:(5,3),2
v[4]:(2,0),0
v[5]:(6,2),1
v[6]:(4,0),0

Note: I didn't need to select the parent_id. Just makes it easier to see and 
match visually.

From this vector you can build your tree in a single pass:
Topic:
	1
	+	3
		+	5
	2
	+	6
	4

Having said all of this, why not just do a select, ordering by parent_id and 
then msg_id and build your tree using a tree model? Since you're ordering the 
select by parent, and then message id, you'll always know that the parent 
node has already been created, so then you're just changing it from a leaf to 
a branch. You can always find the node because you know its id in the tree.

But does this defeat the purpose of the assignment?


Again, in both of these examples, you don't need to sort within your code.

Or did I miss something? 

;-)

Mime
View raw message