db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel John Debrunner <...@apache.org>
Subject The perils of complexity - CURRENT TIME handling
Date Thu, 31 Aug 2006 22:46:58 GMT
In starting to address DERBY-1700 I discovered some interesting code
where possibly an early decision lead to complex code that had the
opposite effect to that intended. Some lessons to be learnt here about
going for the simple approach. Maybe a long story, but worth reading I
think.

The background to DERBY-1700 is that the generated code to create many
of Derby's language ResultSets includes a parameter called
'closeCleanup'. This is a reference that can be used to execute a
generated method that will be called at the close of the ResultSet. I
was looking into simplifying it because the passed in parameter is
always null (don't do anything) unless the ResultSet is the top of the
tree. Thus in the large queries I was looking at with, say 10,000 Union
nodes, we generate code to pass null 9,999 times and to pass the real
method "pointer" once. Since the passing of the argument is one byte,
that's ~10k generated code for "no-added-value", that's significant when
the size of a Java method is limited to ~65k.

So I thought, there has to be some better way.

Based on my recent experience with the system catalogs I first
investigated how this is used, rather than doing work and finding it's
never called (as was the case with some code in the system catalog area).

The cleanup method only ever has one use, to make the statement foget
about the current time. This is based upon the requirement that a SQL
statement has the same value for CURRENT TIMESTAMP (CURRENT TIME etc.)
for the lifetime of the statement, no matter how long it takes to
execute. Thus once the ResultSet is closed the activation forgets about
its current time value, so it's ready for the next use. The current time
is defined by the class org.apache.derby.impl.sql.execute.CurrentDatetime.

Sounds ok, right, well this is where the complexity comes in.

First the code tries to be as generic as possible, it's a "close cleanup
facility", not just current time resetting. I saw Tim Bray present once
and he brought up this notion of "You ain't going to need it!",
basically don't overdesign something based upon the thought that one day
you might have to expand it, just implement what you need. Making this
generic immediately makes any casual reader of the code assume it is
widely used and must be important, as well as hiding what it is actually
doing, making the code harder to understand.
[To be fair, maybe this was once used for more than the current time
reset, but I doubt it from the comments in the code]

The second problem is where the original intention adds to the actual
complexity and has the opposite effect. Only SQL statements that
reference CURRENT TIMESTAMP (TIME, DATE etc.) need to care have having a
fixed value for the lifetime of the statement. Other statements
obviously do not need to track this or reset anything.

Based upon that, I'm assuming that the decision was made to save space
at runtime by only generating a field to store the CurrentDatetime
object. Sounds great, no overhead for current time handling if the
statement doesn't use current time. So one field reference is saved in
the activation. So there's a little complexity in generating a field
(cdt) and using that field for statements that do use current time, but
we generate code anyway and it's just a field, so no big deal.
Now just generate a method that calls cdt.forget(). Add a utility method
in ResultSetNode (closeMethodArgument) that generates code that either
passes the reference to the generated method for the top-level
ResultSet, or null for all others.

All works, great!

Hang on, though, to save one field in every activation, we've added
closeCleanup fields in 17 different ResultSet implementations
(duplicated code), and those are real space at runtime. So if SQL
statement uses 50 ResultSets upto 50 fields are always allocated
regardless of if the statement uses current time variables or not. And
in my case of 10,000 unions, at least 10,000 fields are allocated at
runtime, all to save one field being allocated! :-(

Hence all this complexity lead to significantly increased memory usage
in the ResultSet tree and saving of one field in the activation. Exactly
the opposite effect that was intended!!

So what would the simple approach be? I think it's as simple as a new
field and methods in BaseActivation

// Field in BaseActivation.java
private CurrentDatetime cdt;

CurrentDatetime getCurrentDatetime()
{
   if (cdt == null)
       cdt = new CurrentDatetime();
    return cdt;
}

void clearCurrentDatetime()
{
   if (cdt != null)
       cdt.forget();
}

and calling the clearCurrentDatetime() method when the top-level
ResultSet is closed.

Yes, a field is used (but no CurrentDatetime object would be allocated)
in activation even for statements that don't use current time, but I
think the simplicity and clarity far outweigh an extra four bytes at
runtime.

Once I have the changes I'll attach the patch to DERBY-1700 and we can
see how much complex code was removed.

Dan.




Mime
View raw message