Mailing-List: contact derby-dev-help@db.apache.org; run by ezmlm
Precedence: bulk
Reply-To: <derby-dev@db.apache.org>
Received-SPF: pass (asf.osuosl.org: local policy)
Message-ID: <43B46F57.7090707@amberpoint.com>
Date: Thu, 29 Dec 2005 15:20:55 -0800
From: Bryan Pendleton <bpendleton@amberpoint.com>
User-Agent: Mozilla Thunderbird 1.0 (Windows/20041206)
MIME-Version: 1.0
To: derby-dev@db.apache.org
Subject: Re: [jira] Updated: (DERBY-212) Optimize some specific methods in
 Network Server to improve performance
References: <70685031.1135272692381.JavaMail.jira@ajax.apache.org>
In-Reply-To: <70685031.1135272692381.JavaMail.jira@ajax.apache.org>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit

Knut Anders Hatlen (JIRA) wrote:
> I have uploaded a patch (DERBY-212-parsePKGNAMCSN.diff) which changes
> the following files:

Hi Knut,

I thought I'd take a look at your changes, since I've recently
been studying a lot of this same code. I hope these comments
are useful to you.

In general, I think the changes are very good and will be a
definite improvement. DEATH TO STRING TOKENIZERS!!!

My comments below are very much "nits" and
mostly have to do with some readability and consistency issues.

thanks,

bryan


1) Why is StmtKey an inner class? It feels like it is as much of
a first-class object as Pkgnamcsn or DRDAString. I think you should
consider elevating it to the same level as those other classes.

2) One of my general feelings is that performance improvements
have to be careful to balance the tradeoff between the
increased complexity that they add to the code. I think you've
done a very good job of this, but I'm somewhat troubled by the
new class DRDAString. It seems to me that there are a few ways
in which it's slightly awkward, and I wonder whether a slightly
simpler version of this class would accomplish most of your
goals but be a little less awkward. In particular:

   - Having to maintain a separate 'length' field seems to add
     a fair amount of "bulk" to DRDAString.java, all to handle
     the one case of calling setBytes() with a new string which
     is shorter in length than the current string. How often
     does this actually happen, in practice? How bad do you think
     it would be to just always allocate a new buffer in setBytes(),
     even when it's a new shorter buffer. That would mean we
     didn't have to have a separate "length" field.

   - The "autotrim" feature feels, frankly, like a wart. It
     doesn't feel like it belongs with this class. I think that
     this functionality belongs either in the DDMReader.readString()
     method, or in the DRDAConnThread.parsePKGNAMCSN() method.

   - I also found the "cachedManager" and "cachedBuffer" feature
     to be pretty complicated. I thought that "cachedString"
     was OK, but the other two caching fields seem like quite a
     bit of complexity. How does this functionality actually
     get used in practice? That is, under what circumstances is
     getBuffer(CcsidManager) called such that the passed-in
     codeset manager is different than the original codeset
     manager for this string?

3) I like the Pkgnamcsn.java class very much. My only comment is
that I think all the "getter" methods should use the standard
DRDA acronyms. So I think getRdbnam(), getRdbcolid() and
getPkgid() are fine, but I wish it was:
   - getPkgcnstkn(), not getPkgcnstknStr(), and
   - I wish the javadoc comment for getSecnumber() said
     "get section number", not "Get PKGSN", or maybe I wish
     the method was named getPkgsn(). I'm a little conflicted here,
     because PKGSN is definitely an acronym used in the DRDA specs,
     but DRDAConnThread.java seems to generally refer to this field
     as "secnumber". I just think there's a little bit of consistency
     cleanup we could do here.

4) Similar to the previous comment, I think that the field in
DRDAStatement.java should be named 'pkgcnstkn', not 'pkgcnstknStr'.
I think the 'Str' on the end is a bit awkward.

5) Similarly, in DRDAConThread, having 'pkgcnstknStr' results in the
awkard construction 'writePKGNAMCSN(pkgcnstknStr.toString())', and
I thought to myself: "why are we calling 'toString' on a String'?
This might be less awkward if the variable was named 'pkgcnstkn' instead.

6) I see that we always initialize the DRDAStrings to 0-sized
buffers in the DRDAConnThread initializer, then re-allocate them
to the "right" size later. Is it possible to predict the "right" size when
initially allocating them? For example, does it make sense to use
values like CodePoint.RDBNAM_LEN rather than 0 here?

7) I think that the use of the 'changedStrings' variable in
DRDAConnThread.parsePKGNAMCSN could use with a more substantial
comment, as this is one of the more subtle parts of your change,
particularly since it ends up percolating down into DDMReader.readString
and DRDAString.setBytes. Perhaps you could say something along the
lines of the following:

	// It is common for a subsequent DRDA message to reference
	// the same database, collection, package, consistency
	// token, and section number. In this case,
	// efficiency can be achieved by recognizing that we're
	// using the same values that we did before, and so we
	// don't have to allocate and initialize a new Pkgnamcsn
	// object, but can just re-use the last one.

8) Lastly, I must admit that I'm surprised that the caching and
re-use of the Pkgnamcsn object via the 'prevPkgnamcsn' field in
DRDAConnThread is actually worth it. The Pkgnamcsn object seems
very lightweight to me, and in order to detect that we can cache
and re-use it, we have to have all the overhead of calling
getTrimmedSize() and equalTo() on each of the Pkgnamcsn components,
as well as the awkwardness of having a boolean return value in
DDMReader.readString and DRDAString.setBytes, both of which end up
surprising the reader since they would "naturally" be 'void'
methods, not 'boolean' methods.

Obviously I'm having an intuitive reaction here, but it "feels"
like the code path that was added to support the detection of
cases where we have the same Pkgnamcsn value is larger than the
code path that we save by allocating a new Pgknamcsn object on
every call to DRDAConnThread.parsePKGNAMCSN(). I can readily
see how all the other performance optimizations in your patch were
wins; I'm just surprised that this particular optimization was a win.