db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bryan Pendleton <bpendle...@amberpoint.com>
Subject Re: [jira] Updated: (DERBY-326) Improve streaming of large objects for network server and client
Date Sat, 04 Mar 2006 23:20:56 GMT
Tomohito Nakayama (JIRA) wrote:
>  I upload DERBY-326_6.patch. 

I spent a while working through this patch. It was very interesting
and I have learned a lot. Thank you for all the hard work on this patch.

The patch applied cleanly to the trunk and I had no troubles building it.
I created a number of test situations by hand and stepped through them
carefully in the debugger, and I was not able to construct any tests
which failed.

Also, while I would not call this a thorough performance test, I did
some experimentation with BLOB and CLOB objects in the 20 megabyte
range, and the results were dramatic:
  - before your patch, my test program was taking about 75 seconds
  - after your patch, my test program was taking 6 seconds
So that is a wonderful result! Hopefully your other performance tests
are as successful.

I do have some comments, but they are more along the lines of questions,
since as I said above I could not find any scenarios which actually broke
with your changes. Hopefully you can take a few minutes and look though
them, to help me learn some more about the code.

The questions are included below.



1) In DDMWriter.writeScalarStream(), at about line 702, there is the

     if (length == 0)

    I tried for a while to cause that code to be executed, but I could not.
    Do you know how to make that code be triggered? It seemed to me that
    the test for isEmptyStream() in DRDAConnThread.writeFdocaVal() might be
    short-circuiting this case and if that is true, perhaps we could remove
    that "if" statement from DDMWriter.

2) In DRDAConnThread.writeEXTDTA(), at about line 7487, there is the

      else if (o instanceof  byte[]) {

    This code is very interesting, because it seems to be the only place
    that might be calling DDMWriter.writeScalarStream() with a length
    value other than -1. But once again, I tried for a while to cause that
    code to be executed, and could not. I thought that it would fire if
    I used LONG VARCHAR FOR BIT DATA, but even when I used that data type,
    I could not cause the code at line 7487 to be executed.

    From what I can tell, LONG VARCHAR FOR BIT DATA is limited to 32700
    bytes max, and values of that type are always transmitted in-line, not
    as externalized data.

    Do you know how to make writeEXTDTA() be called with an object other
    than a EXTDTAInputStream?

    If not, then do you know how to cause DDMWriter.writeScalarStream() to
    be called with a length other than -1?

    I guess my basic point here is that we're doing some major surgery on
    DDMWriter.writeScalarStream(), and if the only path that we ever take
    through the code is the (length == -1) case, then perhaps we should
    consider removing the code which tries to use the known-in-advance length.

3) I had two thoughts about the code which wraps the various stream objects
    in BufferedInputStreams in order to use the mark() and reset() methods:

    a) It seems that there are two different places where we perform the
       wrapping of the streams: once in DDMWriter.writeScalarStream, and
       once in EXTDTAInputStream.openInputStreamAgain(). That seemed a bit
       unfortunate, and I was wondering if you thought we could arrange
       things so that there was exactly one place where we did this stream
    b) It seemed that the streams that were being wrapped were all streams
       of our own implementation, such as the ReEncodedInputStream, and the
       streams which are returned by Blob.getBinaryStream and by
       Clob.getAsciiStream(). Rather than wrapping all of these streams in
       BufferedInputStreams, could we not just implement the mark() and
       reset() methods on our streams, and avoid the extra overhead of the
       stream wrapping?

4) I didn't really understand the method openInputStreamAgain() in
    EXTDTAInputStream. It seems that, for a Blob or Clob, we open the
    stream to the Blob/Clob, then read a single byte from the stream just
    to see if we will get end-of-file or not, and then if we didn't get
    end-of-file, we close and re-open the stream.

    Am I understanding this code properly? If so, it seems rather awkward.
    Three ideas presented themselves to me:

    a) Perhaps there is a method that we can call on the Blob/Clob object
       to figure out if it is null or not, other than reading the first
       byte from the stream?
    b) Or, if we have to read the first byte, maybe we could just hang on
       to that first byte in a local buffer, rather than having to close
       and re-open the stream in order to be able to re-read that byte.
    c) Or, since we are going to need to have a markSupported() stream
       anyway, perhaps we could wait until we have constructed the markSupported
       stream, and then use the mark/reset support to peek at the first byte
       to see if the Blob/Clob is empty or not.

5) It seems that the streaming implementation in DDMWriter always does an
    actual I/O (that is, calls sendBytes()) for each 32K segment. While that
    is certainly correct, I found myself wondering how much of a performance
    impact it was having for *very* large data transfers. If users are using
    Derby to manage blobs which are 10's or 100's of megabytes in size, then
    I wonder if there are still more performance benefits that could be
    realized by batching up multiple 32K DSS Layer B segments and then sending
    them with a single call to sendBytes(). For example, maybe we could
    only do a hard I/O call on each 4th or 8th segment, and give the network
    a chance to work with 128K or 256K worth of bytes at a time.

    I don't know if this would make a real difference or not, and it would
    obviously remove some of the memory reduction benefit of the rest of your
    changes, but it occurred to me and I wanted to mention it.

6) Lastly, it would be nice to add a comment to the method
    EXTDTAInputStream.initInputStream(), specifically to explain the
    meaning of the return value from that method, since it's a bit subtle.

View raw message