db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From TomohitoNakayama <tomon...@basil.ocn.ne.jp>
Subject Re: [jira] Commented: (DERBY-326) Improve streaming of large objects for network server and client
Date Tue, 17 Jan 2006 14:04:07 GMT
Hello Kathey.

I answer your questions.

>My previous comment was regarding the places where new code uses reverse
>indentation or no indentation, not reformatting existing code. We don't
>have formatting guidelines but I think code blocks should be indented.
>  
>
I see.
I will try to indent them partially.
However, I am anxious about readability of patch itself.

Please evaluate readability of next patch too.


>- Can you add a comment to explain the calculation for prepScalarStream return value,
especially what the 6 and 4 are?
>return DssConstants.MAX_DSS_LENGTH - 6 - 4 - nullIndicatorSize  - extendedLengthByteCount
>
They are hard-coded number inherited from original code.
I read them that
6 is Length of Layer A header and
4 is Length of Layer B header.

Please see DRDA, Version 3, Volume 3: Distributed Data Management (DDM) 
Architecture page 315


>Could you verify that my analysis of this code  is correct?
>
I think not correct.
"flushScalarStreamSegment" is called *only* when DSS segment is full (or 
no more information remains in stream).


>- Why do we need this ensureLength call for layerBstreaming in prepScalarStream? 
>      ensureLength( layerBstreaming ? 
>		    DEFAULT_BUFFER_SIZE - offset : 
>		    length );
>  
>
As written before, "flushScalarStreamSegment" is called *only* when DSS 
segment is full (or no more information remains in stream).
Therefore it must be possible for buffer to contain whole DSS segment, 
so that running out of buffer happens only after running out of DSS segment.

Then, flushScalarStreamSegment will be called correctly before running 
out of buffer.

The call to ensureLength ensure size of buffer to contain whole DSS 
segment and realize above.

// To say the truth, I read original flushScalarStreamSegment as that 
buffer is always large enough to contain whole DSS segment.
// When reading mail from Bryan, I thought adding ensureLength is not 
necessary but just improvement for appropriate size of buffer.
// However I found that  implementation of DMMWriter would cause 
insufficient size of buffer *just when* I answer you ....
// Well ... Thank you ....
// // To say more verbosely , I think insufficient size of buffer may be 
escaped outside DDMWriter already.
// // Otherwise original flushScalarStreamSegment would not worked ever.


>Sorry, one more question.
>Could you explain the error handling in the new scheme?  The old code would call padScalarStreamForError
and that is now gone, but I am not sure I understand how the error handling is handled now.
>
Answer is that streaming is discontinued and exception is sent using 
agent.markCommunicationFailure as same as other place.


I will answer your "Possible future improvements:" in next mail ...
Now, it is almost midnight in Japan and I need to sleep :)


Best regards.



Kathey Marsden (JIRA) wrote:

>    [ http://issues.apache.org/jira/browse/DERBY-326?page=comments#action_12362861 ] 
>
>Kathey Marsden commented on DERBY-326:
>--------------------------------------
>
>I think this patch is a big improvement over reading the lobs into memory.
>A few comments on future improvement and some questions on the code below:
>
>
>Possible future improvements:
>
>- for VARCHAR FOR BIT DATA (byte[] values)  we still call writeScalarStream with length
and do not do layerBStreaming.  Would it make sense to stream this as well and get rid of
all the code for handling extended length etc?
>
>reencode() performance 
>This method is called every 1024 characters of a character stream and 
>-- creates a localBuffer char[1024] to read the character stream data
>-- creates a new String(localBuffer) 
>-- creates a new byte[] with the getBytes() call
>-- creates a new ByteArrayInputStream
>
>A simple optimization using your  current scheme might be to have global field  byte[BUFFERED_CHAR_LEN]
instead of making a new one  each time.  Perhaps also a single  OutputStreamWriter field wrapped
around a ByteArrayOutputStream  which is reset on each reencode call could eliminate the String
creation.
>
>Long term we should consider the fact that Derby stores the data in UTF8 format, getCharacterStream
decodes it then network server reencodes it to UTF8.
>DERBY-760 is meant to address this issue, plus hopefully eliminate the additional copy
into the buffer in  DDMWriter.  I will add more comments to DERBY-760.
>
>
>
>QUESTIONS on code:
>
>DDMWriter - I always had a hard time understanding the stream code in this file , but
found it much easier with your single loop vs the triple loop of the old code.  But I have
some questions.
>
>- Why do we need this ensureLength call for layerBstreaming in prepScalarStream? 
>      ensureLength( layerBstreaming ? 
>		    DEFAULT_BUFFER_SIZE - offset : 
>		    length );
>
>- Can you add a comment to explain the calculation for prepScalarStream return value,
especially what the 6 and 4 are?
>return DssConstants.MAX_DSS_LENGTH - 6 - 4 - nullIndicatorSize  - extendedLengthByteCount
>
>
>writeScalarStream
>
>Could you verify that my analysis of this code  is correct?
>	while( !isLastSegment ){
>		/*
>		 * This loop writes and flushes the data. If there is more data than will
>		 * fit in the remaining buffer space or will fit in the current DSS, the 
>		 * data will be continued in the next DSS segment via flushScalarStreamSegment.
>		 * 
>		 */
>		    int spareBufferLength = bytes.length - offset;
>		[snip more of writeScalarStream ]
>	    
>			
>			
>
>			// Adjust spareDssLength for 2 byte continuation flags
>			//written by flushScalarStreamSegment
>			if( ! isLastSegment )
>			    spareDssLength = DssConstants.MAX_DSS_LENGTH - 2;
>
>			}
>		    
>		}		
>
>  
>
>>Improve streaming of large objects for network server and client
>>----------------------------------------------------------------
>>
>>         Key: DERBY-326
>>         URL: http://issues.apache.org/jira/browse/DERBY-326
>>     Project: Derby
>>        Type: Improvement
>>  Components: Network Client, Performance, Network Server
>>    Reporter: Kathey Marsden
>>    Assignee: Tomohito Nakayama
>> Attachments: DERBY-326.patch, DERBY-326_2.patch, DERBY-326_3.patch, DERBY-326_4.patch,
DERBY-326_5.patch
>>
>>Currently the stream writing  methods in network server and client require a  length
parameter. This means that we have to get the length of the stream before sending it. For
example in network server in EXTDTAInputStream we have to use getString and getbytes() instead
of getCharacterStream and getBinaryStream so that we can get the  length.
>>SQLAM Level 7 provides for the enhanced LOB processing to allow streaming without
indicating the length, so, the writeScalarStream methods in
>>network server DDMWriter.java and network client Request.java can be changed to not
require a length.
>>Code inspection of these methods seems to indicate that while the length is never
written it is used heavily in generating the DSS. One strange thing is that it appears on
error, the stream is padded out to full length with zeros, but an actual exception is never
sent.  Basically I think perhaps these methods need to be rewritten from scratch based on
the spec requirements for lobs.
>>After the writeScalarStream methods have been changed, then EXTDAInputStream can be
changed to properly stream LOBS. See TODO tags in this file for more info.  I am guessing
similar optimizations available in the client as well, but am not sure where that code is.
>>    
>>
>
>  
>

-- 
/*

        Tomohito Nakayama
        tomonaka@basil.ocn.ne.jp
        tomohito@rose.zero.ad.jp
        tmnk@apache.org

        Naka
        http://www5.ocn.ne.jp/~tomohito/TopPage.html

*/ 


Mime
View raw message