db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Matrigali <mikem_...@sbcglobal.net>
Subject Re: [jira] Updated: (DERBY-721) State of InputStream retrieved from resultset is not clean , if there exists previous InputStream .
Date Wed, 07 Dec 2005 17:21:21 GMT
I agree with you conclusion, if in the future someone wants to fix
the network server case then it should not be difficult to make the
embedded case match.  Going forward I think our best shot at having
reasonable stream implementations is that we should assume they
can only be read once both on the way into the server and on the
way out (at least the normal optimized codepath should do so).

1 meg will cause the large blob/clob code path.  As a rule of thumb
stream based tests probably should have 3 instances:
1) blob/clob less than 1k
2) blob/clob more than 250k
3) blob/clob more than the allowed heap size in jvm you are testing,
    these tests can follow the exising ones in the large test area.

The difference between
1 meg a 1 gig is that it is hard to tell if the underlying code is
materializing the entire stream into memory with 1 meg.  There is some
number between the 2 (with appropriate setting of max jvm heap) that will.

TomohitoNakayama wrote:
> Hello.
> 
> Mike Matrigali wrote:
> 
>> For embedded I was worried about your description of changes, that made
>> it sound like you somehow were going to buffer the blob in memory.  I
>> see from your changes you basically added reset calls if the underlying
>> stream was resetable.  What I don't know is what happens in the 2 gig
>> blob/clob cases, either you will have to investigate or maybe someone
>> on the list knows?
>>
> As you saw, the patch does not make new cache, just resets the stream.
> I tested just case of 1 mega lob and confirmed lob was streamed from the 
> beginning from 2nd InputStream.
> I don't think there exists qualitative difference in behavior between 1M 
> and  1G, though I'm not completely confirmed it.
> 
> 
> However, I understand your opinion that this patch will implicitly 
> restrict implementation of network client ,
> that entire information streamed from server have to be stored , because 
> streaming between server and client are performed only once.
> 
> Now I think it is preferable to throw Exception
> when 2nd Reader/InputStream for same value in result was retrieved or
> when Reader/InputStream was retrieved in different order as in sql.
> // I hope other's opinion around restriction not to allow user to 
> retrieve Reader/InputStream for result columns in different order as in 
> sql .
> 
> Once, I think the restriction may be too hard for user ,
> however I conclude the restriction is reasonable because ResultSet is 
> not cache for set of result and
> have characteristic of  Stream (especially when lob was used ).
> If user needs cache , the cache should be developed as separated from 
> ResultSet .
> 
> Thank you for your suggestion.
> I didn't realize this Stream like characteristics of ResultSet .
> 
> Best regards.
> 
> 
> Mike Matrigali wrote:
> 
>> I don't have enough information to completely answer, but will
>> try to state my opinion on the issue.
>>
>> I think the goals should be:
>> 1) provide same behavior in embedded and network server mode.
>> 2) provide same behavior whether the blob is "small" or "large".
>> 3) optimize the standard case of getting the column once in jdbc,
>>   as the spec allows.
>> 4) If at all possible when selecting a blob/clob as a stream it should
>>   not be necessary to materialize the entire stream in memory.
>>
>> For embedded I was worried about your description of changes, that made
>> it sound like you somehow were going to buffer the blob in memory.  I
>> see from your changes you basically added reset calls if the underlying
>> stream was resetable.  What I don't know is what happens in the 2 gig
>> blob/clob cases, either you will have to investigate or maybe someone
>> on the list knows?
>>
>> For embedded it is theoretically possible for the reset of the stream
>> to go all the way back to store and read it again from the beginning.
>> For network client this seems even more complicated to do in an
>> optimized way (I believe you are looking at improving the streaming
>> behavior of large objects to network client so I defer to you in
>> how hard this may be).
>>
>> My opinion would be to make the second reference throw an error, to
>> make that behavior consistent in network server, embedded, long and
>> short blob/clob streams.  And to document that behavior.
>>
>> Having said that, I am not against the code working as you are moving
>> toward as long as it does not cause a memory/runtime performance issue
>> for the normal single get stream case.
>>
>> TomohitoNakayama wrote:
>>
>>  
>>
>>> Hello Daniel and Mike .
>>>
>>> Do you think it is preferable not to allow user to call getXXXXStream
>>> twice from one row ,
>>> in order to make a room for releasing memory for cache in ResultSet as
>>> soon as possible ?
>>>
>>> Best regards.
>>>
>>>
>>> Daniel John Debrunner wrote:
>>>
>>>   
>>>
>>>> Mike Matrigali wrote:
>>>>
>>>>
>>>>
>>>>     
>>>>
>>>>> Is there anything in the standard that says what the second call to
>>>>> the get the stream has to do?  Imagine the case where the first
>>>>> stream reads 1 gig of a 2 gig blob, does the second call to
>>>>> getBinaryStream() have to return the 1st gig again?
>>>>>  
>>>>>       
>>>>
>>>> Yes & no.
>>>>
>>>> Nothing in the JDBC spec doc, but the javadoc for java.sql.ResultSet 
>>>> has
>>>> always had:
>>>>
>>>> " For maximum portability, result set columns within each row should be
>>>> read in left-to-right order, and each column should be read only once."
>>>>
>>>> Thus, Derby could thrown an exception if there was a second 
>>>> getXXXStream
>>>> call on the same column.
>>>>
>>>>
>>>>
>>>>     
>>>>
>>>>> Any change that tries to cache the bytes returned by the first
>>>>> getBinaryStream either in local client or network client code is
>>>>> going to be a performance/memory drain.
>>>>>  
>>>>>       
>>>>
>>>> Agreed, we need to be careful here, we need to optmise the frequent
>>>> case, getting the column's value once as-per JDBC.
>>>>
>>>> Dan.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>     
>>
>>
>>
>>  
>>
> 


Mime
View raw message