db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel John Debrunner <...@apache.org>
Subject Re: ArrayInputStream and performance
Date Wed, 29 Nov 2006 16:16:48 GMT
Dyre.Tjeldvoll@Sun.COM wrote:
> Mike Matrigali <mikem_app@sbcglobal.net> writes:
>> It would be fine to use an unchecked and/or an ASSERT based check for
>>  readFieldLengthAndSetStreamPosition.  The "store" module owns this
>> access and is not counting on limit checks to catch anything here.
> Another question:
> All observed calls to setLimit (single tuple select load) come from the same
> method: StoredPage.readRecordFromArray(...) 
> (setLimit is also called from readRecordFromStream() but this does not
> seem to happen with this type of load)
> And the argument to setLimit()is always the local variable fieldDataLength
> which is the return value from
> StoredFieldHeader.readLengthAndSetStreamPosition().
> So if readLengthAndSetStreamPosition() can update the position without
> checking, presumably the return value from this method can be trusted
> as well? Is it then necessary to check this value again in setLimit(),
> or could we have used an unchecked version of setLimit here?

I'm worried by this approach of removing checking of the limit or the 
position, it's much like saying we don't needs bounds checking on arrays 
because I know my code is correct.

The current code provides some protection from a software bug, corrupted 
page or hacked page. Removing those checks may lead to hard to detect 
bugs where a position and/or limit is calculated incorrectly and 
subsequently leads to corrupted data or random exceptions being thrown.

My feeling is that the integrity of the store and the code is better 
served by keeping these checks.

I also think we need more performance numbers to justify such a change, 
a single set of runs from a single vm does not justify it. I will run 
numbers on linux with a number of vm's when I get the chance.

Also often in these cases it is better to try and optimize at a higher 
level, rather than try to optimize at the lowest level (especially when 
removing such checks). In this case see if the number of calls to 
setLimit() or setPosition() could be reduced rather than 
micro-optimizing these methods by changing their core functionality.

As an example the setLimit() call around the readExternalFromArray 
method. Maybe this responsibility could be pushed into the data type 
itself, and for some builtin types we trust their read mechanism to read 
the correct amount of data. E.g. reading a INTEGER will always read four 
bytes, so no need to set a limit around it. The limit is pushed there to 
support types that do not know their length on disk (e.g. some character 
types, some binary types, user defined types) and was to support 
arbitrary user types when the engine cannot trust or require that the 
de-serialization will read the complete stream and only its data.


View raw message