db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dyre.Tjeldv...@Sun.COM
Subject Re: ArrayInputStream and performance
Date Wed, 29 Nov 2006 19:48:24 GMT
Daniel John Debrunner <djd@apache.org> writes:

> I'm worried by this approach of removing checking of the limit or the
> position, it's much like saying we don't needs bounds checking on
> arrays because I know my code is correct.
> The current code provides some protection from a software bug,
> corrupted page or hacked page. Removing those checks may lead to hard
> to detect bugs where a position and/or limit is calculated incorrectly
> and subsequently leads to corrupted data or random exceptions being
> thrown.
> My feeling is that the integrity of the store and the code is better
> served by keeping these checks.

Thank you for clarifying your position on this. 

> I also think we need more performance numbers to justify such a
> change, a single set of runs from a single vm does not justify it. I
> will run numbers on linux with a number of vm's when I get the chance.
> Also often in these cases it is better to try and optimize at a higher
> level, rather than try to optimize at the lowest level (especially
> when removing such checks). In this case see if the number of calls to
> setLimit() or setPosition() could be reduced rather than
> micro-optimizing these methods by changing their core functionality.
> As an example the setLimit() call around the readExternalFromArray
> method. Maybe this responsibility could be pushed into the data type
> itself, and for some builtin types we trust their read mechanism to
> read the correct amount of data. E.g. reading a INTEGER will always
> read four bytes, so no need to set a limit around it. The limit is
> pushed there to support types that do not know their length on disk
> (e.g. some character types, some binary types, user defined types) and
> was to support arbitrary user types when the engine cannot trust or
> require that the de-serialization will read the complete stream and
> only its data.

Thank you for that suggestion. 

I'll freely admit that my understanding of this (and other) part(s) of
the code is very limited, and that you and others with an intimate
knowledge of the code probably can get much more relevant information
out of a profiler dump than I can.

So I would like to encourage you and Mike and anynone else to look at
Derby's performance. Do your own comparison with other databases using
the vm you prefer. Let us know what the results were. If performance
isn't a problem let me, and everone else that seems to think so, know
what we're doing wrong. And if it turns out that performance could be
better; do your own profiling. Propose some intelligent changes based
on what you see. Maybe I or someone else can implement it based on
your guidelines?


View raw message