db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kristian Waagan <Kristian.Waa...@Sun.COM>
Subject Re: CLOB performance
Date Fri, 06 Feb 2009 11:48:37 GMT
Knut Anders Hatlen wrote:
> Kristian Waagan <Kristian.Waagan@Sun.COM> writes:
>> Dag H. Wanvik wrote:
>>> Thanks for the good work!
>>> Kristian Waagan <Kristian.Waagan@Sun.COM> writes:
>>>> testFetchLargeClobPieceByPiece               673707     624639      3370
>>>> testFetchLargeClobPieceByPieceBackwards     1138559    1059045      2863
>>> Interesting; fetching backwards is faster than forwards? :) Test
>>> artifact, or?
>> Hi Dag,
>> I certainly hope I haven't optimized for fetching LOBs backwards!
> It looks like there are some differences between those tests. They fetch
> chunks of different sizes, and one of them performs sanity checking
> against a LoopingAlphabetReader whereas the other one doesn't. So I
> don't think we can say that fetching backwards is faster just by looking
> at those numbers.


As I mentioned before, the results from the tests are really only 
comparable for runs made on the same machine and for one and one test 
Some of the tests fetch all ten Clobs (repeated five times), whereas 
some tests only fetch a single Clob (repeated as well).

Looking at the tests again, I see the test does something completely 
different. It just fetches a small part of the Clob!
The reason why fetching the whole Clob backwards would be a lot slower, 
is that Derby would have to skip parts of the data many times.

What the test really tests, is the ability of UTF8Reader to go backwards 
in its internal buffer, which it couldn't do before. If going backwards 
in this buffer weren't possible, the test would cause Derby to skip 
around 7.5 MB (size=15MB, pieceSize=10, pos=size/2-pieceSize, 
intBuf=8192) over 800 times.
If you read backwards with chunks equal to or larger than the internal 
buffer, Derby must reposition size / chunk times. With a 15 MB Clob and 
a 32 KB chunk size, this would give 480 repositions. If my formula from 
DERBY-3766 is correct, Derby has to skip approx 3.5 GB of data in this case!

One situation where the backwards repositioning could be a great 
time-saver, is when you are searching for a pattern in the Clob (using 
Clob.position). If the pattern is relatively short compared to the 
internal buffer, the Clob contains many partial matches again the 
pattern, and the internal buffer boundary isn't crossed when restarting 
the search, the win will be big.

Back to the results I posted originally, 
'testFetchLargeClobPieceByPiece' fetches all 15 MB five times in 3370 
ms, and 'testFetchLargeClobPieceByPieceBackwards' fetches around 8 K in 
10 character chunks five times. A quick investigation revealed that 
Derby had to reposition the stream twice: once for the first request 
(skipping to read position 7864310) and once for the last request 
(skipping to read position 7856120). The first position is close to the 
end of the internal buffer, and the last position is on the "wrong side" 
of the lower buffer boundary. For all other requests Derby was able to 
go 20 characters backwards in the internal buffer in UTF8Reader, which 
consists of only changing a position variable.

Hope this made things a little clearer, and please, don't optimize your 
application by reading Clobs backwards ;)


View raw message