db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kristian Waagan (JIRA)" <j...@apache.org>
Subject [jira] Updated: (DERBY-3934) Improve performance of reading modified Clobs
Date Thu, 04 Dec 2008 16:08:44 GMT

     [ https://issues.apache.org/jira/browse/DERBY-3934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Kristian Waagan updated DERBY-3934:

    Attachment: derby-3934-3a-clobupdreader_utf8reader.stat

'derby-3934-3a-clobupdreader_utf8reader.diff' makes the handling of
StoreStreamClob and TemporaryClob consistent.

The following files are touched (all in derby.impl.jdbc):
*** EmbedClob.
 Updated call to ClobUpdatableReader. The change of the position argument is

*** TemporaryClob
 Replaced the ClobUpdatableReader returned by getReader with a UTF8Reader.
 Internal handling of TemporaryClob should deal with changing contents
 specifically, or create a ClobUpdatableReader where required.
 Note also the use of the new CharacterStreamDescriptor class. This piece of
 code will probably be changed later on, when there is more information about
 the stream available. For instance, caching byte/char positions allows to skip
 directly to the byte position through the underlying file API. This way, we
 don't have to decode all the raw bytes to skip the correct number of chars.

*** ClobUpdatableReader
 More or less rewritten. It now uses the new methods exposed by InternalClob to
 detect changes in the underlying Clob content. Note that this class doesn't
 handle repositioning, only detection of changes and forwarding of read/skip
 Note the lazy initialization of the underlying reader.

 WARNING: There is one thing missing, which is proper synchronization. Access to
 store will be synchronized in other locations, but this class is not thread
 safe. I haven't decided yet whether to synchronize on the reader object or the
 root connection. I think the latter is the best choice. Does anyone know
 anything about the cost of taking locks on the same object multiple times?

*** StoreStreamClob
 Replaced old UTF8Reader constructor with the new one. Again, this code needs
 to be updated when more information about the stream is available. This is to
 allow UTF8Reader to perform better.

*** UTF8Reader
 Added a new constructor, using the new CharacterStreamDescriptor class.
 Removed one constructor.
 Retrofitted the second old constructor to use CharacterStreamDescriptor. This
 will be removed when the calling code has been updated.
 The old method calculating the buffer size will also be removed.
 Stopped referencing PositionedStoreStream, using PositionedStream interface
 instead. This allows the positioning logic to be used for both store streams
 and LOBInputStream streams.
 The reader has been prepared to be able to deal with multiple data offsets,
 i.e. handling several store stream formats. For instance, the current
 implementations has an offset of two bytes, where as the planned new one will
 have an offset of at least five bytes. LOBInputStream has an offset of zero
 bytes (no header information).
 From now on, position aware streams are not closed as early as before, because
 we might have go backwards in the stream. Streams that can only move forwards
 are closed as soon as possible (as before).

 Tests are running, and about 3/4 finished. No errors so far. I will post final
 results later.
 Patch ready for review.

 The plan forwards
 After patch 3a is in, I plan to do the following;
  1) Implement TemporaryClob.getInternalReader().
     This will dramatically improve the Clob.getSubString performance for
     modified Clobs.
  2) I will consider adding a simple byte/char position cache.
     The point of this is to be able to skip to a given byte position without
     having to decode byte into chars. This is a mechanism that will only help
     certain access patterns, but it should come with a very low overhead.
  3) Continue working with the new Clob format.
     When it is in place, care must be taken to utilize the new steam
     information where possible. The primary one is returning the length through
     Clob.length(). A second opportunity is using the length information to take
     decisions in the byte/char position cache.
     This work is mostly related to DERBY-3907.

I'm using the simple Clob regression tests in my work, and it has already
revealed a bug :) I had forgotten to include the know byte length in the
CharacterStreamDescription, which caused UTF8Reader to allocate a buffer that
was way too big (8K instead of 100 bytes).
The last step in my LOB work will be to write a simple report documenting the

> Improve performance of reading modified Clobs
> ---------------------------------------------
>                 Key: DERBY-3934
>                 URL: https://issues.apache.org/jira/browse/DERBY-3934
>             Project: Derby
>          Issue Type: Improvement
>          Components: JDBC
>    Affects Versions:
>            Reporter: Kristian Waagan
>            Assignee: Kristian Waagan
>         Attachments: derby-3934-1a-clob_replace_test.diff, derby-3934-2a-intclob_new_methods.diff,
derby-3934-3a-clobupdreader_utf8reader.diff, derby-3934-3a-clobupdreader_utf8reader.stat
> The performance of reading modified Clobs is poor, which is demonstrated by running a
test program selecting a 10 MB Clob and then getting the contents using getSubString:
>  - unmodified Clob (StoreStreamClob) : ~1 300 ms
>  - modified Clob (TemporaryClob): ~156 000 ms
> In this case, the Clob was modified by changing the first character.
> A number of subtasks will be created to handle the various issues, which will be related
to both performance and code cleanup.
> For a brief overview, see http://www.nabble.com/Suggestion-for-improving-ClobUpdatableReader-and-related-code-to20308303.html

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message