db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kristian Waagan (JIRA)" <j...@apache.org>
Subject [jira] Commented: (DERBY-3907) Save useful length information for Clobs in store
Date Mon, 13 Oct 2008 15:46:44 GMT

    [ https://issues.apache.org/jira/browse/DERBY-3907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12639082#action_12639082

Kristian Waagan commented on DERBY-3907:

[Header format]

Mike wrote:
What does the following mean? Will the changes apply to all sql which inserts clobs, or to
only particular jdbc interfaces?
1) Clob modifications are done on a copy (i.e. TemporaryClob).
With Clob modifications I mean updates of parts of an existing Clob. To get into this state,
one must first do a select to get the Clob that has already been stored in the database. I
think updating parts of the Clob can only be done through the Clob interface. Is that correct?

The ResultSet.updateXXX-methods can be seen as inserting a new Clob.
My current hope is that all insertion will go through ReaderToUTF8Stream, which seems like
a good place to count characters (and bytes) and obtain bytes per char statistics.

There might be a slight complication as we allow using setString on Clob columns.

What is the expected call sequence to store, and what is the goal performance characteristic?
The expected call sequence is exactly as you describe it (see Mike's comment from 10/Oct/08
10:10 AM).
Depending on the information we need to obtain, the header can be written at once or as the
last step of insertion. Even if we only store length information, we need to support the latter
due to the lengthless JDBC methods.

The goal performance characteristic for the length operation is that getting the length for
the largest storable Clob should be as fast as for the shortest one (read first page and decode
stream header bytes). This is not the case today, because the Clob data must be decoded to
find the length. Besides from Clob.getLength, this is hurting us where other methods do argument
checking using the Clob length.

Positioning can be expressed with costs like this:
[reset stream] + decode_chars + skip_bytes  
In certain cases, we can remove the decoding costs by knowing that all chars are represented
by one, two or three bytes. In these cases, the positioning cost should be as for Blob. This
is the motivation for the bytes per char information.

> Save useful length information for Clobs in store
> -------------------------------------------------
>                 Key: DERBY-3907
>                 URL: https://issues.apache.org/jira/browse/DERBY-3907
>             Project: Derby
>          Issue Type: Improvement
>          Components: JDBC, Store
>    Affects Versions:
>            Reporter: Kristian Waagan
>            Assignee: Kristian Waagan
> The store should save useful length information for Clobs. This allows the length to
be found without decoding the whole data stream.
> The following thread raised the issue on what information to store, and also contains
some background information: http://www.nabble.com/Storing-length-information-for-CLOB-on-disk-tp19197535p19197535.html
> The information to store, and the exact format of it, is still to be discussed/determined.
> Currently two bytes are set aside for length information, which is inadequate.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message