db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kristian Waagan (JIRA)" <j...@apache.org>
Subject [jira] Updated: (DERBY-3907) Save useful length information for Clobs in store
Date Wed, 26 Nov 2008 13:06:44 GMT

     [ https://issues.apache.org/jira/browse/DERBY-3907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Kristian Waagan updated DERBY-3907:
-----------------------------------

    Attachment: derby-3907-alternative_approach.diff

I got stuck trying to implement the original solution, so I tried an alternative approach.

It is a lot simpler, but people might not like it. Note however, that it follows roughly the
same pattern as Blob.
Note the patch is a quick mash-up, and I want some feedback from the community.

The alternative approach is to make all classes writing and reading data from store able to
peek at it and determine which format it has to use to read/write the data.
Including my second format, we have these two byte formats:
 - current: D1_D2_DATA
 - new: D4_D3_M_D2_D1_DATA

M is a magic byte, and is used to detect the new format. It is a illegal UTF-8 encoding, so
it should not be possible to interpret it incorrectly as the first format and data.
I have set M to F0 (11110000), but I'm masking out the last four bits when looking for the
magic byte. This makes it possible to have arbitrary many formats, should that be necessary,
the main point is to keep the four highest bits set.
With respect to data corruption (i.e. one bit getting flipped), is this approach safe enough?

So if we need to be able to store huge Clobs in the future, we could change M and use another
format:
 - future: D6_D5_M_D4_D3_D2_D1_DATA
The same approach could be used to store other meta information.

The patch 'derby-3907-alternative_approach.diff' only changes behavior for small Clobs. To
enable a new format for a larger Clob, the streaming classes have to be changed (ReaderToUTF8Stream,
UTF8Reader).
It should be noted that these classes are used to write other character types (CHAR, VARCHAR)
as well, and I do not intend to change how they are represented. This means that I have to
include enough information to be able to do the correct thing.

While the format can be detected on read, an informed decision must be made on write. Now
I'm consulting the data dictionary to check the database version, and if it is less than 10.5
I use th e old format. Is there a better way?


Regarding the original approach, I got stuck because the upper layers of Derby are sending
down NULL values of the data types into store. The upper layer don't have any context information,
and is unable to choose the correct implementation. The system doesn't seem to be set up for
having multiple implementations of a single data type at this level.
I ended up with a series of hacks, for instance having store override the Clob implementation
type, but it just didn't work very well. At one point I had normal, soft- and hard-upgraded
working, but compress table failed. I'm sure this isn't the only path that will fail.

I might pick up the work again later, but right now I want to wait for a while and work on
other issues.

> Save useful length information for Clobs in store
> -------------------------------------------------
>
>                 Key: DERBY-3907
>                 URL: https://issues.apache.org/jira/browse/DERBY-3907
>             Project: Derby
>          Issue Type: Improvement
>          Components: JDBC, Store
>    Affects Versions: 10.5.0.0
>            Reporter: Kristian Waagan
>            Assignee: Kristian Waagan
>         Attachments: derby-3907-alternative_approach.diff
>
>
> The store should save useful length information for Clobs. This allows the length to
be found without decoding the whole data stream.
> The following thread raised the issue on what information to store, and also contains
some background information: http://www.nabble.com/Storing-length-information-for-CLOB-on-disk-tp19197535p19197535.html
> The information to store, and the exact format of it, is still to be discussed/determined.
> Currently two bytes are set aside for length information, which is inadequate.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message