db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kristian Waagan (JIRA)" <j...@apache.org>
Subject [jira] Updated: (DERBY-4477) Selecting / projecting a column whose value is represented by a stream more than once fails
Date Tue, 15 Dec 2009 16:58:18 GMT

     [ https://issues.apache.org/jira/browse/DERBY-4477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Kristian Waagan updated DERBY-4477:

    Attachment: derby-4477-0a-prototype.diff

Attached a prototype patch 0a.

The logic is isolated in ProjectRestrictResultSet, the rest of the patch is code from the
patch attached to DERBY-3650.
Currently, the prototype tried to implement something along the lines of phase 2.
I'm running the regressions tests, and tomorrow I will post a performance test and some results.

I'd like some feedback on how we want Derby to behave:
 - what should the clone stream threshold be?
 - is it okay to always materialize [[LONG] VAR]CHAR [FOR BIT DATA]?
 - the DataValueDescriptor.getLengthIfKnow was something I added just before posting the patch,
to optimize where possible. Keep it or ditch it? Useful in other scenarios?
   (as a side note, getCharLengthIfKnown was added to InternalClob)
 - the second check could be done in the constructor if there was a way to reliably find the
types of the relevant columns (I was only able to find the descriptors for the top-level result
set, but then I don't know the available structures very well)

Finally, with the exception of the code in the constructor, the added code should only be
activated if the user selects a "store streamable" [1] column more than once. I don't know
if that is very common, and I guess the most important issue is that Derby is able to handle
it without crashing.

[1] The store streamable are the various CHAR and CHAR FOR BIT DATA types, BLOB, and CLOB.
     (Hmm, what about XML?)

> Selecting / projecting a column whose value is represented by a stream more than once
> -------------------------------------------------------------------------------------------
>                 Key: DERBY-4477
>                 URL: https://issues.apache.org/jira/browse/DERBY-4477
>             Project: Derby
>          Issue Type: Bug
>          Components: Store
>    Affects Versions:,,
>            Reporter: Kristian Waagan
>            Assignee: Kristian Waagan
>         Attachments: derby-4477-0a-prototype.diff
> Selecting / projecting a column whose value is represented as a stream more than once
crashes Derby, i.e.:
> ResultSet rs = stmt.executeQuery("SELECT clobValue AS clobOne, clobValue AS clobTwo FROM
> rs.getString(1);
> rs.getString(2);
> After having looked at the class of bugs having to do with reuse of stream data types,
I now have a possible fix. It fixes DERBY-3645, DERBY-3646 and DERBY-2349 (there may be more
> The core of the fix is cloning certain DVDs being selected/projected in multiple columns.
There are two types of cloning:
>  A) materializing clone
>  B) stream clone
> (A) can be implemented already, (B) requires code to clone a stream without materializing
it. Note that the streams I'm talking about are streams originating from the store.
> Testing revealed the following:
>  - the cost of the checks performed to figure out if cloning is required seems acceptable
>  - in some cases (A) has better performance than (B) because the raw data only has to
be decoded once
>  - stream clones are preferred when the data value is above a certain size for several
>     * avoids potential out-of-memory errors (and in case of a server environment, it
lowers the memory pressure)
>     * avoids decoding the whole value if the JDBC streaming APIs are used to access only
parts of the value
>     * avoids decoding overall in cases where the value isn't accessed by the client /
>        (this statement conflicts with the performance observation above)
> We don't always know the size of a value, and since the fix code deals with all kinds
of data types, it is slightly more costly to try to obtain the size.
> What do people think about the following goal statement?
> Goals:
> ----- Phase 1
>  1) No crashes or wrong results due to stream reuse when executing duplicate column selections
(minus goal 4)
>  2) Minimal performance degradation for non-duplicate column selections
>  3) Only a minor performance degradation for duplicate [[LONG] VAR]CHAR [FOR BIT DATA]
column selections
> ----- Phase 2
>  4) No out-of-memory exceptions during execution of duplicate column selections of BLOB/CLOB
>  5) Optimize BLOB/CLOB cloning
> I think phase 1 can proceed by reviewing and discussing the prototype patch. Phase 2
requires more discussion and work (see DERBY-3650).
> A note about the bug behavior facts:
> Since this issue is the underlying cause for several other reported issues, I have decided
to be liberal when setting the bug behavior facts. Depending on where the duplicate column
selection is used, it can cause both crashes, wrong results and data corruption.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message