db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mike Matrigali (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DERBY-5367) Stale data retrieved when using new collation=TERRITORY_BASED:PRIMARY feature
Date Thu, 08 Sep 2011 22:36:09 GMT

    [ https://issues.apache.org/jira/browse/DERBY-5367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13100756#comment-13100756

Mike Matrigali commented on DERBY-5367:

For benchmarks it seems like we would like to know results for 2 different types of benchmarks.
 The worst case for this code would be primary key insert and delete repeated using the same
key.  I think that is what you are doing.  Then also just a normal mix of insert/delete. 
The question I would be trying to understand is it worth the overhead to the "normal" path
to know if we can then optimize for the worst case path.  Basically 3.6% degredation for that
worst case which I don't think is a normal usage pattern may be ok.  What I don't want to
see is
slowdown is "normal" case which I think we might have seen with patch 1 that did work in normal
path to optimize the worst case path.

I think option 2 is a reasonable approach for a bug fix to existing codelines.  Seems safe
to me, and a good first step.  And doing it not 
special cased gets us full testing that caught the previous problem with the patch.

With more thought I don't think we should do option 1.  I think it might make sense to log
a future project for the trunk to look into just going ahead a purging the deleted row and
then retrying the insert.  Whie we have the latch we may just want to purge all committed
deleted rows
in a similar fashion to what we do previous to splitting the page.
The problem is that purge is tricky in that it has to be committed while holding the latch
on the page, you can't just call it in-line in the current
code.  And you have to make sure that if you fork a transaction that parent is not conflicting
with child to do the purge, otherwise might loop
forever.  The purge will also address the RESOLVE in the code right now (since that comment
was written HeapRowLocations are now variable in size based on CompressedNumber) .  And with
the specific issue that is causing this issue it is also
possible that updating "equal" fields might need more space given that different characters
and chacter sequences can be "equal".  So the
problem is even possibly worse for the collation problem.

> Stale data retrieved when using new collation=TERRITORY_BASED:PRIMARY feature
> -----------------------------------------------------------------------------
>                 Key: DERBY-5367
>                 URL: https://issues.apache.org/jira/browse/DERBY-5367
>             Project: Derby
>          Issue Type: Bug
>          Components: Store
>    Affects Versions:,
>         Environment: Mac OS X, Windows
>            Reporter: Brett Wooldridge
>            Assignee: Kristian Waagan
>            Priority: Critical
>         Attachments: derby-5367-1a-update_row_fully.diff, derby-5367-1b-update_row_fully.diff,
derby-5367-1b-update_row_fully.stat, derby-5367-2a-minimal_fix.diff
> Our product recently upgraded to version in order to take advantage of the new
'case-insensitive' mode offered by Derby in the form of the "collation=TERRITORY_BASED:PRIMARY"
connection parameter.
> Unfortunately, we have run into an issue whereby stale data appears to be retrieved from
an index, even though the data in the table itself has changed.
> You can see this issue in the IJ session below.  The database in question was created
using this Java parameter when invoking IJ:
> -Dij.database=jdbc:derby:test;create=true;collation=TERRITORY_BASED:PRIMARY
> Here is the IJ session:
> CONNECTION0* - 	jdbc:derby:test
> * = current connection
> ij> CREATE TABLE tag (
>     tag VARCHAR(255) NOT NULL,
>     CONSTRAINT tag_pk PRIMARY KEY (tag_id),
>     CONSTRAINT tag_tag_unique UNIQUE (tag)
> );
> 0 rows inserted/updated/deleted
> ij> -- first insert a value 'Test', note the upper-case 'T' in 'Test'
> ij> INSERT INTO tag (tag) VALUES ('Test');
> 1 row inserted/updated/deleted
> ij> SELECT * FROM tag;
> TAG_ID     |TAG                                                                     
> --------------------------------------------------------------------------------------------------------------------------------------------
> 1          |Test                                                                    
> 1 row selected
> ij> -- Now delete the row
> ij> DELETE FROM tag WHERE tag='Test';
> 1 row inserted/updated/deleted
> ij> -- You could run another SELECT here to verify it is gone, but it is.
> ij> -- Now insert a new value 'test', note the lower-case 't' in 'test'
> ij> INSERT INTO tag (tag) VALUES ('test');
> 1 row inserted/updated/deleted
> ij> -- Now verify that the table contains only the lower-case version: 'test'
> ij> SELECT * FROM tag;
> TAG_ID     |TAG                                                                     
> --------------------------------------------------------------------------------------------------------------------------------------------
> 2          |test                                                                    
> 1 row selected
> ij> -- Now, here is the bug.
> ij> SELECT tag FROM tag;
> TAG                                                                                 
> --------------------------------------------------------------------------------------------------------------------------------
> Test                                                                                
> 1 row selected
> ij> 
> Note in the last SELECT we specify the 'tag' column specifically.  When we 'SELECT *',
Derby performs a table-scan and the result is correct.  However, when we 'SELECT tag', Derby
appears to use the index created for the 'tag_tag_unique' unique constraint.  As an optimization
Derby, like many databases, will use values directly from the index in the case where the
index covers all requested columns.
> The bigger question is, why doesn't the DELETE action cause the entry in the tag_tag_unique
index to be deleted?  Is this a further optimization?  If so, it is imperative that the index
at least be updated when the new value is inserted.
> This is rather a severe bug for us that causes stale data to be returned.

This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message