db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mike Matrigali (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DERBY-5367) Stale data retrieved when using new collation=TERRITORY_BASED:PRIMARY feature
Date Mon, 29 Aug 2011 17:22:38 GMT

    [ https://issues.apache.org/jira/browse/DERBY-5367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13092984#comment-13092984

Mike Matrigali commented on DERBY-5367:

I have not looked at patch yet.  First some answers to previous questions.  

As to the following:
a) Deoptimize insert - always insert a new index row. 

This would be a bad idea.  Algorithms in the btree depend on every leaf row in the tree being
unique, whether the rows are deleted or not.   There are 2 cases
to consider: unique and non-unique btrees.  In the case of non-unique btrees each leaf row
is made unique by including the trailing row location as part of the whole key.  In the case
of unique btrees the keys should be unique not including the row location, and thus the search
algo's don't include it as part of the key.

The kinds of things that depend on this may include the following:
o the various binary searches on the leafs assume a single unique final destination, if the
code allows for multiple duplicate leafs then I think you might randomly end up in the middle
of a list of keys depending on binary search, and the code does not expect to get to a destination
and "backup" to beginning of
a list. 
o the check for inserting a duplicate error assumes the binary search will get to exactly
the ONE row that matches depending on unique vs non-unique
parameters.  I think bugs will arise if there are more than one matching row.  

As to following question:
>The block above the one where I applied the fix deals with the case where also the row
location is the same.
>When does this case happen, and is it guaranteed that the row values are correct even
when a (non-default) collation is being used? 
Logically the code should handle this as Derby theoretically allows for reusable record ids.
 In practice heap pages are marked to not allow them, so 
i don;t think this is a normal event.  It might happen as part of particular paths through
crash recovery where we do logical undo vs. physical undo on
btree pages. 

I am not sure about the question about non-default collation.  At high level I believe it
should work and the store should just count on the result 
of the collation code doing the compare.  If it says 2 different things are the same then
it should just treat them as the same.

Next I will look at patch.  Sorry about delay - have been out and away from list for awhile.

> Stale data retrieved when using new collation=TERRITORY_BASED:PRIMARY feature
> -----------------------------------------------------------------------------
>                 Key: DERBY-5367
>                 URL: https://issues.apache.org/jira/browse/DERBY-5367
>             Project: Derby
>          Issue Type: Bug
>          Components: Store
>    Affects Versions:,
>         Environment: Mac OS X, Windows
>            Reporter: Brett Wooldridge
>            Assignee: Kristian Waagan
>            Priority: Critical
>         Attachments: derby-5367-1a-update_row_fully.diff, derby-5367-1b-update_row_fully.diff,
> Our product recently upgraded to version in order to take advantage of the new
'case-insensitive' mode offered by Derby in the form of the "collation=TERRITORY_BASED:PRIMARY"
connection parameter.
> Unfortunately, we have run into an issue whereby stale data appears to be retrieved from
an index, even though the data in the table itself has changed.
> You can see this issue in the IJ session below.  The database in question was created
using this Java parameter when invoking IJ:
> -Dij.database=jdbc:derby:test;create=true;collation=TERRITORY_BASED:PRIMARY
> Here is the IJ session:
> CONNECTION0* - 	jdbc:derby:test
> * = current connection
> ij> CREATE TABLE tag (
>     tag VARCHAR(255) NOT NULL,
>     CONSTRAINT tag_pk PRIMARY KEY (tag_id),
>     CONSTRAINT tag_tag_unique UNIQUE (tag)
> );
> 0 rows inserted/updated/deleted
> ij> -- first insert a value 'Test', note the upper-case 'T' in 'Test'
> ij> INSERT INTO tag (tag) VALUES ('Test');
> 1 row inserted/updated/deleted
> ij> SELECT * FROM tag;
> TAG_ID     |TAG                                                                     
> --------------------------------------------------------------------------------------------------------------------------------------------
> 1          |Test                                                                    
> 1 row selected
> ij> -- Now delete the row
> ij> DELETE FROM tag WHERE tag='Test';
> 1 row inserted/updated/deleted
> ij> -- You could run another SELECT here to verify it is gone, but it is.
> ij> -- Now insert a new value 'test', note the lower-case 't' in 'test'
> ij> INSERT INTO tag (tag) VALUES ('test');
> 1 row inserted/updated/deleted
> ij> -- Now verify that the table contains only the lower-case version: 'test'
> ij> SELECT * FROM tag;
> TAG_ID     |TAG                                                                     
> --------------------------------------------------------------------------------------------------------------------------------------------
> 2          |test                                                                    
> 1 row selected
> ij> -- Now, here is the bug.
> ij> SELECT tag FROM tag;
> TAG                                                                                 
> --------------------------------------------------------------------------------------------------------------------------------
> Test                                                                                
> 1 row selected
> ij> 
> Note in the last SELECT we specify the 'tag' column specifically.  When we 'SELECT *',
Derby performs a table-scan and the result is correct.  However, when we 'SELECT tag', Derby
appears to use the index created for the 'tag_tag_unique' unique constraint.  As an optimization
Derby, like many databases, will use values directly from the index in the case where the
index covers all requested columns.
> The bigger question is, why doesn't the DELETE action cause the entry in the tag_tag_unique
index to be deleted?  Is this a further optimization?  If so, it is imperative that the index
at least be updated when the new value is inserted.
> This is rather a severe bug for us that causes stale data to be returned.

This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message