db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mike Matrigali (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DERBY-5367) Stale data retrieved when using new collation=TERRITORY_BASED:PRIMARY feature
Date Wed, 21 Sep 2011 22:09:27 GMT

    [ https://issues.apache.org/jira/browse/DERBY-5367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13112159#comment-13112159
] 

Mike Matrigali commented on DERBY-5367:
---------------------------------------

I have reviewed the most recent patch.  Here are comments, none of which I would say should
hold up a commit:

o I much prefer the current optimization as now it is a compile time decision.  I still would
prefer if non-collated db's did not have to march through
   all the collumns and calculate if the row has collated columns on every compile.  I might
have pushed the work up to language by adding an
   argument to OpenConglomerateScratchSpace to pass in hasCollatedTypes.  That would make
most sense if the calling level has a better
   understanding of the collation state of the DB and/or table and could avoid the calculation.

o I believe you said it, but just want to verify that you ran at least one set of tests where
all tests went through the proposed collation code path just
   to make sure it was tested once.  My guess is that the new codepath you are adding is probably
not tested at all, without adding some new tests.
   Seems like you should at least add some version of the test case from the bug report.

o I think the following is also a bug for same reason as you are working on - but not sure
how to force the code path, but happy to have it as a separate JIRA - let me know if I should
report it:
 if (this.getConglomerate().nKeyFields ==
     this.getConglomerate().nUniqueColumns)
 {
     // The row that we found deleted is exactly the new ro
     targetleaf.page.deleteAtSlot(
         insert_slot, false, this.btree_undo);

     break;
 }

o longer term I think that we should not be doing the update at all, and instead doing purges.
 I think that will solve some of the possible out of space issues.  But I think there are
self deadlocking issues there so I am in favor of your approach especially for back porting
to existing releases.  

> Stale data retrieved when using new collation=TERRITORY_BASED:PRIMARY feature
> -----------------------------------------------------------------------------
>
>                 Key: DERBY-5367
>                 URL: https://issues.apache.org/jira/browse/DERBY-5367
>             Project: Derby
>          Issue Type: Bug
>          Components: Store
>    Affects Versions: 10.8.1.2, 10.9.0.0
>         Environment: Mac OS X, Windows
>            Reporter: Brett Wooldridge
>            Assignee: Kristian Waagan
>            Priority: Critical
>         Attachments: derby-5367-1a-update_row_fully.diff, derby-5367-1b-update_row_fully.diff,
derby-5367-1b-update_row_fully.stat, derby-5367-2a-minimal_fix.diff, derby-5367-3a-update_field_by_field_preview.diff,
derby-5367-4a-fix_with_optimization_improved.diff
>
>
> Our product recently upgraded to version 10.8.1.2 in order to take advantage of the new
'case-insensitive' mode offered by Derby in the form of the "collation=TERRITORY_BASED:PRIMARY"
connection parameter.
> Unfortunately, we have run into an issue whereby stale data appears to be retrieved from
an index, even though the data in the table itself has changed.
> You can see this issue in the IJ session below.  The database in question was created
using this Java parameter when invoking IJ:
> -Dij.database=jdbc:derby:test;create=true;collation=TERRITORY_BASED:PRIMARY
> Here is the IJ session:
> CONNECTION0* - 	jdbc:derby:test
> * = current connection
> ij> CREATE TABLE tag (
>     tag_id INTEGER GENERATED BY DEFAULT AS IDENTITY NOT NULL,
>     tag VARCHAR(255) NOT NULL,
>     CONSTRAINT tag_pk PRIMARY KEY (tag_id),
>     CONSTRAINT tag_tag_unique UNIQUE (tag)
> );
> 0 rows inserted/updated/deleted
> ij> -- first insert a value 'Test', note the upper-case 'T' in 'Test'
> ij> INSERT INTO tag (tag) VALUES ('Test');
> 1 row inserted/updated/deleted
> ij> SELECT * FROM tag;
> TAG_ID     |TAG                                                                     
                                                       
> --------------------------------------------------------------------------------------------------------------------------------------------
> 1          |Test                                                                    
                                                       
> 1 row selected
> ij> -- Now delete the row
> ij> DELETE FROM tag WHERE tag='Test';
> 1 row inserted/updated/deleted
> ij> -- You could run another SELECT here to verify it is gone, but it is.
> ij> -- Now insert a new value 'test', note the lower-case 't' in 'test'
> ij> INSERT INTO tag (tag) VALUES ('test');
> 1 row inserted/updated/deleted
> ij> -- Now verify that the table contains only the lower-case version: 'test'
> ij> SELECT * FROM tag;
> TAG_ID     |TAG                                                                     
                                                       
> --------------------------------------------------------------------------------------------------------------------------------------------
> 2          |test                                                                    
                                                       
> 1 row selected
> ij> -- Now, here is the bug.
> ij> SELECT tag FROM tag;
> TAG                                                                                 
                                           
> --------------------------------------------------------------------------------------------------------------------------------
> Test                                                                                
                                           
> 1 row selected
> ij> 
> Note in the last SELECT we specify the 'tag' column specifically.  When we 'SELECT *',
Derby performs a table-scan and the result is correct.  However, when we 'SELECT tag', Derby
appears to use the index created for the 'tag_tag_unique' unique constraint.  As an optimization
Derby, like many databases, will use values directly from the index in the case where the
index covers all requested columns.
> The bigger question is, why doesn't the DELETE action cause the entry in the tag_tag_unique
index to be deleted?  Is this a further optimization?  If so, it is imperative that the index
at least be updated when the new value is inserted.
> This is rather a severe bug for us that causes stale data to be returned.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message