Mailing-List: contact derby-dev-help@db.apache.org; run by ezmlm
Precedence: bulk
Reply-To: <derby-dev@db.apache.org>
Date: Mon, 15 Aug 2011 13:15:27 +0000 (UTC)
From: "Kristian Waagan (JIRA)" <jira@apache.org>
To: derby-dev@db.apache.org
Message-ID: 
 <1590989390.38337.1313414127857.JavaMail.tomcat@hel.zones.apache.org>
In-Reply-To: 
 <1080959284.600.1312263927387.JavaMail.tomcat@hel.zones.apache.org>
Subject: [jira] [Updated] (DERBY-5367) Stale data retrieved when using new
 collation=TERRITORY_BASED:PRIMARY feature
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


     [ https://issues.apache.org/jira/browse/DERBY-5367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kristian Waagan updated DERBY-5367:
-----------------------------------

    Attachment: derby-5367-1b-update_row_fully.stat
                derby-5367-1b-update_row_fully.diff

Thanks for testing and reviewing the patch, Brett.
I will work hard to have a fix committed in time for the upcoming 10.8.2-release, but I'd still like to get a comment from one of the store experts.
The store is tricky, and maybe there are other places in the code that isn't prepared for collations.

Attaching patch 1b, with the following changes:
 o moved the check for collated types inside the sp.resultExact block. A further change may be to remove the instance variables altogether.
 o added a regression test in CollationTest2.

There might be room for improvement in the wording (both comments and method-names), since my vocabulary related to collations is rather thin :) Also, I believe we are applying a collation order even when there is no territory specified, so DVD.isCollatedType is maybe confusing?

Another question:
The block above the one where I applied the fix deals with the case where also the row location is the same.
When does this case happen, and is it guaranteed that the row values are correct even when a (non-default) collation is being used?

> Stale data retrieved when using new collation=TERRITORY_BASED:PRIMARY feature
> -----------------------------------------------------------------------------
>
>                 Key: DERBY-5367
>                 URL: https://issues.apache.org/jira/browse/DERBY-5367
>             Project: Derby
>          Issue Type: Bug
>          Components: Store
>    Affects Versions: 10.8.1.2, 10.9.0.0
>         Environment: Mac OS X, Windows
>            Reporter: Brett Wooldridge
>            Priority: Critical
>         Attachments: derby-5367-1a-update_row_fully.diff, derby-5367-1b-update_row_fully.diff, derby-5367-1b-update_row_fully.stat
>
>
> Our product recently upgraded to version 10.8.1.2 in order to take advantage of the new 'case-insensitive' mode offered by Derby in the form of the "collation=TERRITORY_BASED:PRIMARY" connection parameter.
> Unfortunately, we have run into an issue whereby stale data appears to be retrieved from an index, even though the data in the table itself has changed.
> You can see this issue in the IJ session below.  The database in question was created using this Java parameter when invoking IJ:
> -Dij.database=jdbc:derby:test;create=true;collation=TERRITORY_BASED:PRIMARY
> Here is the IJ session:
> CONNECTION0* - 	jdbc:derby:test
> * = current connection
> ij> CREATE TABLE tag (
>     tag_id INTEGER GENERATED BY DEFAULT AS IDENTITY NOT NULL,
>     tag VARCHAR(255) NOT NULL,
>     CONSTRAINT tag_pk PRIMARY KEY (tag_id),
>     CONSTRAINT tag_tag_unique UNIQUE (tag)
> );
> 0 rows inserted/updated/deleted
> ij> -- first insert a value 'Test', note the upper-case 'T' in 'Test'
> ij> INSERT INTO tag (tag) VALUES ('Test');
> 1 row inserted/updated/deleted
> ij> SELECT * FROM tag;
> TAG_ID     |TAG                                                                                                                             
> --------------------------------------------------------------------------------------------------------------------------------------------
> 1          |Test                                                                                                                            
> 1 row selected
> ij> -- Now delete the row
> ij> DELETE FROM tag WHERE tag='Test';
> 1 row inserted/updated/deleted
> ij> -- You could run another SELECT here to verify it is gone, but it is.
> ij> -- Now insert a new value 'test', note the lower-case 't' in 'test'
> ij> INSERT INTO tag (tag) VALUES ('test');
> 1 row inserted/updated/deleted
> ij> -- Now verify that the table contains only the lower-case version: 'test'
> ij> SELECT * FROM tag;
> TAG_ID     |TAG                                                                                                                             
> --------------------------------------------------------------------------------------------------------------------------------------------
> 2          |test                                                                                                                            
> 1 row selected
> ij> -- Now, here is the bug.
> ij> SELECT tag FROM tag;
> TAG                                                                                                                             
> --------------------------------------------------------------------------------------------------------------------------------
> Test                                                                                                                            
> 1 row selected
> ij> 
> Note in the last SELECT we specify the 'tag' column specifically.  When we 'SELECT *', Derby performs a table-scan and the result is correct.  However, when we 'SELECT tag', Derby appears to use the index created for the 'tag_tag_unique' unique constraint.  As an optimization Derby, like many databases, will use values directly from the index in the case where the index covers all requested columns.
> The bigger question is, why doesn't the DELETE action cause the entry in the tag_tag_unique index to be deleted?  Is this a further optimization?  If so, it is imperative that the index at least be updated when the new value is inserted.
> This is rather a severe bug for us that causes stale data to be returned.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira