db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kristian Waagan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DERBY-5367) Stale data retrieved when using new collation=TERRITORY_BASED:PRIMARY feature
Date Mon, 12 Sep 2011 11:41:15 GMT

    [ https://issues.apache.org/jira/browse/DERBY-5367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13102566#comment-13102566

Kristian Waagan commented on DERBY-5367:

I have run two simple benchmarks. The first one I cannot post the source code for, but it
populates the database and then it starts deleting and inserting records. Here I saw around
7% performance decrease with the proposed fix.

I'll attach the source for the other benchmark. It is very simple, and only inserts and deletes
the same record over and over again. The insert and delete is done as one transaction.
I ran with 60 s warmup and 300 s steady-state, ten runs per configuration. Unless stated otherwise,
all settings are default. I just rounded the final numbers.
Explanation for the "bad" runs follows further down.

--- single column key
clean, auto on: 637 tps
d5367, auto on: 557 tps
==> -12,5%

clean, auto off: 3560 tps (5 "bad" runs)
d5367, auto off: 2874 tps (3 "bad" runs)
==> -19.5%
(commit interval 5k)

--- composite key, three columns
clean, auto on: 551 tps
d5367, auto on: 382 tps
==> -30%

clean, auto off: 2799 tps (1 "bad" runs)
d5367, auto off: 1671 tps (0 "bad" runs)
==> -40%

--- max checkpoint interval and log switch interval
clean, auto off: 6272 tps (3 "bad" runs)
d5367, auto off: 4908 tps (0 "bad" runs)
==> -21,5%

I tested the checkpoint interval and log switch settings because I saw runs with huge performance
drops with the default settings. I wasn't able to get rid of these entirely. What I saw was
runs with auto-commit off whose performance dropped roughly to the performance of the runs
with auto-commit on. I haven't investigated further, but I'd like to know if the drops are
limited duration events, or if the performance drop is permanent.
The numbers above are for worst case scenarios.

I'm currently testing an alternative patch to see how performance is affected when we optimize
to avoid updating all fields when there are no collated types in the conglomerate. This has
the drawback that concerned Mike - there will be multiple code paths and the new code executed
for collated types only will be a lot less tested.
This can be optimized further for composite keys, but that part is not high priority for me
right now.

> Stale data retrieved when using new collation=TERRITORY_BASED:PRIMARY feature
> -----------------------------------------------------------------------------
>                 Key: DERBY-5367
>                 URL: https://issues.apache.org/jira/browse/DERBY-5367
>             Project: Derby
>          Issue Type: Bug
>          Components: Store
>    Affects Versions:,
>         Environment: Mac OS X, Windows
>            Reporter: Brett Wooldridge
>            Assignee: Kristian Waagan
>            Priority: Critical
>         Attachments: derby-5367-1a-update_row_fully.diff, derby-5367-1b-update_row_fully.diff,
derby-5367-1b-update_row_fully.stat, derby-5367-2a-minimal_fix.diff, derby-5367-3a-update_field_by_field_preview.diff
> Our product recently upgraded to version in order to take advantage of the new
'case-insensitive' mode offered by Derby in the form of the "collation=TERRITORY_BASED:PRIMARY"
connection parameter.
> Unfortunately, we have run into an issue whereby stale data appears to be retrieved from
an index, even though the data in the table itself has changed.
> You can see this issue in the IJ session below.  The database in question was created
using this Java parameter when invoking IJ:
> -Dij.database=jdbc:derby:test;create=true;collation=TERRITORY_BASED:PRIMARY
> Here is the IJ session:
> CONNECTION0* - 	jdbc:derby:test
> * = current connection
> ij> CREATE TABLE tag (
>     tag VARCHAR(255) NOT NULL,
>     CONSTRAINT tag_pk PRIMARY KEY (tag_id),
>     CONSTRAINT tag_tag_unique UNIQUE (tag)
> );
> 0 rows inserted/updated/deleted
> ij> -- first insert a value 'Test', note the upper-case 'T' in 'Test'
> ij> INSERT INTO tag (tag) VALUES ('Test');
> 1 row inserted/updated/deleted
> ij> SELECT * FROM tag;
> TAG_ID     |TAG                                                                     
> --------------------------------------------------------------------------------------------------------------------------------------------
> 1          |Test                                                                    
> 1 row selected
> ij> -- Now delete the row
> ij> DELETE FROM tag WHERE tag='Test';
> 1 row inserted/updated/deleted
> ij> -- You could run another SELECT here to verify it is gone, but it is.
> ij> -- Now insert a new value 'test', note the lower-case 't' in 'test'
> ij> INSERT INTO tag (tag) VALUES ('test');
> 1 row inserted/updated/deleted
> ij> -- Now verify that the table contains only the lower-case version: 'test'
> ij> SELECT * FROM tag;
> TAG_ID     |TAG                                                                     
> --------------------------------------------------------------------------------------------------------------------------------------------
> 2          |test                                                                    
> 1 row selected
> ij> -- Now, here is the bug.
> ij> SELECT tag FROM tag;
> TAG                                                                                 
> --------------------------------------------------------------------------------------------------------------------------------
> Test                                                                                
> 1 row selected
> ij> 
> Note in the last SELECT we specify the 'tag' column specifically.  When we 'SELECT *',
Derby performs a table-scan and the result is correct.  However, when we 'SELECT tag', Derby
appears to use the index created for the 'tag_tag_unique' unique constraint.  As an optimization
Derby, like many databases, will use values directly from the index in the case where the
index covers all requested columns.
> The bigger question is, why doesn't the DELETE action cause the entry in the tag_tag_unique
index to be deleted?  Is this a further optimization?  If so, it is imperative that the index
at least be updated when the new value is inserted.
> This is rather a severe bug for us that causes stale data to be returned.

This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message