db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Knut Anders Hatlen (JIRA)" <j...@apache.org>
Subject [jira] Updated: (DERBY-4348) Copy table content with "INSERT INTO table SELECT FROM (...)" statement leads to corrupt data
Date Tue, 25 Aug 2009 11:25:59 GMT

     [ https://issues.apache.org/jira/browse/DERBY-4348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Knut Anders Hatlen updated DERBY-4348:
--------------------------------------

    Attachment: derby-4348-1a.stat
                derby-4348-1a.diff

Here's a patch (derby-4348-1a.diff) that adds a regression test case
and fixes the problem.

It turns out that there in fact is a problem with the special case for
LONG VARCHAR and LONG VARBINARY when performing normalization of the
values. Normally, DataTypeDescriptor.normalize() normalizes a
DataValueDescriptor by copying it into another DataValueDescriptor and
returning the copy. This destination DVD is cached and reused so that
one doesn't need to reallocate it for every value to normalize.

The special case for LONG VARCHAR and LONG VARBINARY changes this
slightly by returning the source DVD instead of the destination DVD,
apparently to avoid problems with shared streams.

Now, NormalizeResultSet has an ExecRow field, called normalizedRow, in
which the cached destination DVDs are stored. It is reused so that
NormalizeResultSet.getNextRowCore() returns the exact same instance
for every row. But since DataTypeDescriptor.normalize() returns the
source DVD instead of the copy for LONG VARCHAR, the cached ExecRow
will contain the original DVD and not the copy. When the next row is
requested from the NormalizeResultSet, it will therefore use the
source DVD for the previous row as the destination DVD for the call to
normalize().

Copying a column from the current row to the previous row is not a
problem for most of the rows, as the previous row has already been
processed. However, when processing the first row in a new chunk
returned from BulkTableScanResultSet, the DVDs in the previous row
have also been reused in the fetch buffer to hold the last row in the
chunk. Since that row has not yet been processed, copying into it from
the current row will affect what we see when we get to it later.

The problem here is that NormalizeResultSet.normalizedRow serves two
purposes: (1) Hold an ExecRow object that can be reused, and (2) hold
one DataValueDescriptor per column that can be reused. This works fine
as long as the actual DVD references in the ExecRow are not changed,
but when one of the values is a LONG VARCHAR/LONG VARBINARY the
references are changed.

The patch addresses the problem by having a separate data structure
for each of the two purposes. NormalizeResultSet.normalizedRow
continues to cache the ExecRow object for reuse. A new field
(cachedDestinations[]) is added to hold each individual
DataValueDescriptor that should be reused. This way, changing the DVD
references in normalizedRow does not change which destination DVD is
used when processing the next row, and we don't end up modifying a DVD
which is also present later in the fetch buffer of the bulk scan.

Description of changes:

* NormalizeResultSet.java:

- new field cachedDestinations which takes over some of the
  responsibility from normalizedRow

- new helper methods getCachedDestination() and getDesiredType() to
  reduce the complexity of normalizeRow()

- removed unneeded throws clause from fetchResultTypes() to prevent
  getDesiredType() from having to inherit the unneeded clause

* DataTypeDescriptor.java:

- removed code in normalize() that initializes the cached destination
  if it is null, since this is now handled by
  NormalizeResultSet.getCachedDestination()

* InsertTest.java:

- new JUnit test which exposes the bug


The regression tests ran cleanly with this patch.

> Copy table content with "INSERT INTO table SELECT FROM (...)" statement leads to corrupt
data
> ---------------------------------------------------------------------------------------------
>
>                 Key: DERBY-4348
>                 URL: https://issues.apache.org/jira/browse/DERBY-4348
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.0.2.1, 10.1.1.0, 10.2.1.6, 10.2.2.0, 10.3.1.4, 10.4.2.0, 10.5.1.1,
10.6.0.0
>         Environment: Derby: embedded driver 10.5.1.1 - (764942); testet with 10.4 and
client driver also
> OS: Windows XP
> SQL statements executed using SQuirrel SQL Client, but behavior is the same with ij
>            Reporter: Stefan Huber
>            Assignee: Knut Anders Hatlen
>            Priority: Critical
>         Attachments: d4348-import.sql, D4348.java, d4348.sql, derby-4348-1a.diff, derby-4348-1a.stat,
out.dat, rmdb.zip
>
>
> I had to change a the primary key of a table and used ddlutils to do so. Ddlutils recreated
the table to perform this task.
> After the schema conversion the row data of the changed table were corrupted. 
> The values of the last table column were filled with values from other rows!
> After performing a few tests I could break down the problem to the SQL statement "INSERT
INTO table SELECT FROM (...)"
> To reprocude the effect do the following:
> 1. unpack attached database 'rmdb.zip'
> 2. connect to the database with embedded driver 
>     User: IGEL
>     Password: test 
> 3. read data of an example row from database
>     select * from stringrangenew where classname = 'x.xserver%.colordepth';
> result is 
>     x.xserver%.colordepth	2	[16] [24] [32]
> 4. now copy the data to the second table (column INSTANCENR has been added to this table)
>     INSERT INTO STRINGRANGENEW_ (CLASSNAME,FIRMWAREID,RVALUE) SELECT CLASSNAME,FIRMWAREID,RVALUE
FROM STRINGRANGENEW;
> 5. select data of example row from second table
>     select * from stringrangenew_ where classname = 'x.xserver%.colordepth';
> result is 
>     x.xserver%.colordepth	2	-1	[CCW][CW][XX]
> -> value of last column is not the same as in orignal table!
> Here some additional information i worked out during my tests:
> * if you change the copy statement to include the additional column INSTANCENR, the copied
data are correct.
>     delete from STRINGRANGENEW_;
>     INSERT INTO STRINGRANGENEW_ (CLASSNAME,FIRMWAREID, INSTANCENR, RVALUE) SELECT CLASSNAME,FIRMWAREID,
-1, RVALUE FROM STRINGRANGENEW;
> * if you select the rows only 'SELECT CLASSNAME,FIRMWAREID,RVALUE FROM STRINGRANGENEW',
the result shows correct data
> Note: 
> The effect is not restricted to this row but also applies to other rows. But it's always
the same rows, that get corrupted.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message