db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mamta Satoor" <msat...@gmail.com>
Subject [Derby-655] : getImportedKeys returns duplicate rows in some cases
Date Wed, 24 May 2006 05:47:33 GMT
Hi,

I have researched quite a bit on Derby-655 : getImportedKeys returning
duplicate rows in some cases. Here are my findings and 3 possible solutions
to the problem. After reading the mail, if someone can think of any other
way of solving this problem, then please share it.

First some background information
Derby internally creates backing indexes to enforce unique, primary and
foreign key constraints. Unique and primary key constraints generate unique
indexes whereas foreign key constraint generates a non-unique index. The
information about these indexes are stored in following system tables,
namely, SYS.SYSCONSTRAINTS, SYS.SYSKEYS, SYS.SYSFOREIGNKEYS, and
SYS.SYSCONGLOMERATES.

SYS.SYSCONSTRAINTS keep information that is common among all kinds of
constraints. CONGLOMERATEID is the primary key of SYS.SYSCONSTRAINTS.

Derby tables and indexes are stored in storage units called conglomerates.
And information about them is stored in SYS.SYSCONGLOMERATES. Each
conglomerate has an id and that is kept in column CONGLOMERATEID in
SYS.SYSCONGLOMERATES. So, each of the constraints in SYSCONSTRAINTS will
have a corresponding conglomerate and information about that conglomerate
will be saved in SYSCONGLOMERATES.

The information about constraints in the 2 different tables, SYSCOSTRAINTS
and SYSCONGLOMERATES need to be tied together and that is done using SYSKEYS
or SYSFOREIGNKEYS depending on the kind of constraint.

If the constraint type is primary key or unique key, then the information
specific to primary key or unique key constraint is saved in SYS.SYSKEYS.
SYS.SYSKEYS has only 2 columns, CONSTRAINTID and CONGLOMERATEID. This table
is used to tie SYSCONSTRAINTS and SYSCONGLOMERATES tables together for
primary key and unique key constraints.

If the constraint type is foreign key, then the information specific to
foreign key constraints is kept in SYS.SYSFOREIGNKEYS. Among other columns,
this table has 2 columns, CONSTRAINTID and CONGLOMERATEID. This table is
used to tie SYSCONSTRAINTS and SYSCONGLOMERATES tables together for foreign
key constraints.

When a table with no constraint is created, Derby creates a conglomerate for
that table's heap and an entry is made into SYSCONGLOMERATES for that heap .

When the first constraint is created on the table, say primary key, Derby
creates a conglomerate for the table's primary key backing index and an
entry is made into SYSCONGLOMERATES for that index.

For every conglomerate that gets created for a table (heap or index), Derby
has to do the job of maintaining them as the table data changes. So, higher
the number of conglomerates per table, slower would be Derby's performance.

In order to improve Derby's performance in conglomerate maintenance area, in
Derby 10.0, few changes were made in
org.apache.derby.impl.sql.execute.CreateIndexConstantAction.executeConstantAction,
so that if a new (backing)index was detected to be a duplicate of an
existing index, then rather than creating a new conglomerate for it, the new
index would share the conglomerate that was already created for the existing
duplicate index. So, an entry will be made into SYSCONGLOMERATES for the new
constraint and an entry will be made into SYSCONGLOMERATES for the new
constraint but the CONGLOMERATEID of the new constraint will be same as the
CONGLOMERATEID of the existing duplicate index. Following is what qualifies
an index to be a duplicate index(Note that a user defined index will throw
an error if it is going to cause a duplicate index. Only backing indexes for
constraints are allowed to be duplicates, eg. a user can define a primary
key and a foreign key constraint on the same column of the table and that
will succeed.)
   /* For an index to be considered a duplicate of already existing index,
the
    * following conditions have to be satisfied:
    * 1. the set of columns and their order in the index is the same as that
of an existing index AND
    * 2. the ordering attributes are the same AND
    * 3. both the previously existing index and the one being created are
non-unique OR the previously existing index is unique
    */
These rules for duplicate indexes for constraints mean that for a given
CONGLOMERATEID, there can be duplicate rows in SYSCONGLOMERATES. So, when a
query tries to join SYSCONSTRAINTS, SYSKEYS/SYSFOREIGNKEYS, and
SYSCONGLOMERATES, it should be aware of the duplicate rows in
SYSCONGLOMERATES and should have mechanisms to filter out duplicate rows in
SYSCONGLOMERATES. The metadata query for getImportedKeys in
org.apache.derby.impl.jdbc.metadata.properties in the outermost select makes
a join on SYSCONSTRAINTS, SYSKEYS AND SYSCONGLOMERATES and it was not fixed
to handle the duplicate rows in SYSCONGLOMERATES when the duplicate index
work went into
org.apache.derby.impl.sql.execute.CreateIndexConstantAction.executeConstantActionand
that is causing Derby-655 to retunr duplicate rows in some cases.

To make this more clear with an example, I am going to run through subset of
the sql included in Derby-655
CREATE TABLE t2 (c21_ID BIGINT NOT NULL primary key);
CREATE TABLE t3(c31_ID BIGINT NOT NULL primary key);
ALTER TABLE t2 ADD CONSTRAINT F_443 Foreign Key (c21_ID)
   REFERENCES t3(c31_ID) ON DELETE CASCADE ON UPDATE NO ACTION;

In this example, when primary key constraint is defined on t2(c21_ID), a
conglomerate is created for backing index for primary key on column c21_ID.
A row is inserted into SYSCONSTRAINTS, SYSKEYS, and SYSCONGLOMERATES for
this constraint and it's newly created conglomerate. So far, so good.

When foreign key constraint is defined of t2(c21_ID), notice that it is on
the same column as the existing primary key constraint. Hence,
org.apache.derby.impl.sql.execute.CreateIndexConstantAction.executeConstantActiondecides
that it is a duplicate index. But since it is a backing index, no
error is thrown. For duplicate backing indexes, we decide not to create a
new conglomerate since foreign key backing index in this case can share the
conglomerate for primary key backing index on t2. So, a row is inserted into
SYSCONSTRAINTS, SYSKEYS, and SYSCONGLOMERATES but the row in
SYSCONGLOMERATES has the same CONGLOMERATEID as the row for primary key
backing index.

The query for getImportedKeys has following where clause for the outermost
select statement
   ) AS FKINFO(FK_ID, \
      FKTABLE_SCHEM, \
      FKTABLE_NAME, \
      FKCOLUMN_NAME, \
      KEY_SEQ, \
      FK_NAME, \
      FK_UPDATERULE, \
      FK_DELETERULE), \
   SYS.SYSCONSTRAINTS c PROPERTIES joinStrategy=NESTEDLOOP, index =
'SYSCONSTRAINTS_INDEX1', \
   SYS.SYSTABLES T PROPERTIES joinStrategy=NESTEDLOOP, index =
'SYSTABLES_INDEX2', \
   SYS.SYSSCHEMAS S PROPERTIES joinStrategy=NESTEDLOOP, index =
'SYSSCHEMAS_INDEX2', \
   SYS.SYSKEYS K PROPERTIES joinStrategy=NESTEDLOOP, index =
'SYSKEYS_INDEX1', \
   SYS.SYSCONGLOMERATES CONGLOMS PROPERTIES joinStrategy = NESTEDLOOP, index
= 'SYSCONGLOMERATES_INDEX1', \
   SYS.SYSCOLUMNS COLS PROPERTIES joinStrategy=NESTEDLOOP, index =
'SYSCOLUMNS_INDEX1' \
 WHERE T.TABLEID = C.TABLEID  \
 AND C.CONSTRAINTID = FKINFO.FK_ID  \
 AND FKINFO.KEY_SEQ = CONGLOMS.DESCRIPTOR.getKeyColumnPosition(  \
               COLS.COLUMNNUMBER) \
 AND S.SCHEMAID = T.SCHEMAID \
 AND K.CONSTRAINTID = C.CONSTRAINTID \
 AND (CASE WHEN CONGLOMS.DESCRIPTOR IS NOT NULL THEN \
  CONGLOMS.DESCRIPTOR.getKeyColumnPosition(COLS.COLUMNNUMBER) ELSE 0 END) <>
0  \
 AND K.CONGLOMERATEID = CONGLOMS.CONGLOMERATEID  \
 AND C.TABLEID = COLS.REFERENCEID  \

In the above sql snippet, when the join is made on SYSCONSTRAINTS, SYSKEYS,
and SYSCONGLOMERATES for primary key on table t2, the query finds 2 rows in
SYSCONGLOMERATES with the conglomerateid and it picks both those rows. This
is wrong and we need to somehow return only one row from SYSCONGLOMERATES,
eventhough there might be duplicate rows in there for a given
conglomerateid.

I have thought of following 3 solutions
1)Even if a foreign key's backing index qualifies as a duplicate index, have
foreign key backing index create its own conglomerate rather than share an
existing conglomerate. But this would mean that system would have to
maintain 2 identical conglomerates. This can impact Derby's performance
negatively, although I don't know how badly because it might be rare that a
user creates a primary key and foreign key on identical columns.
2)Write a system function which will return just one row for all the
duplicate conglomerateids for a given constraintid and use that row in the
outermost select statement of metadata query for getImportedKeys. Not sure
how easy this would be to implement but I am leaning towards this solution.
3)Do not create duplicate rows in sysconglomerates for duplicate indexes.
Instead have just one row in sysconglomerate to represent all the duplicate
indexes in sysconstraints. But that seems like more of an enhancement rather
than a bug fix to me. Current Derby code heavily relies on having a row in
SYSCONGLOMERATE for each of the indexes.

Thanks if you have made it this far :) I hope this mail is clear in
explaining the problem and the possible solutions. If anything is not clear,
then let me know and I can provide more information.

If anyone has any ideas about some other possible fix for the problem,
please let me know that too.

thanks,
Mamta

Mime
View raw message