db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel John Debrunner <...@debrunners.com>
Subject Re: [jira] Commented: (DERBY-25) INSERT INTO SELECT DISTINCT ... skips some values for autoincrement column
Date Wed, 01 Dec 2004 14:33:20 GMT
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Shreyas Kaushik wrote:

> Hi Dan,
>
>    Thanks for all the info and suggestions you gave about going around a
> thing like this. Definitely going through the code is the best way to
> understand Derby internals.
>
> I did more digging on the activation plan and figured out the ResultSet
> tree that is formed which propogates the results through them. This is
> how the tree looks.
>
> InsertResultSet -> NormalizeResultSet ->SortResultSet
> ->ProjectRestrictResultSet.
>
> The actual projection of the values happens in the doProjection method
> in the ProjectRestrictResultSet and control returns back to the
> SortResultSet. This in turn calls the MergeInserter(implements
> SortController interface) to actaully eliminate duplicate rows.
>   So for a table that has an indentity column that is generated, value
> is projected for all the rows and duplicate rows are eliminated in the
> SortResultSet, hence the gap in the identity column keys.
>
> So a solution to this bug will invovle determing the timing of the
> projection. When *select distinct* is used a source ResultSet with no
> repetitions needs to be built before the projection is done.
>
> Some sugestions:
>
> i) Now whether doing the above would cause the projection to happen
>      elsewhere.
> ii)  making the doProjection a public method .
> iii) is there a better approach to do this than the above two ?
>
> My suggestion would be to do the projection after the sorter has
> finished eliminating the duplicate rows. ( Still need to work on this
> from an implementation point of view)

You are heading in the right direction I think, but keep in mind I don't
think you want to change how the basic building blocks (ResultSet
implementations) work, just how they are put together. Thus your option
ii) is not the correct approach.

It would be interesting to also look at the query plan/tree for a simple
insert into the table (insert into t values (?,?)) and for the select
distinct by itself (without the insert). Looking at those may give you a
better idea of how the problem could be solved. It may turn out the
correct tree should be something like

InsertResultSet -> NormalizeResultSet -> ProjectRestrictResultSet(1)
- ->SortResultSet->ProjectRestrictResultSet(2).

Where the identity column is handled in ProjectRestrictResultSet(1)
above the SortResultSet, while the lower ProjectRestrictResultSet(2)
handles any projection/restriction from the SELECT statement.

Dan.


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFBrdYwIv0S4qsbfuQRAm3lAKDIjStY4fMlLkgXHWP8ox893TkxZgCcCS8X
LppzOTUzM2fLROupa6hLRNg=
=Lqoz
-----END PGP SIGNATURE-----


Mime
View raw message