db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mike Matrigali (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DERBY-532) Support deferrable constraints
Date Wed, 06 Nov 2013 18:07:18 GMT

    [ https://issues.apache.org/jira/browse/DERBY-532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13815082#comment-13815082
] 

Mike Matrigali commented on DERBY-532:
--------------------------------------

Thanks Dag for the reply.  Below is hopefully a clearer explanation
of how I think it should work.

I think we should implement all deferrable unique constraints using
existing unmodified non-unique btree implementation.   At least as the
first incremental approach I believe this should function correctly and
work with your design.

Constraint checking architecturally seems like something that should happen
at the SQL layer.  I know the system does existing uniqueness checks in store
code, but as we have seen as this constraint checking gets more convoluted at
least to me it seems best managed closer to the SQL level that understands
the datatypes, the nullability rules, and in this case the level that
understands the timing on when that checking is meant to be done.  It looks
like Oracle implements their deferred unique checking using a non-unique
index also.

Using a pure duplicate index matches the reality of deffered unique
constraint.  At any time when the constraint is deferred there may
be duplicate keys seen by the inserting transaction during query
execution.  So optimizer and execution should not assume index is unique.
Having no special casing in the btree I think will lead to less bugs
overall, as it should force all code to treat the index as it really is: ie.
a non-unique index.

o constraint definition time:
  o create a pure duplicate allowing index matching the constraint.  It is
    important that the catalog for this that is used by the optimizer and
    execution knows that it is non-unique, even though it will be used to
    implement unique constraints.

o sql layer insert/update time deferable constraint (constraint deferred):

  o just go ahead and insert the row into the duplicate allowing index. There
    should never be a unique error thrown.  Do not have any code at this point
    that tries to determine anything about the constraint.

    It sounded like you were seeing unique errors at this point initially,
    which I don't understand.

    At SQL layer add all inserts into the btree into a backing store hash
    table.

    As you mentioned this approach avoids having any sort of shadow table,
    the index is always up to date.  Locking prevents other transactions
    from seeing the possible duplicates, and I assume SQL standard says it
    is ok to see these duplicates from same transaction.
o sql layer insert/update time deferrable constraint (constraint not deferred)

  o after insert run same duplicate check you would have run if it were
    deferred, just after the insert.  If it fails thow error and backout
    from SQL layer.

o sql layer commit time with outstanding constraint checking

  o now simply run through entire list doing constraint checking.  This
    is actually less complicated from a locking perspective since a probe
    into the tree using just the key will always go to the left-most
    matching key and the locking will be guaranteed left to right which
    avoids

  o by using backing store list you won't run out of memory for the to
    be processed constraint list.  This is apparently was a problem in
    one google found posting about postgres.

  o I think this step is one area where once the system is running I would
    suggest a follow on patch to add some sort of
    "reopenScanAndCheckForDuplicate()"
    interface to GenericScanController, if performance looks critical.
    It could package positioning the scan and checking for duplicates in
    one call and minimize latch contention.


For me the benefits of this approach are:
1) move any "extra" locking necessary to verify duplicate in case of deferred
   checking to end of transaction, so less chance of unexpected concurrency
   during middle of long running transaction.

2) Seems simpler to prove correct.  In the currently proposed solution the
   constraint becomes corrupt if for any reason the first check on inserting
   into the duplicate btree misses a possible duplicate.
   There a number of subtle problems with
   checking "left and right" during insert into a non-unique btree for
   duplicates.  The code may be correct but testing is very hard and the
   cases include combinations of all the following:
   o committed deleted rows of other transactions and self transactions
   o non-committed deleted rows of other transactions and self transactions
   o when going left need to lose latch and "restart" all work
   o aborted other transactions after the check
   o concurrent other transactions doing same insert at same time
   o does the existing locking guarantee that the 2nd inserter always does
     the final deferred check, and the 1st one does not have to do a check.
     Should we be forcing that order of check, or is it more concurrent to
     just do the check by whoever is ending transaction first?

3) I think this same architecture should allow for one shared implementation
   to be also used by deferred foreign key constraints.  Would like some
   feedback if this makes sense or not.  I don't know much about existing
   foreign key constraints or how/if we need to do work to make them deferrable.
~


> Support deferrable constraints
> ------------------------------
>
>                 Key: DERBY-532
>                 URL: https://issues.apache.org/jira/browse/DERBY-532
>             Project: Derby
>          Issue Type: Improvement
>          Components: SQL
>            Reporter: Jörg von Frantzius
>            Assignee: Dag H. Wanvik
>              Labels: derby_triage10_11
>         Attachments: deferredConstraints.html, deferredConstraints.html, deferredConstraints.html,
deferredConstraints.html, derby-532-import-1.diff, derby-532-import-1.status, derby-532-import-2.diff,
derby-532-import-3.diff, derby-532-import-3.status, derby-532-more-tests-1.diff, derby-532-more-tests-1.stat,
derby-532-syntax-binding-dict-1.diff, derby-532-syntax-binding-dict-1.status, derby-532-syntax-binding-dict-2.diff,
derby-532-syntax-binding-dict-2.status, derby-532-syntax-binding-dict-all-1.diff, derby-532-testAlterConstraintInvalidation.diff,
derby-532-testAlterConstraintInvalidation.status, derby-532-unique-pk-1.diff, derby-532-unique-pk-1.status,
derby-532-unique-pk-2.diff, derby-532-unique-pk-3.diff, derby-532-unique-pk-3.status, derby-532-xa-1.diff,
derby-532-xa-2.diff, derby-532-xa-3.diff, derby-532-xa-3.status
>
>
> In many situations it is desirable to have constraints checking taking place only at
transaction commit time, and not before. If e.g. there is a chain of foreign key constraints
between tables, insert statements have to be ordered to avoid constraint violations. If foreign
key references are circular, the DML has to be split into insert statements and subsequent
update statements by the user.
> In other words, with deferred constraints checking, life is much easier for the user.
Also it can create problems with softwares such as object-relational mapping tools that are
not prepared for statement ordering and thus depend on deferred constraints checking.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message