db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Matrigali <mikem_...@sbcglobal.net>
Subject Re: Transaction contexts and log flushing
Date Tue, 03 Aug 2010 17:28:54 GMT

Kristian Waagan wrote:
> Hi,
> When working on an experiment for automatic index statistics 
> (re)generation, I was exposed to the Derby transaction API.
> Dan filed an issue [1] suggesting to clean up this API, and I can give 
> my +1 to that :) In places the comments and actual usage aren't in sync, 
> and missing functionality of the lcc (LanguageConnectionContext) can be 
> obtained by working directly on the tc (TransactionController). One such 
> example is nested read-write user transactions, which doesn't seem to be 
> supported through the lcc (although the lcc API suggests so), but is 
> used in some places by working with the tc.
> I tried to use a nested read-write user transaction to write index 
> statistics to the data dictionary, and was surprised to find that the 
> changes were lost even though I committed the transaction (they survive 
> if I do a proper shutdown). Turns out Derby uses the concept of 
> transaction contexts, and the following are defined in XactFactory:
> Now, the XactFactory also has this method:
>     /**
>         Decide if a transaction of this contextId needs to flush the log 
> when
>         it commits
>     */
>     public boolean flushLogOnCommit(String contextName)
>     {
>         //
>         // if this is a user transaction, flush the log
>         // if this is an internal or nested top transaction, do not
>         // flush, let it age out.
>         //
>         return (contextName == USER_CONTEXT_ID ||
>                 contextName.equals(USER_CONTEXT_ID));
>     }
> Most of this code is rather old, so I haven't found much history. My 
> questions:
>  1) Is using a nested read-write user transaction simply wrong in this 
> case?
>     (nested because I want to release the locks on the data dictionary 
> as soon as possible)  There is the problem with locks not
being compatible - see below.  I think this is what mamta kept running
into in what she tried.  usually it is likely not to be a problem but
if user happens to have accumulated some system catalog locks then
there are issues (and this case I think comes up in a bunch of our
tests).  I also see an issue with some user with a big table complaining
when his simple query takes a long time waiting on the stats during

Ultimately the solution I think would work best is some generic way
to queue background work from the language layer, similar to what can
be done from the storage layer to the daemon thread.  This would avoid
making a query wait while a full scan of the entire table is done
to rebuild statistics.  A separate thread avoids a lot of the deadlock
issues of a nested user thread.

The issues to solve would be:
o how to make sure only correct permissions are enforced on the 
background work.  I think it would be best at least in first 
implementation if it was only possible for internal systems to queue
to this background runner.
o should the work survive a shutdown?  It would be simple enough to
   track the work in a table, but is it worth it.
o I don't think we should add another thread always to handle this
   background work as we already get complaints about the existing
   1 thread per db.  Best would be some system that could add the
   and delete the thread as needed - maybe even add more than one thread 
if it could determine it was appropriate on the hardware - or maybe 
based on some configuration param.

>  2) Is the method flushLogOnCommit doing the right thing?
I believe flushLogOnCommit is doing the right thing for the current 
usage of nested user transactions.  The issue is one of performance. In
the current usage these transactions are only used for internal things
where it is ok for either the transaction work to be backed out or 
committed, even if the internal transaction commits.  All the work
based on the internal transaction is logged after the internal 
transaction.  So if this transaction is lost then it must be that all
subsequent work is also lost.

What it is doing is avoiding a synchonous write on the log for each 
internal transaction.  This not only benefits by avoiding the write, but
it is likely that it will increase the "group" of log records that will
be served by the subsequent real user transaction commit.  I believe the
usual usage for this read/write transaction is the update of the "block"
of numbers used for generated keys.  So the expectation is that usually
the parent user transaction will commit soon.
> I haven't checked yet, but it is also important to know if the update 
> locks of the nested user transaction is incompatible with the parent 
> user transaction (to avoid deadlock when using NO_WAIT).
The locks are not compatible.  See following documentation in

    * <p>
    * The locks in the child transaction of a readOnly nested user 
    * will be compatible with the locks of the parent transaction.  The
    * locks in the child transaction of a non-readOnly nested user 
    * will NOT be compatible with those of the parent transaction - this is
    * necessary for correct recovery behavior.
    * <p>
> And thanks to Mamta for the writeup regarding the index stats issue :)
> At the moment I'm trying to implement a prototype for a first step of a 
> hybrid solution, where the statistics generation is done in a separate 
> thread. The generation is initialized from the user thread when 
> compiling a statement, and writing new stats back is also done in a user 
> thread. There are several issues to resolve, but I'll see how far I get 
> before abandoning the approach (will attach code/comments to the 
> appropriate JIRA later).
> Thanks,

View raw message