db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kristian Waagan <kristian.waa...@oracle.com>
Subject Re: Transaction contexts and log flushing
Date Fri, 06 Aug 2010 10:24:03 GMT
On 03.08.10 23:05, Mike Matrigali wrote:
> Kristian Waagan wrote:
>>  On 03.08.2010 19:28, Mike Matrigali wrote:
>> Hi Mike,
>> Thank you for the feedback. See my comments below, especially the one 
>> regarding flushLogOnCommit.
>>>> >>
>>> Kristian Waagan wrote:
>>>> Hi,
>>>> When working on an experiment for automatic index statistics 
>>>> (re)generation, I was exposed to the Derby transaction API.
>>>> Dan filed an issue [1] suggesting to clean up this API, and I can 
>>>> give my +1 to that :) In places the comments and actual usage 
>>>> aren't in sync, and missing functionality of the lcc 
>>>> (LanguageConnectionContext) can be obtained by working directly on 
>>>> the tc (TransactionController). One such example is nested 
>>>> read-write user transactions, which doesn't seem to be supported 
>>>> through the lcc (although the lcc API suggests so), but is used in 
>>>> some places by working with the tc.
>>>> I tried to use a nested read-write user transaction to write index 
>>>> statistics to the data dictionary, and was surprised to find that 
>>>> the changes were lost even though I committed the transaction (they 
>>>> survive if I do a proper shutdown). Turns out Derby uses the 
>>>> concept of transaction contexts, and the following are defined in 
>>>> XactFactory:
>>>> Now, the XactFactory also has this method:
>>>>     /**
>>>>         Decide if a transaction of this contextId needs to flush 
>>>> the log when
>>>>         it commits
>>>>     */
>>>>     public boolean flushLogOnCommit(String contextName)
>>>>     {
>>>>         //
>>>>         // if this is a user transaction, flush the log
>>>>         // if this is an internal or nested top transaction, do not
>>>>         // flush, let it age out.
>>>>         // return (contextName == USER_CONTEXT_ID ||
>>>>                 contextName.equals(USER_CONTEXT_ID));
>>>>     }
>>>> Most of this code is rather old, so I haven't found much history. 
>>>> My questions:
>>>>  1) Is using a nested read-write user transaction simply wrong in 
>>>> this case?
>>>>     (nested because I want to release the locks on the data 
>>>> dictionary as soon as possible)  There is the problem with locks not
>>> being compatible - see below.  I think this is what mamta kept running
>>> into in what she tried.  usually it is likely not to be a problem but
>>> if user happens to have accumulated some system catalog locks then
>>> there are issues (and this case I think comes up in a bunch of our
>>> tests).  I also see an issue with some user with a big table 
>>> complaining
>>> when his simple query takes a long time waiting on the stats during
>>> compile.
>> Although the current prototype code is very crude, here's a brief 
>> description:
>>  o work is queued by a thread compiling a statement. After queuing  
>> the thread continues its normal work; compiling the statement and 
>> potentially executing it.
> need to catch the case and somehow stop multiple threads from all 
> queuing the same statistics work.

This is currently done by setting a (non-persisted) flag in the 
descriptor. If/how this is done in the final implementation depends on 
whether we look at a single index as a unit of work, or a table (doing 
all indexes).
In addition, the daemon looks for duplicate units of work (a short queue 
length is enforced).

>>  o generating the statistics is done by a separate thread created 
>> on-demand (in it's own transaction);
> was creating the context for this hard?  I believe mamta was having
> problems with things like user password, encryption values, roles, ...

I may have simplified this too much, at least it didn't cause any 
problems for me so far. Note that I'm not creating a lcc, which is why I 
chose to write the results back to the data dictionary using the user 
We can touch on this again when the code is posted.

>>     - if there is no thread, one is created
>>     - if there is more work when the  thread finishes the current 
>> unit of work, it will continue with the next item in the queue
>>     - if there is no more work, the thread dies
>>  o work is scheduled based on tables; all indexes are regenerated, 
>> not individual ones
>>  o writing the stats to the system tables is done by a user thread 
>> compiling a statement
> how does this last part work (the writing to the system tables), if 
> the work stat generation work is async.

The new stats are stored in the daemon. There is logic in 
GenericStatement (at the very end of prepMinion) to go to the daemon to 
fetch any new stats if they exists. Although all new stats will be 
written at once, the decision on whether to go to the daemon to ask for 
new stats is taken based on the statistics needed by the query being 

Another issue here is whether doing these checks only at compile time is 
good enough. May the system reach a steady state where the stats are 
becoming outdated and no queries are (re)compiled?

>> In the current prototype, there has to be a mechanism for a user 
>> thread to detect that there are new statistics to be written. One 
>> issue right now, is that this happens at a time where the statement 
>> is already optimized (thus loosing out on the new statistics). I'm 
>> not sure if this is a problem or not, the new stats will be picked up 
>> the next time the statement is compiled.
>>> Ultimately the solution I think would work best is some generic way
>>> to queue background work from the language layer, similar to what can
>>> be done from the storage layer to the daemon thread.  This would avoid
>>> making a query wait while a full scan of the entire table is done
>>> to rebuild statistics.  A separate thread avoids a lot of the deadlock
>>> issues of a nested user thread.
>> Yes, this is what I have tried to do, although the "background 
>> daemon" is very specific.
>>> The issues to solve would be:
>>> o how to make sure only correct permissions are enforced on the 
>>> background work.  I think it would be best at least in first 
>>> implementation if it was only possible for internal systems to queue
>>> to this background runner.
>> Ignored for now, the worker can only generate statistics and the work 
>> is queued from within GenericStatement.
>>> o should the work survive a shutdown?  It would be simple enough to
>>>   track the work in a table, but is it worth it.
>> Do you mean the work queue?
>> If so, I feel that it isn't necessary, as work is queued as 
>> determined by the logic "detecting" stale/missing stats (lots of work 
>> to do here I think).
>> Maybe saving intermediate results can be useful for huge tables, but 
>> then we have to handle the issue of stale intermediate results too...
>>> o I don't think we should add another thread always to handle this
>>>   background work as we already get complaints about the existing
>>>   1 thread per db.  Best would be some system that could add the
>>>   and delete the thread as needed - maybe even add more than one 
>>> thread if it could determine it was appropriate on the hardware - or 
>>> maybe based on some configuration param.
>> This is what is done in the prototype, but I haven't looked into 
>> configuration. One concern is that index regeneration will poison the 
>> page cache, but avoiding this is probably a big thing. I think we may 
>> also have to tune how much resources (CPU, IO) are used for this 
>> background activity.
> yes, this is the challenging part, but I believe it can be broken down
> into manageable chunks.  If you can get the queueing stuff to work, I 
> think that is a big step.
> A reasonable bit of work might be to look at the stat generation part 
> again.  Logically it is not really necesary to rebuild the indexes
> to get the new stats, it just works that way now.  I would imagine 
> code to do this would be about a weeks work at most, if the existing 
> logic is used.  The existing logic counts on piping all the rows 
> through the sorter and then getting called back for each row in order, 
> new code could just scan the index in order instead.  All the stats 
> per index can be generated by code doing a single scan of an existing 
> index.

I can't verify this right now but I think we are only scanning the 
indexes, not rebuilding them. The code being run is the code currently 
living in AlterTableConstantAction (I duplicated it and split it up in 
the prototype).

> Rebuilding the index does have the added benefit of reclaiming all 
> non-used space in the index.

Yes, I think this problem must be tackled as well (possibly as a 
separate task). It is not clear to me when we hit this issue, but I 
guess it involves a load where you insert and delete a lot of rows? Does 
the order of the keys matter?

>>>>  2) Is the method flushLogOnCommit doing the right thing?
>>> I believe flushLogOnCommit is doing the right thing for the current 
>>> usage of nested user transactions.  The issue is one of performance. In
>>> the current usage these transactions are only used for internal things
>>> where it is ok for either the transaction work to be backed out or 
>>> committed, even if the internal transaction commits.  All the work
>>> based on the internal transaction is logged after the internal 
>>> transaction.  So if this transaction is lost then it must be that all
>>> subsequent work is also lost.
>> Well, the updates to the data dictionary are lost even though I have 
>> committed the parent transaction. They survive a shutdown if I do a 
>> proper shutdown. I cannot explain this, maybe I have a severe bug in 
>> the prototype (I did pretty much copy the code from InsertResultSet 
>> though).
> This seems wrong.  If you commit the nested user transaction and do a 
> subsequent real commit of the parent transaction there should be a 
> force to the log at user commit time.  Is it likely your user 
> transaction never does any writes separate from what is done in the 
> nested user transaction - this may be an edge case that never happens 
> today, maybe
> the parent transaction is read only and does not know that it needs to
> force log on commit in this case - I am not sure.  The fact that it 
> sounds easy for you to repro this leads me to think there is a bug there
> somewhere, because it works when

Parts of your comment got lost, but I think you are spot on with your 
theory. If I add a write operation to the user transaction, the work of 
the nested read-write user transaction is also flushed to the log.
This may be a problem if the final implementation uses the same approach 
as the current prototype.
Do you know if this is a real bug in the transaction handling code, or 
just a new way of using the transaction API(s)?

> If I were debugging this I would dump the transaction log
> in the 2 cases and see if anything jumps out different.  If you need
> the properties for this let me know.

That would be great :)

>>> What it is doing is avoiding a synchonous write on the log for each 
>>> internal transaction.  This not only benefits by avoiding the write, 
>>> but
>>> it is likely that it will increase the "group" of log records that will
>>> be served by the subsequent real user transaction commit.  I believe 
>>> the
>>> usual usage for this read/write transaction is the update of the 
>>> "block"
>>> of numbers used for generated keys.  So the expectation is that usually
>>> the parent user transaction will commit soon.
>> Okay, so the nested transaction implementation is pretty much tailed 
>> to fit work related to identity columns?
>> I guess all contexts except USER_CONTEXT_ID are considered internal.
> Historically the "real" internal transactions came first - and these are
> used heavily by the raw store for things like btree splits and various
> reclaim space operations.  These really would make the system suffer if
> we did hard sync on commit, and they are all generated by user 
> transactions that are doing real updates so a subsequent commit in the 
> log is very likely.
> Next came read only user nested transactions and they are used for 
> compile time read only locking so that we can give up locks mid user 
> transaction.  Since they are read-only log commit behaviour does not 
> matter.
> And then read-write nested user transactions came and are used for
> identity column system cat update.  their locks are not compatible so
> they do cause problems, so if you can avoid them it is best.  The code
> expects them to be used to nest work, commit it, and then return to
> main user thread.  Using it in other ways may not work - I don't know.
> If you new code really needs nested read/write transactions it may be
> reasonable to do real force on these types of internal transactions, but
> we should not do it on ntt and raw store internal transactions.

Okay, we'll have to revisit this. Right now things work okay, but I'm 
positive I haven't tested the more difficult cases yet...

The prototype runs suites.All with 4 failures, all asserts in 
OrderByAndSortAvoidanceTest. Looks like the plan has changed, but I have 
not yet verified if the change is valid or not.
I'll continue with derbyall, then do a little more work on the code 
before I commit (for instance I started changing some methods to allow 
using NO_WAIT in the data dictionary, can't use setNoLockWait on nested 
tx, don't know if this is something I want to do or not).


>> The prototype uses a nested transaction slightly different - it would 
>> be best if the work done by the nested transaction would be synced. 
>> No real harm is done by loosing the updates though, Derby just have 
>> to do the work again (may affect performance).
>>>> I haven't checked yet, but it is also important to know if the 
>>>> update locks of the nested user transaction is incompatible with 
>>>> the parent user transaction (to avoid deadlock when using NO_WAIT).
>>> The locks are not compatible.  See following documentation in
>>> TransactionController.java!getNestedUserTransaction()
>>>    * <p>
>>>    * The locks in the child transaction of a readOnly nested user 
>>> transaction
>>>    * will be compatible with the locks of the parent transaction.  The
>>>    * locks in the child transaction of a non-readOnly nested user 
>>> transaction
>>>    * will NOT be compatible with those of the parent transaction - 
>>> this is
>>>    * necessary for correct recovery behavior.
>>>    * <p>
>> Thanks.

View raw message