db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Brett Wooldridge (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (DERBY-4279) Statement cache deadlock
Date Wed, 27 Jun 2012 07:35:44 GMT

     [ https://issues.apache.org/jira/browse/DERBY-4279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Brett Wooldridge updated DERBY-4279:
------------------------------------

    Attachment: patch4279_2.txt

Okay, here is my second cut a fix...

First of all, the original report surmised that any DML, not just SELECT, could cause this
deadlock.  I myself thought this as well, but it turns out not to be the case.

There are two places in the statement compilation code where openConglomerate() is called.
  All compiled statement nodes ultimately inherit from StatementNode.  In StatementNode.lockTableForCompilation()
you will find this logic:

if (dataDictionary.getCacheMode() == DataDictionary.DDL_MODE)   <=== Important
{
    ...
    heapCC = tc.openConglomerate(td.getHeapConglomerateId(),
        false,
        TransactionController.OPENMODE_FORUPDATE |
        TransactionController.OPENMODE_FOR_LOCK_ONLY,
        TransactionController.MODE_RECORD,
        TransactionController.ISOLATION_SERIALIZABLE);   <=== Important
    ...
}

The method is preceded with this comment:

	/* We need to get some kind of table lock (IX here) at the beginning of
	 * compilation of DMLModStatementNode and DDLStatementNode, to prevent the
	 * interference of insert/update/delete/DDL compilation and DDL execution,
	 * see beetle 3976, 4343, and $WS/language/SolutionsToConcurrencyIssues.txt
	 */

However, ultimately, according to the logic the lock is only obtained if the DataDictionary
is in DDL_MODE.  This is all fine.

Now to the heart of the issue.  The second place where openConglomerate() is called is in
ResultColumnList.generateHolderMethod().  This is used for SELECT statements for columns that
will ultimately appear in the ResultSet (or sub-SELECTs in nested queries).  In ResultColumnList.generateHolderMethod()
you will find this logic:

cc = getLanguageConnectionContext().getTransactionCompile().openConglomerate(
        conglomerateId,
        false,
        0,
        TransactionController.MODE_RECORD,
        TransactionController.ISOLATION_READ_COMMITTED);   <=== Important

Notice no conditional logic.  In other words, a lock on the conglomerate is unconditionally
obtained.  It is this lock that contends with a table lock that may have been obtained elsewhere
(possibly DDL, possibly an explicit lock).  The attached patch changes this code to the following:

LanguageConnectionContext lcc = getLanguageConnectionContext();
DataDictionary dd = lcc.getDataDictionary();
int isolationLevel = (dd.getCacheMode() == DataDictionary.DDL_MODE) ?
        TransactionController.ISOLATION_READ_COMMITTED : TransactionController.ISOLATION_NOLOCK;
cc = lcc.getTransactionCompile().openConglomerate(
        conglomerateId,
        false,
        0,
        TransactionController.MODE_RECORD,
        isolationLevel);

Basically, it too checks the DataDictionary to see if it is in DDL_MODE.  If it is, it maintains
the same locking level as the existing code.  However, if the DataDictionary is not being
modified the code will not obtain a lock.

What is the impact?  The original code has the effect of blocking DDL from execution by locking
the table(s) during this compile phase.  And in turn, if DDL has locked the table(s), the
compilation will be blocked.  The first I take to be an unintended side-effect (as you'll
see).

Let's take the second case first.  If DDL has occurred (and locked tables), the DataDictionary
will be in DDL_MODE, and the new code behaves the same as the old code; the conglomerate is
opened with READ_COMMITTED isolation and the compilation will be blocked just as before.

Now the first case.  If compilation of a statement (that involves ResultColumnList) has started
and no DDL is in-progress at that instant, no locks are obtained (because the cache mode is
not DDL_MODE).  This is the same as other statement types [that extend from StatementNode].
 If DDL occurs after compilation has started, it will not be blocked (but it was never blocked
in all other cases).  Because it was never blocked in other cases (due to the conditional
logic), I take this to be a side-effect.  However...

If DDL does happen to alter the table(s) involved there is already code to handle it.  All
roads to ResultColumnList.generateHolderMethod() lead back to GenericStatement.prepare().
 GenericStatement.prepare() has this logic fragment:

try {
    return prepMinion(lcc, true, (Object[]) null, (SchemaDescriptor) null, forMetaData);
} catch (StandardException se) {
    // There is a chance that we didn't see the invalidation
    // request from a DDL operation in another thread because
    // the statement wasn't registered as a dependent until
    // after the invalidation had been completed. Assume that's
    // what has happened if we see a conglomerate does not exist
    // error, and force a retry even if the statement hasn't been
    // invalidated.
    if (SQLState.STORE_CONGLOMERATE_DOES_NOT_EXIST.equals(se.getMessageId())) {
        ... recompile ...
    }
}

So, even though the compile of a SELECT will no longer block DDL from occurring (remember
UPDATE, DELETE, and all other DML never blocked it anyway) there is code in-place to handle
the possibility that openConglomerate() in StatementNode or ResultColumnList might fail. 
In that case, the compile is given one more shot before failing (recompile = true).

It is debatable now that the logic in ResultColumnList is the same as StatementNode, whether
the isolation in ResultColumnList should also be ISOLATION_SERIALIZABLE like StatementNode
given that it is now also conditional based on DataDictionary.getCacheMode() == DDL_MODE.
 However, my approach was to be least impacting in terms of deviation from current known working
code.

On a final note, this patch obviously fixed the test case attached to this issue, however
past attempts to fix this bug did the same but ultimately failed in some of the long running
stress tests.  It would be helpful if this patch could be applied and let run through the
usual full barrage of tests.  That said, the previous patch attempted to fix the issue at
the statement cache level with all of it's hairy synchronization.  This patch is considerably
simpler both conceptually and implementation-wise.

                
> Statement cache deadlock
> ------------------------
>
>                 Key: DERBY-4279
>                 URL: https://issues.apache.org/jira/browse/DERBY-4279
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.0.2.1, 10.1.3.1, 10.2.2.0, 10.3.3.0, 10.4.2.0, 10.5.1.1, 10.8.1.2
>         Environment: Windows Vista, OS X 10.5+
>            Reporter: Jeff Stuckman
>              Labels: derby_triage10_5_2
>         Attachments: Derby4279.java, client_stacktrace_activation_closed.txt, patch4279.txt,
patch4279_2.txt, stacktrace.txt
>
>
> Due to a design flaw in the statement cache, a deadlock can occur if a prepared statement
becomes out-of-date.
> I will illustrate this with the following example:
> The application is using the embedded Derby driver. The application has two threads,
and each thread uses its own connection.
> There is a table named MYTABLE with column MYCOLUMN.
> 1. A thread prepares and executes the query SELECT MYCOLUMN FROM MYTABLE. The prepared
statement is stored in the statement cache (see org.apache.derby.impl.sql.GenericStatement
for this logic)
> 2. After some time, the prepared statement becomes invalid or out-of-date for some reason
(see org.apache.derby.impl.sql.GenericPreparedStatement)
> 3. Thread 1 begins a transaction and executes LOCK TABLE MYTABLE IN EXCLUSIVE MODE
> 4. Thread 2 begins a transaction and executes SELECT MYCOLUMN FROM MYTABLE. The statement
is in the statement cache but it is out-of-date. The thread begins to recompile the statement.
To compile the statement, the thread needs a shared lock on MYTABLE. Thread 1 already has
an exclusive lock on MYTABLE. Thread 2 waits.
> 5. Thread 1 executes SELECT MYCOLUMN FROM MYTABLE. The statement is in the statement
cache but it is being compiled. Thread 1 waits on the statement's monitor.
> 6. We have a deadlock. Derby eventually detects a lock timeout, but the error message
is not descriptive. The stacks at the time of the deadlock are:
> This deadlock is unique because it can still occur in a properly designed database. You
are only safe if all of your transactions are very simple and cannot be interleaved in a sequence
that causes the deadlock, or if your particular statements do not require a table lock to
compile. (For the sake of simplicity, I used LOCK TABLE in my example, but any UPDATE statement
would fit.)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message