db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Knut Anders Hatlen (JIRA)" <j...@apache.org>
Subject [jira] Updated: (DERBY-3786) Assert failure in CacheEntry.unkeepForRemove when running stress.multi
Date Tue, 22 Jul 2008 15:19:31 GMT

     [ https://issues.apache.org/jira/browse/DERBY-3786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Knut Anders Hatlen updated DERBY-3786:

    Attachment: remove.diff

The code that calls removeStatement() uses a synchronized flag and
wait()+notifyAll() to prevent multiple threads from recompiling the
same statement concurrently. The code synchronizes on the
GenericPreparedStatement instance that is being compiled. What
happens, is basically:

  1. fetch the prepared statement from the statement cache, possibly
  creating a new one if it isn't in the cache

  2. release the statement (which means others can removed it from the

  3. if someone else is recompiling the prepared statement, wait until
  they have finished

  4. recompile the statement and block others until finished

  5. if an error happens during recompile

    a) check if it exists in the cache

    b) if it is in the cache, remove it

If two threads do this concurrently, they'll both get a reference to
the same prepared statement in the cache in step (1). Then one of the
threads is allowed to recompile the statement, whereas the other
thread is blocked in (3). In the repro, the thread that is recompiling
the statement fails with a lock timeout, so it removes the statement
from the cache. The other thread is allowed to continue, and also
fails while recompiling. The statement is no longer in the cache, so
it doesn't try to remove it, and everything is fine.

Now, consider the possibility that a third thread tries to prepare the
same SQL text right after the first thread has removed the prepared
statement from the cache. This means that thread 3 doesn't find the
statement in the cache and therefore creates a new prepared
statement. Since this is another instance of GenericPreparedStatement
than the one thread 1 and thread 2 are working on, thread 3 won't
detect that others are recompiling the same statement.

If the compilation succeeds, it is no problem that two different
GenericPreparedStatement instances are used for the same SQL
text. However, since also the third thread fails while compiling the
statement, and it is not synchronized with the second thread since
they don't have the same prepared statement, thread 2 and thread 3 may
end up calling removeStatement() concurrently. Although their prepared
statements are different, they'll have the same key in the statement
cache, so now thread 2 will find that the statement in fact is in the
cache and try to remove it. Thread 3 also finds it in the cache and
invokes CacheManager.remove() at the same time. This breaks the
contract of CacheManager.remove(), and bad things start happening
(assert failures and hangs).

I think there are two ways to fix this:

  1) In GenericLanguageConnectionContext.removeStatement(), only call
  remove() if the prepared statement in the cache is exactly the same
  as the prepared statement we tried to recompile. The patch
  remove.diff shows this approach. I haven't run the regression tests,
  but it seems to fix this problem (I don't see assert failures or
  hangs with the repro anymore).

  2) It may also be possible to fix it by not releasing the statement
  in step (2). Then it's not possible to remove the statement from the
  cache until it has been recompiled, and all the threads that try to
  recompile it are guaranteed to work on the same
  GenericPreparedStatement instance. Some extra logic would be needed
  to prevent deadlocks, as the call to remove() would block until no
  one holds that statement, and every other thread that is waiting to
  recompile the same statement will be holding it.

> Assert failure in CacheEntry.unkeepForRemove when running stress.multi
> ----------------------------------------------------------------------
>                 Key: DERBY-3786
>                 URL: https://issues.apache.org/jira/browse/DERBY-3786
>             Project: Derby
>          Issue Type: Bug
>          Components: Services
>    Affects Versions:
>         Environment: 32 hardware execution threads
>            Reporter: Kristian Waagan
>            Priority: Minor
>         Attachments: d3786.java, remove.diff
> When running stress.multi on a machine with 32 hardware execution threads, I observed
the following assert failure in two independent runs:
> -----
> 2008-07-18 02:15:14.415 GMT Thread[Thread-8,5,workers] (XID = 94699), (SESSIONID = 16923),
(DATABASE = mydb), (DRDAID = null), Failed Statement is: insert into a values (1)
> org.apache.derby.shared.common.sanity.AssertFailure: ASSERT FAILED
>         at org.apache.derby.shared.common.sanity.SanityManager.ASSERT(SanityManager.java:98)
>         at org.apache.derby.impl.services.cache.CacheEntry.unkeepForRemove(CacheEntry.java:217)
>         at org.apache.derby.impl.services.cache.ConcurrentCache.remove(ConcurrentCache.java:446)
>         at org.apache.derby.impl.sql.conn.GenericLanguageConnectionContext.removeStatement(GenericLanguageConnectionContext.java:898)
>         at org.apache.derby.impl.sql.GenericStatement.prepMinion(GenericStatement.java:516)
>         at org.apache.derby.impl.sql.GenericStatement.prepare(GenericStatement.java:88)
>         at org.apache.derby.impl.sql.conn.GenericLanguageConnectionContext.prepareInternalStatement(GenericLanguageConnectionContext.java:794)
>         at org.apache.derby.impl.jdbc.EmbedStatement.execute(EmbedStatement.java:606)
>         at org.apache.derby.impl.jdbc.EmbedStatement.execute(EmbedStatement.java:555)
>         at org.apache.derby.impl.tools.ij.ij.executeImmediate(ij.java:329)
>         at org.apache.derby.impl.tools.ij.utilMain.doCatch(utilMain.java:508)
>         at org.apache.derby.impl.tools.ij.utilMain.runScriptGuts(utilMain.java:350)
>         at org.apache.derby.impl.tools.ij.utilMain.go(utilMain.java:248)
>         at org.apache.derby.impl.tools.ij.mtTestCase.runMe(mtTestCase.java:246)
>         at org.apache.derby.impl.tools.ij.mtTester.run(mtTester.java:91)
>         at java.lang.Thread.run(Thread.java:619)
> Cleanup action completed
> -----
> In stress.log:
> -----
> Tester8: insert2 Fri Jul 18 04:15:12 CEST 2008
> Tester6: TERMINATING due to unexpected error:
> FatalException: XJ001: Java exception: 'ASSERT FAILED: org.apache.derby.shared.common.sanity.AssertFailure'.
> Tester1: SELECT2 Fri Jul 18 04:15:13 CEST 2008
> Tester8: stopping on request after 820 iterations
> Tester10: stopping on request after 859 iterations
> Tester1: stopping on request after 847 iterations
> Tester7: TERMINATING due to unexpected error:
> FatalException: XJ001: Java exception: 'ASSERT FAILED: org.apache.derby.shared.common.sanity.AssertFailure'.
> Tester9: stopping on request after 722 iterations
> Tester3: stopping on request after 880 iterations
> Tester5: stopping on request after 858 iterations
> Tester4: stopping on request after 839 iterations
> WARNING: testers didn't die willingly, so I'm going to kill 'em.
>         This may result in connection resources that aren't cleaned up
>         (e.g. you may see problems in the final script run with deadlocks).
> -----
> A few runs on a similar but slightly slower machine didn't experience the same failure,
so the bug is likely timing dependent.
> I'll perform some more runs and see how hard it is to reproduce.
> I haven't investigated what will happen in an insane build.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message