db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Knut Anders Hatlen (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (DERBY-5632) Logical deadlock happened when freezing/unfreezing the database
Date Thu, 13 Dec 2012 12:32:14 GMT

     [ https://issues.apache.org/jira/browse/DERBY-5632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Knut Anders Hatlen updated DERBY-5632:

    Attachment: experimental-v1.diff

I think there are two reasons why RAMAccessManager synchronizes on the conglomerate cache
instance whenever it accesses it:

1) Because it manually faults in missing items in the cache, and it needs to ensure that no
others fault it in between its calls to findCached() and create().

2) Because conglomCacheUpdateEntry() implements a create-or-replace operation, which is not
provided by the CacheManager interface, and it needs to ensure no others add an item with
the same key between findCached() and create().

As mentioned in an earlier comment, I think (1) should be solved by implementing CacheableConglomerate.setIdentity(),
so that the cache manager takes care of faulting in the conglomerate.

(2) might be solved by adding a create-or-replace operation to CacheManager interface. However,
I'm not sure it is needed. The conglomCacheUpdateEntry() method is only called once; by RAMTransaction.addColumnToConglomerate().
That method fetches a Conglomerate instance from the cache, modifies it, and reinserts it
into the cache. The instance that's reinserted into the cache is the exact same instance that
was fetched from the cache, so the call to conglomCacheUpdateEntry() doesn't really update
the conglomerate cache, it just replaces an existing entry with itself.

It looks to me as if the conglomCacheUpdateEntry() can be removed, and that will take care
of (2).

I created an experimental patch, attached as experimental-v1.diff. It removes conglomCacheUpdateEntry()
as suggested. It also makes CacheableConglomerate implement setIdentity() so that conglomCacheFind()
doesn't need to fault in conglomerates manually.

The patch is not ready for commit, as it doesn't pass all regression tests. But it could be
used for testing, if someone has a test environment where the deadlock can be reliably reproduced.

There was only one failure in the regression tests. store/xaOffline1.sql had a diff in one
of the transaction table listings, where a transaction showed up in the ACTIVE state whereas
IDLE was expected.

This probably happens because the transaction used in the CacheableConglomerate.setIdentity()
method is not necessarily the same as the one previously used by RAMAccessManager.conglomCacheFind().

The current implementation of setIdentity() in the patch just fetches the first transaction
it finds on the context stack. That seems to do the trick in most cases, but it doesn't know
whether conglomCacheFind() was called with a top-level transaction or a nested transaction,
as setIdentity() cannot access conglomCacheFind()'s parameters. Maybe it can be solved by
pushing some other context type (with a reference to the correct tx) on the context stack
before accessing the conglomerate cache, and let setIdentity() check that instead?
> Logical deadlock happened when freezing/unfreezing the database
> ---------------------------------------------------------------
>                 Key: DERBY-5632
>                 URL: https://issues.apache.org/jira/browse/DERBY-5632
>             Project: Derby
>          Issue Type: Bug
>          Components: Documentation, Services
>    Affects Versions:
>         Environment: Oracle M3000/Solaris 10
>            Reporter: Brett Bergquist
>              Labels: derby_triage10_10
>         Attachments: experimental-v1.diff, stack.txt
> Tried to make a quick database backup by freezing the database, performing a ZFS snapshot,
and then unfreezing the database.   The database was frozen but then a connection to the database
could not be established to unfreeze the database.
> Looking at the stack trace of the network server, , I see 3 threads that are trying to
process a connection request.   Each of these is waiting on:
>                 at org.apache.derby.impl.store.access.RAMAccessManager.conglomCacheFind(Unknown
>                 - waiting to lock <0xfffffffd3a7fcc68> (a org.apache.derby.impl.services.cache.ConcurrentCache)
> That object is owned by:
>                 - locked <0xfffffffd3a7fcc68> (a org.apache.derby.impl.services.cache.ConcurrentCache)
>                 at org.apache.derby.impl.store.access.RAMTransaction.findExistingConglomerate(Unknown
>                 at org.apache.derby.impl.store.access.RAMTransaction.openGroupFetchScan(Unknown
>                 at org.apache.derby.impl.services.daemon.IndexStatisticsDaemonImpl.updateIndexStatsMinion(Unknown
>                 at org.apache.derby.impl.services.daemon.IndexStatisticsDaemonImpl.runExplicitly(Unknown
>                 at org.apache.derby.impl.sql.execute.AlterTableConstantAction.updateStatistics(Unknown
> which itself is waiting for the object:
>                 at java.lang.Object.wait(Native Method)
>                 - waiting on <0xfffffffd3ac1d608> (a org.apache.derby.impl.store.raw.log.LogToFile)
>                 at java.lang.Object.wait(Object.java:485)
>                 at org.apache.derby.impl.store.raw.log.LogToFile.flush(Unknown Source)
>                 - locked <0xfffffffd3ac1d608> (a org.apache.derby.impl.store.raw.log.LogToFile)
>                 at org.apache.derby.impl.store.raw.log.LogToFile.flush(Unknown Source)
>                 at org.apache.derby.impl.store.raw.data.BaseDataFileFactory.flush(Unknown
> So basically what I think is happening is that the database is frozen, the statistics
are being updated on another thread which has the "org.apache.derby.impl.services.cache.ConcurrentCache"
locked and then waits for the LogToFile lock and the connecting threads are waiting to lock
"org.apache.derby.impl.services.cache.ConcurrentCache" to connect and these are where the
database is going to be unfrozen.    Not a deadlock as far as the JVM is concerned but it
will never leave this state either.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message