db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Knut Anders Hatlen (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DERBY-5632) Logical deadlock happened when freezing/unfreezing the database
Date Tue, 11 Dec 2012 14:49:21 GMT

    [ https://issues.apache.org/jira/browse/DERBY-5632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13529019#comment-13529019
] 

Knut Anders Hatlen commented on DERBY-5632:
-------------------------------------------

I think the way the conglomerate cache is accessed breaks with the intention of how our generic
cache implementation should be accessed. It should not be necessary to synchronize on the
cache instance, like RAMAccessManager.conglomCacheFind() and some other callers do.

To take conglomCacheFind() as an example, I think it ideally should have been implemented
like this:

Conglomerate conglom = null;
CacheableConglomerate entry = (CacheableConglomerate) conglomCache.find(new Long(conglomid));
if (entry != null) {
    conglom = entry.getConglom();
    conglom_cache.release(entry);
}
return conglom;

That is, no explicit synchronization, and let the cache implementation take care of faulting
in the conglomerate if it's not in the cache.

However, CacheableConglomerate.setIdentity(), which is where the code that faults in the conglomerate
is supposed to be, is just an empty shell:

	public Cacheable setIdentity(Object key) throws StandardException
    {
		if (SanityManager.DEBUG) {
			SanityManager.THROWASSERT("not supported.");
		}

        return(null);
    }

I'll have a look and see if it's possible to rewrite the code in a way so that we can remove
the explicit synchronization on the conglomerate cache instance. Hopefully, that would be
enough to break the deadlock.
                
> Logical deadlock happened when freezing/unfreezing the database
> ---------------------------------------------------------------
>
>                 Key: DERBY-5632
>                 URL: https://issues.apache.org/jira/browse/DERBY-5632
>             Project: Derby
>          Issue Type: Bug
>          Components: Documentation, Services
>    Affects Versions: 10.8.2.2
>         Environment: Oracle M3000/Solaris 10
>            Reporter: Brett Bergquist
>              Labels: derby_triage10_10
>         Attachments: stack.txt
>
>
> Tried to make a quick database backup by freezing the database, performing a ZFS snapshot,
and then unfreezing the database.   The database was frozen but then a connection to the database
could not be established to unfreeze the database.
> Looking at the stack trace of the network server, , I see 3 threads that are trying to
process a connection request.   Each of these is waiting on:
>                 at org.apache.derby.impl.store.access.RAMAccessManager.conglomCacheFind(Unknown
Source)
>                 - waiting to lock <0xfffffffd3a7fcc68> (a org.apache.derby.impl.services.cache.ConcurrentCache)
> That object is owned by:
>                 - locked <0xfffffffd3a7fcc68> (a org.apache.derby.impl.services.cache.ConcurrentCache)
>                 at org.apache.derby.impl.store.access.RAMTransaction.findExistingConglomerate(Unknown
Source)
>                 at org.apache.derby.impl.store.access.RAMTransaction.openGroupFetchScan(Unknown
Source)
>                 at org.apache.derby.impl.services.daemon.IndexStatisticsDaemonImpl.updateIndexStatsMinion(Unknown
Source)
>                 at org.apache.derby.impl.services.daemon.IndexStatisticsDaemonImpl.runExplicitly(Unknown
Source)
>                 at org.apache.derby.impl.sql.execute.AlterTableConstantAction.updateStatistics(Unknown
Source)
> which itself is waiting for the object:
>                 at java.lang.Object.wait(Native Method)
>                 - waiting on <0xfffffffd3ac1d608> (a org.apache.derby.impl.store.raw.log.LogToFile)
>                 at java.lang.Object.wait(Object.java:485)
>                 at org.apache.derby.impl.store.raw.log.LogToFile.flush(Unknown Source)
>                 - locked <0xfffffffd3ac1d608> (a org.apache.derby.impl.store.raw.log.LogToFile)
>                 at org.apache.derby.impl.store.raw.log.LogToFile.flush(Unknown Source)
>                 at org.apache.derby.impl.store.raw.data.BaseDataFileFactory.flush(Unknown
Source)
> So basically what I think is happening is that the database is frozen, the statistics
are being updated on another thread which has the "org.apache.derby.impl.services.cache.ConcurrentCache"
locked and then waits for the LogToFile lock and the connecting threads are waiting to lock
"org.apache.derby.impl.services.cache.ConcurrentCache" to connect and these are where the
database is going to be unfrozen.    Not a deadlock as far as the JVM is concerned but it
will never leave this state either.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message