db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Brett Bergquist (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DERBY-5632) Logical deadlock happened when freezing/unfreezing the database
Date Thu, 14 Mar 2013 14:18:13 GMT

    [ https://issues.apache.org/jira/browse/DERBY-5632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13602303#comment-13602303
] 

Brett Bergquist commented on DERBY-5632:
----------------------------------------

Knut, originally this was all done using a script and IJ.   The script invoked IJ to freeze
the database, then did a file system backup (using bash and ZFS commands) and then invoked
IJ to unfreeze the database.  Mike Matrigali indicated that the expectation was that the freeze/unfreeze
would be done using the same connection, so I wrote a utility in Java that would create a
connection, invoke the freeze, issue a system level call to perform the file system backup,
and then issue the unfreeze all in one connection.   The problem still exists even doing it
this way but the stack traces tare probably from the original.

I really do have an issue with the expectation that this all be done in one connection.  Even
with the utility that there is a chance that the utility could fail between the time the freeze
is done and the unfreeze is done.   At that point Derby is done and will require to be forcefully
killed.

It would seem to me that it should always be possible to connect to the database engine. 
It may be that specific requests on the connection will block when the database is frozen,
but the one that should not block is the unfreeze.

I think not allowing the freeze/unfreeze from different connections is a bug.
                
> Logical deadlock happened when freezing/unfreezing the database
> ---------------------------------------------------------------
>
>                 Key: DERBY-5632
>                 URL: https://issues.apache.org/jira/browse/DERBY-5632
>             Project: Derby
>          Issue Type: Bug
>          Components: Documentation, Services
>    Affects Versions: 10.8.2.2
>         Environment: Oracle M3000/Solaris 10
>            Reporter: Brett Bergquist
>            Assignee: Knut Anders Hatlen
>              Labels: derby_triage10_10
>             Fix For: 10.10.0.0
>
>         Attachments: experimental-v1.diff, experimental-v2.diff, stack.txt
>
>
> Tried to make a quick database backup by freezing the database, performing a ZFS snapshot,
and then unfreezing the database.   The database was frozen but then a connection to the database
could not be established to unfreeze the database.
> Looking at the stack trace of the network server, , I see 3 threads that are trying to
process a connection request.   Each of these is waiting on:
>                 at org.apache.derby.impl.store.access.RAMAccessManager.conglomCacheFind(Unknown
Source)
>                 - waiting to lock <0xfffffffd3a7fcc68> (a org.apache.derby.impl.services.cache.ConcurrentCache)
> That object is owned by:
>                 - locked <0xfffffffd3a7fcc68> (a org.apache.derby.impl.services.cache.ConcurrentCache)
>                 at org.apache.derby.impl.store.access.RAMTransaction.findExistingConglomerate(Unknown
Source)
>                 at org.apache.derby.impl.store.access.RAMTransaction.openGroupFetchScan(Unknown
Source)
>                 at org.apache.derby.impl.services.daemon.IndexStatisticsDaemonImpl.updateIndexStatsMinion(Unknown
Source)
>                 at org.apache.derby.impl.services.daemon.IndexStatisticsDaemonImpl.runExplicitly(Unknown
Source)
>                 at org.apache.derby.impl.sql.execute.AlterTableConstantAction.updateStatistics(Unknown
Source)
> which itself is waiting for the object:
>                 at java.lang.Object.wait(Native Method)
>                 - waiting on <0xfffffffd3ac1d608> (a org.apache.derby.impl.store.raw.log.LogToFile)
>                 at java.lang.Object.wait(Object.java:485)
>                 at org.apache.derby.impl.store.raw.log.LogToFile.flush(Unknown Source)
>                 - locked <0xfffffffd3ac1d608> (a org.apache.derby.impl.store.raw.log.LogToFile)
>                 at org.apache.derby.impl.store.raw.log.LogToFile.flush(Unknown Source)
>                 at org.apache.derby.impl.store.raw.data.BaseDataFileFactory.flush(Unknown
Source)
> So basically what I think is happening is that the database is frozen, the statistics
are being updated on another thread which has the "org.apache.derby.impl.services.cache.ConcurrentCache"
locked and then waits for the LogToFile lock and the connecting threads are waiting to lock
"org.apache.derby.impl.services.cache.ConcurrentCache" to connect and these are where the
database is going to be unfrozen.    Not a deadlock as far as the JVM is concerned but it
will never leave this state either.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message