jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ian Boston <...@tfd.co.uk>
Subject Re: Server Hang in a Cluster, might be a deadlock
Date Thu, 17 May 2007 23:28:53 GMT
I think I have found where the problem is...

the HTTP threads appear to lock waiting in 
AbstractJournal.lockAndSync(), while the ClusterNode thread waits in the 
LockManagerImple.aquire();


Since both the HTTP threads are trying to aquire, and this doesnt happen 
in a non clustered deployment, then I am going to guess that the 
spinlock in LockManagerImple.aquire(); has already got the a WriterLock, 
hence it blocks the http threads.

I cant quite see why the aquire spins forever, I'll put some more debug in.

Ian





Thread http-8580-Processor24 waiting by 
EDU.oswego.cs.dl.util.concurrent.WriterPreferenceReadWriteLock$WriterLock@8065c9 
::WAITING at 
EDU.oswego.cs.dl.util.concurrent.WriterPreferenceReadWriteLock$WriterLock.acquire(null:-1)
      at java.lang.Object.wait(Object.java:-2)
      at java.lang.Object.wait(Object.java:474)
      at 
EDU.oswego.cs.dl.util.concurrent.WriterPreferenceReadWriteLock$WriterLock.acquire(null:-1)
      at 
org.apache.jackrabbit.core.journal.AbstractJournal.lockAndSync(AbstractJournal.java:228)
      at 
org.apache.jackrabbit.core.journal.DefaultRecordProducer.append(DefaultRecordProducer.java:51)

Thread http-8580-Processor23 waiting by 
EDU.oswego.cs.dl.util.concurrent.WriterPreferenceReadWriteLock$WriterLock@8065c9 
::WAITING at 
EDU.oswego.cs.dl.util.concurrent.WriterPreferenceReadWriteLock$WriterLock.acquire(null:-1)
      at java.lang.Object.wait(Object.java:-2)
      at java.lang.Object.wait(Object.java:474)
      at 
EDU.oswego.cs.dl.util.concurrent.WriterPreferenceReadWriteLock$WriterLock.acquire(null:-1)
      at 
org.apache.jackrabbit.core.journal.AbstractJournal.lockAndSync(AbstractJournal.java:228)
      at 
org.apache.jackrabbit.core.journal.DefaultRecordProducer.append(DefaultRecordProducer.java:51)

Thread ClusterNode-localhost2 waiting by 
EDU.oswego.cs.dl.util.concurrent.ReentrantLock@6c6ff1 ::WAITING at 
EDU.oswego.cs.dl.util.concurrent.ReentrantLock.acquire(null:-1)
      at java.lang.Object.wait(Object.java:-2)
      at java.lang.Object.wait(Object.java:474)
      at EDU.oswego.cs.dl.util.concurrent.ReentrantLock.acquire(null:-1)
      at 
org.apache.jackrabbit.core.lock.LockManagerImpl.acquire(LockManagerImpl.java:599)
      at 
org.apache.jackrabbit.core.lock.LockManagerImpl.nodeAdded(LockManagerImpl.java:838)


Dominique Pfister wrote:
> Hi Ian,
> 
> have you been able to generate a thread dump of the stalled node, at
> the moment it doesn't appear to respond any more? That might help...
> 
> Kind regards
> Dominique
> 
> On 5/15/07, Ian Boston <ieb@tfd.co.uk> wrote:
>> Hi,
>>
>> I've been doing some testing of a 2 node jackrabbit cluster using 1.3
>> (with the JCR-915 patch), but I am getting some strange behavior.
>>
>> I use OSX Finder to mount a DAV service from each node and then upload
>> lots of files to each dav mount at the same time. All goes Ok for the
>> first few 1000 files, and then one of the nodes stops responding to that
>> session. The other node continues and finishes.
>>
>> Eventually OSX disconnects the stalled node.
>>
>> When I try the port of the apparently stalled cluster node, its still
>> responds, however with some strange behaviour.
>>
>> A remount attempt responds with a 401 and forces basic login, but stalls
>> after that point. (the URL is to the base of a workspace)
>>
>> If I open firefox and access the dav servlet via firefox, I can navigate
>> down the directory tree, but if I try and refresh any jcr folder or jcr
>> file that I have already visited (since the cluster node has been up),
>> FF spins forever.
>>
>> I have put a Deadlock detector class into both nodes (java class that
>> looks for deadlock through jmx) but it doesnt detect anything.
>>
>> I have also use JProfiler connected to one node but it never detects a
>> deadlock.
>>
>> I have tried all of this in single node mode, with no Journal or
>> ClusterNode and not been able to re-create the problem (yet).
>>
>> The one thing that I have seen in JProfiler is threads blocked waiting
>> for an ItemState? monitor inside jackrabbit, but never for more that 
>> 500ms.
>>
>> I am using the standard DatabaseJournal and the
>> SimpleDbPersistanceManager, however I see the same happening with the
>> FileJournal.
>>
>> Any ideas ? I might put some very simple debug in near that monitor that
>> was blocking for 500ms ?
>>
>> I did search JIRA but couldnt find anything that was a close match.
>>
>>
>> Ian
>>
>>


Mime
View raw message