jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ian Boston <...@tfd.co.uk>
Subject Re: Server Hang in a Cluster, might be a deadlock
Date Fri, 18 May 2007 07:51:45 GMT
Now tracking this issue as
https://issues.apache.org/jira/browse/JCR-929



Ian Boston wrote:
> I think I have found where the problem is...
> 
> the HTTP threads appear to lock waiting in 
> AbstractJournal.lockAndSync(), while the ClusterNode thread waits in the 
> LockManagerImple.aquire();
> 
> 
> Since both the HTTP threads are trying to aquire, and this doesnt happen 
> in a non clustered deployment, then I am going to guess that the 
> spinlock in LockManagerImple.aquire(); has already got the a WriterLock, 
> hence it blocks the http threads.
> 
> I cant quite see why the aquire spins forever, I'll put some more debug in.
> 
> Ian
> 
> 
> 
> 
> 
> Thread http-8580-Processor24 waiting by 
> EDU.oswego.cs.dl.util.concurrent.WriterPreferenceReadWriteLock$WriterLock@8065c9 
> ::WAITING at 
> EDU.oswego.cs.dl.util.concurrent.WriterPreferenceReadWriteLock$WriterLock.acquire(null:-1)

> 
>      at java.lang.Object.wait(Object.java:-2)
>      at java.lang.Object.wait(Object.java:474)
>      at 
> EDU.oswego.cs.dl.util.concurrent.WriterPreferenceReadWriteLock$WriterLock.acquire(null:-1)

> 
>      at 
> org.apache.jackrabbit.core.journal.AbstractJournal.lockAndSync(AbstractJournal.java:228)

> 
>      at 
> org.apache.jackrabbit.core.journal.DefaultRecordProducer.append(DefaultRecordProducer.java:51)

> 
> 
> Thread http-8580-Processor23 waiting by 
> EDU.oswego.cs.dl.util.concurrent.WriterPreferenceReadWriteLock$WriterLock@8065c9 
> ::WAITING at 
> EDU.oswego.cs.dl.util.concurrent.WriterPreferenceReadWriteLock$WriterLock.acquire(null:-1)

> 
>      at java.lang.Object.wait(Object.java:-2)
>      at java.lang.Object.wait(Object.java:474)
>      at 
> EDU.oswego.cs.dl.util.concurrent.WriterPreferenceReadWriteLock$WriterLock.acquire(null:-1)

> 
>      at 
> org.apache.jackrabbit.core.journal.AbstractJournal.lockAndSync(AbstractJournal.java:228)

> 
>      at 
> org.apache.jackrabbit.core.journal.DefaultRecordProducer.append(DefaultRecordProducer.java:51)

> 
> 
> Thread ClusterNode-localhost2 waiting by 
> EDU.oswego.cs.dl.util.concurrent.ReentrantLock@6c6ff1 ::WAITING at 
> EDU.oswego.cs.dl.util.concurrent.ReentrantLock.acquire(null:-1)
>      at java.lang.Object.wait(Object.java:-2)
>      at java.lang.Object.wait(Object.java:474)
>      at EDU.oswego.cs.dl.util.concurrent.ReentrantLock.acquire(null:-1)
>      at 
> org.apache.jackrabbit.core.lock.LockManagerImpl.acquire(LockManagerImpl.java:599) 
> 
>      at 
> org.apache.jackrabbit.core.lock.LockManagerImpl.nodeAdded(LockManagerImpl.java:838) 
> 
> 
> 
> Dominique Pfister wrote:
>> Hi Ian,
>>
>> have you been able to generate a thread dump of the stalled node, at
>> the moment it doesn't appear to respond any more? That might help...
>>
>> Kind regards
>> Dominique
>>
>> On 5/15/07, Ian Boston <ieb@tfd.co.uk> wrote:
>>> Hi,
>>>
>>> I've been doing some testing of a 2 node jackrabbit cluster using 1.3
>>> (with the JCR-915 patch), but I am getting some strange behavior.
>>>
>>> I use OSX Finder to mount a DAV service from each node and then upload
>>> lots of files to each dav mount at the same time. All goes Ok for the
>>> first few 1000 files, and then one of the nodes stops responding to that
>>> session. The other node continues and finishes.
>>>
>>> Eventually OSX disconnects the stalled node.
>>>
>>> When I try the port of the apparently stalled cluster node, its still
>>> responds, however with some strange behaviour.
>>>
>>> A remount attempt responds with a 401 and forces basic login, but stalls
>>> after that point. (the URL is to the base of a workspace)
>>>
>>> If I open firefox and access the dav servlet via firefox, I can navigate
>>> down the directory tree, but if I try and refresh any jcr folder or jcr
>>> file that I have already visited (since the cluster node has been up),
>>> FF spins forever.
>>>
>>> I have put a Deadlock detector class into both nodes (java class that
>>> looks for deadlock through jmx) but it doesnt detect anything.
>>>
>>> I have also use JProfiler connected to one node but it never detects a
>>> deadlock.
>>>
>>> I have tried all of this in single node mode, with no Journal or
>>> ClusterNode and not been able to re-create the problem (yet).
>>>
>>> The one thing that I have seen in JProfiler is threads blocked waiting
>>> for an ItemState? monitor inside jackrabbit, but never for more that 
>>> 500ms.
>>>
>>> I am using the standard DatabaseJournal and the
>>> SimpleDbPersistanceManager, however I see the same happening with the
>>> FileJournal.
>>>
>>> Any ideas ? I might put some very simple debug in near that monitor that
>>> was blocking for 500ms ?
>>>
>>> I did search JIRA but couldnt find anything that was a close match.
>>>
>>>
>>> Ian
>>>
>>>
> 


Mime
View raw message