jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ian Boston <...@tfd.co.uk>
Subject Re: Server Hang in a Cluster, might be a deadlock
Date Tue, 15 May 2007 08:42:26 GMT
Dominique Pfister wrote:
> Another question, that just crossed my mind: after having restarted
> your stalled cluster node, does it again behave normally?

yes, I only need to start that node not the whole cluster.

Ian

> 
> Dominique
> 
> 
> On 5/15/07, Ian Boston <ieb@tfd.co.uk> wrote:
>> Hi,
>>
>> I've been doing some testing of a 2 node jackrabbit cluster using 1.3
>> (with the JCR-915 patch), but I am getting some strange behavior.
>>
>> I use OSX Finder to mount a DAV service from each node and then upload
>> lots of files to each dav mount at the same time. All goes Ok for the
>> first few 1000 files, and then one of the nodes stops responding to that
>> session. The other node continues and finishes.
>>
>> Eventually OSX disconnects the stalled node.
>>
>> When I try the port of the apparently stalled cluster node, its still
>> responds, however with some strange behaviour.
>>
>> A remount attempt responds with a 401 and forces basic login, but stalls
>> after that point. (the URL is to the base of a workspace)
>>
>> If I open firefox and access the dav servlet via firefox, I can navigate
>> down the directory tree, but if I try and refresh any jcr folder or jcr
>> file that I have already visited (since the cluster node has been up),
>> FF spins forever.
>>
>> I have put a Deadlock detector class into both nodes (java class that
>> looks for deadlock through jmx) but it doesnt detect anything.
>>
>> I have also use JProfiler connected to one node but it never detects a
>> deadlock.
>>
>> I have tried all of this in single node mode, with no Journal or
>> ClusterNode and not been able to re-create the problem (yet).
>>
>> The one thing that I have seen in JProfiler is threads blocked waiting
>> for an ItemState? monitor inside jackrabbit, but never for more that 
>> 500ms.
>>
>> I am using the standard DatabaseJournal and the
>> SimpleDbPersistanceManager, however I see the same happening with the
>> FileJournal.
>>
>> Any ideas ? I might put some very simple debug in near that monitor that
>> was blocking for 500ms ?
>>
>> I did search JIRA but couldnt find anything that was a close match.
>>
>>
>> Ian
>>
>>


Mime
View raw message