jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ian Boston <...@tfd.co.uk>
Subject Re: Server Hang in a Cluster, might be a deadlock
Date Tue, 15 May 2007 08:41:46 GMT


Dominique,
I have tried with Ctrl+/, and with kill -3  and with a jmx deadlock 
monitor class that looks for deadlocked threads every 500ms..... but it 
doesn't find any....... which makes me think that its not a 
deadlock..... JProfiler also has a deadlock detector (which might be 
more reliable than my code :) ) and so far it hasn't found any ... so 
perhapse its not a deadlock.... not certain what else it could be there 
is no cpu activity.

Im going to go back to basics, instrument the code and produce some big 
log files. (Just hoping that doesnt generate a work around!)

Ian



Dominique Pfister wrote:
> Hi Ian,
> 
> have you been able to generate a thread dump of the stalled node, at
> the moment it doesn't appear to respond any more? That might help...
> 
> Kind regards
> Dominique
> 
> On 5/15/07, Ian Boston <ieb@tfd.co.uk> wrote:
>> Hi,
>>
>> I've been doing some testing of a 2 node jackrabbit cluster using 1.3
>> (with the JCR-915 patch), but I am getting some strange behavior.
>>
>> I use OSX Finder to mount a DAV service from each node and then upload
>> lots of files to each dav mount at the same time. All goes Ok for the
>> first few 1000 files, and then one of the nodes stops responding to that
>> session. The other node continues and finishes.
>>
>> Eventually OSX disconnects the stalled node.
>>
>> When I try the port of the apparently stalled cluster node, its still
>> responds, however with some strange behaviour.
>>
>> A remount attempt responds with a 401 and forces basic login, but stalls
>> after that point. (the URL is to the base of a workspace)
>>
>> If I open firefox and access the dav servlet via firefox, I can navigate
>> down the directory tree, but if I try and refresh any jcr folder or jcr
>> file that I have already visited (since the cluster node has been up),
>> FF spins forever.
>>
>> I have put a Deadlock detector class into both nodes (java class that
>> looks for deadlock through jmx) but it doesnt detect anything.
>>
>> I have also use JProfiler connected to one node but it never detects a
>> deadlock.
>>
>> I have tried all of this in single node mode, with no Journal or
>> ClusterNode and not been able to re-create the problem (yet).
>>
>> The one thing that I have seen in JProfiler is threads blocked waiting
>> for an ItemState? monitor inside jackrabbit, but never for more that 
>> 500ms.
>>
>> I am using the standard DatabaseJournal and the
>> SimpleDbPersistanceManager, however I see the same happening with the
>> FileJournal.
>>
>> Any ideas ? I might put some very simple debug in near that monitor that
>> was blocking for 500ms ?
>>
>> I did search JIRA but couldnt find anything that was a close match.
>>
>>
>> Ian
>>
>>


Mime
View raw message