Mailing-List: contact dev-help@jackrabbit.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@jackrabbit.apache.org
Message-ID: <16059162.1180717518734.JavaMail.jira@brutus>
Date: Fri, 1 Jun 2007 10:05:18 -0700 (PDT)
From: "Xiaohua Lu (JIRA)" <jira@apache.org>
To: dev@jackrabbit.apache.org
Subject: [jira] Commented: (JCR-929) Under Heavy load in a Cluster HTTP
 Threads Block and stall requests
In-Reply-To: <8561188.1179474076849.JavaMail.jira@brutus>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/JCR-929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12500761 ] 

Xiaohua Lu commented on JCR-929:
--------------------------------

I had a similar problem but the stack trace is slight different 
The setup is a 4 nodes cluster and under heavy load (mainly updates), they all hang, from database side, three transaction updates are waiting for a select lock. The select lock seems to be blocked by one of the threads underneath

thread 1 
Thread 25141: (state = BLOCKED)
 - java.lang.Object.wait(long) @bci=0 (Compiled frame; information may be imprecise)
 - java.lang.Object.wait() @bci=2, line=474 (Compiled frame)
 - org.apache.jackrabbit.core.journal.AbstractJournal.sync() @bci=9, line=160 (Compiled frame)
 - org.apache.jackrabbit.core.cluster.ClusterNode.sync() @bci=27, line=283 (Interpreted frame)
 - org.apache.jackrabbit.core.cluster.ClusterNode.run() @bci=38, line=254 (Interpreted frame)
 - java.lang.Thread.run() @bci=11, line=595 (Interpreted frame)


thread 2 
Thread 25137: (state = BLOCKED)
 - org.apache.commons.collections.map.AbstractHashedMap.get(java.lang.Object) @bci=62, line=182 (Compiled frame; information may be imprecise)
 - org.apache.jackrabbit.core.state.NodeState.getReorderedChildNodeEntries() @bci=57, line=671 (Compiled frame)
 - org.apache.jackrabbit.core.CachingHierarchyManager.nodesReplaced(org.apache.jackrabbit.core.state.NodeState) @bci=1, line=385 (Interpreted frame)
 - org.apache.jackrabbit.core.state.StateChangeDispatcher.notifyNodesReplaced(org.apache.jackrabbit.core.state.NodeState) @bci=29, line=132 (Interpreted frame)
 - org.apache.jackrabbit.core.state.SessionItemStateManager.nodesReplaced(org.apache.jackrabbit.core.state.NodeState) @bci=29, line=874 (Interpreted frame)
 - org.apache.jackrabbit.core.state.NodeState.notifyNodesReplaced() @bci=12, line=793 (Interpreted frame)
 - org.apache.jackrabbit.core.state.NodeState.setChildNodeEntries(java.util.List) @bci=73, line=473 (Interpreted frame)
 - org.apache.jackrabbit.core.state.NodeStateMerger.merge(org.apache.jackrabbit.core.state.NodeState, org.apache.jackrabbit.core.state.NodeStateMerger$MergeContext) @bci=291, line=139 (Compiled frame)
 - org.apache.jackrabbit.core.state.SessionItemStateManager.stateModified(org.apache.jackrabbit.core.state.ItemState) @bci=58, line=802 (Interpreted frame)
 - org.apache.jackrabbit.core.state.StateChangeDispatcher.notifyStateModified(org.apache.jackrabbit.core.state.ItemState) @bci=29, line=85 (Interpreted frame)
 - org.apache.jackrabbit.core.state.LocalItemStateManager.stateModified(org.apache.jackrabbit.core.state.ItemState) @bci=49, line=427 (Interpreted frame)
 - org.apache.jackrabbit.core.state.StateChangeDispatcher.notifyStateModified(org.apache.jackrabbit.core.state.ItemState) @bci=29, line=85 (Interpreted frame)
 - org.apache.jackrabbit.core.state.SharedItemStateManager.stateModified(org.apache.jackrabbit.core.state.ItemState) @bci=5, line=390 (Interpreted frame)
 - org.apache.jackrabbit.core.state.ItemState.notifyStateUpdated() @bci=12, line=241 (Interpreted frame)
 - org.apache.jackrabbit.core.state.ChangeLog.persisted() @bci=30, line=271 (Interpreted frame)
 - org.apache.jackrabbit.core.state.SharedItemStateManager.doExternalUpdate(org.apache.jackrabbit.core.state.ChangeLog) @bci=264, line=945 (Interpreted frame)
 - org.apache.jackrabbit.core.state.SharedItemStateManager.externalUpdate(org.apache.jackrabbit.core.state.ChangeLog, org.apache.jackrabbit.core.observation.EventStateCollection) @bci=10, line=871 (Interpreted frame)
 - org.apache.jackrabbit.core.RepositoryImpl$WorkspaceInfo.externalUpdate(org.apache.jackrabbit.core.state.ChangeLog, java.util.List) @bci=25, line=1957 (Interpreted frame)
 - org.apache.jackrabbit.core.cluster.ClusterNode.end() @bci=182, line=834 (Interpreted frame)
 - org.apache.jackrabbit.core.cluster.ClusterNode.consume(org.apache.jackrabbit.core.journal.Record) @bci=469, line=929 (Compiled frame)
 - org.apache.jackrabbit.core.journal.AbstractJournal.doSync(long) @bci=108, line=191 (Compiled frame)
 - org.apache.jackrabbit.core.journal.AbstractJournal.lockAndSync() @bci=42, line=241 (Interpreted frame)
 - org.apache.jackrabbit.core.journal.DefaultRecordProducer.append() @bci=6, line=51 (Interpreted frame)
 - org.apache.jackrabbit.core.cluster.ClusterNode$WorkspaceUpdateChannel.updateCreated(org.apache.jackrabbit.core.cluster.Update) @bci=36, line=466 (Interpreted frame)
 - org.apache.jackrabbit.core.state.SharedItemStateManager$Update.begin() @bci=44, line=530 (Interpreted frame)
 - org.apache.jackrabbit.core.state.SharedItemStateManager.beginUpdate(org.apache.jackrabbit.core.state.ChangeLog, org.apache.jackrabbit.core.observation.EventStateCollectionFactory, org.apache.jackrabbit.core.virtual.VirtualItemStateProvider) @bci=15, line=825 (Interpreted frame)
 - org.apache.jackrabbit.core.state.SharedItemStateManager.update(org.apache.jackrabbit.core.state.ChangeLog, org.apache.jackrabbit.core.observation.EventStateCollectionFactory) @bci=4, line=855 (Interpreted frame)
 - org.apache.jackrabbit.core.state.LocalItemStateManager.update(org.apache.jackrabbit.core.state.ChangeLog) @bci=9, line=326 (Interpreted frame)
 - org.apache.jackrabbit.core.state.XAItemStateManager.update(org.apache.jackrabbit.core.state.ChangeLog) @bci=20, line=313 (Interpreted frame)
 - org.apache.jackrabbit.core.state.LocalItemStateManager.update() @bci=22, line=302 (Interpreted frame)
 - org.apache.jackrabbit.core.state.SessionItemStateManager.update() @bci=4, line=306 (Interpreted frame)
 - org.apache.jackrabbit.core.ItemImpl.save() @bci=594, line=1214 (Interpreted frame)
 - net.maven.mcr.event.AssetCompleteEventListener.markAssetComplete(javax.jcr.Node, boolean) @bci=137, line=185 (Interpreted frame)
 - net.maven.mcr.event.AssetCompleteEventListener.handleAssetCompleteCheck(java.lang.String) @bci=241, line=169 (Interpreted frame)
 - net.maven.mcr.event.AssetCompleteEventListener.onEvent(javax.jcr.observation.EventIterator) @bci=112, line=82 (Interpreted frame)
 - org.apache.jackrabbit.core.observation.EventConsumer.consumeEvents(org.apache.jackrabbit.core.observation.EventStateCollection) @bci=165, line=231 (Compiled frame)
 - org.apache.jackrabbit.core.observation.ObservationDispatcher.run() @bci=104, line=145 (Interpreted frame)
 - java.lang.Thread.run() @bci=11, line=595 (Interpreted frame)


Since Thread 2 is blocked by JVM lock, it is also holding the select lock in doSync.getRecords. That explained the deadlock on database level. 

I am not sure these two problems are exactly the same, if not, I can file a seperate bug. Thanks.


> Under Heavy load in a Cluster HTTP Threads Block and stall requests
> -------------------------------------------------------------------
>
>                 Key: JCR-929
>                 URL: https://issues.apache.org/jira/browse/JCR-929
>             Project: Jackrabbit
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 1.3
>         Environment: 2 Node Cluster, OSX, JDK 1.5 with DatabaseJournal, DatabasePersistanceManager, all content in DB, using WebDAV to load
>            Reporter: Ian Boston
>            Assignee: Dominique Pfister
>         Attachments: catalina.out.node1.txt, catalina.out.node2.txt
>
>
> Under Heavy load created by mounting both nodes in the cluster in OSX Finder and then uploading large numebers of files to each node at the same time ( a few 1000), eventually one of the nodes stops responding and the Finder mount timesout and disconnects.
> Once that happens that node becomes unusable.
> More mount attempts will prompt for a password indicating HTTP is still running, but will timeout once the connection is authenticated.
> Access by the Web Browser will prompt for a password, conenct and provide a once only listing of any collection in the workspace. If you try to refresh that collection, the HTTP request hangs forever.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.