jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ian Boston (JIRA)" <j...@apache.org>
Subject [jira] Commented: (JCR-929) Under Heavy load in a Cluster HTTP Threads Block and stall requests
Date Fri, 18 May 2007 08:20:16 GMT

    [ https://issues.apache.org/jira/browse/JCR-929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12496813
] 

Ian Boston commented on JCR-929:
--------------------------------

The reverse pattern also appears,

The ClusterNode thread waiting in AbstractJournal.sync

and the http threads waiting in the LockManagerImpl.aquire

-----

The previous case was ClusterNode thread in LockManagerImpl.aquire 

and http thread waiting in AbstractJournal.lockAndSync

This indicates that both sets of threads interact with the locks in both places raising the
potential for an interlock to happen.

Since the AbstractJournal is the newer code, perhapse it should perform a LockManagerImpl
aquire earlier than it does ?

-----

There is no indication that the spin lock inside LockManagerImpl.aquire ever comes out of
the wait condition, except on a interupt to the JVM, at which point it goes back into the
aquire.

This prevents the JCR from shutting down since the shutdown operation also needs to aquire
a lock.

Will investicate the call tree to see if its possible to change the locking order to prevent
the interlock without hitting performance

> Under Heavy load in a Cluster HTTP Threads Block and stall requests
> -------------------------------------------------------------------
>
>                 Key: JCR-929
>                 URL: https://issues.apache.org/jira/browse/JCR-929
>             Project: Jackrabbit
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 1.3
>         Environment: 2 Node Cluster, OSX, JDK 1.5 with DatabaseJournal, DatabasePersistanceManager,
all content in DB, using WebDAV to load
>            Reporter: Ian Boston
>
> Under Heavy load created by mounting both nodes in the cluster in OSX Finder and then
uploading large numebers of files to each node at the same time ( a few 1000), eventually
one of the nodes stops responding and the Finder mount timesout and disconnects.
> Once that happens that node becomes unusable.
> More mount attempts will prompt for a password indicating HTTP is still running, but
will timeout once the connection is authenticated.
> Access by the Web Browser will prompt for a password, conenct and provide a once only
listing of any collection in the workspace. If you try to refresh that collection, the HTTP
request hangs forever.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message