From users-return-9315-apmail-jackrabbit-users-archive=jackrabbit.apache.org@jackrabbit.apache.org Thu Oct 16 09:05:36 2008 Return-Path: Delivered-To: apmail-jackrabbit-users-archive@locus.apache.org Received: (qmail 63795 invoked from network); 16 Oct 2008 09:05:36 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 16 Oct 2008 09:05:36 -0000 Received: (qmail 19534 invoked by uid 500); 16 Oct 2008 09:05:37 -0000 Delivered-To: apmail-jackrabbit-users-archive@jackrabbit.apache.org Received: (qmail 19052 invoked by uid 500); 16 Oct 2008 09:05:36 -0000 Mailing-List: contact users-help@jackrabbit.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: users@jackrabbit.apache.org Delivered-To: mailing list users@jackrabbit.apache.org Received: (qmail 19041 invoked by uid 99); 16 Oct 2008 09:05:36 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 16 Oct 2008 02:05:36 -0700 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of marcel.reutegger@gmx.net designates 213.165.64.20 as permitted sender) Received: from [213.165.64.20] (HELO mail.gmx.net) (213.165.64.20) by apache.org (qpsmtpd/0.29) with SMTP; Thu, 16 Oct 2008 09:04:28 +0000 Received: (qmail invoked by alias); 16 Oct 2008 09:05:04 -0000 Received: from bsl-rtr.day.com (EHLO [10.0.0.112]) [62.192.10.254] by mail.gmx.net (mp051) with SMTP; 16 Oct 2008 11:05:04 +0200 X-Authenticated: #894343 X-Provags-ID: V01U2FsdGVkX18quRMtn84qeUnpTxEuWxIPkZ7Sn+XzSZnLWQi013 AxuKi/4rBZzKcR Message-ID: <48F703BE.5080907@gmx.net> Date: Thu, 16 Oct 2008 11:05:02 +0200 From: Marcel Reutegger User-Agent: Thunderbird 2.0.0.17 (Windows/20080914) MIME-Version: 1.0 To: users@jackrabbit.apache.org Subject: Re: Liveness failures in DefaultISMLocking References: <23fce8e60810051624uf6a9a7fka8176fc76946eeab@mail.gmail.com> In-Reply-To: <23fce8e60810051624uf6a9a7fka8176fc76946eeab@mail.gmail.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Y-GMX-Trusted: 0 X-FuHaFi: 0.6 X-Virus-Checked: Checked by ClamAV on apache.org Hi, it seems that your application shares a session instance among multiple threads. while read-only access on a session instance from multiple threads is considered safe, write access is not. can you please check that your application uses a dedicated session when it writes to the repository? regards marcel James Abley wrote: > Hi, > > I've seen some liveness failures in DefaultISMLocking, where our > webapp is unresponsive and thread dumps (which will follow tomorrow / > later today depending on your timezone). The list of suspect causes > for this problem currently stands at this: > > 1. JRockit JVM does not honour finally blocks. > 2. Bug in concurrent-utils. > 3. Bug in Jackrabbit code. > 4. Bug in our code calling Jackrabbit. > 5. Door number 3. > > 1. is obviously a frightening thought and cannot be the problem - just > listing the obvious. > 2. is highly unlikely. It's a very widely used library written and > reviewed by some very smart people. > 3. is possible, but fairly unlikely. A problem would presumably have > been reported by someone else and a reasonable number of people are > using Jackrabbit without ever seeing this problem. > 4. Less people are using our code than the Jackrabbit code, so this is > most likely where the problem lies. Further analysis of the thread > dumps is required to see what's going on. > 5. Or something I've not though of yet. > > I've not yet done sufficient analysis to determine whether it is a > deadlock, missed notification or some other reason for the application > becoming unresponsive. From my reading of the Jackrabbit code, it > looks fine in terms of locks being acquired and then released in a > finally block. One question I do have though, is that the lock > acquisition code all use the blocking form of trying to acquire the > lock; i.e. in DefaultISMLocking: > > rwLock.writeLock().acquire(); > > and > > rwLock.readLock().acquire(); > > These methods can potentially wait for ever (and that is what they > look like doing, since the thread dumps we have seem to indicate that > no thread is making progress over a 5 minute timeframe). Is there any > particular reason why the timeout version isn't used? i.e. > > rwLock.writeLock().attempt(10000); > > and > > rwLock.readLock().attemp(10000); > > Again, from my static analysis of the code, this should allow an > exception to safely propagate and my application would fail / display > an error message to the customer, but would not require the servlet > container to be restarted. To my mind, that would be a safer > implementation? > > I plan on trying to write a test to recreate the problem (which to > date I think we've only seen on JRockit JVMs, hence my listing of that > as a possible issue), and then putting in an implementation of > ISMLocking using the Java 5 java.util.concurrent primitives with the > timeout versions of the methods being used. But I was just curious as > to what the list might think about this issue? > > Cheers, > > James > >