From users-return-9161-apmail-jackrabbit-users-archive=jackrabbit.apache.org@jackrabbit.apache.org Sun Oct 05 23:25:18 2008 Return-Path: Delivered-To: apmail-jackrabbit-users-archive@locus.apache.org Received: (qmail 64725 invoked from network); 5 Oct 2008 23:25:18 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 5 Oct 2008 23:25:18 -0000 Received: (qmail 66877 invoked by uid 500); 5 Oct 2008 23:25:15 -0000 Delivered-To: apmail-jackrabbit-users-archive@jackrabbit.apache.org Received: (qmail 66860 invoked by uid 500); 5 Oct 2008 23:25:15 -0000 Mailing-List: contact users-help@jackrabbit.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: users@jackrabbit.apache.org Delivered-To: mailing list users@jackrabbit.apache.org Received: (qmail 66849 invoked by uid 99); 5 Oct 2008 23:25:15 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 05 Oct 2008 16:25:15 -0700 X-ASF-Spam-Status: No, hits=0.1 required=10.0 tests=DNS_FROM_SECURITYSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of james.abley@gmail.com designates 209.85.198.225 as permitted sender) Received: from [209.85.198.225] (HELO rv-out-0506.google.com) (209.85.198.225) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 05 Oct 2008 23:24:13 +0000 Received: by rv-out-0506.google.com with SMTP id k40so2812572rvb.31 for ; Sun, 05 Oct 2008 16:24:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:to :subject:mime-version:content-type:content-transfer-encoding :content-disposition; bh=wROijeHNAgR+MpTYI7fucwX/EBcOjWb26HmoMBGitwc=; b=gLuJf5KaQQTyleNZLiVY7oiY5poFfOlNoiH8/f2aiJFEszY5D80dlTlEgdnlU7akQC 28f0SYceAg1/v5eU0sbc2ch+qErc0rkIRQ2HSrnDVqMEjbIp1XZgKd7NBOTJ/Qg3yBbM czx57qGmQ5ldE+7KPzT4lPLy4OX9VaUS7q3/w= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:to:subject:mime-version:content-type :content-transfer-encoding:content-disposition; b=S73ZYTe1N587iVTQSXw46Vghwa0kmHMzpEakuRnDtQ2lzBtkDLplXQw082ASpwAbPZ 821R9raRrcz+lexIc2b63BvVoA1Ky/g+wbPWcT5Mr1eku4SmuyJ27jtyt6jd4uhEwqCN vP416cfv61LWsk+b4GRK615EVrgIrFkc7e6U4= Received: by 10.142.188.4 with SMTP id l4mr1587658wff.151.1223249088651; Sun, 05 Oct 2008 16:24:48 -0700 (PDT) Received: by 10.142.180.14 with HTTP; Sun, 5 Oct 2008 16:24:48 -0700 (PDT) Message-ID: <23fce8e60810051624uf6a9a7fka8176fc76946eeab@mail.gmail.com> Date: Mon, 6 Oct 2008 00:24:48 +0100 From: "James Abley" To: users@jackrabbit.apache.org Subject: Liveness failures in DefaultISMLocking MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline X-Virus-Checked: Checked by ClamAV on apache.org Hi, I've seen some liveness failures in DefaultISMLocking, where our webapp is unresponsive and thread dumps (which will follow tomorrow / later today depending on your timezone). The list of suspect causes for this problem currently stands at this: 1. JRockit JVM does not honour finally blocks. 2. Bug in concurrent-utils. 3. Bug in Jackrabbit code. 4. Bug in our code calling Jackrabbit. 5. Door number 3. 1. is obviously a frightening thought and cannot be the problem - just listing the obvious. 2. is highly unlikely. It's a very widely used library written and reviewed by some very smart people. 3. is possible, but fairly unlikely. A problem would presumably have been reported by someone else and a reasonable number of people are using Jackrabbit without ever seeing this problem. 4. Less people are using our code than the Jackrabbit code, so this is most likely where the problem lies. Further analysis of the thread dumps is required to see what's going on. 5. Or something I've not though of yet. I've not yet done sufficient analysis to determine whether it is a deadlock, missed notification or some other reason for the application becoming unresponsive. From my reading of the Jackrabbit code, it looks fine in terms of locks being acquired and then released in a finally block. One question I do have though, is that the lock acquisition code all use the blocking form of trying to acquire the lock; i.e. in DefaultISMLocking: rwLock.writeLock().acquire(); and rwLock.readLock().acquire(); These methods can potentially wait for ever (and that is what they look like doing, since the thread dumps we have seem to indicate that no thread is making progress over a 5 minute timeframe). Is there any particular reason why the timeout version isn't used? i.e. rwLock.writeLock().attempt(10000); and rwLock.readLock().attemp(10000); Again, from my static analysis of the code, this should allow an exception to safely propagate and my application would fail / display an error message to the customer, but would not require the servlet container to be restarted. To my mind, that would be a safer implementation? I plan on trying to write a test to recreate the problem (which to date I think we've only seen on JRockit JVMs, hence my listing of that as a possible issue), and then putting in an implementation of ISMLocking using the Java 5 java.util.concurrent primitives with the timeout versions of the methods being used. But I was just curious as to what the list might think about this issue? Cheers, James