jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "James Abley" <james.ab...@gmail.com>
Subject Liveness failures in DefaultISMLocking
Date Sun, 05 Oct 2008 23:24:48 GMT
Hi,

I've seen some liveness failures in DefaultISMLocking, where our
webapp is unresponsive and thread dumps (which will follow tomorrow /
later today depending on your timezone). The list of suspect causes
for this problem currently stands at this:

1. JRockit JVM does not honour finally blocks.
2. Bug in concurrent-utils.
3. Bug in Jackrabbit code.
4. Bug in our code calling Jackrabbit.
5. Door number 3.

1. is obviously a frightening thought and cannot be the problem - just
listing the obvious.
2. is highly unlikely. It's a very widely used library written and
reviewed by some very smart people.
3. is possible, but fairly unlikely. A problem would presumably have
been reported by someone else and a reasonable number of people are
using Jackrabbit without ever seeing this problem.
4. Less people are using our code than the Jackrabbit code, so this is
most likely where the problem lies. Further analysis of the thread
dumps is required to see what's going on.
5. Or something I've not though of yet.

I've not yet done sufficient analysis to determine whether it is a
deadlock, missed notification or some other reason for the application
becoming unresponsive. From my reading of the Jackrabbit code, it
looks fine in terms of locks being acquired and then released in a
finally block. One question I do have though, is that the lock
acquisition code all use the blocking form of trying to acquire the
lock; i.e. in DefaultISMLocking:

rwLock.writeLock().acquire();

and

rwLock.readLock().acquire();

These methods can potentially wait for ever (and that is what they
look like doing, since the thread dumps we have seem to indicate that
no thread is making progress over a 5 minute timeframe). Is there any
particular reason why the timeout version isn't used?  i.e.

rwLock.writeLock().attempt(10000);

and

rwLock.readLock().attemp(10000);

Again, from my static analysis of the code, this should allow an
exception to safely propagate and my application would fail / display
an error message to the customer, but would not require the servlet
container to be restarted. To my mind, that would be a safer
implementation?

I plan on trying to write a test to recreate the problem (which to
date I think we've only seen on JRockit JVMs, hence my listing of that
as a possible issue), and then putting in an implementation of
ISMLocking using the Java 5 java.util.concurrent primitives with the
timeout versions of the methods being used. But I was just curious as
to what the list might think about this issue?

Cheers,

James

Mime
View raw message