lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject help on Lock.obtain(lockWaitTimeout)
Date Thu, 21 Sep 2006 19:47:25 GMT
I'm working on a LockFactory that uses java.nio.* (OS native locks)
for its locks.

This should be a big help for people who keep finding their lock files
left on disk due to abnormal shutdown, etc (because OS will free the
locks, nomatter what, "in theory").

I thought I was nearly done but .... in testing the new LockFactory on
an NFS server that didn't have locks properly configured (I think
possibly a common situtation) I found a problem with how the
Lock.obtain(lockWaitTimeout) works.

That function precomputes how many times to try to obtain the lock
(just divides lockWaitTimeout parameter and LOCK_POLL_INTERVAL) and
then tries Lock.obtain() followed by a sleep of LOCK_POLL_INTERVAL,
that many times, before timing out.

The problem is, in the above test case: the call to Lock.obtain() can
apparently take a looooong time (35 seconds, I assume some kind of
underlying timeout contacting "lockd" from the NFS client) only to
finally return "false".  But the "try N times" approach makes the
assumption that this call will take zero time.  (In fact, as things
stand now, when Lock.obtain() takes non-zero time, it causes the
timeout to be longer than what was asked for; but likely this is
typically a small amount?).

Anyway, my first reaction was to change this to use
System.currentTimeMillis() to measure elapsed time, but then I
remembered is a dangerous approach because whenever the clock on the
machine is updated (eg by a time-sync NTP client) it would mess up
this function, causing it to either take longer than was asked for (if
clock is moved backwards) or, to timeout in [much] less time than was
asked for (if clock was moved forwards).  I've hit such issues in the
past and it's devilish.  Timezone and daylight savings time don't
matter because it's measuring GMT.

So then what to do?  What's the best way to change the function to
"really" measure time?  In Java 1.5 there is now a "nanoTime()" which
is closer to what I need, but it's 1.5 (and we're still on 1.4), and
apparently it can "fallback" to currentTimeMillis() on some platforms.
In the past I've used separate a separate "clock" thread that just
sleeps & increments a counter, but I don't really like the idea of
spawning a whole new thread (Lucene doesn't launch its own threads
now, except for ParallelMultiSearcher).

Does anyone know of a good solution?

Alternatively, since this is really a "misconfiguration" (ie the
Lock.obtain() is never going to succeed), maybe we could try to obtain
a random "test" lock on creation of the LockFactory, just to confirm
that locking even "works" at all in the current environment, and then
leave the current implementation of Lock.obtain() unchanged (when NFS
locking is properly configured it seems to be fairly fast)?

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message