lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler" <>
Subject RE: LuceneTestCase.threadCleanup incorrectly reports left running threads
Date Sat, 25 Dec 2010 22:41:16 GMT
Hi Shai,

the md5 hash generated has nothing to do with concurrency anymore (the
concurrency thing was this NativeFSLock test method already removed). The
thing is the following:

In early lucene versions, the lock files were put into TEMP directory. Later
the lock factories allowed, to put the lock files into arbitrary folders.
For these both cases, the lock file name got an MD5 hash of the index
directory appended/prepended. In later Lucene versions the default for lock
files was changed to be the index folder. For backwards compatibility
reasons, with 2.9 and 3.0 you still had the possibility to instantiate a
LockFactory using a non-null path (using the ctor with a directory name).
FSLockFactory was programmed to support both cases (null directory or
explicit directory). When the lock directory is the same like the index
directory, the lock file got no hash appended. For the rare case that
somebody used a different folder (e.g. a temp directory), FSLockFactory was
falling back to the "old" behavior of adding the hash to the lock file name.

The magic for the md5 magic lock prefix is done if
FSDirectory#setLockFactory(). It checks for lockFactory extends
FSLockFactory and if yes then checks, that the LockFactories path name is
the same like the FSDir's or null. In that case it sets the lock prefix to
null. Otherwise the lock prefix is generated by calling the magic MD5
creating method (Directory#getLockId()).

In my opinion, in 3.x we should deprecate the separate path for the lock
file (Directory#getLockId()) and enforce the lockfile always to be placed in
the index dir. LockFactory should not get a directory at all, but instead
should get the index dir on locking. For FS locks it would place the
write.lock file in the supplied folder and for other locks (like per-JVM
locks for RAMDirs) it could e.g. lookup the index dir in some map or
whatever. To place the lockfile somewhere else, you should be able to use
FileSwitchDirectory (currently not possible).

Most tests in Lucene use the default (null lock dir in LockFactory), but
some tests for SimpleFSLockFactory & Co use the explicit directory names and
therefore generate MD5 hashes to test the special behavior.

For compatibility reasons we have to still use MD5 (to prevent different
lock file names after Lucene upgrade when FSDir is locked by another JVM
with older Lucene version). For 4.0 I would remove this stupidity and only
allow lock files in index directory.

I hope I explained this stuff so everybody understand it, its really a
little bit confusing (how its implemented), but its "sophisticated
backwards" (haha). I would like to get rid of it and then we have no digest
code anymore.


Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen

> -----Original Message-----
> From: Shai Erera []
> Sent: Saturday, December 25, 2010 3:04 PM
> To:
> Subject: Re: LuceneTestCase.threadCleanup incorrectly reports left running
> threads
> Actually, the MD5 thingy is an attempt to generate a unique temp lock ID,
> IIRC. so this piece of code can disappear entirely now that the tests
> concurrency is better.
> As for the other threads that are left running, I couldn't track down yet
> warning from the benchmark tests, but I'd love to get rid of those false
> warnings. I thought the stack trace could at least tell us who spawned the
> thread, but obviously it's not always clear.
> Shai
> On Saturday, December 25, 2010, Robert Muir <> wrote:
> > On Sat, Dec 25, 2010 at 4:04 AM, Uwe Schindler <>
> wrote:
> >> Md5 is guaranteed to be there (like utf8 as charset). This is
documented in
> crypto Api, which algorithms are available for digest.
> >>
> >
> > where is this documented? its not in the javadocs.
> >
> > anyway, we shouldn't be doing this:
> > * this algorithm might not exist on J2ME etc (still java), you need to
> > install an extra crypto add-on.
> > * we shouldnt start up an expensive PKI infrastructure on mac os X,
> > including spawning a new thread, just to hash a string. thats absurd.
> > * we pay all these costs ... for md5! its not even a good hash!
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: For
> > additional commands, e-mail:
> >
> >
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: For additional
> commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message