lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shai Erera <ser...@gmail.com>
Subject Re: LuceneTestCase.threadCleanup incorrectly reports left running threads
Date Sun, 26 Dec 2010 11:55:04 GMT
Opening an issue sounds great. It kinda hijacked that thread already :)

We can argue there whether FSLockFactory should or shouldn't allow locking
outside the index directory. Perhaps we can put on the application the
burden of generating a unique lock name, if it intends to use it outside the
index directory.

Shai

On Sun, Dec 26, 2010 at 12:17 PM, Uwe Schindler <uwe@thetaphi.de> wrote:

> Hi Shai,
>
>
>
> Thanks for the insight. I still disagree about the lock file placement. If
> your application needs such functionality you can always program your own
> lock factory. But even if we add support for external dirs, the creation of
> the lock file name and prefix should be done inside the lock factory
> (currently, the lock id – the MD5 – is created in the directory instance).
> This has pros for e.g. RAMDirectory, but is just unclean at the moment.
> Especially the special handling of the lock id when the directory path for
> the lock file differs from the index directory.
>
>
>
> About MD5: At least for Lucene 3.x we must stay with MD5, as changing the
> hash algorithm would change the lock file name, so an index locked with 3.0
> would have a different lock file than an index locked with 3.x (for external
> lock directory). This would lead to problems for users upgrading several
> servers writing to shared indexes during the upgrades. You cannot be sure
> that the same index is always locked when you have 2 different lucene
> versions. This is just a warning, maybe it’s not important, but its
> important.
>
>
>
> I would propose the following (for 3.x): Remove the static MessageDigest
> from FSDirectory and replace it by a lazy-init one. Because for the common
> case, the message digest is never used and should not be initialized. On
> first use, create one (which only happens if you specify a separate lock
> directory). Normal Lucene users and Solr will never hit this. We can stay
> with MD5. About availability: As nitpicked yesterday with Robert, we can
> still assume MD5 is always available for any JVM, because without MD5, tons
> of software applications will break (just think of sessions in web app
> servers). Harmony has it of course, Android. So we should simply use it (and
> please don’t copy an MD5 impl to util!) and throw exception when not
> available. For this seldom case (normal user have their locks in index dir),
> this is fine. Everything else would break safe upgrade pathes.
>
>
>
> I think we should open an issue and discuss about the LockFactory
> simplyfications (the code is currently very broken as it relies on strange
> sophisticated backwards assumptions, just my opinion).
>
>
>
> Uwe
>
> -----
>
> Uwe Schindler
>
> H.-H.-Meier-Allee 63, D-28213 Bremen
>
> http://www.thetaphi.de
>
> eMail: uwe@thetaphi.de
>
>
>
> *From:* Shai Erera [mailto:serera@gmail.com]
> *Sent:* Sunday, December 26, 2010 7:43 AM
>
> *To:* dev@lucene.apache.org
> *Subject:* Re: LuceneTestCase.threadCleanup incorrectly reports left
> running threads
>
>
>
> Thanks for the explanation Uwe. I got confused with that and the process
> name we add to the temp lock file.
>
> About what you say, we have a scenario where we use a FSLockFactory outside
> an IndexWriter scope, but still within a search application scope. We lock a
> directory so that several processes can be synced on it. Another use case is
> when you manage several indexes, that should be locked at once -- the lock
> file may reside external to all of them.
>
> So I would not want to see LockFactory suddenly assuming and enforcing the
> lock file to be created in the index directory, or rely on IndexWriter
> passing it something etc. I think the way LF works today, allowing you to
> place the lock file wherever you want gives more freedom to people and opens
> up the door for more use cases.
>
> I don't think though that we should rely on MD5. A simple hash function
> should be enough IMO.
>
> Shai
>
> On Sun, Dec 26, 2010 at 12:41 AM, Uwe Schindler <uwe@thetaphi.de> wrote:
>
> Hi Shai,
>
> the md5 hash generated has nothing to do with concurrency anymore (the
> concurrency thing was this NativeFSLock test method already removed). The
> thing is the following:
>
> In early lucene versions, the lock files were put into TEMP directory.
> Later
> the lock factories allowed, to put the lock files into arbitrary folders.
> For these both cases, the lock file name got an MD5 hash of the index
> directory appended/prepended. In later Lucene versions the default for lock
> files was changed to be the index folder. For backwards compatibility
> reasons, with 2.9 and 3.0 you still had the possibility to instantiate a
> LockFactory using a non-null path (using the ctor with a directory name).
> FSLockFactory was programmed to support both cases (null directory or
> explicit directory). When the lock directory is the same like the index
> directory, the lock file got no hash appended. For the rare case that
> somebody used a different folder (e.g. a temp directory), FSLockFactory was
> falling back to the "old" behavior of adding the hash to the lock file
> name.
>
> The magic for the md5 magic lock prefix is done if
> FSDirectory#setLockFactory(). It checks for lockFactory extends
> FSLockFactory and if yes then checks, that the LockFactories path name is
> the same like the FSDir's or null. In that case it sets the lock prefix to
> null. Otherwise the lock prefix is generated by calling the magic MD5
> creating method (Directory#getLockId()).
>
> In my opinion, in 3.x we should deprecate the separate path for the lock
> file (Directory#getLockId()) and enforce the lockfile always to be placed
> in
> the index dir. LockFactory should not get a directory at all, but instead
> should get the index dir on locking. For FS locks it would place the
> write.lock file in the supplied folder and for other locks (like per-JVM
> locks for RAMDirs) it could e.g. lookup the index dir in some map or
> whatever. To place the lockfile somewhere else, you should be able to use
> FileSwitchDirectory (currently not possible).
>
> Most tests in Lucene use the default (null lock dir in LockFactory), but
> some tests for SimpleFSLockFactory & Co use the explicit directory names
> and
> therefore generate MD5 hashes to test the special behavior.
>
> For compatibility reasons we have to still use MD5 (to prevent different
> lock file names after Lucene upgrade when FSDir is locked by another JVM
> with older Lucene version). For 4.0 I would remove this stupidity and only
> allow lock files in index directory.
>
> I hope I explained this stuff so everybody understand it, its really a
> little bit confusing (how its implemented), but its "sophisticated
> backwards" (haha). I would like to get rid of it and then we have no digest
> code anymore.
>
> Uwe
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>
> > -----Original Message-----
> > From: Shai Erera [mailto:serera@gmail.com]
> > Sent: Saturday, December 25, 2010 3:04 PM
> > To: dev@lucene.apache.org
> > Subject: Re: LuceneTestCase.threadCleanup incorrectly reports left
> running
> > threads
> >
> > Actually, the MD5 thingy is an attempt to generate a unique temp lock ID,
> > IIRC. so this piece of code can disappear entirely now that the tests
> > concurrency is better.
> >
> > As for the other threads that are left running, I couldn't track down yet
> the
> > warning from the benchmark tests, but I'd love to get rid of those false
> > warnings. I thought the stack trace could at least tell us who spawned
> the
> > thread, but obviously it's not always clear.
> >
> > Shai
> >
> > On Saturday, December 25, 2010, Robert Muir <rcmuir@gmail.com> wrote:
> > > On Sat, Dec 25, 2010 at 4:04 AM, Uwe Schindler <uwe@thetaphi.de>
> > wrote:
> > >> Md5 is guaranteed to be there (like utf8 as charset). This is
> documented in
> > crypto Api, which algorithms are available for digest.
> > >>
> > >
> > > where is this documented? its not in the javadocs.
> > >
> > > anyway, we shouldn't be doing this:
> > > * this algorithm might not exist on J2ME etc (still java), you need to
> > > install an extra crypto add-on.
> > > * we shouldnt start up an expensive PKI infrastructure on mac os X,
> > > including spawning a new thread, just to hash a string. thats absurd.
> > > * we pay all these costs ... for md5! its not even a good hash!
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For
> > > additional commands, e-mail: dev-help@lucene.apache.org
> > >
> > >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional
> > commands, e-mail: dev-help@lucene.apache.org
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>
>

Mime
View raw message