lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christoph Goller <gol...@detego-software.de>
Subject Re: Lock handling and Lucene 1.9 / 2.0
Date Tue, 14 Sep 2004 07:57:03 GMT
Pete Lewis wrote:
> Hi Christoph
> 
> The directory caching is applied *across* class instances (the directory
> is instanced once) - this cache exists singularily and is updated if the
> FSDirectory is called against a different index.

Yes. That's how we guarantee that every process has at most one FSDirectory
instance for every index.

> 
> Multiple indexes will *always* cause directory caching upon calls to
> FSDirectory - our searches are made sequentially against all libraries
> (or a selection of libraries) and this sequential call to FSDirectory
> causes the cache to be updated - its very, very rare that the cache will
> remain the same between two calls to get FSDirectories. This caching
> *is* synchronized using the commit.lock (see the code) and two processes
>   will attain two different caches (completely
> separate) *but* are tied together by the commit lock. This is what
> causes the spin.

As I said, synchronization on directory instances is the in-process
synchronization mechanism. The commit.lock mechanism is the inter-process
synchronization mechanism. If you use one process (independent JVM) for
more than one search, you will get hits in the directories cache.

The in-process mechanism is the first synchronization made when
opening an IndexReader. Obviously you need a directory instance to
synchronize on, and this instance has to be unique for your process.
In order to get the directory instance we synchronize on the static
directory-cache and this may be a bottleneck since all opening
threads independent of the index they are trying to open have to
synchronize on the static cache. However, accessing the hashtable should
be fast, shouln't it? In order to get the directory instance you do not
need a commit.lock. That's what I meant by:

>>FSDirectory.getDirectory has nothing to do with a commit.lock!
> 
> 
> Err, wrong. The directory.makeLock(IndexWriter.COMMIT_LOCK_NAME) call
> from within the IndexReader.open routine ties the commit.lock to the
> FSDirectory by synchronising the code around a *static* instance of the
> directory object (see the code!!).

After synchronizing all threads of one process that try to open the same
index (on the directory), the inter-process mechanism with
the commit.lock is applied in order to synchronize with other processes
that might try to open the same index.

So there are 3 places where synchronization is done. Could you please again
tell me what in your opinion is the most critical and do you have any ideas
how we could improve synchroniztion?

One of your ideas was to turn off the commit.lock mechanism. However, I think
we cannot give up inter-process synchronization....

Furthermore, what are the implications of these synchronization problems for
your application. Do they just make application start-up slow, or do they
slow down every search? This is of course about reusing processes and searcher
instances for more than one query/search. Everything else is simply using
Lucene in the wrong way.

regards,
Christoph


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Mime
View raw message