lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Pete Lewis" <p...@uptima.co.uk>
Subject Re: Lock handling and Lucene 1.9 / 2.0
Date Mon, 13 Sep 2004 09:24:07 GMT
Hi Christoph

We are running a cluster of 4 multi-processor Sun servers with Bea Weblogic.  We are using
Lucene for the search component and have multiple indexes on a SAN, where all indexes are
accessible from all of the servers in the cluster.

During performance testing we found that  Lucene seemed to be taking a lot of resources. When
the system was "stressed" we did a number of thread dumps; the system appears to have most
threads that are doing work tied up within Lucene. I've included a couple of examples from
the dumps for you to look at.


"ExecuteThread: '20' for queue: 'weblogic.kernel.Default'" daemon prio=5 tid=0xc
bee60 nid=0x22 waiting for monitor entry [8c2fd000..8c2ffc24]
        at org.apache.lucene.store.FSDirectory.getDirectory(FSDirectory.java:127
)
        - waiting to lock <be2641c8> (a org.apache.lucene.store.FSDirectory)
        at org.apache.lucene.store.FSDirectory.getDirectory(FSDirectory.java:101
)
        at org.apache.lucene.index.IndexReader.open(IndexReader.java:91)
        at org.apache.lucene.search.IndexSearcher.<init>(IndexSearcher.java:75)
        at com.uptima.usi.searchAPI.Search.performSearch(Unknown Source)



"ExecuteThread: '15' for queue: 'weblogic.kernel.Default'" daemon prio=5 tid=0x7
d1b58 nid=0x1d waiting on condition [8c7fd000..8c7ffc24]
        at java.lang.Thread.sleep(Native Method)
        at org.apache.lucene.store.Lock$With.run(Lock.java:109)
        at org.apache.lucene.index.IndexReader.open(IndexReader.java:103)
        - locked <c895c5a8> (a org.apache.lucene.store.FSDirectory)
        at org.apache.lucene.index.IndexReader.open(IndexReader.java:91)
        at org.apache.lucene.search.IndexSearcher.<init>(IndexSearcher.java:75)
        at com.uptima.usi.searchAPI.Search.performSearch(Unknown Source)

These are just examples; each thread dump typically has 5-10 threads tied up in this way.
Obviously code which is doing a Thread.sleep on the server side is a bit worrying!

Therefore we dug in a bit more......

Long answer - theres a heap of horrible, horrible code in the FSDirectory that tries to be
clever and I think its not quite working correctly. 

Two types of lock - write.lock and commit.lock. The write.lock is used exclusively for synchronising
the indexing of documents and has *no* impact on searching whatsoever.

Commit.lock is another little story. Commit.lock is used for two things - stopping indexing
processes from overwriting segments that another one is currently using, and stopping IndexReaders
from overwriting each other when they delete entries (dcon't even start asking my why a bloody
IndexReader can delete documents).

*However*, theres another naughty little usage that isn't listed in any of the documentation,
and here it is....

Doug Cutting wrote FSDirectory in such a way that it caches a directory. Hence, if FSDirectory
is called more than once with the same directory, the FSDirectory class uses a static Hashtable
to return the current values. However, if FSDirectory is called with a *different* directory,
it engages a commit.lock while it updates the values. It *also* makes that Hashtable (sychronised).


Creating an IndexSearcher creates (within itself) an IndexReader to read the index. The first
thing the IndexReader does is grab an FSDirectory for the index directory - if you are using
LUCENE with a single index, theres is never a problem - it is read once, then cached.

Our search process works by searching across all the libraries selected sequentially, building
a results list and then culling the results it doesn't need. To search it loops through each
library and creates an IndexSearcher to get at the data.

Starting to see the issue yet? Because each library is in a different directory, the internal
call to the IndexReader which then gets an FSDirectory causes the FSDirectory to update its
singular cache. Which forces a commit.lock to appear.

Doug Cuttings little bit of 'neat' code for caching singularily the data within an FSDirectory
is causing us headaches immense. The code is horrible:

/** Returns an IndexReader reading the index in the given Directory. */
  public static IndexReader open(final Directory directory) throws IOException{
    synchronized (directory) {     // in- & inter-process sync
      return (IndexReader)new Lock.With(
          directory.makeLock(IndexWriter.COMMIT_LOCK_NAME),
          IndexWriter.COMMIT_LOCK_TIMEOUT) {
          public Object doBody() throws IOException {
            SegmentInfos infos = new SegmentInfos();
            infos.read(directory);
            if (infos.size() == 1) {    // index is optimized
              return new SegmentReader(infos, infos.info(0), true);
            } else {
                SegmentReader[] readers = new SegmentReader[infos.size()];
                for (int i = 0; i < infos.size(); i++)
                  readers[i] = new SegmentReader(infos, infos.info(i), i==infos.size()-1);
                return new SegmentsReader(infos, directory, readers);
            }
          }
        }.run();
    }
  }

Where directory is passed in from the constructor to IndexReader thus:

  return open( FSDirectory.getDirectory( path, false ) );

I don't know what the reasoning was with the use of the IndexWriter Timeouts when creating
the FSDirectory stuff, *AND* the fact it synchronises it all around the thing as well - but
it hurts when you have multiple indexes.

All of this would go away if the FSDirectory didn't maintain a cache. But it does.

Disabling the locks is both a good and a fundamentally bad idea. Good - it would wipe this
problem. Bad - it would suppress ALL locks on the system. I *think* we could get around this
by using another System property such as 'disableLocksSoTheFSDirectoryCacheWorks'. Or something
cleaner. 

For our system I was thinking of having a system property that allows us to turn on/off the
commit.lock around FSDirectory cache creation - but would obviously like it included in the
core Lucene and hence thought that it was a worthwhile candidate for Lucene 2.

Sorry to get verbose......

Cheers

Pete Lewis

----- Original Message ----- 
From: "Christoph Goller" <goller@detego-software.de>
To: "Lucene Developers List" <lucene-dev@jakarta.apache.org>
Sent: Monday, September 13, 2004 9:26 AM
Subject: Re: Lock handling and Lucene 1.9 / 2.0


> Pete Lewis wrote:
> > Hi all
> > 
> > IndexReader has to obtain a transitory exclusive readlock on a library. This is
fine, and results in the short lived commit.lock file. However, if multiple instantiations
of LUCENE IndexReaders are used over a *single* shared library source (multiple libraries,
single root) a spin can occur where multiple IndexReaders sit in 1 second waits. This can
be addressed by removing the need for an exclusive readlock in the IndexReader - is this to
be addressed for 1.4/1.9?
> 
> Hi Pete,
> 
> I do not understand the problem you are describing.
> What do you mean by a spin?
> 
> The only problem I currently see is that if you open multiple
> readers at the same time and if opening takes a long time you
> could get a timeout IOException for some of the readers.
> 
> Note that the short living commit lock is further used to
> commit changes to an index with either an IndexReader or
> an IndexWriter. Therefore I think it has to be exclusive.
> 
> Christoph
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
> 
> 
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message