Return-Path: Delivered-To: apmail-jakarta-lucene-dev-archive@www.apache.org Received: (qmail 73825 invoked from network); 13 Sep 2004 10:37:40 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur-2.apache.org with SMTP; 13 Sep 2004 10:37:40 -0000 Received: (qmail 94154 invoked by uid 500); 13 Sep 2004 10:37:35 -0000 Delivered-To: apmail-jakarta-lucene-dev-archive@jakarta.apache.org Received: (qmail 94049 invoked by uid 500); 13 Sep 2004 10:37:32 -0000 Mailing-List: contact lucene-dev-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Developers List" Reply-To: "Lucene Developers List" Delivered-To: mailing list lucene-dev@jakarta.apache.org Received: (qmail 93991 invoked by uid 99); 13 Sep 2004 10:37:32 -0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests=FORGED_RCVD_HELO,SPF_HELO_PASS X-Spam-Check-By: apache.org Received-SPF: pass (hermes.apache.org: local policy) Received: from [212.227.126.184] (HELO moutng.kundenserver.de) (212.227.126.184) by apache.org (qpsmtpd/0.28) with ESMTP; Mon, 13 Sep 2004 03:37:30 -0700 Received: from [212.227.126.162] (helo=mrelayng.kundenserver.de) by moutng.kundenserver.de with esmtp (Exim 3.35 #1) id 1C6oD9-00058D-00 for lucene-dev@jakarta.apache.org; Mon, 13 Sep 2004 12:37:27 +0200 Received: from [82.135.8.150] (helo=detego-software.de) by mrelayng.kundenserver.de with asmtp (TLSv1:RC4-MD5:128) (Exim 3.35 #1) id 1C6oD9-0004VX-00 for lucene-dev@jakarta.apache.org; Mon, 13 Sep 2004 12:37:27 +0200 Message-ID: <4145779B.3040608@detego-software.de> Date: Mon, 13 Sep 2004 12:34:03 +0200 From: Christoph Goller User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.5) Gecko/20031009 X-Accept-Language: de, en-us, en, de-at MIME-Version: 1.0 To: Lucene Developers List Subject: Re: Lock handling and Lucene 1.9 / 2.0 References: <033801c49662$d14073c0$34c2e550@joseph> <414559A8.6060202@detego-software.de> <0c0901c49973$6bc4b9c0$34c2e550@joseph> In-Reply-To: <0c0901c49973$6bc4b9c0$34c2e550@joseph> X-Enigmail-Version: 0.76.7.0 X-Enigmail-Supports: pgp-inline, pgp-mime Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-Provags-ID: kundenserver.de abuse@kundenserver.de auth:12f525e90d51bb735119ab4626f6800d X-Virus-Checked: Checked X-Spam-Rating: minotaur-2.apache.org 1.6.2 0/1000/N Pete Lewis wrote: > Hi Christoph > Long answer - theres a heap of horrible, horrible code in the FSDirectory that tries to be clever and I think its not quite working correctly. > > Two types of lock - write.lock and commit.lock. The write.lock is used exclusively for synchronising the indexing of documents and has *no* impact on searching whatsoever. > > Commit.lock is another little story. Commit.lock is used for two things - stopping indexing processes from overwriting segments that another one is currently using, and stopping IndexReaders from overwriting each other when they delete entries (dcon't even start asking my why a bloody IndexReader can delete documents). Commit.lock is used to synchronize comittment of changes to an index with the process of opening an IndexReader. These changes my come from an IndexWriter or an IndexReader. There are good reasons for having the delete functionality in IndexReader (see developer mailing list around July 16). Write.lock is used to gurantee that there always is only one writer. > > *However*, theres another naughty little usage that isn't listed in any of the documentation, and here it is.... > > Doug Cutting wrote FSDirectory in such a way that it caches a directory. Hence, if FSDirectory is called more than once with the same directory, the FSDirectory class uses a static Hashtable to return the current values. However, if FSDirectory is called with a *different* directory, it engages a commit.lock while it updates the values. It *also* makes that Hashtable (sychronised). FSDirectory.getDirectory has nothing to do with a commit.lock! Lucene currently uses 2 locking mechanisms, the interprocess mechanism with the commit.lock file and an intraprocess mechanism based on synchronization on directory instances. The 2nd mechanism needs unique directory instances and this is achieved by caching directory instances in FSDirectory. > > Creating an IndexSearcher creates (within itself) an IndexReader to read the index. The first thing the IndexReader does is grab an FSDirectory for the index directory - if you are using LUCENE with a single index, theres is never a problem - it is read once, then cached. > > Our search process works by searching across all the libraries selected sequentially, building a results list and then culling the results it doesn't need. To search it loops through each library and creates an IndexSearcher to get at the data. > > Starting to see the issue yet? Because each library is in a different directory, the internal call to the IndexReader which then gets an FSDirectory causes the FSDirectory to update its singular cache. Which forces a commit.lock to appear. > > Doug Cuttings little bit of 'neat' code for caching singularily the data within an FSDirectory is causing us headaches immense. The code is horrible: > > /** Returns an IndexReader reading the index in the given Directory. */ > public static IndexReader open(final Directory directory) throws IOException{ > synchronized (directory) { // in- & inter-process sync > return (IndexReader)new Lock.With( > directory.makeLock(IndexWriter.COMMIT_LOCK_NAME), > IndexWriter.COMMIT_LOCK_TIMEOUT) { > public Object doBody() throws IOException { > SegmentInfos infos = new SegmentInfos(); > infos.read(directory); > if (infos.size() == 1) { // index is optimized > return new SegmentReader(infos, infos.info(0), true); > } else { > SegmentReader[] readers = new SegmentReader[infos.size()]; > for (int i = 0; i < infos.size(); i++) > readers[i] = new SegmentReader(infos, infos.info(i), i==infos.size()-1); > return new SegmentsReader(infos, directory, readers); > } > } > }.run(); > } > } > > Where directory is passed in from the constructor to IndexReader thus: > > return open( FSDirectory.getDirectory( path, false ) ); All threads that open an IndexReader and that don't get a directory instance directly have to compete for FSDirectory.getDirectory synchronization independent of the index you are trying to open. So you are right. This is a bottleneck. After that, threads opening an IndexReader only compete with each other if they try to read the same index. This is handled by the two above mentioned locking mechanisms. Here are two ideas that could help: The bottleneck only occurs if you always start a new process for every search, doesn't it? If you make a second search within the same process, the directory instances will already be cached and the bottleneck won't be a problem? Furthermore, you do not have to always open new searchers for every search. Can't you use your Searcher instances for multiple searches. A question for Lucene 1.9/2.0 is, whether we really need intraprocess and interprocess synchonization. Maybe these two mechanisms exist for purely historical reasons and the interprocess mechanism alone would be enough? Christoph --------------------------------------------------------------------- To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org For additional commands, e-mail: lucene-dev-help@jakarta.apache.org