Return-Path: Delivered-To: apmail-jackrabbit-dev-archive@www.apache.org Received: (qmail 4781 invoked from network); 19 Jun 2007 16:19:24 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 19 Jun 2007 16:19:24 -0000 Received: (qmail 80423 invoked by uid 500); 19 Jun 2007 16:19:27 -0000 Delivered-To: apmail-jackrabbit-dev-archive@jackrabbit.apache.org Received: (qmail 80041 invoked by uid 500); 19 Jun 2007 16:19:26 -0000 Mailing-List: contact dev-help@jackrabbit.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@jackrabbit.apache.org Delivered-To: mailing list dev@jackrabbit.apache.org Received: (qmail 80028 invoked by uid 99); 19 Jun 2007 16:19:26 -0000 Received: from herse.apache.org (HELO herse.apache.org) (140.211.11.133) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 19 Jun 2007 09:19:26 -0700 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_HELO_PASS,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (herse.apache.org: domain of gcaj-jackrabbit-dev@m.gmane.org designates 80.91.229.2 as permitted sender) Received: from [80.91.229.2] (HELO ciao.gmane.org) (80.91.229.2) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 19 Jun 2007 09:19:21 -0700 Received: from list by ciao.gmane.org with local (Exim 4.43) id 1I0gPm-0002fG-Uu for dev@jackrabbit.apache.org; Tue, 19 Jun 2007 18:18:46 +0200 Received: from gateway.subshell.com ([212.79.22.193]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Tue, 19 Jun 2007 18:18:46 +0200 Received: from christoph by gateway.subshell.com with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Tue, 19 Jun 2007 18:18:46 +0200 X-Injected-Via-Gmane: http://gmane.org/ To: dev@jackrabbit.apache.org From: Christoph Kiehl Subject: Re: Optimize search performance Date: Tue, 19 Jun 2007 18:18:50 +0200 Lines: 42 Message-ID: References: <466D04F8.8030304@day.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Complaints-To: usenet@sea.gmane.org X-Gmane-NNTP-Posting-Host: gateway.subshell.com User-Agent: Thunderbird 2.0.0.4 (Windows/20070604) In-Reply-To: <466D04F8.8030304@day.com> Sender: news X-Virus-Checked: Checked by ClamAV on apache.org Marcel Reutegger wrote: >> 2. Lucene uses the FieldCaches to speed up sorting and range queries >> which is exactly what we are after. Those FieldCaches are per >> IndexReader. >> Jackrabbit uses an IndexSearcher which searches on a single >> IndexReader which is most likely to be an instance of >> CachingMultiReader. So on every search which builds up a FieldCache >> this FieldCache instance is associated with this instance of a >> CachingMultiReader. On successive queries which operate on this >> CachingMultiReader you will get a tremendous speedup for queries which >> can reuse those associated FieldCache instances. >> The problem is that Jackrabbit creates a new CachingMultiReader >> _everytime_ one of the underlying indexes are modified. This means if >> you just change _one_ item in the repository you will need to rebuild >> all those FieldCaches because the existing FieldCaches are associated >> with the old instance of CachingMultiReader. >> This does not only lead to slow search response times for queries >> which contains range queries or are sorted by a field but also leads >> to massive memory consumption (depending on the size of your indexes) >> because there might be multiple instances of CachingMultiReaders in >> use if you have a scenario where a lot of queries and item >> modifications are executed concurrently. >> As far as I understand the solution is to use a MultiSearcher which >> uses multiple IndexReaders. Since due to the merging strategy most of >> the indexes are stable this means the FieldCaches can be used for a >> much longer time. > > Using a multi searcher means that you must be able to execute a query on > each of the index segments independently. this is not possible because > hierarchy information is always spread across multiple segments. e.g. a > node in one segment may reference a parent in another segment. I just created an issue [1] to which I attached an initial patch which works quite well for us. It doesn't use MultiSearcher but extends SharedFieldSortComparator to be aware of the underlying index segments. Could you please review the patch? Cheers, Christoph [1] http://issues.apache.org/jira/browse/JCR-974