Return-Path: Delivered-To: apmail-jackrabbit-dev-archive@www.apache.org Received: (qmail 92756 invoked from network); 21 Aug 2007 12:37:18 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 21 Aug 2007 12:37:18 -0000 Received: (qmail 84102 invoked by uid 500); 21 Aug 2007 12:37:14 -0000 Delivered-To: apmail-jackrabbit-dev-archive@jackrabbit.apache.org Received: (qmail 84076 invoked by uid 500); 21 Aug 2007 12:37:14 -0000 Mailing-List: contact dev-help@jackrabbit.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@jackrabbit.apache.org Delivered-To: mailing list dev@jackrabbit.apache.org Received: (qmail 84067 invoked by uid 99); 21 Aug 2007 12:37:14 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 21 Aug 2007 05:37:14 -0700 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of marcel.reutegger@gmx.net designates 213.165.64.20 as permitted sender) Received: from [213.165.64.20] (HELO mail.gmx.net) (213.165.64.20) by apache.org (qpsmtpd/0.29) with SMTP; Tue, 21 Aug 2007 12:37:12 +0000 Received: (qmail invoked by alias); 21 Aug 2007 12:36:50 -0000 Received: from adsl-89-217-149-221.adslplus.ch (EHLO [10.0.1.194]) [89.217.149.221] by mail.gmx.net (mp049) with SMTP; 21 Aug 2007 14:36:50 +0200 X-Authenticated: #894343 X-Provags-ID: V01U2FsdGVkX18jnh4VKfEjSsdug1oNCuwu4Lgr153l2J6kFAoQxq aMPxVdP1eoD0dt Message-ID: <46CADC61.4030700@gmx.net> Date: Tue, 21 Aug 2007 14:36:49 +0200 From: Marcel Reutegger User-Agent: Thunderbird 2.0.0.6 (Windows/20070728) MIME-Version: 1.0 To: dev@jackrabbit.apache.org Subject: Re: improving the scalability in searching References: In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Y-GMX-Trusted: 0 X-Virus-Checked: Checked by ClamAV on apache.org Ard Schrijvers wrote: > So, WDOT about indexing properties in seperate lucene Fields, and about > possibly indexing more information of one property. Because the number of distinct property names in jackrabbit is unlimited (think of nt:unstructured nodes), this would lead to a great number of files created by lucene. for each field (this actually changed with version 2.1) lucene creates a separate file. That's basically the reason why I put them all into one field. See [1] and [2]. We should probably re-consider using a 1:1 mapping between jcr property names and lucene fields, since we also got rid of the norms with [3]. > My experience with > lucene, is that indexing tactically, eases querying a lot, and gains you lots > of performance. So, if you do agree on these changes, which I can try to > build in Jackrabbit, then I think these changes might validate a new > QueryHandler class to be build aside the old one. WDOT? I'm all for making the index better, however I'm a bit skeptical when it comes to virtual fields. This is not just an optimization but a new jackrabbit specific feature that we would introduce. regards marcel [1] http://lucene.apache.org/java/docs/fileformats.html#Normalization%20Factors [2] http://issues.apache.org/jira/browse/JCR-106 [3] http://issues.apache.org/jira/browse/JCR-1042