Return-Path: Delivered-To: apmail-jackrabbit-dev-archive@www.apache.org Received: (qmail 19491 invoked from network); 30 Nov 2007 10:55:38 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 30 Nov 2007 10:55:38 -0000 Received: (qmail 1203 invoked by uid 500); 30 Nov 2007 10:55:26 -0000 Delivered-To: apmail-jackrabbit-dev-archive@jackrabbit.apache.org Received: (qmail 1176 invoked by uid 500); 30 Nov 2007 10:55:26 -0000 Mailing-List: contact dev-help@jackrabbit.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@jackrabbit.apache.org Delivered-To: mailing list dev@jackrabbit.apache.org Received: (qmail 1167 invoked by uid 99); 30 Nov 2007 10:55:26 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 30 Nov 2007 02:55:26 -0800 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of marcel.reutegger@gmx.net designates 213.165.64.20 as permitted sender) Received: from [213.165.64.20] (HELO mail.gmx.net) (213.165.64.20) by apache.org (qpsmtpd/0.29) with SMTP; Fri, 30 Nov 2007 10:55:05 +0000 Received: (qmail invoked by alias); 30 Nov 2007 10:55:06 -0000 Received: from adsl-84-227-142-178.adslplus.ch (EHLO [10.0.1.198]) [84.227.142.178] by mail.gmx.net (mp045) with SMTP; 30 Nov 2007 11:55:06 +0100 X-Authenticated: #894343 X-Provags-ID: V01U2FsdGVkX1+/1OcK8WK/BH7ShOjcjZMjZKwYzbqESeA2lSJyN/ 5Xx6wSvfXaTQmC Message-ID: <474FEBF5.9070209@gmx.net> Date: Fri, 30 Nov 2007 11:54:45 +0100 From: Marcel Reutegger User-Agent: Thunderbird 2.0.0.9 (Windows/20071031) MIME-Version: 1.0 To: dev@jackrabbit.apache.org Subject: Re: DescendantSelfAxisWeight ChildAxisQuery performance References: In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Y-GMX-Trusted: 0 X-Virus-Checked: Checked by ClamAV on apache.org Ard Schrijvers wrote: >> Ard Schrijvers wrote: >> >>> Query q = qm.createQuery("stuff//*[@count]", Query.XPATH); if (q >>> instanceof QueryImpl) { >>> // limit the result set >>> ((QueryImpl) q).setLimit(1); >>> } >>> >>> Since my "stuff//*[@count]" gives me 1.200.000, it makes >> perfect sense >>> to users I think, that even with our patches and a working >> cache, that >>> retaining them all would be slow. But if I set the limit to >> 1 or 10, I >>> would expect to have performance (certainly when you have not >>> implemented any AccessManager). >>> >>> But, if I set limit to 1, why would we have to check all 1.200.000 >>> parents wether the path is correct? >> I'm not quite sure if this is a valid/common use case. I >> can't imagine doing a query like this without using an "order >> by" clause. Because without an "order by" you will just get a >> random node. But if you use an "order by" you need to get all >> nodes first anyway. see my comments below. > This is not my point. Wether you have an order by or not, lucene will > compute the score of all hits anyway. So, no order by, does not mean > that lucene does not order: it orders on score (but ofcourse you already > know that :-) ) > So, my thing holds with and without order by. WRT lucene this is correct. but the same is not true for JCR. if there is no order by the implementation is free to return the nodes in any order. I did a quick test and wrote a custom IndexSearcher (see below), which collects only the first n matching documents. the test query then executed much faster because the number of DescendantSelfAxisScorer.isValid() calls dropped drastically. There is one drawback though. you don't know the total number of results. in this case it might be OK to return -1 for the RangeIterator.getSize(). the order by is more difficult to solve. what we could try is order the result of the sub query first and then run the descendant axis test against the context nodes. DescendantSelfAxisQuery does not add nodes to the sub query but only limits the set subsequent ordering can be skipped. this requires that we need to pass along ordering information with the scorer. e.g. index-order, relevance, property. In any case we should create a jira issue for it. regards marcel public class JackrabbitIndexSearcher extends IndexSearcher { private final IndexReader reader; public JackrabbitIndexSearcher(IndexReader r) { super(r); this.reader = r; } // inherit javadoc public TopDocs search(Weight weight, Filter filter, int nDocs) throws IOException { TopDocCollector collector = new TopDocCollector(nDocs); Scorer scorer = weight.scorer(reader); if (scorer != null) { while (scorer.next() && nDocs-- > 0) { collector.collect(scorer.doc(), scorer.score()); } } return collector.topDocs(); } }