From users-return-18707-apmail-jackrabbit-users-archive=jackrabbit.apache.org@jackrabbit.apache.org Tue Mar 27 11:31:05 2012 Return-Path: X-Original-To: apmail-jackrabbit-users-archive@minotaur.apache.org Delivered-To: apmail-jackrabbit-users-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 07EEE9344 for ; Tue, 27 Mar 2012 11:31:05 +0000 (UTC) Received: (qmail 56813 invoked by uid 500); 27 Mar 2012 11:31:04 -0000 Delivered-To: apmail-jackrabbit-users-archive@jackrabbit.apache.org Received: (qmail 56779 invoked by uid 500); 27 Mar 2012 11:31:04 -0000 Mailing-List: contact users-help@jackrabbit.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: users@jackrabbit.apache.org Delivered-To: mailing list users@jackrabbit.apache.org Received: (qmail 56770 invoked by uid 99); 27 Mar 2012 11:31:04 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 27 Mar 2012 11:31:04 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of alex.parvulescu@gmail.com designates 209.85.216.170 as permitted sender) Received: from [209.85.216.170] (HELO mail-qc0-f170.google.com) (209.85.216.170) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 27 Mar 2012 11:30:59 +0000 Received: by qcmt36 with SMTP id t36so6825361qcm.1 for ; Tue, 27 Mar 2012 04:30:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=/5YG9NbLnQN4m2aP5JRLYKnXFtcPFwhSkvHhyd35g7I=; b=X3RBSLV6i2tJtMTUp139Sa8gaIzTv8f07jP6fFIVVVnDXdQzPndkudLqSdDuvTui7N X34/IbMZo84TdnCtcRw1iSr/+uCjG7NpijJNWLd0ZbK16CsL+B7yg96/8B8s4LGIzRdd FSTmlfkzbd0+TtqLwZCcoWVhmeF0Vrb4mMyEfMCOPftkgR8T9hL0nmVbudpxNqr6yfhi cSK2EsIplBFt7vCW1z3GS1+MungkenHzVVuMdd+73CjnAuKAdmUCvM/4GEHlnipqo1KD sebnrVsLaqzjhRY1+x/Mg8OAkLG0O7gXPW7sc/zSZCY6oqVGqnzzvGO57G9DDwjn4F4W Ipew== MIME-Version: 1.0 Received: by 10.224.33.134 with SMTP id h6mr31920453qad.15.1332847838352; Tue, 27 Mar 2012 04:30:38 -0700 (PDT) Received: by 10.229.242.83 with HTTP; Tue, 27 Mar 2012 04:30:38 -0700 (PDT) In-Reply-To: <4F717278.7000705@liip.ch> References: <4F70D16E.6090608@liip.ch> <4F716800.7090205@liip.ch> <4F7170F5.8000903@liip.ch> <4F717278.7000705@liip.ch> Date: Tue, 27 Mar 2012 13:30:38 +0200 Message-ID: Subject: Re: Strange Search Performance problem with OR From: Alex Parvulescu To: users@jackrabbit.apache.org Cc: Christian Stocker , David Buchmann Content-Type: multipart/alternative; boundary=20cf306f779ef1d4dc04bc37d1cc X-Virus-Checked: Checked by ClamAV on apache.org --20cf306f779ef1d4dc04bc37d1cc Content-Type: text/plain; charset=ISO-8859-1 Hi Christian, can you enable debug logs on org.apache.jackrabbit.core.query.lucene.join.QueryEngine? I'm curious to see what the constraits look like in the big query vs the 2 small ones. This also goes for the join you've mentioned later in the thread, but I just wanted to start with the first query ;) alex On Tue, Mar 27, 2012 at 9:55 AM, Christian Stocker < christian.stocker@liip.ch> wrote: > Hi > > On 27.03.12 09:49, David Buchmann wrote: > > sorry, my bad. did not read correctly. > > you do have the paranthesis so you did what you wanted to do. > > > > looks like lucene/jackrabbit combine the 2 datasets first and filter > > later... > > > > what if you try > > > > > > SELECT * FROM [own:unstructured] AS data > > WHERE > > data.guid = 'J7B1X' AND ISDESCENDANTNODE(data, '/article') > > OR > > data.guid = 'J7B1X' AND ISDESCENDANTNODE(data, '/import/article') > > ORDER BY firstImportDate DESC > > I tried that and I tried it again now. Same response time as the > original query. > > Any hints from someone who knows the internal workings of > jackrabbit/lucene? > > chregu > > > > > if this is fast, then the jackrabbit query engine is not very clever... > > > > cheers,david > > > > > > Am 27.03.2012 09:10, schrieb David Buchmann: > >> i think the 2 queries are not equivalent. the first one is equivalent to > > > >> ... > >> WHERE data.guid = 'J7B1X' > >> AND (ISDESCENDANTNODE(data, '/article') > > > >> plus > > > >> WHERE > >> ISDESCENDANTNODE(data, '/import/article') > > > >> (if you want the data.guid = ... to apply to both, you need paranthesis) > > > >> but if /import/article is almost empty, i still don't see why the > >> combined query should take so long unless jackrabbit/lucene are doing > >> something stupid. > > > >> cheers,david > > > >> Am 26.03.2012 22:28, schrieb Christian Stocker: > >>> Hi > > > >>> We have the following search query > > > > > >>> SELECT * FROM [own:unstructured] AS data WHERE data.guid = 'J7B1X' > >>> AND (ISDESCENDANTNODE(data, '/article') > >>> OR ISDESCENDANTNODE(data, '/import/article') > >>> ) > >>> ORDER BY firstImportDate DESC > > > > > >>> This query can take quite some time (up to 3 seconds, but it gets more > >>> and more hte more data we have). In /article there's potentially a lot > >>> of nodes, in /import/article usually almost nil. > > > > > >>> If we now separate the query into 2: > > > >>> SELECT * FROM [own:unstructured] AS data WHERE data.guid = 'J7B1X' > >>> AND ISDESCENDANTNODE(data, '/article') > >>> ORDER BY firstImportDate DESC > > > >>> and > > > >>> SELECT * FROM [own:unstructured] AS data WHERE data.guid = 'J7B1X' > >>> AND ISDESCENDANTNODE(data, '/import/article') > >>> ORDER BY firstImportDate DESC > > > >>> Both queries take approx. 10ms (and return 0 or 1 resultset, more is > not > >>> possible). So quite fast. > > > >>> Can anyone explain to me, why that is and how we could rewrite the > query > >>> to make it fast with a single one as well? > > > >>> Thanks > > > >>> chregu > > > > > --20cf306f779ef1d4dc04bc37d1cc--