jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alex Parvulescu <alex.parvule...@gmail.com>
Subject Re: Strange Search Performance problem with OR
Date Tue, 27 Mar 2012 11:30:38 GMT
Hi Christian,

can you enable debug logs
on org.apache.jackrabbit.core.query.lucene.join.QueryEngine?
I'm curious to see what the constraits look like in the big query vs the 2
small ones.

This also goes for the join you've mentioned later in the thread, but I
just wanted to start with the first query ;)

alex

On Tue, Mar 27, 2012 at 9:55 AM, Christian Stocker <
christian.stocker@liip.ch> wrote:

> Hi
>
> On 27.03.12 09:49, David Buchmann wrote:
> > sorry, my bad. did not read correctly.
> > you do have the paranthesis so you did what you wanted to do.
> >
> > looks like lucene/jackrabbit combine the 2 datasets first and filter
> > later...
> >
> > what if you try
> >
> >
> > SELECT * FROM [own:unstructured] AS data
> > WHERE
> >     data.guid = 'J7B1X' AND ISDESCENDANTNODE(data, '/article')
> >   OR
> >     data.guid = 'J7B1X' AND ISDESCENDANTNODE(data, '/import/article')
> > ORDER BY firstImportDate DESC
>
> I tried that and I tried it again now. Same response time as the
> original query.
>
> Any hints from someone who knows the internal workings of
> jackrabbit/lucene?
>
> chregu
>
> >
> > if this is fast, then the jackrabbit query engine is not very clever...
> >
> > cheers,david
> >
> >
> > Am 27.03.2012 09:10, schrieb David Buchmann:
> >> i think the 2 queries are not equivalent. the first one is equivalent to
> >
> >> ...
> >> WHERE data.guid = 'J7B1X'
> >>   AND (ISDESCENDANTNODE(data, '/article')
> >
> >> plus
> >
> >> WHERE
> >>  ISDESCENDANTNODE(data, '/import/article')
> >
> >> (if you want the data.guid = ... to apply to both, you need paranthesis)
> >
> >> but if /import/article is almost empty, i still don't see why the
> >> combined query should take so long unless jackrabbit/lucene are doing
> >> something stupid.
> >
> >> cheers,david
> >
> >> Am 26.03.2012 22:28, schrieb Christian Stocker:
> >>> Hi
> >
> >>> We have the following search query
> >
> >
> >>> SELECT * FROM [own:unstructured] AS data WHERE data.guid = 'J7B1X'
> >>>             AND (ISDESCENDANTNODE(data, '/article')
> >>>             OR ISDESCENDANTNODE(data, '/import/article')
> >>>             )
> >>>             ORDER BY firstImportDate DESC
> >
> >
> >>> This query can take quite some time (up to 3 seconds, but it gets more
> >>> and more hte more data we have). In /article there's potentially a lot
> >>> of nodes, in /import/article usually almost nil.
> >
> >
> >>> If we now separate the query into 2:
> >
> >>> SELECT * FROM [own:unstructured] AS data WHERE data.guid = 'J7B1X'
> >>>             AND ISDESCENDANTNODE(data, '/article')
> >>>             ORDER BY firstImportDate DESC
> >
> >>> and
> >
> >>> SELECT * FROM [own:unstructured] AS data WHERE data.guid = 'J7B1X'
> >>>             AND ISDESCENDANTNODE(data, '/import/article')
> >>>             ORDER BY firstImportDate DESC
> >
> >>> Both queries take approx. 10ms (and return 0 or 1 resultset, more is
> not
> >>> possible). So quite fast.
> >
> >>> Can anyone explain to me, why that is and how we could rewrite the
> query
> >>> to make it fast with a single one as well?
> >
> >>> Thanks
> >
> >>> chregu
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message