jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marcel Reutegger <marcel.reuteg...@gmx.net>
Subject Re: Help with query performance required
Date Wed, 01 Apr 2009 12:35:56 GMT
Hi,

On Wed, Apr 1, 2009 at 09:10, daveg0 <bagel10002000@googlemail.com> wrote:
> I am trying to do "wildcard" queries that should return multiple nodes such
> as:
>
> /jcr:root/portal/wap/images//element(*,
> atom:Entry)[jcr:like(@atom:titletext,'soccer%']
>
> the performance has degraded over time with more entries to take nearly 8
> seconds which is unacceptable. I am aware that wildcard queries take longer,
> but shouldn't this type of query create a Lucene PrefixQuery which is much
> quicker. Most of our "wildcard" queries will be "prefix" queries as they
> will typically be searches for matching entries that start with a specific
> value eg "st*".
>
> I tried looking through the source code and I can't see any use of Lucene
> PrefixQuery only WildcardQuery, is this a design decision?

Yes, it is. There are basically two reasons:

- PrefixQuery is basically a boolean query that consists of optional
TermQueries (one for each term that matches the prefix). This design
has an inherent limit, because as soon as you have more than 1024
distinct terms that match the prefix the BooleanQuery will throw a
TooManyClauses exception.
- Jackrabbit supports prefix queries in combination with lower- and
upper-casing. This is not possible with the lucene PrefixQuery

In any case, prefix queries perform linearly to the number of distinct
terms in the index that match the prefix. Is it possible that your
prefix matches lots of distinct terms? i.e. the prefix is very short
or very common.

> Am I missing something or is it possible for Jackrabbit to perform a
> PrefixQuery for queries like this.
>
> I also tried to use "jcr:contains" e.g:
>
> /jcr:root/portal/wap/images//element(*,
> atom:Entry)[jcr:contains(@atom:titletext,'soccer']

that's not exactly the same, because it matches only terms that were
indexed as soccer. You could use:

/jcr:root/portal/wap/images//element(*,
atom:Entry)[jcr:contains(@atom:titletext,'soccer*']

but I'd say the performance is about the same.

> but this only returns the first matching entry. Am I
> misunderstanding/misusing "jcr:contains" in this way or would you expect it
> to return the same as the query with "jcr:like"

jcr:contains and jcr:like behave differently. see the specification for details.

regards
 marcel

Mime
View raw message