jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marcel Reutegger <marcel.reuteg...@gmx.net>
Subject Re: Query that sorts a large result set.
Date Wed, 17 Jun 2009 08:13:45 GMT
Hi,

the sorting is pretty well optimized, it basically uses underlying
lucene functionality for that. there are two other important points
that will influence performance:

1) workspace configuration

the default workspace configuration will cause initial fetching of the
entire result set. you can change this behavior by setting the
resultFetchSize parameter. See [0].

2) Ian wrote: "I only want to see a small number of items eg 100 after
a particular date."

that might actually become a problem. it will result in a range query
that potentially selects lots (millions?) of nodes with distinct date
properties. this case is not optimized. there's a new indexing
technique in lucene called trierange queries [1] which was
specifically built to perform such queries efficiently. but this is
not yet integrated with jackrabbit.

I've created a JIRA issue to discuss and keep track of such an
enhancement in jackrabbit: [2]

regards
 marcel

[0] http://issues.apache.org/jira/browse/JCR-651
[1] http://www.lucidimagination.com/blog/2009/05/13/exploring-lucene-and-solrs-trierange-capabilities/
[2] https://issues.apache.org/jira/browse/JCR-2151

On Wed, Jun 17, 2009 at 01:50, Ian Boston<ieb@tfd.co.uk> wrote:
> Hi,
>
> I want to perform a query where the full result set could be millions of
> items. That set needs to be sorted by the lastModified attribute on the
> node, and I only want to see a small number of items eg 100 after a
> particular date.
>
> If I do this, will there be scalability issues, or is the sorting of a date
> field optimized in the query engine ?
>
> Thanks
> Ian
>

Mime
View raw message