jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ian Boston <...@tfd.co.uk>
Subject Re: Query that sorts a large result set.
Date Wed, 17 Jun 2009 15:57:59 GMT

On 17 Jun 2009, at 09:13, Marcel Reutegger wrote:

> Hi,
>
> the sorting is pretty well optimized, it basically uses underlying
> lucene functionality for that. there are two other important points
> that will influence performance:
>
> 1) workspace configuration
>
> the default workspace configuration will cause initial fetching of the
> entire result set. you can change this behavior by setting the
> resultFetchSize parameter. See [0].

yes, we already have this in place, its made a huge difference,  
serveral orders of magnitude.

>
> 2) Ian wrote: "I only want to see a small number of items eg 100 after
> a particular date."
>
> that might actually become a problem. it will result in a range query
> that potentially selects lots (millions?) of nodes with distinct date
> properties. this case is not optimized. there's a new indexing
> technique in lucene called trierange queries [1] which was
> specifically built to perform such queries efficiently. but this is
> not yet integrated with jackrabbit.

So if I don't query for all items after a certain date, but just ask  
for a sort and do paging of the sorted result set..... with that be  
optimized by lucene ?

>
> I've created a JIRA issue to discuss and keep track of such an
> enhancement in jackrabbit: [2]

Thank you, I will go an do some reading, we use Lucene in so many  
places outside jackrabbit knowing the details of things like this is  
always valuable.
Thanks
Ian


>
> regards
> marcel
>
> [0] http://issues.apache.org/jira/browse/JCR-651
> [1] http://www.lucidimagination.com/blog/2009/05/13/exploring-lucene-and-solrs-trierange-capabilities/
> [2] https://issues.apache.org/jira/browse/JCR-2151
>
> On Wed, Jun 17, 2009 at 01:50, Ian Boston<ieb@tfd.co.uk> wrote:
>> Hi,
>>
>> I want to perform a query where the full result set could be  
>> millions of
>> items. That set needs to be sorted by the lastModified attribute on  
>> the
>> node, and I only want to see a small number of items eg 100 after a
>> particular date.
>>
>> If I do this, will there be scalability issues, or is the sorting  
>> of a date
>> field optimized in the query engine ?
>>
>> Thanks
>> Ian
>>


Mime
View raw message