jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marcel Reutegger <marcel.reuteg...@gmx.net>
Subject Re: Query performance for large query results
Date Mon, 27 Nov 2006 11:26:22 GMT
Jukka Zitting wrote:
> On 11/27/06, Christoph Kiehl <kiehl@subshell.com> wrote:
>> 1. Use a lazy QueryResultImpl that keeps a reference to the result and 
>> only
>> fetches the UUIDs for requested nodes.
> I much prefer this approach over adding yet another cache. :-)

I would prefer a solution that fetches a configurable amount of result nodes and 
if more results are requested re-executes the query to get the remaining nodes.

>> This imposes that the access check is done in the QueryResultImpl and the
>> result size returned by size() may vary if you don't have access to 
>> some nodes
>> (which it already does if node in the result gets deleted).
> We could postpone the size calculation; if getSize() is never called,
> there is no need to calculate the result in advance. Additionally or
> instead of making getSize() lazy, we could add a configuration
> variable that governs the accuracy of the return value:
> 1) Return -1, this is allowed by the spec, but not very useful
> 2) Return the (almost) correct size like now, but with the latency issue
> 3) Return the unfiltered size, reducing latency but compromising security
> 4) Return the correct size for result sets of up to N nodes, otherwise 
> return -1

I would go for 3) combined with 4):
Return the correct size up to N, otherwise N + remaining-unfiltered-size()

>> The real problem is how to trigger result.close() which closes the 
>> index. I'm even
>> not sure if it causes problems if indexes are not closed as fast as 
>> possible.
> Any Lucene experts around with more insight on this?

well, it doesn't have an effect on other queries or the internals of the query 
handler, it will simply keep resources for a unknown time. but anyway, there is 
still the question when exactly the result would be closed.

I would rather prefer the already mentioned approach where the query is 
re-executed to get more results.


View raw message