jackrabbit-oak-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thomas Mueller <muel...@adobe.com>
Subject Re: The infamous getSize() == -1 (Was: [jira] [Created] (OAK-300) Query: QueryResult.getRows().getSize())
Date Tue, 11 Sep 2012 14:18:23 GMT
Hi,

I'm worried about queries that return a huge number of rows, for example 1
million nodes. If getSize() is supposed to return the correct result, it
could potentially take hours (when reading 100 nodes per second). I'm more
in favour of returning -1 if there are more than just a few rows (for
example 20, or 100). Where 'few' is configurable (see OAK-300 for
details). And I think it should be configurable so that a client can
decide how much time he is willing to wait for the answer.

Specially I'm worried about existing clients that do call getSize() every
time.

>2. The client does need to know the size, so it calls getSize() and

I currently can't come up with a convincing use case - what is your use
case?


>has to iterate through all results if getSize() returns -1.

I would add: clients that do call getSize() even thought they don't really
need the result, or don't care so much about the result, and only use it
to display a 'next page' button / adjust the scrollbar.

>The main problem I have with the -1 return value is that case 2
>becomes really annoying to handle.
>
>Instead I'd propose the following design:
>
>* The getSize() method always returns the size, by buffering all
>results in memory if necessary.

Buffering will only work up to some point. Past that point, the query has
to be re-executed.

>* A separate hasSize() method can be used to check if the size is
>quickly available (i.e. if getSize() will complete in O(1) time).

O(1), without reading all rows, is hard to achieve, first because of
access rights checks and second because regular indexes will not be able
to provide this information within O(1) time (a counted b-tree can do it,
but we don't use one of those).

>PS. Note that implementing an "estimated size" feature like seen in
>many public search engines ("results 1-10 of thousands") is really
>difficult to implement in a manner that's both efficient and secure.

I agree.

Regards,
Thomas


Mime
View raw message