jackrabbit-oak-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jukka Zitting <jukka.zitt...@gmail.com>
Subject The infamous getSize() == -1 (Was: [jira] [Created] (OAK-300) Query: QueryResult.getRows().getSize())
Date Tue, 11 Sep 2012 10:08:11 GMT
Hi,

[moving this to oak-dev@ for a broader discussion]

On Tue, Sep 11, 2012 at 9:55 AM, Thomas Mueller (JIRA) <jira@apache.org> wrote:
> [...] For compatibility with Jackrabbit 2.0, and for ease of use, it would be good to
> have a clearly defined way to get the size of the result. [...]

I've always found the -1 return value from getSize() incredibly
annoying as it forces client code to use extra conditionals and go
through extra hoops if the size turns out not to be available. There
are basically three potential scenarios:

1. The client doesn't need to know the size, so it never calls getSize().
2. The client does need to know the size, so it calls getSize() and
has to iterate through all results if getSize() returns -1.
3. The client could use the size (for UI, optimization, etc.), so it
calls getSize() and ignores the result if its -1.

The main problem I have with the -1 return value is that case 2
becomes really annoying to handle.

Instead I'd propose the following design:

* The getSize() method always returns the size, by buffering all
results in memory if necessary.
* A separate hasSize() method can be used to check if the size is
quickly available (i.e. if getSize() will complete in O(1) time).

With such a design the above cases become easier to handle:

1. The client doesn't need to know the size, so it never calls getSize().
2. The client does need to know the size, so it calls getSize().
3. The client could use the size (for UI, optimization, etc.), so it
calls hasSize() and possibly follows up with getSize().

PS. Note that implementing an "estimated size" feature like seen in
many public search engines ("results 1-10 of thousands") is really
difficult to implement in a manner that's both efficient and secure.
Public search engines can make such estimates efficiently since all
their content is public and they thus don't need to worry about
accidentally leaking sensitive information.

BR,

Jukka Zitting

Mime
View raw message