lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: CursorMarks and 'end of results'
Date Fri, 29 Jun 2018 15:42:46 GMT
bq. It basically cuts down the search time in half in the usual case
for us, so it's an important 'feature'.

Wait. You mean that the "extra" call to get back 0 rows doubles your
query time? That's surprising, tell us more.

How many times does your "usual" use case call using CursorMark? My
off-the-cuff explanation would be that
you usually get all the rows in the first call.

CursorMark is intended to help with the "deep paging" problem, i.e.
where start=some_big_number to allow
returning large results sets in chunks, say through 10s of K rows.
Part of our puzzlement is that in that
case the overhead of the last call is minuscule compared to the rest.

There's no reason that it can't be used for small result sets, those
are just usually handled by setting the
start parameter. Up through, say, 1,000 or so the extra overhead is
pretty unnoticeable. So my head was
in the "what's the problem with 1 extra call after making the first 50?".

OTOH, if you make 100 successive calls to search with the CursorMark
and call 101 takes as long as
the previous 100, something's horribly wrong.

Best,
Erick


On Fri, Jun 29, 2018 at 4:01 AM, David Frese
<david.frese@active-group.de> wrote:
> Am 22.06.18 um 02:37 schrieb Chris Hostetter:
>>
>>
>> : the documentation of 'cursorMarks' recommends to fetch until a query
>> returns
>> : the cursorMark that was passed in to a request.
>> :
>> : But that always requires an additional request at the end, so I wonder
>> if I
>> : can stop already, if a request returns less results than requested (num
>> rows).
>> : There won't be new documents added during the search in my use case, so
>> could
>> : there every be a non-empty 'page' after a non-full 'page'?
>>
>> You could stop then -- if that fits your usecase -- but the documentation
>> (in particular the sentence you are refering to) is trying to be as
>> straightforward and general as possible ... which includes the use case
>> where someone is "tailing" an index and documents may be continually
>> added.
>>
>> When originally writing those docs, I did have a bit in there about
>> *either* getting back less then "rows" docs *or* getting back the same
>> cursor you passed in (to try to cover both use cases as efficiently as
>> possible) but it seemed more confusing -- and i was worried people might
>> be suprised/confused when the number of docs was perfectly divisible by
>> "rows" so the "less then rows" case could still wind up in a final
>> request that returned "0" docs.
>>
>> the current docs seemed like a good balance between brevity & clarity,
>> with the added bonus of being correct :)
>>
>> But as Anshum said: if you have suggested improvements for rewording,
>> patches/PRs certainly welcome.  It's hard to have a good perspective on
>> what docs are helpful to new users whne you have been working with the
>> software for 14 years and wrote the code in question.
>
>
> Thank you very much for the clarification.
>
> It basically cuts down the search time in half in the usual case for us, so
> it's an important 'feature'.
>
>
> --
> David Frese
> +49 7071 70896 75
>
> Active Group GmbH
> Hechinger Str. 12/1, 72072 Tübingen
> Registergericht: Amtsgericht Stuttgart, HRB 224404
> Geschäftsführer: Dr. Michael Sperber

Mime
View raw message