lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From S G <sg.online.em...@gmail.com>
Subject Re: Why are cursor mark queries recommended over regular start, rows combination?
Date Wed, 14 Mar 2018 16:20:39 GMT
Thanks everybody. This is lot of good information.
And we should try to update this in the documentation too to help users
make the right choice.
I can take a stab at this if someone can point me how to update the
documentation.

Thanks
SG


On Tue, Mar 13, 2018 at 2:04 PM, Chris Hostetter <hossman_lucene@fucit.org>
wrote:

>
> : > 3) Lastly, it is not clear the role of export handler. It seems that
> the
> : > export handler would also have to do exactly the same kind of thing as
> : > start=0 and rows=1000,000. And that again means bad performance.
>
> : <3> First, streaming requests can only return docValues="true"
> : fields.Second, most streaming operations require sorting on something
> : besides score. Within those constraints, streaming will be _much_
> : faster and more efficient than cursorMark. Without tuning I saw 200K
> : rows/second returned for streaming, the bottleneck will be the speed
> : that the client can read from the network. First of all you only
> : execute one query rather than one query per N rows. Second, in the
> : cursorMark case, to return a document you and assuming that any field
> : you return is docValues=false
>
> Just to clarify, there is big difference between the /export handler
> and "streaming expressions"
>
> Unless something has changed drasticly in the past few releases, the
> /export handler does *NOT* support exporting a full *collection* in solr
> cloud -- it only operates on an individual core (aka: shard/replica).
>
> Streaming expressions is a feature that does work in Cloud mode, and can
> make calls to the /export handler on a replica of each shard in order to
> process the data of an entire collection -- but when doing so it has to
> aggregate the *ALL* the results from every shard in memory on the
> coordinating node -- meaning that (in addition to the docvalues caveat)
> streaming expressions requires you to "spend" a lot of ram usage on one
> node as a trade off for spending more time & multiple requests to get teh
> same data from cursorMark...
>
> https://lucene.apache.org/solr/guide/exporting-result-sets.html
> https://lucene.apache.org/solr/guide/streaming-expressions.html
>
> An additional perk of cursorMakr that may be relevant to the OP is that
> you can "stop" tailing a cursor at anytime (ie: if you're post processing
> the results client side and decide you have "enough" results) but a simila
> feature isn't available (AFAICT) from streaming expressions...
>
> https://lucene.apache.org/solr/guide/pagination-of-
> results.html#tailing-a-cursor
>
>
> -Hoss
> http://www.lucidworks.com/
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message