lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nawab Zada Asad Iqbal <khi...@gmail.com>
Subject Re: Request Highlighting only for the final set of rows
Date Fri, 18 Aug 2017 17:57:27 GMT
Actually, i realize that it is an incorrect use on my part to pass only
id+score in fl and specify more fields in the hl.fl fields. This was
somehow supported in older versions but the new behavior is actually a
performance improvement for the scenario when user is asking for only ids.


Nawab

On Fri, Aug 18, 2017 at 8:33 AM, Nawab Zada Asad Iqbal <khichi@gmail.com>
wrote:

> Thanks Erick for the pointing to better option. I will explore that. After
> your email, I found that if i have specified 'fl=*' in the query then it is
> doing the right thing (a 2 pass process). However, my queries had
> 'fl=id+score' (or sometimes fl=id&fl=score), in both of these cases I found
> that the shards are asked for highlighting all the results on the first
> request (and there is no second request).
>
> The fl=* query is (in my sample case) finishing in 100 msec while same
> query with fl=id+score finishes in 1200 msec.
>
> Here are the two queries;
>
> http://solrdev.test.net:8984/solr/filesearch/select?&hl=on&
> fl=*&start=200&rows=200&q=nawab&shards=solrdev.test.net:
> 8984/solr/filesearch,solrdev.test.net:8985/solr/filesearch,
> solrdev.test.net:8986/solr/filesearch&wt=json
>
>
> http://solrdev.test.net:8984/solr/filesearch/select?&hl=on&
> fl=id&fl=score&start=200&rows=200&q=nawab&shards=solrdev.
> test.net:8984/solr/filesearch,solrdev.test.net:8985/solr/
> filesearch,solrdev.test.net:8986/solr/filesearch&wt=json
>
>
> Thanks
> Nawab
>
>
>
>
> On Fri, Aug 18, 2017 at 7:23 AM, Erick Erickson <erickerickson@gmail.com>
> wrote:
>
>> I don't think you're reading it correctly. First of all, if you're
>> going to do be doing deep paging you should be using cusorMark, see:
>> https://cwiki.apache.org/confluence/display/solr/Pagination+of+Results.
>>
>> Second, it's a two-pass process if you don't use cursormark. The first
>> pass gets the candidate docs from each shard. But all it returns is
>> the ID and sort criteria. Then the aggregator node gets the _true_ top
>> N after sorting all the lists from each shard and issues a second
>> request for _only_ those docs that have made the top N from each sub
>> shard, and those should be the only ones highlighted.
>>
>> Do you have any evidence to the contrary that they're all being
>> highlighted? Or are you misinterpreting the log message for the first
>> pass?
>>
>> Best,
>> Erick
>>
>> On Thu, Aug 17, 2017 at 5:43 PM, Nawab Zada Asad Iqbal <khichi@gmail.com>
>> wrote:
>> > Hi,
>> >
>> > In a multi-node solr installation (without SolrCloud), during a paging
>> > scenario (e.g., start=1000, rows=200), the primary node asks for 1200
>> rows
>> > from each shard. If highlighting is ON, then the primary node is asking
>> for
>> > highlighting all the 1200 results from each shard, which doesn't scale
>> > well. Is there a way to break the shard query in two steps e.g. ask for
>> the
>> > 1200 rows and after sorting the 1200 responses from each shard and
>> finding
>> > final rows to return (1001 to 1200) , issue another query to shards for
>> > asking highlighted response for the relevant docs?
>> >
>> >
>> >
>> > Thanks
>> > Nawab
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message