lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nawab Zada Asad Iqbal <khi...@gmail.com>
Subject Re: Request Highlighting only for the final set of rows
Date Fri, 18 Aug 2017 18:07:04 GMT
Actually, part of me is thinking that there are valid use cases for having
fl and hl.fl with different values. e.g, receive name etc. in “clean” form
in fl field and receive both name and address in html formatted form (by
specifying in hl.fl)


On Fri, Aug 18, 2017 at 10:57 AM, Nawab Zada Asad Iqbal <khichi@gmail.com>
wrote:

> Actually, i realize that it is an incorrect use on my part to pass only
> id+score in fl and specify more fields in the hl.fl fields. This was
> somehow supported in older versions but the new behavior is actually a
> performance improvement for the scenario when user is asking for only ids.
>
>
> Nawab
>
> On Fri, Aug 18, 2017 at 8:33 AM, Nawab Zada Asad Iqbal <khichi@gmail.com>
> wrote:
>
>> Thanks Erick for the pointing to better option. I will explore that.
>> After your email, I found that if i have specified 'fl=*' in the query then
>> it is doing the right thing (a 2 pass process). However, my queries had
>> 'fl=id+score' (or sometimes fl=id&fl=score), in both of these cases I found
>> that the shards are asked for highlighting all the results on the first
>> request (and there is no second request).
>>
>> The fl=* query is (in my sample case) finishing in 100 msec while same
>> query with fl=id+score finishes in 1200 msec.
>>
>> Here are the two queries;
>>
>> http://solrdev.test.net:8984/solr/filesearch/select?&hl=on&f
>> l=*&start=200&rows=200&q=nawab&shards=solrdev.test.net:8984/
>> solr/filesearch,solrdev.test.net:8985/solr/filesearch,solrd
>> ev.test.net:8986/solr/filesearch&wt=json
>>
>>
>> http://solrdev.test.net:8984/solr/filesearch/select?&hl=on&f
>> l=id&fl=score&start=200&rows=200&q=nawab&shards=solrdev.test
>> .net:8984/solr/filesearch,solrdev.test.net:8985/solr/filesea
>> rch,solrdev.test.net:8986/solr/filesearch&wt=json
>>
>>
>> Thanks
>> Nawab
>>
>>
>>
>>
>> On Fri, Aug 18, 2017 at 7:23 AM, Erick Erickson <erickerickson@gmail.com>
>> wrote:
>>
>>> I don't think you're reading it correctly. First of all, if you're
>>> going to do be doing deep paging you should be using cusorMark, see:
>>> https://cwiki.apache.org/confluence/display/solr/Pagination+of+Results.
>>>
>>> Second, it's a two-pass process if you don't use cursormark. The first
>>> pass gets the candidate docs from each shard. But all it returns is
>>> the ID and sort criteria. Then the aggregator node gets the _true_ top
>>> N after sorting all the lists from each shard and issues a second
>>> request for _only_ those docs that have made the top N from each sub
>>> shard, and those should be the only ones highlighted.
>>>
>>> Do you have any evidence to the contrary that they're all being
>>> highlighted? Or are you misinterpreting the log message for the first
>>> pass?
>>>
>>> Best,
>>> Erick
>>>
>>> On Thu, Aug 17, 2017 at 5:43 PM, Nawab Zada Asad Iqbal <khichi@gmail.com>
>>> wrote:
>>> > Hi,
>>> >
>>> > In a multi-node solr installation (without SolrCloud), during a paging
>>> > scenario (e.g., start=1000, rows=200), the primary node asks for 1200
>>> rows
>>> > from each shard. If highlighting is ON, then the primary node is
>>> asking for
>>> > highlighting all the 1200 results from each shard, which doesn't scale
>>> > well. Is there a way to break the shard query in two steps e.g. ask
>>> for the
>>> > 1200 rows and after sorting the 1200 responses from each shard and
>>> finding
>>> > final rows to return (1001 to 1200) , issue another query to shards for
>>> > asking highlighted response for the relevant docs?
>>> >
>>> >
>>> >
>>> > Thanks
>>> > Nawab
>>>
>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message