lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Modassar Ather <modather1...@gmail.com>
Subject Re: Search results differs with sorting on pagination.
Date Thu, 10 Sep 2015 11:48:50 GMT
If two documents come back from different
shards with the same score, the order would not be predictable

This is fine.

What I am not able to understand is that when I do not give a secondary
field for sort I am getting the result from one shard which changes to
other shard in other hits. Here the results are always from one shard.
E.g In first hit all the results are from shard1 and in next hit all the
results are from shard2.

But when I add the secondary sort field I see the results from multiple
shards. E.g It has results from shard1 and shard2 both. This does not
change in multiple hits.

So please help me understand why the similar result merge and aggregation
in not happening in when a single sort field is given?

Regards,
Modassar



On Thu, Sep 10, 2015 at 5:03 PM, Upayavira <uv@odoko.co.uk> wrote:

> What scores are you getting? If two documents come back from different
> shards with the same score, the order would not be predictable -
> probably down to which shard responds first.
>
> Fix it with something like sort=score,timestamp or some other time
> related field.
>
> Upayavira
>
> On Thu, Sep 10, 2015, at 11:01 AM, Modassar Ather wrote:
> > To add to my previous observation I saw the response having results from
> > multiple shards when the secondary sort field is added and they remain
> > same
> > across hits.
> > Kindly help me understand this behavior. Why the results are changing as
> > I
> > understand that the result should be first clubbed together from all
> > shard
> > and then based on their score it should be sorted.
> > But here I see that every time I hit the sort query I am getting results
> > from different shard which has different scores.
> >
> > Thanks,
> > Modassar
> >
> > On Thu, Sep 10, 2015 at 2:59 PM, Modassar Ather <modather1981@gmail.com>
> > wrote:
> >
> > > Upayavira! I add the fl=id,score,[shard] and saw the shards changing in
> > > the response every time and for different shards the response changes
> but
> > > for the same shard result is same on multiple hits.
> > > When I add secondary sort field e.g. score the shard remains same
> across
> > > hits.
> > >
> > > On Thu, Sep 10, 2015 at 12:52 PM, Upayavira <uv@odoko.co.uk> wrote:
> > >
> > >> Add fl=id,score,[shard] to your query, and show us the results of two
> > >> differing executions.
> > >>
> > >> Perhaps we will be able to see the cause of the difference.
> > >>
> > >> Upayavira
> > >>
> > >> On Thu, Sep 10, 2015, at 05:35 AM, Modassar Ather wrote:
> > >> > Thanks Erick. There are no replicas on my cluster and the indexing
> is
> > >> one
> > >> > time. No updates or additions are done to the index and the
> segments are
> > >> > optimized at the end of indexing.
> > >> > So adding a secondary sort criteria is the only solution for such
> issue
> > >> > in
> > >> > sort?
> > >> >
> > >> > Regards,
> > >> > Modassar
> > >> >
> > >> > On Wed, Sep 9, 2015 at 8:21 PM, Erick Erickson <
> erickerickson@gmail.com
> > >> >
> > >> > wrote:
> > >> >
> > >> > > When the primary sort criteria is identical for two documents,
> > >> > > then the _internal_ Lucene document ID is used to break the
> > >> > > tie. The internal ID for two docs can be not only different,
but
> > >> > > in different _order_ on two separate shards. I'm assuming here
> > >> > > that  each of your shards has multiple replicas and/or you're
> > >> > > continuing to index to your cluster.
> > >> > >
> > >> > > The relative internal doc IDs for may change even relative to
> > >> > > each other when segments get merged.
> > >> > >
> > >> > > So yes, if you are sorting by something that can be identical
> > >> > > in documents, it's always best to specify a secondary sort
> > >> > > criteria. It's not referenced unless there's a tie so it's
> > >> > > not that expensive. People often use whatever field
> > >> > > is defined for <uniqueKey> since that's _guaranteed_ to
> > >> > > never be the same for two docs.
> > >> > >
> > >> > > Best,
> > >> > > Erick
> > >> > >
> > >> > > On Wed, Sep 9, 2015 at 1:45 AM, Modassar Ather <
> > >> modather1981@gmail.com>
> > >> > > wrote:
> > >> > > > Hi,
> > >> > > >
> > >> > > > Search results are changed every time the following query
is
> hit.
> > >> Please
> > >> > > > note that it is 7 shard cluster of Solr-5.2.1.
> > >> > > >
> > >> > > > Query: q=network&start=50&rows=50&sort=f_sort
> > >> > > asc&group=true&group.field=id
> > >> > > >
> > >> > > > Following are the fields and their types in my schema.xml.
> > >> > > >
> > >> > > > <fieldType name="string" class="solr.StrField"
> > >> sortMissingLast="true"
> > >> > > > stored="false" omitNorms="true"/>
> > >> > > > <fieldType name="string_dv" class="solr.StrField"
> > >> sortMissingLast="true"
> > >> > > > stored="false" indexed="true" docValues="true"/>
> > >> > > >
> > >> > > > <field name="id" type="string" stored="true"/>
> > >> > > > <dynamicField name="*_sort" type="string_dv"/>
> > >> > > >
> > >> > > > As per my understanding it seems to be the issue of tie
among
> the
> > >> > > document
> > >> > > > as when I added a new sort field like below the result never
> changed
> > >> > > across
> > >> > > > multiple hits.
> > >> > > > q=network&start=50&rows=50&sort=f_sort asc,
score
> > >> > > > asc&group=true&group.field=id
> > >> > > >
> > >> > > > Kindly let me know if this is an issue or how this can be
fixed.
> > >> > > >
> > >> > > > Thanks,
> > >> > > > Modassar
> > >> > >
> > >>
> > >
> > >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message