lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shawn Heisey <apa...@elyograg.org>
Subject Re: inconsistent result count when doing paging
Date Thu, 09 Feb 2017 15:35:12 GMT
On 2/8/2017 9:35 PM, cmti95035 wrote:
> I noticed in our production environment that the returned result count is
> inconsistent when doing paging.
>
> For example, for a certain query, for the first page (start = 0, rows = 30),
> the corresponding "numFound" is 3402; and then it returned 3378, 3361 for
> the 2nd and 3rd page, respectively (start = 30, 60 respectively). A sample
> query looks like the following:
> q:TMCN:(美丽 OR ?美丽 OR 美丽? OR 丽美)
> raw query parameters:
> fl=*&start=60&rows=30&shards=172.10.10.3:9080/solr/tm01,172.10.10.3:9080
<snip>
> /solr/tm44,172.10.10.3:9080/solr/tm45&facet=true&facet.missing=false&facet.field=intCls&facet.field=appDate&facet.field=TMStatus
>
> The query was against multiple shards at a time. With limited tries I
> noticed that the return count is consistent if the number of shards are less
> than 5. 

When a distributed search returns different numFound values on different
requests for the same query, it almost always means that your uniqueKey
field is not unique between the different shards -- you have documents
using the same uniqueKey value in more than one shard.

The reason you see different counts has to do with which shards get
their results back to the coordinating node first, so on one query there
may be a different number of duplicate documents than on a subsequent
query, and the fact that Solr will remove duplicates from the combined
results before calculating the total.  Probably when you reduce the
number of shards, you are removing shards from the list that contain the
duplicate documents, so the problem doesn't happen.

It is *critical* that the uniqueKey field remains unique across the
entire distributed index.  Using SolrCloud with *fully* automatic
document routing will typically ensure that everything is unique across
the entire collection, but in other situations, making sure this happens
will be up to you.

Thanks,
Shawn


Mime
View raw message