lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <hossman_luc...@fucit.org>
Subject Re: Solr v3.5.0 - numFound changes when paging through results on 8-shard cluster
Date Tue, 19 Jun 2012 21:40:43 GMT
: Confirming that there are no active records being written, the "numFound"
: value is decreasing as we page through the results.

1) check that the "clones" of each shard are in fact identical (just look 
at the index files on each machine and make sure they are the same.

2) distributed searching relies heavily on using a uniqeuKey, and can 
behave oddly if documents with identical keys exist in multiple shards.  

http://wiki.apache.org/solr/DistributedSearch?#Distributed_Searching_Limitations

If i remember correctly, what you are describing sounds like one of the 
things that can hapen if you violate the uniqueKey rule across differnet 
shards when indexing.

I *think* what you are seeing is that in the distributed request for 
page#1 the coordinator sums up the numFound from all shards, and merges 
results 1-$rows acording to the sort, likewise for pages 2 & 3 when you 
get to page #4, it suddenly sees that doc#9876543 is included in hte 
responses from 3 diff shards, and it subtracts 2 from the numFound, and so 
on as you page farther through the results.  the more documents with 
duplicate uniqueKeys it find in the results as it pages through, the lower 
the cumulative numFound gets.

: For example,
: Page1 - numFound = 3683
: Page2 - numFound = 3683
: Page3 - numFound = 3683
: Page4 - numFound = 2866
: Page5 - numFound = 2419
: Page5 - numFound = 1898
: Page6 - numFound = 1898
: ...
: PageN - numFound = 1898

-Hoss

Mime
View raw message