lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lance Norskog <goks...@gmail.com>
Subject Re: numFound is changing when query across distributed-seach with the same query.
Date Sun, 03 Jan 2010 01:32:55 GMT
The current distributed search design assumes that all document ids
are unique across the set of cores. If you have duplicates, you're on
your on.

On Fri, Jan 1, 2010 at 7:10 AM, Yonik Seeley <yonik@lucidimagination.com> wrote:
> On Thu, Dec 31, 2009 at 10:26 PM, Chris Hostetter
> <hossman_lucene@fucit.org> wrote:
>> why do we bother detecthing/removing the duplicates?
>>
>> strictly speaking docs with duplicate IDs on multiple shards is a "garbage
>> in" situation, i can understanding Solr taking a little extra effort to
>> not fail hard if this situation is encountered, but why update the
>> numFound at all, or remove the duplicates from the list? ... why not leave
>> them in as is?  (then numFound would never change)
>
> Distrib search keys some things off of the unique id, so when we
> encountered duplicates in the past it failed hard.  IIRC only keeping
> one doc with the same id was actually the easiest way to not fail
> hard.
>
> -Yonik
> http://www.lucidimagination.com
>



-- 
Lance Norskog
goksron@gmail.com

Mime
View raw message