lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Iker Mtnz. Apellaniz" <mitxin...@gmail.com>
Subject Re: Duplicated Documents Across shards
Date Mon, 06 May 2013 08:44:40 GMT
Thanks Erick,
  I think we found the problem. When defining the cores for both shards we
define both of them in the same instanceDir, like this:
<core schema="schema.xml" shard="shard2" instanceDir="1_collection/"
name="1_collection" config="solrconfig.xml" collection="1_collection"/>
<core schema="schema.xml" shard="shard4" instanceDir="1_collection/"
name="1_collection" config="solrconfig.xml" collection="1_collection"/>

  Each shard should have its own folder, so the final configuration should
be like this:
<core schema="schema.xml" shard="shard2" instanceDir="1_collection/shard2/"
name="1_collection" config="solrconfig.xml" collection="1_collection"/>
<core schema="schema.xml" shard="shard4" instanceDir="1_collection/shard4/"
name="1_collection" config="solrconfig.xml" collection="1_collection"/>

Can anyone confirm this?

Thanks,
  Iker


2013/5/4 Erick Erickson <erickerickson@gmail.com>

> Sounds like you've explicitly routed the same document to two
> different shards. Document replacement only happens locally to a
> shard, so the fact that you have documents with the same ID on two
> different shards is why you're getting duplicate documents.
>
> Best
> Erick
>
> On Fri, May 3, 2013 at 3:44 PM, Iker Mtnz. Apellaniz
> <mitxino77@gmail.com> wrote:
> > We are currently using version 4.2.
> > We have made tests with a single document and it gives us a 2 document
> > count. But if we force to shard into te first machine, the one with a
> > unique shard, the count gives us 1 document.
> > I've tried using distrib=false parameter, it gives us no duplicate
> > documents, but the same document appears to be in two different shards.
> >
> > Finally, about the separate directories, We have only one directory for
> the
> > data in each physical machine and collection, and I don't see any
> subfolder
> > for the different shards.
> >
> > Is it possible that we have something wrong with the dataDir
> configuration
> > to use multiple shards in one machine?
> >
> > <dataDir>${solr.data.dir:}</dataDir>
> > <directoryFactory name="DirectoryFactory"
> > class="${solr.directoryFactory:solr.NRTCachingDirectoryFactory}"/>
> >
> >
> >
> > 2013/5/3 Erick Erickson <erickerickson@gmail.com>
> >
> >> What version of Solr? The custom routing stuff is quite new so
> >> I'm guessing 4x?
> >>
> >> But this shouldn't be happening. The actual index data for the
> >> shards should be in separate directories, they just happen to
> >> be on the same physical machine.
> >>
> >> Try querying each one with &distrib=false to see the counts
> >> from single shards, that may shed some light on this. It vaguely
> >> sounds like you have indexed the same document to both shards
> >> somehow...
> >>
> >> Best
> >> Erick
> >>
> >> On Fri, May 3, 2013 at 5:28 AM, Iker Mtnz. Apellaniz
> >> <mitxino77@gmail.com> wrote:
> >> > Hi,
> >> >   We have currently a solrCloud implementation running 5 shards in 3
> >> > physical machines, so the first machine will have the shard number 1,
> the
> >> > second machine shards 2 & 4, and the third shards 3 & 5. We noticed
> that
> >> > while queryng numFoundDocs decreased when we increased the start
> param.
> >> >   After some investigation we found that the documents in shards 2 to
> 5
> >> > were being counted twice. Querying to shard 2 will give you back the
> >> > results for shard 2 & 4, and the same thing for shards 3 & 5. Our
> guess
> >> is
> >> > that the physical index for both shard 2&4 is shared, so the shards
> don't
> >> > know which part of it is for each one.
> >> >   The uniqueKey is correctly defined, and we have tried using shard
> >> prefix
> >> > (shard1!docID).
> >> >
> >> >   Is there any way to solve this problem when a unique physical
> machine
> >> > shares shards?
> >> >   Is it a "real" problem os it just affects facet & numResults?
> >> >
> >> > Thanks
> >> >    Iker
> >> >
> >> > --
> >> > /** @author imartinez*/
> >> > Person me = *new* Developer();
> >> > me.setName(*"Iker Mtz de Apellaniz Anzuola"*);
> >> > me.setTwit("@mitxino77 <https://twitter.com/mitxino77>");
> >> > me.setLocations({"St Cugat, Barcelona", "Kanpezu, Euskadi", "*,
> >> World"]});
> >> > me.setSkills({*SoftwareDeveloper, Curious, AmateurCook*});
> >> > me.setWebs({*urbasaabentura.com, ikertxef.com*});
> >> > *return* me;
> >>
> >
> >
> >
> > --
> > /** @author imartinez*/
> > Person me = *new* Developer();
> > me.setName(*"Iker Mtz de Apellaniz Anzuola"*);
> > me.setTwit("@mitxino77 <https://twitter.com/mitxino77>");
> > me.setLocations({"St Cugat, Barcelona", "Kanpezu, Euskadi", "*,
> World"]});
> > me.setSkills({*SoftwareDeveloper, Curious, AmateurCook*});
> > *return* me;
>



-- 
/** @author imartinez*/
Person me = *new* Developer();
me.setName(*"Iker Mtz de Apellaniz Anzuola"*);
me.setTwit("@mitxino77 <https://twitter.com/mitxino77>");
me.setLocations({"St Cugat, Barcelona", "Kanpezu, Euskadi", "*, World"]});
me.setSkills({*SoftwareDeveloper, Curious, AmateurCook*});
*return* me;

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message