Return-Path: X-Original-To: apmail-lucene-solr-user-archive@minotaur.apache.org Delivered-To: apmail-lucene-solr-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id CB1E410BF1 for ; Mon, 6 May 2013 08:45:14 +0000 (UTC) Received: (qmail 43871 invoked by uid 500); 6 May 2013 08:45:11 -0000 Delivered-To: apmail-lucene-solr-user-archive@lucene.apache.org Received: (qmail 43547 invoked by uid 500); 6 May 2013 08:45:11 -0000 Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-user@lucene.apache.org Delivered-To: mailing list solr-user@lucene.apache.org Received: (qmail 43460 invoked by uid 99); 6 May 2013 08:45:08 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 06 May 2013 08:45:08 +0000 X-ASF-Spam-Status: No, hits=1.7 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of mitxino77@gmail.com designates 209.85.212.46 as permitted sender) Received: from [209.85.212.46] (HELO mail-vb0-f46.google.com) (209.85.212.46) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 06 May 2013 08:45:01 +0000 Received: by mail-vb0-f46.google.com with SMTP id 10so2720190vbe.5 for ; Mon, 06 May 2013 01:44:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:content-type; bh=+dTttwZX8LQSuxpsDeanqfs8L7v2ajx+ebj0KVJi8qI=; b=pgpSy7LTVIwrNJrFiwEh7MqmLbZuOrgbHUdBTikGUky57tIZtqPIYYkK42FRDt6a+V 5foI9DigGXCQjwp5e7BtOqrJccVfuoeECg+vDzQg29wQaXcZYUfCRj5wr41FbevIHsbu c5lxKDuV7JfcIH8bLnUybNphmvO/SPctRsd1mJSMrbucDjn7qEQ2r86nBUwSNdR2Ul/B nRHDIcASDpbhC1vZ7ATLHJmx+8FZeb0HXp1TGSc9hHEbxpvoJK9L40NkJdAJlN37JHv+ l90wA+4WvHxUgTFyQ12jFQkU3gVqePc/NEr3Ce0C2GcKErKLy9IeCc1bSqTuJ4FGs856 IgEg== MIME-Version: 1.0 X-Received: by 10.59.11.199 with SMTP id ek7mr6508696ved.19.1367829880952; Mon, 06 May 2013 01:44:40 -0700 (PDT) Received: by 10.52.166.169 with HTTP; Mon, 6 May 2013 01:44:40 -0700 (PDT) In-Reply-To: References: Date: Mon, 6 May 2013 10:44:40 +0200 Message-ID: Subject: Re: Duplicated Documents Across shards From: "Iker Mtnz. Apellaniz" To: solr-user@lucene.apache.org Content-Type: multipart/alternative; boundary=047d7bd6b1562ae18e04dc08b619 X-Virus-Checked: Checked by ClamAV on apache.org --047d7bd6b1562ae18e04dc08b619 Content-Type: text/plain; charset=ISO-8859-1 Thanks Erick, I think we found the problem. When defining the cores for both shards we define both of them in the same instanceDir, like this: Each shard should have its own folder, so the final configuration should be like this: Can anyone confirm this? Thanks, Iker 2013/5/4 Erick Erickson > Sounds like you've explicitly routed the same document to two > different shards. Document replacement only happens locally to a > shard, so the fact that you have documents with the same ID on two > different shards is why you're getting duplicate documents. > > Best > Erick > > On Fri, May 3, 2013 at 3:44 PM, Iker Mtnz. Apellaniz > wrote: > > We are currently using version 4.2. > > We have made tests with a single document and it gives us a 2 document > > count. But if we force to shard into te first machine, the one with a > > unique shard, the count gives us 1 document. > > I've tried using distrib=false parameter, it gives us no duplicate > > documents, but the same document appears to be in two different shards. > > > > Finally, about the separate directories, We have only one directory for > the > > data in each physical machine and collection, and I don't see any > subfolder > > for the different shards. > > > > Is it possible that we have something wrong with the dataDir > configuration > > to use multiple shards in one machine? > > > > ${solr.data.dir:} > > > class="${solr.directoryFactory:solr.NRTCachingDirectoryFactory}"/> > > > > > > > > 2013/5/3 Erick Erickson > > > >> What version of Solr? The custom routing stuff is quite new so > >> I'm guessing 4x? > >> > >> But this shouldn't be happening. The actual index data for the > >> shards should be in separate directories, they just happen to > >> be on the same physical machine. > >> > >> Try querying each one with &distrib=false to see the counts > >> from single shards, that may shed some light on this. It vaguely > >> sounds like you have indexed the same document to both shards > >> somehow... > >> > >> Best > >> Erick > >> > >> On Fri, May 3, 2013 at 5:28 AM, Iker Mtnz. Apellaniz > >> wrote: > >> > Hi, > >> > We have currently a solrCloud implementation running 5 shards in 3 > >> > physical machines, so the first machine will have the shard number 1, > the > >> > second machine shards 2 & 4, and the third shards 3 & 5. We noticed > that > >> > while queryng numFoundDocs decreased when we increased the start > param. > >> > After some investigation we found that the documents in shards 2 to > 5 > >> > were being counted twice. Querying to shard 2 will give you back the > >> > results for shard 2 & 4, and the same thing for shards 3 & 5. Our > guess > >> is > >> > that the physical index for both shard 2&4 is shared, so the shards > don't > >> > know which part of it is for each one. > >> > The uniqueKey is correctly defined, and we have tried using shard > >> prefix > >> > (shard1!docID). > >> > > >> > Is there any way to solve this problem when a unique physical > machine > >> > shares shards? > >> > Is it a "real" problem os it just affects facet & numResults? > >> > > >> > Thanks > >> > Iker > >> > > >> > -- > >> > /** @author imartinez*/ > >> > Person me = *new* Developer(); > >> > me.setName(*"Iker Mtz de Apellaniz Anzuola"*); > >> > me.setTwit("@mitxino77 "); > >> > me.setLocations({"St Cugat, Barcelona", "Kanpezu, Euskadi", "*, > >> World"]}); > >> > me.setSkills({*SoftwareDeveloper, Curious, AmateurCook*}); > >> > me.setWebs({*urbasaabentura.com, ikertxef.com*}); > >> > *return* me; > >> > > > > > > > > -- > > /** @author imartinez*/ > > Person me = *new* Developer(); > > me.setName(*"Iker Mtz de Apellaniz Anzuola"*); > > me.setTwit("@mitxino77 "); > > me.setLocations({"St Cugat, Barcelona", "Kanpezu, Euskadi", "*, > World"]}); > > me.setSkills({*SoftwareDeveloper, Curious, AmateurCook*}); > > *return* me; > -- /** @author imartinez*/ Person me = *new* Developer(); me.setName(*"Iker Mtz de Apellaniz Anzuola"*); me.setTwit("@mitxino77 "); me.setLocations({"St Cugat, Barcelona", "Kanpezu, Euskadi", "*, World"]}); me.setSkills({*SoftwareDeveloper, Curious, AmateurCook*}); *return* me; --047d7bd6b1562ae18e04dc08b619--