Return-Path: X-Original-To: apmail-lucene-solr-user-archive@minotaur.apache.org Delivered-To: apmail-lucene-solr-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id B4CF517E1F for ; Thu, 26 Mar 2015 18:16:23 +0000 (UTC) Received: (qmail 29295 invoked by uid 500); 26 Mar 2015 18:16:07 -0000 Delivered-To: apmail-lucene-solr-user-archive@lucene.apache.org Received: (qmail 29229 invoked by uid 500); 26 Mar 2015 18:16:07 -0000 Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-user@lucene.apache.org Delivered-To: mailing list solr-user@lucene.apache.org Received: (qmail 28101 invoked by uid 99); 26 Mar 2015 18:16:06 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 26 Mar 2015 18:16:06 +0000 X-ASF-Spam-Status: No, hits=2.3 required=5.0 tests=SPF_SOFTFAIL,URI_HEX X-Spam-Check-By: apache.org Received-SPF: softfail (nike.apache.org: transitioning domain of shamikb@gmail.com does not designate 162.253.133.43 as permitted sender) Received: from [162.253.133.43] (HELO mwork.nabble.com) (162.253.133.43) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 26 Mar 2015 18:15:41 +0000 Received: from mben.nabble.com (unknown [162.253.133.72]) by mwork.nabble.com (Postfix) with ESMTP id 5234718C5F7F for ; Thu, 26 Mar 2015 11:14:57 -0700 (PDT) Date: Thu, 26 Mar 2015 11:14:39 -0700 (MST) From: shamik To: solr-user@lucene.apache.org Message-ID: <1427393679373-4195591.post@n3.nabble.com> In-Reply-To: References: Subject: Re: Uneven index distribution using composite router MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org Thanks for your reply Eric. In my case, I've 14 languages, out of which 50% of the documents belong to English. German and CHS will probably constitute another 25%. I'm not using copyfield, rather, each language has it's dedicated field such as title_enu, text_enu, title_ger,text_ger, etc. Since I know the language prior to index time, this works for, me. I've added one more sample key in the example. ENU!12345!www.testurl.com/enu/doc1 ENU!12345!www.testurl.com/enu/doc10 GER!12345!www.testurl.com/ger/doc2 CHS!67890!www.testurl.com/chs/doc3 As you can see, there are 2 documents in english having same topic id (12345). I added topicid as part of the key to make sure that they are residing in the same shard in order to make field collapsing work on topic id. I can perhaps remove the composite key and only have language and url, something like, ENU!www.testurl.com/enu/doc1 But that'll probably not solve the distribution issue. You mentioned "when you take over routing, making sure the distribution is even is now your responsibility." I'm wondering, what's the best practice to make it happen ? I can get away from composite router and manually assign a bunch of language to a dedicated shard, both during index and query time. But I'm not sure keeping a map is an efficient way of dealing with it. -- View this message in context: http://lucene.472066.n3.nabble.com/Uneven-index-distribution-using-composite-router-tp4195569p4195591.html Sent from the Solr - User mailing list archive at Nabble.com.