From solr-user-return-140480-archive-asf-public=cust-asf.ponee.io@lucene.apache.org Wed Apr 11 17:37:32 2018 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id A993718067B for ; Wed, 11 Apr 2018 17:37:31 +0200 (CEST) Received: (qmail 18424 invoked by uid 500); 11 Apr 2018 15:37:24 -0000 Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-user@lucene.apache.org Delivered-To: mailing list solr-user@lucene.apache.org Received: (qmail 18386 invoked by uid 99); 11 Apr 2018 15:37:23 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 11 Apr 2018 15:37:23 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 751831806B3 for ; Wed, 11 Apr 2018 15:37:23 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -0.112 X-Spam-Level: X-Spam-Status: No, score=-0.112 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (1024-bit key) header.d=elyograg.org Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id dhTrX-0-WUEf for ; Wed, 11 Apr 2018 15:37:22 +0000 (UTC) Received: from frodo.elyograg.org (frodo.elyograg.org [166.70.79.217]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id 130E35F3F0 for ; Wed, 11 Apr 2018 15:37:21 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by frodo.elyograg.org (Postfix) with ESMTP id F1C5FB2D for ; Wed, 11 Apr 2018 09:37:15 -0600 (MDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=elyograg.org; h= content-language:content-transfer-encoding:content-type :content-type:in-reply-to:mime-version:user-agent:date:date :message-id:from:from:references:subject:subject:received :received; s=mail; t=1523461035; bh=4x3LNlAiK4tl49VZ7wwhBdNK7v8L QMu5HS+ZZ+a6Dd8=; b=VBVkB/SSPfVCTbR4z6r+Gr8vlZhGXHi9nq9NLFiQQ7Qg TVuDlu6qJCbgF//1yq2ydKZI9PR3fwBw/q7yMqVWfFsXHLG4m2nPB3QS6k4jTnls UrI3hjZEK4SVQWFpEOF3U3QSfB7QAhTALE03tObokAKKaX9eABtQIsAmoxrYtU0= X-Virus-Scanned: Debian amavisd-new at frodo.elyograg.org Received: from frodo.elyograg.org ([127.0.0.1]) by localhost (frodo.elyograg.org [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id B4kqQUYJOVmX for ; Wed, 11 Apr 2018 09:37:15 -0600 (MDT) Received: from [10.2.0.108] (client175.mainstreamdata.com [209.63.42.175]) (using TLSv1 with cipher DHE-RSA-AES128-SHA (128/128 bits)) (No client certificate requested) (Authenticated sender: elyograg@elyograg.org) by frodo.elyograg.org (Postfix) with ESMTPSA id E9E68B27 for ; Wed, 11 Apr 2018 09:37:14 -0600 (MDT) Subject: Re: Decision on Number of shards and collection To: solr-user@lucene.apache.org References: <1523441736052-0.post@n3.nabble.com> From: Shawn Heisey Message-ID: Date: Wed, 11 Apr 2018 09:37:08 -0600 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.7.0 MIME-Version: 1.0 In-Reply-To: <1523441736052-0.post@n3.nabble.com> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Content-Language: en-US On 4/11/2018 4:15 AM, neotorand wrote: > I believe heterogeneous data can be indexed to same collection and i can > have multiple shards for the index to be partitioned.So whats the need of a > second collection?. yes when collection size grows i should look for more > collection.what exactly that size is? what KPI drives the decision of having > more collection?Any pointers or links for best practice. There are no hard rules.  Many factors affect these decisions. https://lucidworks.com/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/ Creating multiple collections should be done when there is a logical or business reason for keeping different sets of data separate from each other.  If there's never any need for people to query all the data at once, then it might make sense to use separate collections.  Or you might want to put them together just for convenience, and use data in the index to filter the results to only the information that the user is allowed to access. > when should i go for multiple shards? > yes when shard size grows.Right? whats the size and how do i benchmark. Some indexes function really well with 300 million documents or more per shard.  Other indexes struggle with less than a million per shard.  It's impossible to give you any specific number.  It depends on a bunch of factors. If query rate is very high, then you want to keep the shard count low.  Using one shard might not be possible due to index size, but it should be as low as you can make it.  You're also going to want to have a lot of replicas to handle the load. If query rate is extremely low, then sharding the index can actually *improve* performance, because there will be idle CPU capacity that can be used for the subqueries. Thanks, Shawn