lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Abhi Basu <>
Subject Re: Decision on Number of shards and collection
Date Wed, 11 Apr 2018 15:55:51 GMT
 *The BKM I have read so far (trying to find source) says 50 million
docs/shard performs well. I have found this in my recent tests as well. But
of course it depends on index structure, etc.*

On Wed, Apr 11, 2018 at 10:37 AM, Shawn Heisey <> wrote:

> On 4/11/2018 4:15 AM, neotorand wrote:
> > I believe heterogeneous data can be indexed to same collection and i can
> > have multiple shards for the index to be partitioned.So whats the need
> of a
> > second collection?. yes when collection size grows i should look for more
> > collection.what exactly that size is? what KPI drives the decision of
> having
> > more collection?Any pointers or links for best practice.
> There are no hard rules.  Many factors affect these decisions.
> abstract-why-we-dont-have-a-definitive-answer/
> Creating multiple collections should be done when there is a logical or
> business reason for keeping different sets of data separate from each
> other.  If there's never any need for people to query all the data at
> once, then it might make sense to use separate collections.  Or you
> might want to put them together just for convenience, and use data in
> the index to filter the results to only the information that the user is
> allowed to access.
> > when should i go for multiple shards?
> > yes when shard size grows.Right? whats the size and how do i benchmark.
> Some indexes function really well with 300 million documents or more per
> shard.  Other indexes struggle with less than a million per shard.  It's
> impossible to give you any specific number.  It depends on a bunch of
> factors.
> If query rate is very high, then you want to keep the shard count low.
> Using one shard might not be possible due to index size, but it should
> be as low as you can make it.  You're also going to want to have a lot
> of replicas to handle the load.
> If query rate is extremely low, then sharding the index can actually
> *improve* performance, because there will be idle CPU capacity that can
> be used for the subqueries.
> Thanks,
> Shawn

Abhi Basu

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message