lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <>
Subject Re: Solr Cloud sharding strategy
Date Tue, 08 Mar 2016 03:59:46 GMT
What do you mean "the rest of the cluster"? The routing is based on
the key provided. All of the "enu" prefixes will go to one of your
shards. All the "deu" docs will appear on one shard. All the "esp"
will be on one shard. All the "chs" docs will be on one shard.

Which shard will each go to? Good question. Especially when you have
small numbers of keys and/or one of the keys has a majority of your
corpus you can end up with very uneven distributions. If you require
individual control, what I'd do is create separate _collections_ for
each language, then use collection aliasing to have a single URL to
query. Of course that requires that you index to the correct
collection.... You could also create a collection for the language
with the most docs and one for "everything else". Or....

The advantage here is that the collection can be tailored to the
number of docs. That is, the Spanish collection may be a single shard
whereas the English one may be 4 shards....

But really, with a corpus this size I wouldn't worry about it. I
suspect you're over-thinking the problem.

And one addendum to Walter's comment... I often turn caching off (or
waaaay down) when doing perf testing if I can't mine logs for, say,
100K queries in an attempt to negate effects of caching, but that
doesn't force swapping though which is its weakness.

I worked with one client that was thrilled at getting < 5ms response
times for their stress tests with many threads simultaneously
executing queries.... except they were firing the exact same query
over and over and over.....


On Mon, Mar 7, 2016 at 7:36 PM, shamik <> wrote:
> Thanks Eric and Walter, this is extremely insightful. One last followup
> question on composite routing. I'm trying to have a better understanding of
> index distribution. If I use language as a prefix, SolrCloud guarantees that
> same language content will be routed to the same shard. What I'm curious to
> know is how rest of the data is being distributed across remaining shards.
> For e.g. I've the following composite keys,
> enu!doc1
> enu!doc2
> deu!doc3
> deu!doc4
> esp!doc5
> chs!doc6
> If I've 2 shards in the cluster, will SolrCloud try to distribute the above
> data evenly? Is is possible that enu will be routed to shard1 while deu goes
> to shard2, and esp and chs gets indexed in either of them. Or, all of them
> can potentially end up getting indexed in the same shard, either 1 or 2,
> leaving one shard under-utilized.
> --
> View this message in context:
> Sent from the Solr - User mailing list archive at

View raw message