lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shalin Shekhar Mangar <shalinman...@gmail.com>
Subject Re: Finding out optimal hash ranges for shard split
Date Wed, 06 May 2015 11:58:18 GMT
Hi Anand,

The nature of the hash function (murmur3) should lead to a approximately
uniform distribution of documents across sub-shards. Have you investigated
why, if at all, the sub-shards are not balanced? Do you use composite keys
e.g. abc!id1 which cause the imbalance?

I don't think there is a (cheap) way to implement what you are asking in
the current scheme of things because unless we go through each id and
calculate the hash, we have no way of knowing the optimal distribution.
However, if we were to store the hash of the key as a separate field in the
index then it should be possible to binary search for hash ranges which
lead to approx. equal distribution of docs in sub-shards. We can implement
something like that inside Solr.

On Wed, May 6, 2015 at 4:42 PM, anand.mahajan <anand@zerebral.co.in> wrote:

> Okay - Thanks for the confirmation Shalin.  Could this be a feature request
> in the Collections API - that we have a Split shard dry run API that
> accepts
> sub-shards count as a request param and returns the optimal shard ranges
> for
> the number of sub-shards requested to be created along with the respective
> document counts for each of the sub-shards? The users can then use this
> shard ranges for the actual split?
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Finding-out-optimal-hash-ranges-for-shard-split-tp4203609p4204100.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Regards,
Shalin Shekhar Mangar.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message