lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dan Rosher (JIRA)" <>
Subject [jira] [Commented] (SOLR-2592) Pluggable shard lookup mechanism for SolrCloud
Date Fri, 14 Sep 2012 09:06:07 GMT


Dan Rosher commented on SOLR-2592:

I think I should reiterate that the default is the HashShardPartitioner, the NamedShardPartitioner
was supplied as an example, and does what we needed.

HashShardPartitioner partitions by hash(id) % num_shards much as the existing implementation.

NamedShardPartitioner sends the doc to a particular shard, so that e.g. shard Sep2012 ONLY
contains docs with doc.shard=Sep2012. Docs with doc.shard=Oct2012 would live in another shard.
I think this works much the way Lance pointed out on 07/Jun/12 04:49.

Michael - The problem for us with the patch you've submitted for the composite id, is that
it still uses hashing to determine the shard to reside. 
On the indexing side, hashes of the composites might mean that e.g. doc=1234_Sep2012 and doc=4567_Oct2012
might end up in the same hash range and hence on the same shard, one might even end up with
ALL docs on the same shard for example. 
On the searching side, again as hashing is used, it's not a simple task to determine which
shard docs for Sep2012 would reside and so a query would need to be sent everywhere which
would be less efficient, perhaps by a large margin, than sending the query directly to the

With the NamedShardPartitioner I think that since we know that all related docs live on the
same shard, it should be more obvious how to split/merge shard indicies if desired.

These are just two implementations, but since we asbstract ShardPartitioner, a developer can
write something that more suits their needs. 
> Pluggable shard lookup mechanism for SolrCloud
> ----------------------------------------------
>                 Key: SOLR-2592
>                 URL:
>             Project: Solr
>          Issue Type: New Feature
>          Components: SolrCloud
>    Affects Versions: 4.0-ALPHA
>            Reporter: Noble Paul
>            Assignee: Mark Miller
>         Attachments: dbq_fix.patch, pluggable_sharding.patch, pluggable_sharding_V2.patch,
SOLR-2592.patch, SOLR-2592_r1373086.patch, SOLR-2592_r1384367.patch, SOLR-2592_rev_2.patch,
> If the data in a cloud can be partitioned on some criteria (say range, hash, attribute
value etc) It will be easy to narrow down the search to a smaller subset of shards and in
effect can achieve more efficient search.  

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message