lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael Garski (JIRA)" <>
Subject [jira] [Updated] (SOLR-2592) Pluggable shard lookup mechanism for SolrCloud
Date Tue, 15 May 2012 19:31:10 GMT


Michael Garski updated SOLR-2592:

    Attachment: pluggable_sharding.patch

This patch is intended to be a cocktail napkin sketch to get feedback (as such forwarding
queries to the appropriate shards is not yet implemented). I can iterate on this as needed.

The attached patch is a very simple implementation of pluggable sharding which works as follows:

1. Configure a ShardingStrategy in SolrConfig under config/shardingStrategy, if none is configured
the default implementation of sharding on the document's unique id will be performed.
     <shardingStrategy class="solr.UniqueIdShardingStrategy"/>

2. The ShardingStrategy accepts an AddUpdateCommand, DeleteUpdateCommand, or SolrParams to
return a BytesRef that is hashed to determine the destination slice.

3. I have only implemented updates at this time, queries are still distributed across all
shards in the collection. I have added a param to common.params.ShardParams for a 'shard.keys'
parameter that would contain the value(s) which is(are) to be hashed to determine the shard(s)
which is(are) to be queried within the the HttpShardHandler.checkDistributed method. if 'shard.keys'
does not have a value the query would be distributed across all shards in the collection.


There are no unit tests yet however all existing tests pass. 

I am not quite sure about the configuration location within solr config, however as sharding
is used by both update and search requests placing it in the udpateHandler and (potentially
multiple) requestHandler sections would require a duplication of the same information in the
solr config for what I believe is more of a collection-wide configuration.

As hashing currently requires the lucene.util.BytesRef class the solrj client can not currently
hash the request to send the request to a specific node without having solrj add a dependency
on lucene core - something that is most likely not desired.  Additionally, hashing on a unique
id also requires access to the schema as well to determine the field that contains the unique
id. Are there any thoughts on how to alter the hashing to remove these dependencies and allow
for solrj to be a 'smart' client that submits requests directly to nodes that contain the

How would solrj work when multiple updates are included in the request that belong to different
shards? Send the request to one of the nodes and let the server distribute them to the proper
nodes? Perform concurrent requests to the specific nodes?

> Pluggable shard lookup mechanism for SolrCloud
> ----------------------------------------------
>                 Key: SOLR-2592
>                 URL:
>             Project: Solr
>          Issue Type: New Feature
>          Components: SolrCloud
>    Affects Versions: 4.0
>            Reporter: Noble Paul
>         Attachments: pluggable_sharding.patch
> If the data in a cloud can be partitioned on some criteria (say range, hash, attribute
value etc) It will be easy to narrow down the search to a smaller subset of shards and in
effect can achieve more efficient search.  

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:!default.jspa
For more information on JIRA, see:


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message