Mailing-List: contact dev-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@lucene.apache.org
Date: Fri, 14 Sep 2012 20:06:07 +1100 (NCT)
From: "Dan Rosher (JIRA)" <jira@apache.org>
To: dev@lucene.apache.org
Message-ID: <1641556857.79572.1347613567797.JavaMail.jiratomcat@arcas>
In-Reply-To: 
 <176709306.6153.1308125267048.JavaMail.tomcat@hel.zones.apache.org>
Subject: [jira] [Commented] (SOLR-2592) Pluggable shard lookup mechanism for
 SolrCloud
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/SOLR-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13455681#comment-13455681 ] 

Dan Rosher commented on SOLR-2592:
----------------------------------

I think I should reiterate that the default is the HashShardPartitioner, the NamedShardPartitioner was supplied as an example, and does what we needed.

HashShardPartitioner partitions by hash(id) % num_shards much as the existing implementation.

NamedShardPartitioner sends the doc to a particular shard, so that e.g. shard Sep2012 ONLY contains docs with doc.shard=Sep2012. Docs with doc.shard=Oct2012 would live in another shard. I think this works much the way Lance pointed out on 07/Jun/12 04:49.

Michael - The problem for us with the patch you've submitted for the composite id, is that it still uses hashing to determine the shard to reside. 
On the indexing side, hashes of the composites might mean that e.g. doc=1234_Sep2012 and doc=4567_Oct2012 might end up in the same hash range and hence on the same shard, one might even end up with ALL docs on the same shard for example. 
On the searching side, again as hashing is used, it's not a simple task to determine which shard docs for Sep2012 would reside and so a query would need to be sent everywhere which would be less efficient, perhaps by a large margin, than sending the query directly to the shard. 

With the NamedShardPartitioner I think that since we know that all related docs live on the same shard, it should be more obvious how to split/merge shard indicies if desired.

These are just two implementations, but since we asbstract ShardPartitioner, a developer can write something that more suits their needs. 
                
> Pluggable shard lookup mechanism for SolrCloud
> ----------------------------------------------
>
>                 Key: SOLR-2592
>                 URL: https://issues.apache.org/jira/browse/SOLR-2592
>             Project: Solr
>          Issue Type: New Feature
>          Components: SolrCloud
>    Affects Versions: 4.0-ALPHA
>            Reporter: Noble Paul
>            Assignee: Mark Miller
>         Attachments: dbq_fix.patch, pluggable_sharding.patch, pluggable_sharding_V2.patch, SOLR-2592.patch, SOLR-2592_r1373086.patch, SOLR-2592_r1384367.patch, SOLR-2592_rev_2.patch, SOLR_2592_solr_4_0_0_BETA_ShardPartitioner.patch
>
>
> If the data in a cloud can be partitioned on some criteria (say range, hash, attribute value etc) It will be easy to narrow down the search to a smaller subset of shards and in effect can achieve more efficient search.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org