lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael Garski (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SOLR-2592) Pluggable shard lookup mechanism for SolrCloud
Date Mon, 21 May 2012 00:45:41 GMT

    [ https://issues.apache.org/jira/browse/SOLR-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13279906#comment-13279906
] 

Michael Garski commented on SOLR-2592:
--------------------------------------

I've been tinkering with this a bit more and discovered that the real-time get component requires
the ability to hash on the unique id of a document to determine the shard to forward the request
to, requiring any hashing implementation to support hashing based off the unique id of a document.
To account for this any custom hashing based on an arbitrary document property or other value
would have to hash to the same value as the unique document id. 

Looking at what representation of the unique id to hash on, by hashing based off of the string
value rather than the indexed value, solrj and any other smart client can hash it and submit
requests directly to the proper shard.  The method oas.common.util.ByteUtils.UTF16toUTF8 would
serve the purpose of generating the bytes for the murmur hash on both client and server.

On both the client and server side updates and deletes by id are simple enough to hash, queries
would require an additional parameter to specify a value to be hashed to determine the shard
to forward the request to. It would also be a good idea to be able to optionally specify a
value to hash on to direct the query to a particular shard rather than broadcast to all shards.

I'm going to rework what I had in the previous patch to account for this.

                
> Pluggable shard lookup mechanism for SolrCloud
> ----------------------------------------------
>
>                 Key: SOLR-2592
>                 URL: https://issues.apache.org/jira/browse/SOLR-2592
>             Project: Solr
>          Issue Type: New Feature
>          Components: SolrCloud
>    Affects Versions: 4.0
>            Reporter: Noble Paul
>         Attachments: pluggable_sharding.patch
>
>
> If the data in a cloud can be partitioned on some criteria (say range, hash, attribute
value etc) It will be easy to narrow down the search to a smaller subset of shards and in
effect can achieve more efficient search.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message