lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael Garski (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SOLR-2592) Pluggable shard lookup mechanism for SolrCloud
Date Mon, 21 May 2012 18:21:41 GMT

     [ https://issues.apache.org/jira/browse/SOLR-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Michael Garski updated SOLR-2592:
---------------------------------

    Attachment: pluggable_sharding_V2.patch

Here is an update to my original patch that accounts for the requirement of hashing based
on unique id and works as follows:

1. Configure a ShardKeyParserFactory in SolrConfig under config/shardKeyParserFactory. If
there is not one configured the default implementation of sharding on the document's unique
id will be performed. The default configuration is equivalent to:
{code:xml} 
<shardKeyParserFactory class="solr.ShardKeyParserFactory"/>
{code}

2. The ShardKeyParser has two methods to parse a shard key out of the unique id or a delete
by query. The default implementation returns the string value of the unique id when parsing
the unique id to forward it to the specific shard, and null when parsing the delete by query
to broadcast a delete by query to the entire collection.

3. Queries can be directed to a subset of shards in the collection by specifying one or more
shard keys in the request parameter 'shard.keys'.

Notes:

There are no distinct unit tests for this change yet, however all current unit tests pass.
The switch to hashing on the string value rather than the indexed value is how I realized
the real-time get component requires support for hashing based on the document's unique id
with a failing test.

By hashing on the string values rather than indexed values, the solrj client can direct queries
to a specific shard however this is not yet implemented.

I put the hashing function in the oas.common.cloud.HashPartioner class, which encapsulates
the hashing and partitioning in one place.  I can see a desire for a pluggable collection
partitioning where a collection could be partitioned on time periods or some other criteria
but that is outside of the scope of pluggable shard hashing.

                
> Pluggable shard lookup mechanism for SolrCloud
> ----------------------------------------------
>
>                 Key: SOLR-2592
>                 URL: https://issues.apache.org/jira/browse/SOLR-2592
>             Project: Solr
>          Issue Type: New Feature
>          Components: SolrCloud
>    Affects Versions: 4.0
>            Reporter: Noble Paul
>         Attachments: pluggable_sharding.patch, pluggable_sharding_V2.patch
>
>
> If the data in a cloud can be partitioned on some criteria (say range, hash, attribute
value etc) It will be easy to narrow down the search to a smaller subset of shards and in
effect can achieve more efficient search.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message