lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael Garski (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SOLR-2592) Pluggable shard lookup mechanism for SolrCloud
Date Sat, 23 Jun 2012 17:34:42 GMT

    [ https://issues.apache.org/jira/browse/SOLR-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13399999#comment-13399999
] 

Michael Garski commented on SOLR-2592:
--------------------------------------

One of the use cases I have is identical to yours Andy, where shard membership is used to
ensure accuracy of the numGroups in the response for a distributed grouping query.

The challenge in that use case is that during an update a document could potentially move
from one shard to another, requiring deleting it from its current shard along with adding
it to the shard where it will now reside. If the previous value of the shardKey is not known,
the same delete by query operation you have in 'Now' would have to be broadcast to all shards
to ensure there are no duplicate unique ids in the collection. It looks like that would result
in the same overhead as using the composite id. Do you have any ideas on how to handle that
during an update?

Adding a separate shardKey definition to the schema would also cascade the change to the real-time
get handler, which currently only uses the unique document ids as an input.

Regarding date-based sharding, I look at that as being handled differently. With hashing a
document is being assigned to a specific shard from a set of known shards where with date-based
sharding I would imagine one would want to bring up a new shard for a specific time period,
perhaps daily or hourly. I can imagine that it might be desirable in some cases to merge shards
with older date ranges together as well if the use case favors recent updates at search time.
                
> Pluggable shard lookup mechanism for SolrCloud
> ----------------------------------------------
>
>                 Key: SOLR-2592
>                 URL: https://issues.apache.org/jira/browse/SOLR-2592
>             Project: Solr
>          Issue Type: New Feature
>          Components: SolrCloud
>    Affects Versions: 4.0
>            Reporter: Noble Paul
>         Attachments: SOLR-2592.patch, dbq_fix.patch, pluggable_sharding.patch, pluggable_sharding_V2.patch
>
>
> If the data in a cloud can be partitioned on some criteria (say range, hash, attribute
value etc) It will be easy to narrow down the search to a smaller subset of shards and in
effect can achieve more efficient search.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message