lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Per Steffensen (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server
Date Wed, 28 Nov 2012 11:20:58 GMT

    [ https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13505368#comment-13505368
] 

Per Steffensen edited comment on SOLR-4114 at 11/28/12 11:20 AM:
-----------------------------------------------------------------

bq. As far as terminology, when I say replicationFactor of 3, I mean 3 copies of the data.
I also count the leader as a replica of a shard (which is logical). It follows from the clusterstate.json,
which lists all "replicas" for a shard and one of them just has a flag indicating it's the
leader. This also makes it easier to talk about a shard having 0 replicas (meaning there is
not even a leader).

I understand that you can view all shards under a slice as a "replica", but in my mind "replica"
is also a "role" that a shard plays at runtime - all shards except one under a slice play
the "replica role" at runtime, the remaining shard plays the "leader role" at runtime. To
not create to much confusion, I suggest you use the term "shards" for all the instances under
a slice, and that you use the terms "replica" and "leader" only for a role that a shard plays
at runtime.
But that of course would require changes e.g. to Slice-class where e.g. getReplicas, getReplicasCopy
and getReplicasMap needs to me renamed to getShardsXXX. It probably shouldnt be done now,
but as a part of a cross-code cleaning up in term-usage. Today there is a heavy mixup of term-usage
in the code - replica and shard are sometimes used for a node, replica and shard are used
for the same thing, etc.

Suggested terms:
 * collection: A big logical bucket to fill data into
 * slice: A logical part of a collection. A part of the data going into a collection goes
into a particular slice. Slices for a particular collection are non-overlapping
 * shard: A physical instance of a slice. Running without replica there is one shard per slice.
Running with replication-factor X there are X+1 shards per slice.
 * replica and leader: Roles played by shards at runtime. As soon as the system is not running
there are no replica/leader - there are just shards
 * node-base-url: The prefix/base (up to and including the webapp-context) of the URL for
a specific Solr server
 * node-name: A logical name for the Solr server - the same as node-base-url except /'s are
replaced by _'s and the protocol part (http(s)://) is removed

                
      was (Author: steff1193):
    bq. As far as terminology, when I say replicationFactor of 3, I mean 3 copies of the data.
I also count the leader as a replica of a shard (which is logical). It follows from the clusterstate.json,
which lists all "replicas" for a shard and one of them just has a flag indicating it's the
leader. This also makes it easier to talk about a shard having 0 replicas (meaning there is
not even a leader).

I understand that you can view all shards under a slice as a "replica", but in my mind "replica"
is also a "role" that a shard plays at runtime - all shards except one under a slice plays
the "replica role" at runtime, the remaining shard play the "leader role". To not create to
much confusion I suggest you use the term shards for all the instances under a slice, and
that you use the term "replica" only for a role that a shard plays at runtime.
But that of course would require changes e.g. to Slice-class where e.g. getReplicas, getReplicasCopy
and getReplicasMap needs to me renamed to getShardsXXX. It probably shouldnt be done now,
but as a part of a cross-code cleaning up in term-usage.

Suggested terms:
 * collection: A big logical bucket to fill data into
 * slice: A logical part of a collection. A part of the data going into a collection goes
into a particular slice. Slices for a particular collection are non-overlapping
 * shard: A physical instance of a slice. Running without replica there is one shard per slice.
Running with replication-factor X there are X+1 shards per slice.
 * replica and leader: Roles played by shards at runtime. As soon as the system is not running
there are no replica/leader - there are just shards
 * node-base-url: The prefix/base (up to and including the webapp-context) of the URL for
a specific Solr server
 * node-name: A logical name for the Solr server - the same as node-base-url except /'s are
replaced by _'s and the protocol part (http(s)://) is removed

                  
> Collection API: Allow multiple shards from one collection on the same Solr server
> ---------------------------------------------------------------------------------
>
>                 Key: SOLR-4114
>                 URL: https://issues.apache.org/jira/browse/SOLR-4114
>             Project: Solr
>          Issue Type: New Feature
>          Components: multicore, SolrCloud
>    Affects Versions: 4.0
>         Environment: Solr 4.0.0 release
>            Reporter: Per Steffensen
>            Assignee: Per Steffensen
>              Labels: collection-api, multicore, shard, shard-allocation
>         Attachments: SOLR-4114.patch
>
>
> We should support running multiple shards from one collection on the same Solr server
- the run a collection with 8 shards on a 4 Solr server cluster (each Solr server running
2 shards).
> Performance tests at our side has shown that this is a good idea, and it is also a good
idea for easy elasticity later on - it is much easier to move an entire existing shards from
one Solr server to another one that just joined the cluter than it is to split an exsiting
shard among the Solr that used to run it and the new Solr.
> See dev mailing list discussion "Multiple shards for one collection on the same Solr
server"

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message