lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Darren Govoni" <dar...@ontrenet.com>
Subject RE: Re: Terminology question: Core vs. Collection vs...
Date Thu, 03 Jan 2013 14:25:04 GMT
Ah, ok. Good. Makes sense.

I think I will draw all this up in a UML that includes the distinction between the "logical"
terms and the "physical" terms (and their mapping) as they do get intertwined. I'll post it
here when I'm done.

<br><br><br>------- Original Message -------
On 1/3/2013  09:19 AM Jack Krupansky wrote:<br>A single shard MAY exist on a single
core, but only if it is not replicated. 
<br>Generally, a single shard will exist on multiple cores, each a replica of 
<br>the source data as it comes into the update handler.
<br>
<br>-- Jack Krupansky
<br>
<br>-----Original Message----- 
<br>From: Darren Govoni
<br>Sent: Thursday, January 03, 2013 9:10 AM
<br>To: solr-user@lucene.apache.org
<br>Subject: RE: Re: Terminology question: Core vs. Collection vs...
<br>
<br>Thanks. I got that part.
<br>
<br>A group of shards (and therefore cores) represent a collection, yes. But a 
<br>single shard exist only on a single core?
<br>
<br><br><br><br>------- Original Message -------
<br>On 1/3/2013  09:03 AM Jack Krupansky wrote:<br>No, a shard is a subset (or

<br>"slice") of the collection. Sharding is a way of
<br><br>"slicing" the original data, before we talk about how the shards get 
<br>stored
<br><br>and replicated on actual Solr cores. Replicas are instances of the data

<br>for
<br><br>a shard.
<br><br>
<br><br>Sometimes people may loosely speak of a replica as being "a shard", but
<br><br>that's just loose use of the terminology.
<br><br>
<br><br>So, we're not "sharding shards", but we are "replicating shards".
<br><br>
<br><br>-- Jack Krupansky
<br><br>
<br><br>-----Original Message----- 
<br><br>From: Darren Govoni
<br><br>Sent: Thursday, January 03, 2013 8:51 AM
<br><br>To: solr-user@lucene.apache.org
<br><br>Subject: RE: Re: Terminology question: Core vs. Collection vs...
<br><br>
<br><br>Thanks again. (And sorry to jump into this convo)
<br><br>
<br><br>But I had a question on your statement:
<br><br>
<br><br>On 1/3/2013 08:07 AM Jack Krupansky wrote:
<br><br>   <br>Collection is the more modern term and incorporates the fact
that 
<br>the
<br><br><br>collection may be sharded, with each shard on one or more cores,

<br>with
<br><br>each <br>core being a replica of the other cores within that shard
of 
<br>that
<br><br><br>collection.
<br><br>
<br><br>A collection is sharded, meaning it is distributed across cores. A shard
<br><br>itself is not distributed across cores in the same since. Rather a shard
<br><br>exist on a single core and is replicated on other cores. Is that right?

<br>The
<br><br>way its worded above, it sounds like a shard can also be sharded...
<br><br>
<br><br>
<br><br><br><br><br>------- Original Message -------
<br><br>On 1/3/2013  08:28 AM Jack Krupansky wrote:<br>A node is a machine
in a
<br><br>cluster or cloud (graph). It could be a real
<br><br><br>machine or a virtualized machine. Technically, you could have

<br>multiple
<br><br><br>virtual nodes on the same physical "box". Each Solr replica
would be 
<br>on
<br><br>a
<br><br><br>different node.
<br><br><br>
<br><br><br>Technically, you could have multiple Solr instances running
on a 
<br>single
<br><br><br>hardware node, each with a different port. They are simply instances

<br>of
<br><br><br>Solr, although you could consider each Solr instance a node
in a 
<br>Solr
<br><br>cloud
<br><br><br>as well, a "virtual" node. So, technically, you could have multiple
<br><br>replicas
<br><br><br>on the same node, but that sort of defeats most of the purpose
of 
<br>having
<br><br><br>replicas in the first place - to distribute the data for performance

<br>and
<br><br><br>fault tolerance. But, you could have replicas of different shards
on 
<br>the
<br><br><br>same node/box for a partial improvement of performance and fault
<br><br>tolerance.
<br><br><br>
<br><br><br>A Solr "cloud' is really a cluster.
<br><br><br>
<br><br><br>-- Jack Krupansky
<br><br><br>
<br><br><br>-----Original Message----- 
<br><br><br>From: Darren Govoni
<br><br><br>Sent: Thursday, January 03, 2013 8:16 AM
<br><br><br>To: solr-user@lucene.apache.org
<br><br><br>Subject: RE: Re: Terminology question: Core vs. Collection vs...
<br><br><br>
<br><br><br>Good write up.
<br><br><br>
<br><br><br>And what about "node"?
<br><br><br>
<br><br><br>I think there needs to be an official glossary of terms that
is
<br><br>sanctioned
<br><br><br>by the solr team and some terms still ni use may need to be
labeled
<br><br><br>"deprecated". After so many years, its still confusing.
<br><br><br>
<br><br><br><br><br><br>------- Original Message -------
<br><br><br>On 1/3/2013  08:07 AM Jack Krupansky wrote:<br>Collection
is the 
<br>more
<br><br>modern
<br><br><br>term and incorporates the fact that the
<br><br><br><br>collection may be sharded, with each shard on one
or more cores,
<br><br>with
<br><br><br>each
<br><br><br><br>core being a replica of the other cores within that
shard of 
<br>that
<br><br><br><br>collection.
<br><br><br><br>
<br><br><br><br>Instance is a general term, but is commonly used to
refer to a
<br><br>running
<br><br><br>Solr
<br><br><br><br>server, each of which can service any number of cores.
A sharded
<br><br><br>collection
<br><br><br><br>would typically require multiple instances of Solr,
each with a
<br><br>shard of
<br><br><br>the
<br><br><br><br>collection.
<br><br><br><br>
<br><br><br><br>Multiple collections can be supported on a single
instance of 
<br>Solr.
<br><br>They
<br><br><br><br>don't have to be sharded or replicated. But if they
are, each 
<br>Solr
<br><br><br>instance
<br><br><br><br>will have a copy or replica of the data (index) of
one shard of 
<br>each
<br><br><br>sharded
<br><br><br><br>collection - to the degree that each collection needs
that many
<br><br>shards.
<br><br><br><br>
<br><br><br><br>At the API level, you talk to a Solr instance, using
a host and
<br><br>port,
<br><br><br>and
<br><br><br><br>giving the collection name. Some operations will refer
only to 
<br>the
<br><br><br>portion
<br><br><br><br>of a multi-shard collection on that Solr instance,
but typically
<br><br>Solr
<br><br><br>will
<br><br><br><br>"distribute" the operation, whether it be an update
or a query, 
<br>to
<br><br>all
<br><br><br>of
<br><br><br><br>the shards of the named collection. In the case of
update, the
<br><br>update
<br><br><br>will
<br><br><br><br>be distributed to all replicas as well, but in the
case of query
<br><br>only
<br><br><br>one
<br><br><br><br>replica of each shard of the collection is needed.
<br><br><br><br>
<br><br><br><br>Before SolrCloud we Solr had master and slave and
the slaves 
<br>were
<br><br><br>replicas
<br><br><br><br>of the master, but with SolrCloud there is no master
and all the
<br><br><br>replicas of
<br><br><br><br>the shard are peers, although at any moment of time
one of them 
<br>will
<br><br>be
<br><br><br><br>considered the "leader" for coordination purposes,
but not in 
<br>the
<br><br>sense
<br><br><br>that
<br><br><br><br>it is a master of the other replicas in that shard.
A SolrCloud
<br><br>replica
<br><br><br>is a
<br><br><br><br>replica of the data, in an abstract sense, for a single
shard of 
<br>a
<br><br><br><br>collection. A SolrCloud replica is more of an instance
of the
<br><br><br>data/index.
<br><br><br><br>
<br><br><br><br>An index exists at two levels: the portion of a collection
on a
<br><br>single
<br><br><br>Solr
<br><br><br><br>core will have a Lucene index, but collectively the
Lucene 
<br>indexes
<br><br>for
<br><br><br>the
<br><br><br><br>shards of a collection can be referred to the index
of the
<br><br>collection.
<br><br><br>Each
<br><br><br><br>replica is a copy or instance of a portion of the
collection's
<br><br>index.
<br><br><br><br>
<br><br><br><br>The term slice is sometimes used to refer collectively
to all of 
<br>the
<br><br><br><br>cores/replicas of a single shard, or sometimes to
a single 
<br>replica
<br><br>as it
<br><br><br><br>contains only a "slice" of the full collection data.
<br><br><br><br>
<br><br><br><br>-- Jack Krupansky
<br><br><br><br>
<br><br><br><br>-----Original Message----- 
<br><br><br><br>From: Alexandre Rafalovitch
<br><br><br><br>Sent: Thursday, January 03, 2013 4:42 AM
<br><br><br><br>To: solr-user@lucene.apache.org
<br><br><br><br>Subject: Terminology question: Core vs. Collection
vs...
<br><br><br><br>
<br><br><br><br>Hello,
<br><br><br><br>
<br><br><br><br>I am trying to understand the core Solr terminology.
I am 
<br>looking
<br><br>for
<br><br><br><br>correct rather than loose meaning as I am trying to
teach an 
<br>example
<br><br><br>that
<br><br><br><br>starts from easy scenario and may scale to multi-core,

<br>multi-machine
<br><br><br><br>situation.
<br><br><br><br>
<br><br><br><br>Here are the terms that seem to be all overlapping
and/or 
<br>crossing
<br><br>over
<br><br><br>in
<br><br><br><br>my mind a the moment.
<br><br><br><br>
<br><br><br><br>1) Index
<br><br><br><br>2) Core
<br><br><br><br>3) Collection
<br><br><br><br>4) Instance
<br><br><br><br>5) Replica (Replica of _what_?)
<br><br><br><br>6) Others?
<br><br><br><br>
<br><br><br><br>I tried looking through documentation, but either
there is a
<br><br>terminology
<br><br><br><br>drift or I am having trouble understanding the distinctions.
<br><br><br><br>
<br><br><br><br>If anybody has a clear picture in their mind, I would
appreciate 
<br>a
<br><br><br><br>clarification.
<br><br><br><br>
<br><br><br><br>Regards,
<br><br><br><br>   Alex.
<br><br><br><br>
<br><br><br><br>Personal blog: http://blog.outerthoughts.com/
<br><br><br><br>LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
<br><br><br><br>- Time is the quality of nature that keeps events
from happening 
<br>all
<br><br>at
<br><br><br><br>once. Lately, it doesn't seem to be working.  (Anonymous
 - via 
<br>GTD
<br><br><br>book)
<br><br><br><br>
<br><br><br><br>
<br><br><br>
<br><br><br>
<br><br>
<br><br> 
<br>
<br>

Mime
View raw message