lucene-solr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jason Rutherglen <>
Subject Re: SolrCloud logical shards
Date Thu, 14 Jan 2010 23:21:27 GMT

> "core" to represent a single index and "shard" to be
> represented by a single core

Can you elaborate on what you mean, isn't a core a single index
too? It seems like shard was used to represent a remote index
(perhaps?). Though here I'd prefer "remote core", because to the
uninitiated Solr outsider it's immediately obvious (i.e. they
need only know what a core is, in the Solr glossary or term

In Google vernacular, which is where the name shard came from, a
"shard" is basically a local sub-index where
there would be many "shards" per server. However that's a
digression at this point.

I personally prefer relatively straightforward names, that are
self-evident, rather than inventing new language for fairly
simple concepts. Slice, even though it comes from our buddy
Yonik, probably doesn't make any immediate sense to external
users when compared with the word shard. Of course software
projects have a tendency to create their own words to somewhat
mystify users into believing in some sort of magic occurring
underneath. If that's what we're after, it's cool, I mean that
makes sense. And I don't mean to be derogatory here however this
is an open source project created in part to educate users on
search and be made easily accessible as possible, to the
greatest number of users possible. I think Doug did a create job
of this when Lucene started with amazingly succinct code for
fairly complex concepts (eg, anti-mystification of search).


On Thu, Jan 14, 2010 at 2:58 PM, Uri Boness <> wrote:
> Although Jason has some valid points here, I'm with Yonik here. I do believe
> that we've gotten used to the terms "core" to represent a single index and
> "shard" to be represented by a single core. A "node" seems to indicate a
> machine or a JVM. Changing any of these (informal perhaps) definitions will
> only cause confusion. That's why I think a "slice" is a good solution now...
> first it's a new term to a new view of the index (logical shard AFAIK don't
> really exists yet) so people won't need to get used to it, but it's also
> descriptive and intuitive. I do like Jason's idea about having a protocol
> attached to the URL's.
> Cheers,
> Uri
> Jason Rutherglen wrote:
>>> But I've kind of gotten used to thinking of shards as the
>>> actual physical queryable things...
>> I think a mistake was made referring to Solr cores as shards.
>> It's the same thing with 2 different names. Slices adds yet
>> another name which seems to imply the same thing yet again. I'd
>> rather see disambiguation here, and call them cores (partially
>> because that's what's in the code and on the wiki), and cores
>> only. It's a Solr specific term, it's going to be confused with
>> microprocessor cores, but at least there's only one name, which
>> as search people, we know creates fewer posting lists :).
>> Logical groupings of cores can occur, which can be aptly named
>> core groups. This way I can submit a query to a core group, and
>> it's reasonable to assume I'm hitting N cores. Further, cores
>> could point to a logical or physical entity via a URL. (As a
>> side note, I've always found it odd that the shards param to
>> RequestHandler lacks the protocol, what if I want to use HTTPS
>> for example?).
>> So there could be http://host/solr/core1 (physical),
>> core://megacorename (logical),
>> coregroup://supergreatcoregroupname (a group of cores) in the
>> "shards" parameter (whose name can perhaps be changed for
>> clarity in a future release). Then people can mix and match and
>> we won't have many different XML elements floating around. We'd
>> have a simple list of URLs that are transposed into a real
>> physical network request.
>> On Thu, Jan 14, 2010 at 12:56 PM, Yonik Seeley
>> <> wrote:
>>> On Thu, Jan 14, 2010 at 1:38 PM, Yonik Seeley
>>> <> wrote:
>>>> On Thu, Jan 14, 2010 at 12:46 PM, Yonik Seeley
>>>> <> wrote:
>>>>> I'm actually starting to lean toward "slice" instead of "logical
>>>>> shard".
>>> Alternate terminology could be "index" for the actual physical lucene
>>> lindex (and also enough of the URL that unambiguously identifies it),
>>> and then "shard" could be the logical entity.
>>> But I've kind of gotten used to thinking of shards as the actual
>>> physical queryable things...
>>> -Yonik

View raw message