lucene-solr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jason Rutherglen <jason.rutherg...@gmail.com>
Subject Re: SolrCloud logical shards
Date Fri, 15 Jan 2010 16:51:46 GMT
> The point I was trying to make is that I believe that if you start changing terminologies
now people will be very confused

So shard -> remote core... Slice -> core group.  Though semantically
they're synonyms.  In any case, I need to spend some time looking at
the cloud branch, and less time jibber-jabberin' about it.

On Fri, Jan 15, 2010 at 1:24 AM, Uri Boness <uboness@gmail.com> wrote:
>>
>> Can you elaborate on what you mean, isn't a core a single index
>> too? It seems like shard was used to represent a remote index
>> (perhaps?).
>
> Yes, a core is a single index and a shard is a conceptual idea which at the
> moment concretely refers to a remote core (but not a specific one as the
> same shard can be represented by multiple core replicas). The point I was
> trying to make is that I believe that if you start changing terminologies
> now people will be very confused. And I thought of sticking to Yonik's
> suggestion of a "slice" just to prevent this confusion. On the other hand
> one can argue that the terminology as it is today is already confusing...
> and if you really want to get it right and be aligned with the "rest of the
> world" (if there is such a thing... from what I've seen so far sharding is
> used differently in different contexts), then perhaps a "good" timing for
> making such terminology changes is with a major release (Solr 2.0?) as with
> such release people tend to be more open for new/changed concepts.
>
> Cheers,
> Uri
>
> Jason Rutherglen wrote:
>>
>> Uri,
>>
>>
>>>
>>> "core" to represent a single index and "shard" to be
>>> represented by a single core
>>>
>>
>> Can you elaborate on what you mean, isn't a core a single index
>> too? It seems like shard was used to represent a remote index
>> (perhaps?). Though here I'd prefer "remote core", because to the
>> uninitiated Solr outsider it's immediately obvious (i.e. they
>> need only know what a core is, in the Solr glossary or term
>> dictionary).
>>
>> In Google vernacular, which is where the name shard came from, a
>> "shard" is basically a local sub-index
>> http://research.google.com/archive/googlecluster.html where
>> there would be many "shards" per server. However that's a
>> digression at this point.
>>
>> I personally prefer relatively straightforward names, that are
>> self-evident, rather than inventing new language for fairly
>> simple concepts. Slice, even though it comes from our buddy
>> Yonik, probably doesn't make any immediate sense to external
>> users when compared with the word shard. Of course software
>> projects have a tendency to create their own words to somewhat
>> mystify users into believing in some sort of magic occurring
>> underneath. If that's what we're after, it's cool, I mean that
>> makes sense. And I don't mean to be derogatory here however this
>> is an open source project created in part to educate users on
>> search and be made easily accessible as possible, to the
>> greatest number of users possible. I think Doug did a create job
>> of this when Lucene started with amazingly succinct code for
>> fairly complex concepts (eg, anti-mystification of search).
>>
>> Jason
>>
>> On Thu, Jan 14, 2010 at 2:58 PM, Uri Boness <uboness@gmail.com> wrote:
>>
>>>
>>> Although Jason has some valid points here, I'm with Yonik here. I do
>>> believe
>>> that we've gotten used to the terms "core" to represent a single index
>>> and
>>> "shard" to be represented by a single core. A "node" seems to indicate a
>>> machine or a JVM. Changing any of these (informal perhaps) definitions
>>> will
>>> only cause confusion. That's why I think a "slice" is a good solution
>>> now...
>>> first it's a new term to a new view of the index (logical shard AFAIK
>>> don't
>>> really exists yet) so people won't need to get used to it, but it's also
>>> descriptive and intuitive. I do like Jason's idea about having a protocol
>>> attached to the URL's.
>>>
>>> Cheers,
>>> Uri
>>>
>>> Jason Rutherglen wrote:
>>>
>>>>>
>>>>> But I've kind of gotten used to thinking of shards as the
>>>>> actual physical queryable things...
>>>>>
>>>>>
>>>>
>>>> I think a mistake was made referring to Solr cores as shards.
>>>> It's the same thing with 2 different names. Slices adds yet
>>>> another name which seems to imply the same thing yet again. I'd
>>>> rather see disambiguation here, and call them cores (partially
>>>> because that's what's in the code and on the wiki), and cores
>>>> only. It's a Solr specific term, it's going to be confused with
>>>> microprocessor cores, but at least there's only one name, which
>>>> as search people, we know creates fewer posting lists :).
>>>>
>>>> Logical groupings of cores can occur, which can be aptly named
>>>> core groups. This way I can submit a query to a core group, and
>>>> it's reasonable to assume I'm hitting N cores. Further, cores
>>>> could point to a logical or physical entity via a URL. (As a
>>>> side note, I've always found it odd that the shards param to
>>>> RequestHandler lacks the protocol, what if I want to use HTTPS
>>>> for example?).
>>>>
>>>> So there could be http://host/solr/core1 (physical),
>>>> core://megacorename (logical),
>>>> coregroup://supergreatcoregroupname (a group of cores) in the
>>>> "shards" parameter (whose name can perhaps be changed for
>>>> clarity in a future release). Then people can mix and match and
>>>> we won't have many different XML elements floating around. We'd
>>>> have a simple list of URLs that are transposed into a real
>>>> physical network request.
>>>>
>>>>
>>>> On Thu, Jan 14, 2010 at 12:56 PM, Yonik Seeley
>>>> <yonik@lucidimagination.com> wrote:
>>>>
>>>>
>>>>>
>>>>> On Thu, Jan 14, 2010 at 1:38 PM, Yonik Seeley
>>>>> <yonik@lucidimagination.com> wrote:
>>>>>
>>>>>
>>>>>>
>>>>>> On Thu, Jan 14, 2010 at 12:46 PM, Yonik Seeley
>>>>>> <yonik@lucidimagination.com> wrote:
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> I'm actually starting to lean toward "slice" instead of "logical
>>>>>>> shard".
>>>>>>>
>>>>>>>
>>>>>
>>>>> Alternate terminology could be "index" for the actual physical lucene
>>>>> lindex (and also enough of the URL that unambiguously identifies it),
>>>>> and then "shard" could be the logical entity.
>>>>>
>>>>> But I've kind of gotten used to thinking of shards as the actual
>>>>> physical queryable things...
>>>>>
>>>>> -Yonik
>>>>> http://www.lucidimagination.com
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>
>>
>

Mime
View raw message