Return-Path: Delivered-To: apmail-lucene-solr-dev-archive@minotaur.apache.org Received: (qmail 16180 invoked from network); 15 Jan 2010 16:52:19 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 15 Jan 2010 16:52:19 -0000 Received: (qmail 82514 invoked by uid 500); 15 Jan 2010 16:52:18 -0000 Delivered-To: apmail-lucene-solr-dev-archive@lucene.apache.org Received: (qmail 82444 invoked by uid 500); 15 Jan 2010 16:52:18 -0000 Mailing-List: contact solr-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-dev@lucene.apache.org Delivered-To: mailing list solr-dev@lucene.apache.org Received: (qmail 82434 invoked by uid 99); 15 Jan 2010 16:52:18 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 15 Jan 2010 16:52:18 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of jason.rutherglen@gmail.com designates 209.85.160.46 as permitted sender) Received: from [209.85.160.46] (HELO mail-pw0-f46.google.com) (209.85.160.46) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 15 Jan 2010 16:52:08 +0000 Received: by pwi11 with SMTP id 11so428868pwi.5 for ; Fri, 15 Jan 2010 08:51:46 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type; bh=vsTLTNepPvD/gMLr9pP3VP1nv5gZFNuEnLWN9fGok7I=; b=oQA/Wq6ooXFDJ1bR/MJvk4LKbB43uZzmfeDfqBf5Ty6HmGlRemqHIHTNT0UUdq6sVA AJo7HSeJbgLGTK5IAMaTUQqisIOjH2oDia/pxxLS2Yk2GLTdNGPuja3kRJUdFiyLeSun P9nuX3sL4L0A99FSJl0WCifuutXRK6BwK+/cI= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=trXEnOPKmFnoEzgp8hyTCic036TO/Fg+b2Ic5HjdZRiV85qoqRteOsBD3THcBy5aip 6HOQDEg2YcnHv3BcAlDAJqKM7ET7RZOGqwYfZ9USh8We9hzHAEk1h7hSjLKM7KPpFDWr SzUDUy1SPbXXJZC5OzWfVbzENHkGoDcIMWOFM= MIME-Version: 1.0 Received: by 10.141.106.8 with SMTP id i8mr1880691rvm.242.1263574306778; Fri, 15 Jan 2010 08:51:46 -0800 (PST) In-Reply-To: <4B50344F.5080704@gmail.com> References: <85d3c3b61001141337w6cefda4fr19a2cd638ba75cef@mail.gmail.com> <4B4FA18E.5040202@gmail.com> <85d3c3b61001141521t357898b6hadca02c0eae5a9d6@mail.gmail.com> <4B50344F.5080704@gmail.com> Date: Fri, 15 Jan 2010 08:51:46 -0800 Message-ID: <85d3c3b61001150851o4ab55c0bu87f65d7c83cf5a6f@mail.gmail.com> Subject: Re: SolrCloud logical shards From: Jason Rutherglen To: solr-dev@lucene.apache.org Content-Type: text/plain; charset=ISO-8859-1 X-Virus-Checked: Checked by ClamAV on apache.org > The point I was trying to make is that I believe that if you start changing terminologies now people will be very confused So shard -> remote core... Slice -> core group. Though semantically they're synonyms. In any case, I need to spend some time looking at the cloud branch, and less time jibber-jabberin' about it. On Fri, Jan 15, 2010 at 1:24 AM, Uri Boness wrote: >> >> Can you elaborate on what you mean, isn't a core a single index >> too? It seems like shard was used to represent a remote index >> (perhaps?). > > Yes, a core is a single index and a shard is a conceptual idea which at the > moment concretely refers to a remote core (but not a specific one as the > same shard can be represented by multiple core replicas). The point I was > trying to make is that I believe that if you start changing terminologies > now people will be very confused. And I thought of sticking to Yonik's > suggestion of a "slice" just to prevent this confusion. On the other hand > one can argue that the terminology as it is today is already confusing... > and if you really want to get it right and be aligned with the "rest of the > world" (if there is such a thing... from what I've seen so far sharding is > used differently in different contexts), then perhaps a "good" timing for > making such terminology changes is with a major release (Solr 2.0?) as with > such release people tend to be more open for new/changed concepts. > > Cheers, > Uri > > Jason Rutherglen wrote: >> >> Uri, >> >> >>> >>> "core" to represent a single index and "shard" to be >>> represented by a single core >>> >> >> Can you elaborate on what you mean, isn't a core a single index >> too? It seems like shard was used to represent a remote index >> (perhaps?). Though here I'd prefer "remote core", because to the >> uninitiated Solr outsider it's immediately obvious (i.e. they >> need only know what a core is, in the Solr glossary or term >> dictionary). >> >> In Google vernacular, which is where the name shard came from, a >> "shard" is basically a local sub-index >> http://research.google.com/archive/googlecluster.html where >> there would be many "shards" per server. However that's a >> digression at this point. >> >> I personally prefer relatively straightforward names, that are >> self-evident, rather than inventing new language for fairly >> simple concepts. Slice, even though it comes from our buddy >> Yonik, probably doesn't make any immediate sense to external >> users when compared with the word shard. Of course software >> projects have a tendency to create their own words to somewhat >> mystify users into believing in some sort of magic occurring >> underneath. If that's what we're after, it's cool, I mean that >> makes sense. And I don't mean to be derogatory here however this >> is an open source project created in part to educate users on >> search and be made easily accessible as possible, to the >> greatest number of users possible. I think Doug did a create job >> of this when Lucene started with amazingly succinct code for >> fairly complex concepts (eg, anti-mystification of search). >> >> Jason >> >> On Thu, Jan 14, 2010 at 2:58 PM, Uri Boness wrote: >> >>> >>> Although Jason has some valid points here, I'm with Yonik here. I do >>> believe >>> that we've gotten used to the terms "core" to represent a single index >>> and >>> "shard" to be represented by a single core. A "node" seems to indicate a >>> machine or a JVM. Changing any of these (informal perhaps) definitions >>> will >>> only cause confusion. That's why I think a "slice" is a good solution >>> now... >>> first it's a new term to a new view of the index (logical shard AFAIK >>> don't >>> really exists yet) so people won't need to get used to it, but it's also >>> descriptive and intuitive. I do like Jason's idea about having a protocol >>> attached to the URL's. >>> >>> Cheers, >>> Uri >>> >>> Jason Rutherglen wrote: >>> >>>>> >>>>> But I've kind of gotten used to thinking of shards as the >>>>> actual physical queryable things... >>>>> >>>>> >>>> >>>> I think a mistake was made referring to Solr cores as shards. >>>> It's the same thing with 2 different names. Slices adds yet >>>> another name which seems to imply the same thing yet again. I'd >>>> rather see disambiguation here, and call them cores (partially >>>> because that's what's in the code and on the wiki), and cores >>>> only. It's a Solr specific term, it's going to be confused with >>>> microprocessor cores, but at least there's only one name, which >>>> as search people, we know creates fewer posting lists :). >>>> >>>> Logical groupings of cores can occur, which can be aptly named >>>> core groups. This way I can submit a query to a core group, and >>>> it's reasonable to assume I'm hitting N cores. Further, cores >>>> could point to a logical or physical entity via a URL. (As a >>>> side note, I've always found it odd that the shards param to >>>> RequestHandler lacks the protocol, what if I want to use HTTPS >>>> for example?). >>>> >>>> So there could be http://host/solr/core1 (physical), >>>> core://megacorename (logical), >>>> coregroup://supergreatcoregroupname (a group of cores) in the >>>> "shards" parameter (whose name can perhaps be changed for >>>> clarity in a future release). Then people can mix and match and >>>> we won't have many different XML elements floating around. We'd >>>> have a simple list of URLs that are transposed into a real >>>> physical network request. >>>> >>>> >>>> On Thu, Jan 14, 2010 at 12:56 PM, Yonik Seeley >>>> wrote: >>>> >>>> >>>>> >>>>> On Thu, Jan 14, 2010 at 1:38 PM, Yonik Seeley >>>>> wrote: >>>>> >>>>> >>>>>> >>>>>> On Thu, Jan 14, 2010 at 12:46 PM, Yonik Seeley >>>>>> wrote: >>>>>> >>>>>> >>>>>>> >>>>>>> I'm actually starting to lean toward "slice" instead of "logical >>>>>>> shard". >>>>>>> >>>>>>> >>>>> >>>>> Alternate terminology could be "index" for the actual physical lucene >>>>> lindex (and also enough of the URL that unambiguously identifies it), >>>>> and then "shard" could be the logical entity. >>>>> >>>>> But I've kind of gotten used to thinking of shards as the actual >>>>> physical queryable things... >>>>> >>>>> -Yonik >>>>> http://www.lucidimagination.com >>>>> >>>>> >>>>> >>>> >>>> >> >> >