Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 48BB691C2 for ; Tue, 24 Jan 2012 09:31:36 +0000 (UTC) Received: (qmail 39506 invoked by uid 500); 24 Jan 2012 09:31:34 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 38765 invoked by uid 500); 24 Jan 2012 09:31:24 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 38756 invoked by uid 99); 24 Jan 2012 09:31:21 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 24 Jan 2012 09:31:21 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of sylvain@datastax.com designates 209.85.214.172 as permitted sender) Received: from [209.85.214.172] (HELO mail-tul01m020-f172.google.com) (209.85.214.172) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 24 Jan 2012 09:31:16 +0000 Received: by obbwc12 with SMTP id wc12so4893904obb.31 for ; Tue, 24 Jan 2012 01:30:55 -0800 (PST) MIME-Version: 1.0 Received: by 10.182.38.70 with SMTP id e6mr11146764obk.13.1327397455043; Tue, 24 Jan 2012 01:30:55 -0800 (PST) Received: by 10.60.39.33 with HTTP; Tue, 24 Jan 2012 01:30:55 -0800 (PST) In-Reply-To: <4F1E7278.1040701@adyen.com> References: <4F1D80FF.5060008@adyen.com> <4F1E7278.1040701@adyen.com> Date: Tue, 24 Jan 2012 10:30:55 +0100 Message-ID: Subject: Re: architectural understanding of write operation node flow From: Sylvain Lebresne To: user@cassandra.apache.org X-Gm-Message-State: ALoCoQlj5podm353kRO0Cmfh9l8CeUnJsmtFACEokEz0P/vRevmY0XKfino9nRQ5bDgGCwySeS6F Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable On Tue, Jan 24, 2012 at 9:57 AM, Peter Dijkshoorn wrote: > yeah, well main question remains then, is the node receiving the request > from the client called the coordinator (even if it is not responsible > for that key)? Yes. > Or will that node forward the call to the first responsible node who > does the coordinating stuff? (as the cassandra and dynamo paper state) > > In case of that forwarding, is the client told to connect to another > node, or does the node receiving the call act as a proxy? > > so, is it a 3-deep or a 2-deep network call? 2-deep (except for counter increments which have a slightly different proto= col). > I want to explain this in a small literature part of my thesis to > distinguish the internal structure against BigTable and various kinds of > RDBMS replication schemes, that's why I want to know precisely :) > > Thanks, > > Peter Dijkshoorn > Adyen - Payments Made Easy > www.adyen.com > > Visiting Address: =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Mail Address: > Stationsplein 57 - 4th floor =A0 =A0 =A0P.O. Box 10095 > 1012 AB Amsterdam =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 1001 EB Amsterdam > The Netherlands =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 The Netherlands > > Office +31.20.240.1240 > Email peter.dijkshoorn@adyen.com > > > On 01/23/2012 06:59 PM, Daniel Doubleday wrote: >> Ouch :-) you were asking write ... >> >> Well kind of similar >> >> 1. Coordinator calculates all nodes >> 2. If not enough (according to CL) nodes are alive it throughs unavailab= le >> 3. If nodes are down it writes and hh is enabled it writes a hint for th= at row >> 4. It sends write request to all nodes (including itself / shortcutting = messaging) >> 5. If it receives enough (according to CL) acks before timeout everythin= g is fine otherwise it throughs unavailable >> >> errm .. I'm more confident in the read path though especially concerning= hh handling so I'm happy to be corrected here. I.e. I'm not sure if hints = are written when request time out but CL is reached. >> >> On Jan 23, 2012, at 6:47 PM, Daniel Doubleday wrote: >> >>> Your first thought was pretty much correct: >>> >>> 1. The node which is called by the client is the coordinator >>> 2. The coordinator determines the nodes in the ring which can handle th= e request ordered by expected latency (via snitch). The coordinator may or = may not be part of these nodes >>> 3. Given the consistency level and read repair chance the coordinator c= alculates the min amount of node to ask and sends read requests to them >>> 4. As soon as the minimum count (according to consistency) of responses= is collected the coordinator will respond to the request. Mismatches will = lead to repair write requests to the corresponding nodes >>> >>> Thus the minimal depth is one (CL =3D 1 and coordinator can handle the = request itself) or two otherwise. >>> >>> Hope that helps >>> >>> On Jan 23, 2012, at 4:47 PM, Peter Dijkshoorn wrote: >>> >>>> Hi guys, >>>> >>>> I got an architectural question about how a write operation flows >>>> through the nodes. >>>> >>>> As far as I understand now, a client sends its write operation to >>>> whatever node it was set to use and if that node does not contain the >>>> data for this key K, then this node forwards the operation to the firs= t >>>> node given by the hash function. This first node having key K then >>>> contacts the replication nodes depending on the selected consistency l= evel. >>>> >>>> This means that in the unlucky event you always have a network call >>>> sequence depth of 2 (consistency level one), or 3 (assumed that the >>>> replication nodes are contacted in parallel) >>>> >>>> This is more than I expected, so I am not sure whether this is correct= ? >>>> can someone help me out? >>>> >>>> At first I thought that the receiver was the coordinator, and thus doi= ng >>>> all further calls in parallel, the depth as described above would alwa= ys >>>> be 2. But I just discovered that I was wrong and that it should be >>>> something like above. >>>> >>>> Another possibility would be that the client learnt the layout of the >>>> cluster at connection time and thereby tries per request to contact th= e >>>> coordinator directly, but I never read or see something like this happ= ening. >>>> >>>> Remembering the picture of Dean about network and hard disk latencies, >>>> is this 3-sequential-network-call still faster? >>>> >>>> Thanks for any thoughts :) >>>> >>>> Peter >>>> >>>> -- >>>> Peter Dijkshoorn >>>> Adyen - Payments Made Easy >>>> www.adyen.com >>>> >>>> Visiting Address: =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Mail Address: >>>> Stationsplein 57 - 4th floor =A0 =A0 =A0P.O. Box 10095 >>>> 1012 AB Amsterdam =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 1001 EB Amsterdam >>>> The Netherlands =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 The Netherlands >>>> >>>> Office +31.20.240.1240 >>>> Email peter.dijkshoorn@adyen.com >>>>