Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (athena.apache.org: domain of sylvain@datastax.com
 designates 209.85.214.172 as permitted sender)
MIME-Version: 1.0
In-Reply-To: <4F1E7278.1040701@adyen.com>
References: <4F1D80FF.5060008@adyen.com>
	<BB6C3728-69E0-44FA-ABA1-9FAE2BF74D33@gmx.net>
	<B9979A3E-F8C2-4DB7-BC0F-8A6AAAEEB059@gmx.net>
	<4F1E7278.1040701@adyen.com>
Date: Tue, 24 Jan 2012 10:30:55 +0100
Message-ID: 
 <CAKkz8Q3NzCwknzNTq=kL71UsSjRG10BEfX-zb-0V8vLeZj1E_w@mail.gmail.com>
Subject: Re: architectural understanding of write operation node flow
From: Sylvain Lebresne <sylvain@datastax.com>
To: user@cassandra.apache.org
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

On Tue, Jan 24, 2012 at 9:57 AM, Peter Dijkshoorn
<peter.dijkshoorn@adyen.com> wrote:
> yeah, well main question remains then, is the node receiving the request
> from the client called the coordinator (even if it is not responsible
> for that key)?

Yes.

> Or will that node forward the call to the first responsible node who
> does the coordinating stuff? (as the cassandra and dynamo paper state)
>
> In case of that forwarding, is the client told to connect to another
> node, or does the node receiving the call act as a proxy?
>
> so, is it a 3-deep or a 2-deep network call?

2-deep (except for counter increments which have a slightly different proto=
col).

> I want to explain this in a small literature part of my thesis to
> distinguish the internal structure against BigTable and various kinds of
> RDBMS replication schemes, that's why I want to know precisely :)
>
> Thanks,
>
> Peter Dijkshoorn
> Adyen - Payments Made Easy
> www.adyen.com
>
> Visiting Address: =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Mail Address:
> Stationsplein 57 - 4th floor =A0 =A0 =A0P.O. Box 10095
> 1012 AB Amsterdam =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 1001 EB Amsterdam
> The Netherlands =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 The Netherlands
>
> Office +31.20.240.1240
> Email peter.dijkshoorn@adyen.com
>
>
> On 01/23/2012 06:59 PM, Daniel Doubleday wrote:
>> Ouch :-) you were asking write ...
>>
>> Well kind of similar
>>
>> 1. Coordinator calculates all nodes
>> 2. If not enough (according to CL) nodes are alive it throughs unavailab=
le
>> 3. If nodes are down it writes and hh is enabled it writes a hint for th=
at row
>> 4. It sends write request to all nodes (including itself / shortcutting =
messaging)
>> 5. If it receives enough (according to CL) acks before timeout everythin=
g is fine otherwise it throughs unavailable
>>
>> errm .. I'm more confident in the read path though especially concerning=
 hh handling so I'm happy to be corrected here. I.e. I'm not sure if hints =
are written when request time out but CL is reached.
>>
>> On Jan 23, 2012, at 6:47 PM, Daniel Doubleday wrote:
>>
>>> Your first thought was pretty much correct:
>>>
>>> 1. The node which is called by the client is the coordinator
>>> 2. The coordinator determines the nodes in the ring which can handle th=
e request ordered by expected latency (via snitch). The coordinator may or =
may not be part of these nodes
>>> 3. Given the consistency level and read repair chance the coordinator c=
alculates the min amount of node to ask and sends read requests to them
>>> 4. As soon as the minimum count (according to consistency) of responses=
 is collected the coordinator will respond to the request. Mismatches will =
lead to repair write requests to the corresponding nodes
>>>
>>> Thus the minimal depth is one (CL =3D 1 and coordinator can handle the =
request itself) or two otherwise.
>>>
>>> Hope that helps
>>>
>>> On Jan 23, 2012, at 4:47 PM, Peter Dijkshoorn wrote:
>>>
>>>> Hi guys,
>>>>
>>>> I got an architectural question about how a write operation flows
>>>> through the nodes.
>>>>
>>>> As far as I understand now, a client sends its write operation to
>>>> whatever node it was set to use and if that node does not contain the
>>>> data for this key K, then this node forwards the operation to the firs=
t
>>>> node given by the hash function. This first node having key K then
>>>> contacts the replication nodes depending on the selected consistency l=
evel.
>>>>
>>>> This means that in the unlucky event you always have a network call
>>>> sequence depth of 2 (consistency level one), or 3 (assumed that the
>>>> replication nodes are contacted in parallel)
>>>>
>>>> This is more than I expected, so I am not sure whether this is correct=
?
>>>> can someone help me out?
>>>>
>>>> At first I thought that the receiver was the coordinator, and thus doi=
ng
>>>> all further calls in parallel, the depth as described above would alwa=
ys
>>>> be 2. But I just discovered that I was wrong and that it should be
>>>> something like above.
>>>>
>>>> Another possibility would be that the client learnt the layout of the
>>>> cluster at connection time and thereby tries per request to contact th=
e
>>>> coordinator directly, but I never read or see something like this happ=
ening.
>>>>
>>>> Remembering the picture of Dean about network and hard disk latencies,
>>>> is this 3-sequential-network-call still faster?
>>>>
>>>> Thanks for any thoughts :)
>>>>
>>>> Peter
>>>>
>>>> --
>>>> Peter Dijkshoorn
>>>> Adyen - Payments Made Easy
>>>> www.adyen.com
>>>>
>>>> Visiting Address: =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Mail Address:
>>>> Stationsplein 57 - 4th floor =A0 =A0 =A0P.O. Box 10095
>>>> 1012 AB Amsterdam =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 1001 EB Amsterdam
>>>> The Netherlands =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 The Netherlands
>>>>
>>>> Office +31.20.240.1240
>>>> Email peter.dijkshoorn@adyen.com
>>>>