Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 9FF0783B8 for ; Thu, 15 Sep 2011 22:12:04 +0000 (UTC) Received: (qmail 90340 invoked by uid 500); 15 Sep 2011 22:12:01 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 90315 invoked by uid 500); 15 Sep 2011 22:12:01 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 90306 invoked by uid 99); 15 Sep 2011 22:12:01 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 15 Sep 2011 22:12:01 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [208.113.200.5] (HELO homiemail-a49.g.dreamhost.com) (208.113.200.5) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 15 Sep 2011 22:11:56 +0000 Received: from homiemail-a49.g.dreamhost.com (localhost [127.0.0.1]) by homiemail-a49.g.dreamhost.com (Postfix) with ESMTP id 935915E0057 for ; Thu, 15 Sep 2011 15:11:34 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=thelastpickle.com; h=from :mime-version:content-type:subject:date:in-reply-to:to :references:message-id; q=dns; s=thelastpickle.com; b=XFX1+jVmnG EikfC6SVC8e42UyHYZIuXUSM1yKN66ZSgNSDQZcNm6JduwrMXadlkbK07IikZcVL +Q3v9Awpg7QqVPH4gWho3BjMgo3VQq6n8bkNpKaku8kRJU8SdcNOxkf5ZE2hsg5k NvFwU+Ad9adTmJOluzzoah6OnHDJZlTBM= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=thelastpickle.com; h=from :mime-version:content-type:subject:date:in-reply-to:to :references:message-id; s=thelastpickle.com; bh=WY7gER56vXqE+kHH nLoHMRYYrs0=; b=SKvCQFbtqMUSd38gbI8YeNYM2LQSgLFGr04iRHzeSvvoxNoI HwPrM2I72I5GvJ8wQonZUVFo9mWPg86//GST2Adfmj/OjABnoiKN9ywjKWGOH83D c1qN/Q4qTmuFDIImi0xK7Xt332ldO4AIVokiEe7qkIbajhhqpAblUuyG2JU= Received: from [172.16.1.4] (122-57-125-228.jetstream.xtra.co.nz [122.57.125.228]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) (Authenticated sender: aaron@thelastpickle.com) by homiemail-a49.g.dreamhost.com (Postfix) with ESMTPSA id CBCFC5E0055 for ; Thu, 15 Sep 2011 15:11:32 -0700 (PDT) From: aaron morton Mime-Version: 1.0 (Apple Message framework v1244.3) Content-Type: multipart/alternative; boundary="Apple-Mail=_8D10996C-26CF-43CB-9DA4-1160B2BE91D4" Subject: Re: Get CL ONE / NTS Date: Fri, 16 Sep 2011 10:11:30 +1200 In-Reply-To: <001f01cc7378$232ad0a0$698071e0$@chalamet.net> To: user@cassandra.apache.org References: <600538272-1316007259-cardhu_decombobulator_blackberry.rim.net-558482816-@b16.c2.bise7.blackberry> <001801cc732a$a3e0a150$eba1e3f0$@chalamet.net> <001f01cc7378$232ad0a0$698071e0$@chalamet.net> Message-Id: <4CF88828-564F-4C0E-A31F-FE14D1689A2B@thelastpickle.com> X-Mailer: Apple Mail (2.1244.3) --Apple-Mail=_8D10996C-26CF-43CB-9DA4-1160B2BE91D4 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=windows-1252 > What I=92m missing is a clear behavior for CL.ONE. I=92m unsure about = what nodes are used by ONE and how the filtering of missing data/error = is done. I=92ve landed in ReadCallback.java but error handling is out of = my reach for the moment. Start with StorageProxy.fetch() to see which nodes are considered to be = part of the request. ReadCallback.ctor() will decide which are actually = involved based on the CL and RR been enabled. At CL ONE there is no checkin of the replica responses for consistency, = as there is only one response. If RR is enabled it will start from = ReadCallback.maybeResolveForRepair().=20 Cheers ----------------- Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 15/09/2011, at 7:21 PM, Pierre Chalamet wrote: > I do not agree here. I trade =93consistency=94 (it=92s more data miss = than consistency here) over performance in my case. > I=92m okay to handle the popping of the Spanish inquisition in the = current DC by triggering a new read with a stronger CL somewhere else = (for example in other DCs). > If the data is nowhere to be found or nothing is reachable, well, it=92s= sad but true but it will be the end of the game. Fine. > =20 > What I=92m missing is a clear behavior for CL.ONE. I=92m unsure about = what nodes are used by ONE and how the filtering of missing data/error = is done. I=92ve landed in ReadCallback.java but error handling is out of = my reach for the moment. > =20 > Thanks, > - Pierre > =20 > From: aaron morton [mailto:aaron@thelastpickle.com]=20 > Sent: Thursday, September 15, 2011 12:27 AM > To: user@cassandra.apache.org > Subject: Re: Get CL ONE / NTS > =20 > Are you advising CL.ONE does not worth the game when considering > read performance ? > Consistency is not performance, it's a whole new thing to tune in your = application. If you have performance issues deal with those as = performance issues, better code / data model / hard ware.=20 > =20 > By the way, I do not have consistency problem at all - data is only = written > once > Nobody expects a consistency problem. It's chief weapon is surprise. = Surprise and fear. It's two weapons are fear and surprise. And so forth = http://www.youtube.com/watch?v=3DIxgc_FGam3s > =20 > If you write at LOCAL QUORUM in DC 1 and DC 2 is down at the start of = the request, a hint will be stored in DC 1. Some time later when DC 2 = comes back that hint will be sent to DC 2. If in the mean time you read = from DC 2 at CL ONE you will not get that change. With Read Repair = enabled it will repair in the background and you may get a different = response on the next read (Am guessing here, cannot remember exactly how = RR works cross DC)=20 > =20 > Cheers > =20 > =20 > =20 > ----------------- > Aaron Morton > Freelance Cassandra Developer > @aaronmorton > http://www.thelastpickle.com > =20 > On 15/09/2011, at 10:07 AM, Pierre Chalamet wrote: >=20 >=20 > Thanks Aaron, didn't seen your answer before mine. >=20 > I do agree for 2/ I might have read error. Good suggestion to use > EACH_QUORUM - it could be a good trade off to read at this level if = ONE > fails. >=20 > Maybe using LOCAL_QUORUM might be a good answer and will avoid = headache > after all. Are you advising CL.ONE does not worth the game when = considering > read performance ? >=20 > By the way, I do not have consistency problem at all - data is only = written > once (and if more it is always the same data) and read several times = across > DC. I only have replication problems. That's why I'm more inclined to = use > CL.ONE for read if possible. >=20 > Thanks, > - Pierre >=20 >=20 > -----Original Message----- > From: aaron morton [mailto:aaron@thelastpickle.com]=20 > Sent: Wednesday, September 14, 2011 11:48 PM > To: user@cassandra.apache.org; pierre@chalamet.net > Subject: Re: Get CL ONE / NTS >=20 > Your current approach to Consistency opens the door to some = inconsistent > behavior.=20 >=20 >=20 > 1/ Will I have an error because DC2 does not have any copy of the data = ? > If you read from DC2 at CL ONE and the data is not replicated it will = not be > returned.=20 >=20 >=20 > 2/ Will Cassandra try to get the data from DC1 if nothing is found in = DC2 > ? > Not at CL ONE. If you used CL EACH QUORUM then the read will go to all = the > DC's. If DC2 is behind DC1 then you will get the data form DC1.=20 >=20 >=20 > 3/ In case of partial replication to DC2, will I see sometimes errors > about servers not holding the data in DC2 ? > Depending on the API call and the client, working at CL ONE, you will = see > either errors or missing data.=20 >=20 >=20 > 4/ Does Get CL ONE failed as soon as the fastest server to answer tell = it > does not have the data or does it waits until all servers tell they do = not > have the data ?=20 > yes >=20 > Consider=20 >=20 > using LOCAL QUORUM for write and read, will make things a bit more > consistent but not add inter DC overhead into the request latency. = Still > possible to not get data in DC2 if it is totally disconnected from the = DC1=20 >=20 > write at LOCAL QUORUM and read at EACH QUORUM . Will so you can always = read, > requests in DC2 will fail if DC1 is not reachable.=20 >=20 > Hope that helps.=20 >=20 >=20 > ----------------- > Aaron Morton > Freelance Cassandra Developer > @aaronmorton > http://www.thelastpickle.com >=20 > On 15/09/2011, at 1:33 AM, Pierre Chalamet wrote: >=20 >=20 > Hello, > =20 > I have 2 datacenters. Cassandra is configured as follow: > - RackInferringSnitch > - NetworkTopologyStrategy for CF > - strategy_options: DC1:3 DC2:3 > =20 > Data are written using CL LOCAL_QUORUM so data written from one = datacenter > will eventually be replicated to the other datacenter. Data is always > written exactly once.=20 >=20 > =20 > On the other side, I'd like to improve the read path. I'm using = actually > the CL ONE since data is only written once (ie: timestamp is more or = less > meaningless in my case). >=20 > =20 > This is where I have some doubts: if data is written on DC1 and > tentatively read from DC2 while the data is still not replicated or > partially replicated (for whatever good reason since replication is = async), > what is the behavior of Get with CL ONE / NTS ?=20 >=20 > 1/ Will I have an error because DC2 does not have any copy of the data = ? > 2/ Will Cassandra try to get the data from DC1 if nothing is found in = DC2 > ? >=20 > 3/ In case of partial replication to DC2, will I see sometimes errors > about servers not holding the data in DC2 ? >=20 > 4/ Does Get CL ONE failed as soon as the fastest server to answer tell = it > does not have the data or does it waits until all servers tell they do = not > have the data ?=20 >=20 > =20 > Thanks a lot, > - Pierre > =20 >=20 --Apple-Mail=_8D10996C-26CF-43CB-9DA4-1160B2BE91D4 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=windows-1252
What I=92m missing is a clear behavior for CL.ONE. = I=92m unsure about what nodes are used by ONE and how the filtering of = missing data/error is done. I=92ve landed in ReadCallback.java but error = handling is out of my reach for the = moment.

Start = with StorageProxy.fetch() to see which nodes are considered to be part = of the request. ReadCallback.ctor() will decide which are actually = involved based on the CL and RR been enabled.
At CL = ONE there is no checkin of the replica responses for consistency, as = there is only one response. If RR is enabled it will start from = ReadCallback.maybeResolveForRepair(). 

Cheers

http://www.thelastpickle.com

On 15/09/2011, at 7:21 PM, Pierre Chalamet wrote:

I do not agree here. I trade =93consistency=94 (it=92s= more data miss than consistency here) over performance in my = case.
I=92m okay to handle the popping of the Spanish = inquisition in the current DC by triggering a new read with a stronger = CL somewhere else (for example in other = DCs).
If the data is nowhere to be found or nothing is = reachable, well, it=92s sad but true but it will be the end of the game. = Fine.
 
What I=92m = missing is a clear behavior for CL.ONE. I=92m unsure about what nodes = are used by ONE and how the filtering of missing data/error is done. = I=92ve landed in ReadCallback.java but error handling is out of my reach = for the moment.
 
- = Pierre
From: aaron morton = [mailto:aaron@thelastpickle.com] 
Sent: Thursday, September 15, = 2011 12:27 AM
To: user@cassandra.apache.orgSubject: Re: = Get CL ONE / NTS
 
Are you advising CL.ONE does not worth the game when = considering
read performance = ?
Consistency is = not performance, it's a whole new thing to tune in your = application. If you have performance issues deal with those as = performance issues, better code / data model / hard = ware. 
By the way, I do not have = consistency problem at all - data is only = written
once
Nobody expects a consistency problem. It's chief weapon = is surprise. Surprise and fear. It's two weapons are fear and surprise. = And so forth  
Thanks Aaron, didn't seen = your answer before mine.

I do agree for 2/ I might have read = error. Good suggestion to use
EACH_QUORUM  - it could be a good = trade off to read at this level if ONE
fails.

Maybe using = LOCAL_QUORUM might be a good answer and will avoid headache
after = all. Are you advising CL.ONE does not worth the game when = considering
read performance ?

By the way, I do not have = consistency problem at all - data is only written
once (and if more = it is always the same data) and read several times across
DC. I only = have replication problems. That's why I'm more inclined to use
CL.ONE = for read if possible.

Thanks,
- = Pierre


-----Original Message-----
From: aaron morton 
[mailto:aaron@thelastpickle.com] 
Sent: Wednesday, = September 14, 2011 11:48 PM
To: user@cassandra.apache.org; pierre@chalamet.net
Subject: Re: = Get CL ONE / NTS

Your current approach to Consistency opens the = door to some inconsistent
behavior. 


=
1/ Will I have an error because DC2 does not have any = copy of the data ?
If you read from DC2 at = CL ONE and the data is not replicated it will not be
returned. 


=
2/ Will Cassandra try to get the data from DC1 if = nothing is found in DC2
?
Not at CL ONE. If = you used CL EACH QUORUM then the read will go to all the
DC's. If DC2 = is behind DC1 then you will get the data form DC1. 


=
3/ In case of partial replication to DC2, will I see = sometimes errors
about servers not holding = the data in DC2 ?
Depending on the API call and the client, working = at CL ONE, you will see
either errors or missing data. 


=
4/ Does Get CL ONE failed as soon as the fastest server = to answer tell it
does not have the data or = does it waits until all servers tell they do not
have the data ? 
yes

Consider 

using LOCAL QUORUM = for write and read, will make things a bit more
consistent but not = add inter DC overhead into the request latency. Still
possible to not = get data in DC2 if it is totally disconnected from the DC1 

write at LOCAL = QUORUM and read at EACH QUORUM . Will so you can always = read,
requests in DC2 will fail if DC1 is not reachable. 

Hope that = helps. 


-----------------=
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On = 15/09/2011, at 1:33 AM, Pierre Chalamet = wrote:


I have 2 datacenters. Cassandra = is configured as follow:
- = RackInferringSnitch
- = NetworkTopologyStrategy for CF
- = strategy_options: DC1:3 DC2:3
Data are written using CL = LOCAL_QUORUM so data written from one = datacenter
will eventually be = replicated to the other datacenter. Data is always
written exactly = once. 

 
On the other side, I'd like to = improve the read path. I'm using = actually
the CL ONE since data is = only written once (ie: timestamp is more or less
meaningless in my = case).

This is where I have some doubts: if data is = written on DC1 and
tentatively = read from DC2 while the data is still not replicated or
partially = replicated (for whatever good reason since replication is = async),
what is the behavior of Get with CL ONE / NTS ? 

1/ Will I have an error because DC2 does not have any = copy of the data ?
2/ Will Cassandra try to get the data from = DC1 if nothing is found in DC2
?

3/ In case of = partial replication to DC2, will I see sometimes = errors
about servers not holding the data in DC2 = ?

4/ Does Get CL ONE failed as = soon as the fastest server to answer tell it
does not have the data or does it waits until all = servers tell they do not
have the data ? 

 
Thanks a = lot,
- Pierre