Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D3344DE82 for ; Sun, 9 Sep 2012 23:44:40 +0000 (UTC) Received: (qmail 18735 invoked by uid 500); 9 Sep 2012 23:44:38 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 18720 invoked by uid 500); 9 Sep 2012 23:44:38 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 18712 invoked by uid 99); 9 Sep 2012 23:44:38 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 09 Sep 2012 23:44:38 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=FSL_RCVD_USER,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [208.113.200.5] (HELO homiemail-a81.g.dreamhost.com) (208.113.200.5) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 09 Sep 2012 23:44:32 +0000 Received: from homiemail-a81.g.dreamhost.com (localhost [127.0.0.1]) by homiemail-a81.g.dreamhost.com (Postfix) with ESMTP id 78C6BA8064 for ; Sun, 9 Sep 2012 16:44:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=thelastpickle.com; h=from :content-type:message-id:mime-version:subject:date:references:to :in-reply-to; s=thelastpickle.com; bh=nlDxTg5CzI7J7RzE2kxV0XQVZf E=; b=dTIZWNj7l6SOjWiH8K0up833ucjl77SBW7eJkFXOhxqaqbUQdWHAhzqg/D U4wtIcMmjfsL3I/p+WWIssNdeARjMYSFC9UaI0B4nmR7q1AVrvi4F/8N1fO4+LzO 9DiG7bW96Bajm8ST6QgDGVeHb4yHQIp9dyczagv3cMCIRal5c= Received: from [172.16.1.10] (unknown [203.86.207.101]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) (Authenticated sender: aaron@thelastpickle.com) by homiemail-a81.g.dreamhost.com (Postfix) with ESMTPSA id EA22CA8061 for ; Sun, 9 Sep 2012 16:44:09 -0700 (PDT) From: aaron morton Content-Type: multipart/alternative; boundary="Apple-Mail=_73DBDF80-E76A-442A-9F72-CCE4E77729E2" Message-Id: <2D8E826F-92DE-4FC6-AC99-9BA0A8018256@thelastpickle.com> Mime-Version: 1.0 (Mac OS X Mail 6.0 \(1486\)) Subject: Re: Replication factor 2, consistency and failover Date: Mon, 10 Sep 2012 11:44:09 +1200 References: To: user@cassandra.apache.org In-Reply-To: X-Mailer: Apple Mail (2.1486) X-Virus-Checked: Checked by ClamAV on apache.org --Apple-Mail=_73DBDF80-E76A-442A-9F72-CCE4E77729E2 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=us-ascii > In general we want to achieve strong consistency.=20 You need to have R + W > N > LOCAL_QUORUM and reads with ONE. Gives you 2 + 1 > 2 when you use it. When you drop back to ONE / ONE = you no longer have strong consistency.=20 > may be advise on how to improve it.=20 Sounds like you know how to improve it :) Things you could play with: * hinted_handoff_throttle_delay_in_ms in YAML to reduce the time it = takes for HH delay to deliver the messages. * increase the read_repair_chance for the CF's. This will increase the = chance of RR repairing an inconsistency behind the scenes, so the next = read is consistent. This will also increase the IO load on the system.=20= With the RF 2 restriction you are probably doing the best you can. You = are giving up consistency for availability and partition tolerance. The = best thing to do to get peeps to agree that "we will accept reduced = consistency for high availability" rather than say "in general we want = to achieve strong consistency". Hope that helps.=20 ----------------- Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 9/09/2012, at 9:09 PM, Sergey Tryuber wrote: > Hi >=20 > We have to use Cassandra with RF=3D2 (don't ask why...). There are two = datacenters (RF=3D2 in each datacenter). Also we use Astyanax as a = client library. In general we want to achieve strong consistency. Read = performance is important for us, that's why we perform writes with = LOCAL_QUORUM and reads with ONE. If one server is down, we automatically = switch to Writes.ONE, Reads.ONE only for that replica which has failed = node (we modified Astyanax to achieve that). When the server comes back, = we turn back Writes.LOCAL_QUORUM and Reads.ONE, but, of course, we see = some inconsistencies during the switching process and some time after = (when hinted handnoff works). >=20 > Basically I don't have any questions, just want to share our "ugly" = failover algorithm, to hear your criticism and may be advise on how to = improve it. Unfortunately we can't change replication factor and most of = the time we have to read with consistency level ONE (because we have = strict requirements on read performance).=20 >=20 > Thank you! >=20 --Apple-Mail=_73DBDF80-E76A-442A-9F72-CCE4E77729E2 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=us-ascii
In general we = want to achieve strong consistency. 
You need to = have R + W > N

LOCAL_QUORUM and reads with = ONE.
Gives you 2  + 1 > 2 when you use it. = When you drop back to ONE / ONE you no longer have strong = consistency. 

may be advise on how to improve = it. 
Sounds like you know how to improve it = :)

Things you could play = with:

* hinted_handoff_throttle_delay_in_ms in = YAML to reduce the time it takes for HH delay to deliver the = messages.
* increase the read_repair_chance for the CF's. This = will increase the chance of RR repairing an inconsistency behind the = scenes, so the next read is consistent. This will also increase the IO = load on the system. 

With the RF 2 = restriction you are probably doing the best you can. You are giving up = consistency for availability and partition tolerance. The best thing to = do to get peeps to agree that "we will accept reduced consistency for = high availability" rather than say "in general we want to achieve = strong consistency".

Hope that = helps. 

http://www.thelastpickle.com

On 9/09/2012, at 9:09 PM, Sergey Tryuber <stryuber@gmail.com> = wrote:

Hi

We have to use = Cassandra with RF=3D2 (don't ask why...). There are two datacenters = (RF=3D2 in each datacenter). Also we use Astyanax as a client library. = In general we want to achieve strong consistency. Read performance is = important for us, that's why we perform writes with LOCAL_QUORUM and = reads with ONE. If one server is down, we automatically switch to = Writes.ONE, Reads.ONE only for that replica which has failed node (we = modified Astyanax to achieve that). When the server comes back, we turn = back Writes.LOCAL_QUORUM and Reads.ONE, but, of course, we see some = inconsistencies during the switching process and some time after (when = hinted handnoff works).

Basically I don't have any questions, just want to share our "ugly" = failover algorithm, to hear your criticism and may be advise on how to = improve it. Unfortunately we can't change replication factor and most of = the time we have to read with consistency level ONE (because we have = strict requirements on read performance).

Thank you!


= --Apple-Mail=_73DBDF80-E76A-442A-9F72-CCE4E77729E2--