Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (athena.apache.org: local policy)
From: aaron morton <aaron@thelastpickle.com>
Content-Type: multipart/alternative;
 boundary="Apple-Mail=_73DBDF80-E76A-442A-9F72-CCE4E77729E2"
Message-Id: <2D8E826F-92DE-4FC6-AC99-9BA0A8018256@thelastpickle.com>
Mime-Version: 1.0 (Mac OS X Mail 6.0 \(1486\))
Subject: Re: Replication factor 2, consistency and failover
Date: Mon, 10 Sep 2012 11:44:09 +1200
References: 
 <CANOBG4z+w7LGXvYJ0s1PWB9JFHkNyQeP_PNkA90W-AfirKR-yg@mail.gmail.com>
 <CANOBG4zGuWK+xFiQJJYUgkvzPs=w8Q4fsBcOSGB-xf8A=K6cMA@mail.gmail.com>
To: user@cassandra.apache.org
In-Reply-To: 
 <CANOBG4zGuWK+xFiQJJYUgkvzPs=w8Q4fsBcOSGB-xf8A=K6cMA@mail.gmail.com>


--Apple-Mail=_73DBDF80-E76A-442A-9F72-CCE4E77729E2
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
	charset=us-ascii

> In general we want to achieve strong consistency.=20
You need to have R + W > N

> LOCAL_QUORUM and reads with ONE.
Gives you 2  + 1 > 2 when you use it. When you drop back to ONE / ONE =
you no longer have strong consistency.=20

> may be advise on how to improve it.=20
Sounds like you know how to improve it :)

Things you could play with:

* hinted_handoff_throttle_delay_in_ms in YAML to reduce the time it =
takes for HH delay to deliver the messages.
* increase the read_repair_chance for the CF's. This will increase the =
chance of RR repairing an inconsistency behind the scenes, so the next =
read is consistent. This will also increase the IO load on the system.=20=


With the RF 2 restriction you are probably doing the best you can. You =
are giving up consistency for availability and partition tolerance. The =
best thing to do to get peeps to agree that "we will accept reduced =
consistency for high availability" rather than say "in general we want =
to achieve strong consistency".

Hope that helps.=20

-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 9/09/2012, at 9:09 PM, Sergey Tryuber <stryuber@gmail.com> wrote:

> Hi
>=20
> We have to use Cassandra with RF=3D2 (don't ask why...). There are two =
datacenters (RF=3D2 in each datacenter). Also we use Astyanax as a =
client library. In general we want to achieve strong consistency. Read =
performance is important for us, that's why we perform writes with =
LOCAL_QUORUM and reads with ONE. If one server is down, we automatically =
switch to Writes.ONE, Reads.ONE only for that replica which has failed =
node (we modified Astyanax to achieve that). When the server comes back, =
we turn back Writes.LOCAL_QUORUM and Reads.ONE, but, of course, we see =
some inconsistencies during the switching process and some time after =
(when hinted handnoff works).
>=20
> Basically I don't have any questions, just want to share our "ugly" =
failover algorithm, to hear your criticism and may be advise on how to =
improve it. Unfortunately we can't change replication factor and most of =
the time we have to read with consistency level ONE (because we have =
strict requirements on read performance).=20
>=20
> Thank you!
>=20


--Apple-Mail=_73DBDF80-E76A-442A-9F72-CCE4E77729E2
Content-Transfer-Encoding: quoted-printable
Content-Type: text/html;
	charset=us-ascii

<html><head><meta http-equiv=3D"Content-Type" content=3D"text/html =
charset=3Dus-ascii"></head><body style=3D"word-wrap: break-word; =
-webkit-nbsp-mode: space; -webkit-line-break: after-white-space; =
"><blockquote type=3D"cite"><div class=3D"gmail_quote">In general we =
want to achieve strong consistency.&nbsp;</div></blockquote>You need to =
have R + W &gt; N<div><br></div><div><blockquote type=3D"cite"><div =
class=3D"gmail_quote">LOCAL_QUORUM and reads with =
ONE.</div></blockquote>Gives you 2 &nbsp;+ 1 &gt; 2 when you use it. =
When you drop back to ONE / ONE you no longer have strong =
consistency.&nbsp;</div><div><br></div><div><blockquote type=3D"cite"><div=
 class=3D"gmail_quote">may be advise on how to improve =
it.&nbsp;</div></blockquote>Sounds like you know how to improve it =
:)</div><div><br></div><div>Things you could play =
with:</div><div><br></div><div>* hinted_handoff_throttle_delay_in_ms in =
YAML to reduce the time it takes for HH delay to deliver the =
messages.</div><div>* increase the read_repair_chance for the CF's. This =
will increase the chance of RR repairing an inconsistency behind the =
scenes, so the next read is consistent. This will also increase the IO =
load on the system.&nbsp;</div><div><br></div><div>With the RF 2 =
restriction you are probably doing the best you can. You are giving up =
consistency for availability and partition tolerance. The best thing to =
do to get peeps to agree that "we will accept reduced consistency for =
high availability" rather than say "in&nbsp;general we want to achieve =
strong consistency".</div><div><br></div><div>Hope that =
helps.&nbsp;</div><div><br></div><div><div apple-content-edited=3D"true">
<span class=3D"Apple-style-span" style=3D"border-collapse: separate; =
color: rgb(0, 0, 0); font-family: Helvetica; font-style: normal; =
font-variant: normal; font-weight: normal; letter-spacing: normal; =
line-height: normal; orphans: 2; text-align: -webkit-auto; text-indent: =
0px; text-transform: none; white-space: normal; widows: 2; word-spacing: =
0px; -webkit-border-horizontal-spacing: 0px; =
-webkit-border-vertical-spacing: 0px; =
-webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: =
auto; -webkit-text-stroke-width: 0px; font-size: medium; "><span =
class=3D"Apple-style-span" style=3D"border-collapse: separate; color: =
rgb(0, 0, 0); font-family: Helvetica; font-style: normal; font-variant: =
normal; font-weight: normal; letter-spacing: normal; line-height: =
normal; orphans: 2; text-indent: 0px; text-transform: none; white-space: =
normal; widows: 2; word-spacing: 0px; -webkit-border-horizontal-spacing: =
0px; -webkit-border-vertical-spacing: 0px; =
-webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: =
auto; -webkit-text-stroke-width: 0px; font-size: medium; "><div =
style=3D"word-wrap: break-word; -webkit-nbsp-mode: space; =
-webkit-line-break: after-white-space; "><span class=3D"Apple-style-span" =
style=3D"border-collapse: separate; color: rgb(0, 0, 0); font-family: =
Helvetica; font-style: normal; font-variant: normal; font-weight: =
normal; letter-spacing: normal; line-height: normal; orphans: 2; =
text-indent: 0px; text-transform: none; white-space: normal; widows: 2; =
word-spacing: 0px; -webkit-border-horizontal-spacing: 0px; =
-webkit-border-vertical-spacing: 0px; =
-webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: =
auto; -webkit-text-stroke-width: 0px; font-size: medium; "><div =
style=3D"word-wrap: break-word; -webkit-nbsp-mode: space; =
-webkit-line-break: after-white-space; "><span class=3D"Apple-style-span" =
style=3D"border-collapse: separate; color: rgb(0, 0, 0); font-family: =
Helvetica; font-style: normal; font-variant: normal; font-weight: =
normal; letter-spacing: normal; line-height: normal; orphans: 2; =
text-indent: 0px; text-transform: none; white-space: normal; widows: 2; =
word-spacing: 0px; -webkit-border-horizontal-spacing: 0px; =
-webkit-border-vertical-spacing: 0px; =
-webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: =
auto; -webkit-text-stroke-width: 0px; font-size: medium; "><div =
style=3D"word-wrap: break-word; -webkit-nbsp-mode: space; =
-webkit-line-break: after-white-space; =
"><div><div>-----------------</div><div>Aaron Morton</div><div>Freelance =
Developer</div><div>@aaronmorton</div><div><a =
href=3D"http://www.thelastpickle.com">http://www.thelastpickle.com</a></di=
v></div></div></span></div></span></div></span></span>
</div>

<br><div><div>On 9/09/2012, at 9:09 PM, Sergey Tryuber &lt;<a =
href=3D"mailto:stryuber@gmail.com">stryuber@gmail.com</a>&gt; =
wrote:</div><br class=3D"Apple-interchange-newline"><blockquote =
type=3D"cite"><div class=3D"gmail_quote">Hi<br><br>We have to use =
Cassandra with RF=3D2 (don't ask why...). There are two datacenters =
(RF=3D2 in each datacenter). Also we use Astyanax as a client library. =
In general we want to achieve strong consistency. Read performance is =
important for us, that's why we perform writes with LOCAL_QUORUM and =
reads with ONE. If one server is down, we automatically switch to =
Writes.ONE, Reads.ONE only for that replica which has failed node (we =
modified Astyanax to achieve that). When the server comes back, we turn =
back Writes.LOCAL_QUORUM and Reads.ONE, but, of course, we see some =
inconsistencies during the switching process and some time after (when =
hinted handnoff works).<br>


<br>Basically I don't have any questions, just want to share our "ugly" =
failover algorithm, to hear your criticism and may be advise on how to =
improve it. Unfortunately we can't change replication factor and most of =
the time we have to read with consistency level ONE (because we have =
strict requirements on read performance). <br>


<br>Thank you!<br>
</div><br>
</blockquote></div><br></div></body></html>=

--Apple-Mail=_73DBDF80-E76A-442A-9F72-CCE4E77729E2--