Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (athena.apache.org: domain of dan.hendry.junk@gmail.com
 designates 209.85.220.172 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=from:to:references:in-reply-to:subject:date:message-id:mime-version
         :content-type:x-mailer:thread-index:content-language;
        b=uJ3Qt9PGEOHeccfjyHZJ9LdAk1BHlg3KCIciIiFRtscm5pQEp9KPY51/Nmq+1n54xD
         q+F2PIiNwV/VYfwVzp6HIj7fmhIfgjnf7BED1Lc0gg/jo9y8NmCyaCgV7PkwkAUN3FKe
         jzND/d+gfMiLGPADQijtvTf8RINqk4SicHgMA=
From: "Dan Hendry" <dan.hendry.junk@gmail.com>
To: <user@cassandra.apache.org>
References: <AANLkTimr1cyN8XAXSbsC0k+soKHwRzr7d_p4D5+K4Y4J@mail.gmail.com>
	<4d570708.4407dc0a.2c44.1110@mx.google.com>
 <AANLkTimQi4YohDL7W9DqXWSKMCz0DoTcZa6tpuTwBfQg@mail.gmail.com>
In-Reply-To: <AANLkTimQi4YohDL7W9DqXWSKMCz0DoTcZa6tpuTwBfQg@mail.gmail.com>
Subject: RE: per-connection "read-after-my-write" consistency
Date: Sat, 12 Feb 2011 18:02:09 -0500
Message-ID: <4d571185.863fdc0a.1d57.121d@mx.google.com>
MIME-Version: 1.0
Content-Type: multipart/alternative;
	boundary="----=_NextPart_000_003F_01CBCADE.FB76D2D0"
thread-index: AcvLBW+7rfTC2bt4QIClIlC1W+sRcwAAzdXg
Content-Language: en-ca

This is a multi-part message in MIME format.

------=_NextPart_000_003F_01CBCADE.FB76D2D0
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

> So the suggestion is to use at least 3 nodes with RF=3D3 and CL.QUORUM =
for
both write and reads where high consistency is required, right?

=20

Yes, this is the typical way to use Cassandra when both consistency and
availability are required.

=20

Dan

=20

From: Michal August=FDn [mailto:augustyn.michal@gmail.com]=20
Sent: February-12-11 17:37
To: user@cassandra.apache.org
Subject: Re: per-connection "read-after-my-write" consistency

=20

Hi,

=20

I'm using .NET and I wrote my own client library (over Thrift) so I'm
absolutely sure that both operations are performed using the same
connection.

I can handle the current issue in application but I'm sure that I will =
not
be able to handle some future situation in application.

=20

So the suggestion is to use at least 3 nodes with RF=3D3 and CL.QUORUM =
for
both write and reads where high consistency is required, right?

=20

Thanks!

2011/2/12 Dan Hendry <dan.hendry.junk@gmail.com>

Are you using a higher level client (hector/pelops/pycassa/etc) or the
actual thrift API? Higher level clients often pool connections and two
subsequent operations (read then write) may be performed with =
connections to
different nodes.

=20

If you are sure you are using the same connection (the actual thrift =
api),
there is a possible race condition. To the best of my understanding, =
here is
how a write happens at cl ONE in your case :=20

-          You make a request to node A which initiates a write to node =
A
and B

-          The server reports successful when the write to node A OR B =
is
complete (can somebody else confirm?)

=20

Typically the write to A will complete quicker since that is the node =
you
are connected to and there is additional network overhead initiating the
write on node B. I suppose a 1:1000 chance of B completing first is
possible, particularly if all nodes and the client are on the same =
network
(or same machine) with very low latencies.=20

=20

Cassandra allows you to explicitly specify the trade-off between =
consistency
and availability. When you read and write at ONE with RF=3D2, =
consistency is
not guaranteed but high availability is (you can lose a node and =
continue to
operate). If you require strong consistency you will either have to read =
or
write at consistency level ALL. My suggestion is to either design your
application to tolerate inconsistency (if possible) or move to RF=3D3 =
and
quorum read and quorum writes.

=20

Dan

=20

From: Michal August=FDn [mailto:augustyn.michal@gmail.com]=20
Sent: February-12-11 4:13
To: user@cassandra.apache.org
Subject: per-connection "read-after-my-write" consistency

=20

Hi,

=20

I'm running 2 nodes with RF=3D2 (not optimal, I know), Cassandra 0.7.1.

=20

During one connection, I write (CL.ONE) a row and subsequently read =
(CL.ONE)
the same row (via Thrift).

I supposed that if I write row to one node then I can immediately read =
this
row from this node.

It seems to be true for most cases, but circa 1 of 1000 attempts doesn't
work as expected - I get no row :(

=20

Where is the problem please? Should I use another CL for read and/or =
write?
I would like just to achieve "per connection read-after-my-write
consistency".

=20

Thank you very much!

=20

Augi

No virus found in this incoming message.
Checked by AVG - www.avg.com
Version: 9.0.872 / Virus Database: 271.1.1/3439 - Release Date: 02/12/11
02:34:00

=20

No virus found in this incoming message.
Checked by AVG - www.avg.com
Version: 9.0.872 / Virus Database: 271.1.1/3439 - Release Date: 02/12/11
02:34:00


------=_NextPart_000_003F_01CBCADE.FB76D2D0
Content-Type: text/html;
	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

<META HTTP-EQUIV=3D"Content-Type" CONTENT=3D"text/html; =
charset=3Diso-8859-1">
<html xmlns:v=3D"urn:schemas-microsoft-com:vml" =
xmlns:o=3D"urn:schemas-microsoft-com:office:office" =
xmlns:w=3D"urn:schemas-microsoft-com:office:word" =
xmlns:m=3D"http://schemas.microsoft.com/office/2004/12/omml" =
xmlns=3D"http://www.w3.org/TR/REC-html40"><head><meta name=3DGenerator =
content=3D"Microsoft Word 12 (filtered medium)"><style><!--
/* Font Definitions */
@font-face
	{font-family:"Cambria Math";
	panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
	{font-family:Calibri;
	panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
	{font-family:Tahoma;
	panose-1:2 11 6 4 3 5 4 4 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
	{margin:0cm;
	margin-bottom:.0001pt;
	font-size:12.0pt;
	font-family:"Times New Roman","serif";}
a:link, span.MsoHyperlink
	{mso-style-priority:99;
	color:blue;
	text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
	{mso-style-priority:99;
	color:purple;
	text-decoration:underline;}
p
	{mso-style-priority:99;
	mso-margin-top-alt:auto;
	margin-right:0cm;
	mso-margin-bottom-alt:auto;
	margin-left:0cm;
	font-size:12.0pt;
	font-family:"Times New Roman","serif";}
p.MsoListParagraph, li.MsoListParagraph, div.MsoListParagraph
	{mso-style-priority:34;
	margin-top:0cm;
	margin-right:0cm;
	margin-bottom:0cm;
	margin-left:36.0pt;
	margin-bottom:.0001pt;
	font-size:12.0pt;
	font-family:"Times New Roman","serif";}
span.EmailStyle18
	{mso-style-type:personal-reply;
	font-family:"Calibri","sans-serif";
	color:#1F497D;}
.MsoChpDefault
	{mso-style-type:export-only;}
@page WordSection1
	{size:612.0pt 792.0pt;
	margin:72.0pt 72.0pt 72.0pt 72.0pt;}
div.WordSection1
	{page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext=3D"edit" spidmax=3D"1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext=3D"edit">
<o:idmap v:ext=3D"edit" data=3D"1" />
</o:shapelayout></xml><![endif]--></head><body lang=3DEN-CA link=3Dblue =
vlink=3Dpurple><div class=3DWordSection1><p class=3DMsoNormal><a =
name=3D"_MailEndCompose">&gt; So the suggestion is to use at least 3 =
nodes with RF=3D3 and CL.QUORUM for both write and reads where high =
consistency is required, right?<o:p></o:p></a></p><p =
class=3DMsoNormal><span =
style=3D'font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497=
D'><o:p>&nbsp;</o:p></span></p><p class=3DMsoNormal><span =
style=3D'font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497=
D'>Yes, this is the typical way to use Cassandra when both consistency =
and availability are required.<o:p></o:p></span></p><p =
class=3DMsoNormal><span =
style=3D'font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497=
D'><o:p>&nbsp;</o:p></span></p><p class=3DMsoNormal><span =
style=3D'font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497=
D'>Dan<o:p></o:p></span></p><p class=3DMsoNormal><span =
style=3D'font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497=
D'><o:p>&nbsp;</o:p></span></p><div =
style=3D'border:none;border-top:solid #B5C4DF 1.0pt;padding:3.0pt 0cm =
0cm 0cm'><p class=3DMsoNormal><b><span lang=3DEN-US =
style=3D'font-size:10.0pt;font-family:"Tahoma","sans-serif"'>From:</span>=
</b><span lang=3DEN-US =
style=3D'font-size:10.0pt;font-family:"Tahoma","sans-serif"'> Michal =
August=FDn [mailto:augustyn.michal@gmail.com] <br><b>Sent:</b> =
February-12-11 17:37<br><b>To:</b> =
user@cassandra.apache.org<br><b>Subject:</b> Re: per-connection =
&quot;read-after-my-write&quot; =
consistency<o:p></o:p></span></p></div><p =
class=3DMsoNormal><o:p>&nbsp;</o:p></p><p =
class=3DMsoNormal>Hi,<o:p></o:p></p><div><p =
class=3DMsoNormal><o:p>&nbsp;</o:p></p></div><div><p =
class=3DMsoNormal>I'm using .NET and I wrote my own client library (over =
Thrift) so I'm absolutely sure that both operations are performed using =
the same connection.<o:p></o:p></p></div><div><p class=3DMsoNormal>I can =
handle the current issue in application but I'm sure that I will not be =
able to handle some future situation in =
application.<o:p></o:p></p></div><div><p =
class=3DMsoNormal><o:p>&nbsp;</o:p></p></div><div><p =
class=3DMsoNormal>So the suggestion is to use at least 3 nodes with =
RF=3D3 and CL.QUORUM for both write and reads where high consistency is =
required, right?<o:p></o:p></p></div><div><p =
class=3DMsoNormal><o:p>&nbsp;</o:p></p></div><div><p class=3DMsoNormal =
style=3D'margin-bottom:12.0pt'>Thanks!<o:p></o:p></p><div><p =
class=3DMsoNormal>2011/2/12 Dan Hendry &lt;<a =
href=3D"mailto:dan.hendry.junk@gmail.com">dan.hendry.junk@gmail.com</a>&g=
t;<o:p></o:p></p><div><div><p class=3DMsoNormal =
style=3D'mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'><a =
name=3D"12e1bf3eb70ac7ac__MailEndCompose"><span =
style=3D'font-size:11.0pt;color:#1F497D'>Are you using a higher level =
client (hector/pelops/pycassa/etc) or the actual thrift API? Higher =
level clients often pool connections and two subsequent operations (read =
then write) may be performed with connections to different =
nodes.</span></a><o:p></o:p></p><p class=3DMsoNormal =
style=3D'mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'><span =
style=3D'font-size:11.0pt;color:#1F497D'>&nbsp;</span><o:p></o:p></p><p =
class=3DMsoNormal =
style=3D'mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'><span =
style=3D'font-size:11.0pt;color:#1F497D'>If you are sure you are using =
the same connection (the actual thrift api), there is a possible race =
condition. To the best of my understanding, here is how a write happens =
at cl ONE in your case : </span><o:p></o:p></p><p><span =
style=3D'font-size:11.0pt;color:#1F497D'>-</span><span =
style=3D'font-size:7.0pt;color:#1F497D'>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nb=
sp;&nbsp;&nbsp;&nbsp; </span><span =
style=3D'font-size:11.0pt;color:#1F497D'>You make a request to node A =
which initiates a write to node A and B</span><o:p></o:p></p><p><span =
style=3D'font-size:11.0pt;color:#1F497D'>-</span><span =
style=3D'font-size:7.0pt;color:#1F497D'>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nb=
sp;&nbsp;&nbsp;&nbsp; </span><span =
style=3D'font-size:11.0pt;color:#1F497D'>The server reports successful =
when the write to node A OR B is complete (can somebody else =
confirm?)</span><o:p></o:p></p><p class=3DMsoNormal =
style=3D'mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;margin-left:1=
8.0pt'><span =
style=3D'font-size:11.0pt;color:#1F497D'>&nbsp;</span><o:p></o:p></p><p =
class=3DMsoNormal =
style=3D'mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'><span =
style=3D'font-size:11.0pt;color:#1F497D'>Typically the write to A will =
complete quicker since that is the node you are connected to and there =
is additional network overhead initiating the write on node B. I suppose =
a 1:1000 chance of B completing first is possible, particularly if all =
nodes and the client are on the same network (or same machine) with very =
low latencies. </span><o:p></o:p></p><p class=3DMsoNormal =
style=3D'mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'><span =
style=3D'font-size:11.0pt;color:#1F497D'>&nbsp;</span><o:p></o:p></p><p =
class=3DMsoNormal =
style=3D'mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'><span =
style=3D'font-size:11.0pt;color:#1F497D'>Cassandra allows you to =
explicitly specify the trade-off between consistency and availability. =
When you read and write at ONE with RF=3D2, consistency is not =
guaranteed but high availability is (you can lose a node and continue to =
operate). If you require strong consistency you will either have to read =
or write at consistency level ALL. My suggestion is to either design =
your application to tolerate inconsistency (if possible) or move to =
RF=3D3 and quorum read and quorum writes.</span><o:p></o:p></p><p =
class=3DMsoNormal =
style=3D'mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'><span =
style=3D'font-size:11.0pt;color:#1F497D'>&nbsp;</span><o:p></o:p></p><p =
class=3DMsoNormal =
style=3D'mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'><span =
style=3D'font-size:11.0pt;color:#1F497D'>Dan</span><o:p></o:p></p><p =
class=3DMsoNormal =
style=3D'mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'><span =
style=3D'font-size:11.0pt;color:#1F497D'>&nbsp;</span><o:p></o:p></p><div=
 style=3D'border:none;border-top:solid #B5C4DF 1.0pt;padding:3.0pt 0cm =
0cm 0cm'><p class=3DMsoNormal =
style=3D'mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'><b><span =
lang=3DEN-US style=3D'font-size:10.0pt'>From:</span></b><span =
lang=3DEN-US style=3D'font-size:10.0pt'> Michal August=FDn [mailto:<a =
href=3D"mailto:augustyn.michal@gmail.com" =
target=3D"_blank">augustyn.michal@gmail.com</a>] <br><b>Sent:</b> =
February-12-11 4:13<br><b>To:</b> <a =
href=3D"mailto:user@cassandra.apache.org" =
target=3D"_blank">user@cassandra.apache.org</a><br><b>Subject:</b> =
per-connection &quot;read-after-my-write&quot; =
consistency</span><o:p></o:p></p></div><div><div><p class=3DMsoNormal =
style=3D'mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'>&nbsp;<o:p><=
/o:p></p><p class=3DMsoNormal =
style=3D'mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'>Hi,<o:p></o:=
p></p><div><p class=3DMsoNormal =
style=3D'mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'>&nbsp;<o:p><=
/o:p></p></div><div><p class=3DMsoNormal =
style=3D'mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'>I'm running =
2 nodes with RF=3D2 (not optimal, I know), Cassandra =
0.7.1.<o:p></o:p></p></div><div><p class=3DMsoNormal =
style=3D'mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'>&nbsp;<o:p><=
/o:p></p></div><div><p class=3DMsoNormal =
style=3D'mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'>During one =
connection, I write (CL.ONE) a row and subsequently read (CL.ONE) the =
same row (via Thrift).<o:p></o:p></p></div><div><p class=3DMsoNormal =
style=3D'mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'>I supposed =
that if I write row to one node then I can immediately read this row =
from this node.<o:p></o:p></p></div><div><p class=3DMsoNormal =
style=3D'mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'>It seems to =
be true for most cases, but circa 1 of 1000 attempts doesn't work as =
expected - I get no row :(<o:p></o:p></p></div><div><p class=3DMsoNormal =
style=3D'mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'>&nbsp;<o:p><=
/o:p></p></div><div><p class=3DMsoNormal =
style=3D'mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'>Where is =
the problem please? Should I use another CL for read and/or write? I =
would like just to achieve &quot;per connection read-after-my-write =
consistency&quot;.<o:p></o:p></p></div><div><p class=3DMsoNormal =
style=3D'mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'>&nbsp;<o:p><=
/o:p></p></div><div><p class=3DMsoNormal =
style=3D'mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'>Thank you =
very much!<o:p></o:p></p></div><div><p class=3DMsoNormal =
style=3D'mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'>&nbsp;<o:p><=
/o:p></p></div><div><p class=3DMsoNormal =
style=3D'mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'>Augi<o:p></o=
:p></p></div></div></div><p><span style=3D'font-size:10.0pt'>No virus =
found in this incoming message.<br>Checked by AVG - <a =
href=3D"http://www.avg.com" =
target=3D"_blank">www.avg.com</a><br>Version: 9.0.872 / Virus Database: =
271.1.1/3439 - Release Date: 02/12/11 =
02:34:00</span><o:p></o:p></p></div></div></div><p =
class=3DMsoNormal><o:p>&nbsp;</o:p></p></div><p><span =
style=3D'font-size:10.0pt;font-family:"Arial","sans-serif"'>No virus =
found in this incoming message.<br>Checked by AVG - =
www.avg.com<br>Version: 9.0.872 / Virus Database: 271.1.1/3439 - Release =
Date: 02/12/11 02:34:00</span><o:p></o:p></p></div></body></html>
------=_NextPart_000_003F_01CBCADE.FB76D2D0--