Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (nike.apache.org: local policy)
From: aaron morton <aaron@thelastpickle.com>
Content-Type: multipart/alternative;
 boundary="Apple-Mail=_DD5C7D41-448C-4DFE-960B-FADDF5EA0198"
Message-Id: <00CC516E-BF60-46A4-9FD7-D6A5E466A331@thelastpickle.com>
Mime-Version: 1.0 (Mac OS X Mail 6.0 \(1486\))
Subject: Re: QUORUM writes, QUORUM reads -- and eventual consistency
Date: Mon, 27 Aug 2012 20:56:34 +1200
References: <20120825045509.GA2237@loggly.com>
 <CAOBz8JYEghuXoEOaYjJTkqdpJGvOeUXJr1uVdwRUrTmo1NiLGg@mail.gmail.com>
 <20120825062704.GA2570@loggly.com>
 <CAKKFZSpK1mX4+qwfCCuFWB+wXcTnBQthh6CiwQkk+x_tq8zk3Q@mail.gmail.com>
 <CADG5=eWZp9+ZEMLFEdAZz7NWFjJxFPdYzAm+pGPQDQMzmQHoSQ@mail.gmail.com>
 <CAKKFZSpGi43kg3Ya5FRjamwazaHceiucvTGtXCFDGTp+YiorVg@mail.gmail.com>
To: user@cassandra.apache.org
In-Reply-To: 
 <CAKKFZSpGi43kg3Ya5FRjamwazaHceiucvTGtXCFDGTp+YiorVg@mail.gmail.com>


--Apple-Mail=_DD5C7D41-448C-4DFE-960B-FADDF5EA0198
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
	charset=windows-1252

>  Doesn't this mean that the read does not "reflect the most recent =
write"?
Yes.=20
A write that fails is not a write.=20

> If it were to have read the newer data from the 1 node and then =
afterwards read the old data from the other 2 then there is a =
consistency problem, but in the example you give the second reader seems =
to still have a consistent view.
In the scenario of a TimedOutException for a write that is entirely =
possible. The write is not considered to be successful at the CL =
requested. So R + W > N does not hold for that datum.=20

When in doubt, ask Werner=85

when R + W > N we have strong consistency=85
"Strong consistency. After the update completes, any subsequent access =
(by A, B, or C) will return the updated value."

when R + W <=3D N we have weak / eventual consistency=85
"*Eventual consistency. This is a specific form of weak consistency; the =
storage system guarantees that if no new updates are made to the object, =
eventually *all* accesses will return the last updated value."

http://queue.acm.org/detail.cfm?id=3D1466448
(emphasis added)

In C* this may mean HH or RR or repair or standard CL checks kicking in =
to make the second read return the "correct" consistent value.=20

> Isn't it cheaper to retry the mutation on _any exception_ than to have =
a transaction in place for the majority of non failing writes?
Yes (with the counter exception).=20

if you get an UnavailableException it's from the point of view of the =
coordinator. it may be the case that the coordinator is isolated and all =
the other nodes are UP and happy.=20

Hope that helps.=20

-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 26/08/2012, at 5:03 AM, Guillermo Winkler <gwinkler@inconcertcc.com> =
wrote:

> Isn't it cheaper to retry the mutation on _any exception_ than to have =
a transaction in place for the majority of non failing writes?
>=20
> The special case to be considered is obviously counters which are not =
idempotent
>=20
> https://issues.apache.org/jira/browse/CASSANDRA-2495=20
>=20
>=20
>=20
> On Sat, Aug 25, 2012 at 4:38 AM, Russell Haering =
<russellhaering@gmail.com> wrote:
> The "issue" is that it is possible for a quorum write to return an
> error, but for the result of the write to still be reflected in the
> view seen by the client. There is really no performant way around this
> (although reading at ALL can make it much less frequent). Guaranteeing
> complete success or failure would (barring a creative solution I'm
> unaware of) require a transactional commit of some sort across the
> replica nodes for the key being written to. The performance tradeoff
> might be desirable under some circumstances, but if this is a
> requirement you should probably look at other databases.
>=20
> Some good rules to play by (someone correct me if these aren't 100% =
true):
>=20
> 1. For writes to a single key, an UnavailableException means the write
> failed totally (clients will never see the data you wrote)
> 2. For writes to a single key, a TimedOutException means you cannot
> know whether the write succeeded or failed
> 3. For writes to multiple keys, either an UnavailableException or a
> TimedOutException means you cannot know whether the write succeeded or
> failed.
>=20
> -Russell
>=20
> On Sat, Aug 25, 2012 at 12:17 AM, Guillermo Winkler
> <gwinkler@inconcertcc.com> wrote:
> > Hi Philip,
> >
> > =46rom http://wiki.apache.org/cassandra/ArchitectureOverview
> >
> > Quorum write: blocks until quorum is reached
> >
> > By my understanding if you _did_ a quorum write it means it =
successfully
> > completed.
> >
> > Guille
> >
> >
> >> I *think* we're saying the same thing here. The addition of the =
word
> >> "successful" (or something more suitable) would make the =
documentation more
> >> precise, not less.
>=20


--Apple-Mail=_DD5C7D41-448C-4DFE-960B-FADDF5EA0198
Content-Transfer-Encoding: quoted-printable
Content-Type: text/html;
	charset=windows-1252

<html><head><meta http-equiv=3D"Content-Type" content=3D"text/html =
charset=3Dwindows-1252"></head><body style=3D"word-wrap: break-word; =
-webkit-nbsp-mode: space; -webkit-line-break: after-white-space; =
"><blockquote type=3D"cite">&nbsp;Doesn't this mean that the read does =
not "reflect the most recent =
write"?</blockquote><div>Yes.&nbsp;</div><div>A write that fails is not =
a write.&nbsp;</div><div><br></div><div><blockquote type=3D"cite">If it =
were to have read the newer data from the 1 node and then afterwards =
read the old data from the other 2 then there is a consistency problem, =
but in the example you give the second reader seems to still have a =
consistent view.<br></blockquote>In the scenario of a TimedOutException =
for a write that is entirely possible. The write is not considered to be =
successful at the CL requested. So R + W &gt; N does not hold for that =
datum.&nbsp;</div><div><br></div><div>When in doubt, ask =
Werner=85</div><div><br></div><div><div>when R + W &gt; N we have strong =
consistency=85</div><div>"Strong consistency.&nbsp;After the update =
completes, any subsequent access (by A, B, or C) will return the updated =
value."</div></div><div><br></div><div>when R + W &lt;=3D N we have weak =
/ eventual consistency=85</div><div>"*Eventual consistency.&nbsp;This is =
a specific form of weak consistency; the storage system guarantees that =
if no new updates&nbsp;are made to the object, eventually *all* accesses =
will return the last updated value."</div><div><br></div><div><a =
href=3D"http://queue.acm.org/detail.cfm?id=3D1466448">http://queue.acm.org=
/detail.cfm?id=3D1466448</a></div><div>(emphasis =
added)</div><div><br></div><div>In C* this may mean HH or RR or repair =
or standard CL checks kicking in to make the second read return the =
"correct" consistent value.&nbsp;</div><div><br></div><div><blockquote =
type=3D"cite">Isn't it cheaper to retry the mutation on _any exception_ =
than to have a transaction in place for the majority of non failing =
writes?<br></blockquote>Yes (with the counter =
exception).&nbsp;</div><div><br></div><div>if you get an =
UnavailableException it's from the point of view of the coordinator. it =
may be the case that the coordinator is isolated and all the other nodes =
are UP and happy.&nbsp;</div><div><br></div><div>Hope that =
helps.&nbsp;</div><div><br><div apple-content-edited=3D"true">
<span class=3D"Apple-style-span" style=3D"border-collapse: separate; =
color: rgb(0, 0, 0); font-family: Helvetica; font-style: normal; =
font-variant: normal; font-weight: normal; letter-spacing: normal; =
line-height: normal; orphans: 2; text-align: -webkit-auto; text-indent: =
0px; text-transform: none; white-space: normal; widows: 2; word-spacing: =
0px; -webkit-border-horizontal-spacing: 0px; =
-webkit-border-vertical-spacing: 0px; =
-webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: =
auto; -webkit-text-stroke-width: 0px; font-size: medium; "><span =
class=3D"Apple-style-span" style=3D"border-collapse: separate; color: =
rgb(0, 0, 0); font-family: Helvetica; font-style: normal; font-variant: =
normal; font-weight: normal; letter-spacing: normal; line-height: =
normal; orphans: 2; text-indent: 0px; text-transform: none; white-space: =
normal; widows: 2; word-spacing: 0px; -webkit-border-horizontal-spacing: =
0px; -webkit-border-vertical-spacing: 0px; =
-webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: =
auto; -webkit-text-stroke-width: 0px; font-size: medium; "><div =
style=3D"word-wrap: break-word; -webkit-nbsp-mode: space; =
-webkit-line-break: after-white-space; "><span class=3D"Apple-style-span" =
style=3D"border-collapse: separate; color: rgb(0, 0, 0); font-family: =
Helvetica; font-style: normal; font-variant: normal; font-weight: =
normal; letter-spacing: normal; line-height: normal; orphans: 2; =
text-indent: 0px; text-transform: none; white-space: normal; widows: 2; =
word-spacing: 0px; -webkit-border-horizontal-spacing: 0px; =
-webkit-border-vertical-spacing: 0px; =
-webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: =
auto; -webkit-text-stroke-width: 0px; font-size: medium; "><div =
style=3D"word-wrap: break-word; -webkit-nbsp-mode: space; =
-webkit-line-break: after-white-space; "><span class=3D"Apple-style-span" =
style=3D"border-collapse: separate; color: rgb(0, 0, 0); font-family: =
Helvetica; font-style: normal; font-variant: normal; font-weight: =
normal; letter-spacing: normal; line-height: normal; orphans: 2; =
text-indent: 0px; text-transform: none; white-space: normal; widows: 2; =
word-spacing: 0px; -webkit-border-horizontal-spacing: 0px; =
-webkit-border-vertical-spacing: 0px; =
-webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: =
auto; -webkit-text-stroke-width: 0px; font-size: medium; "><div =
style=3D"word-wrap: break-word; -webkit-nbsp-mode: space; =
-webkit-line-break: after-white-space; =
"><div><div>-----------------</div><div>Aaron Morton</div><div>Freelance =
Developer</div><div>@aaronmorton</div><div><a =
href=3D"http://www.thelastpickle.com">http://www.thelastpickle.com</a></di=
v></div></div></span></div></span></div></span></span>
</div>

<br><div><div>On 26/08/2012, at 5:03 AM, Guillermo Winkler &lt;<a =
href=3D"mailto:gwinkler@inconcertcc.com">gwinkler@inconcertcc.com</a>&gt; =
wrote:</div><br class=3D"Apple-interchange-newline"><blockquote =
type=3D"cite">Isn't it cheaper to retry the mutation on _any exception_ =
than to have a transaction in place for the majority of non failing =
writes?<div><br>The special case to be considered is obviously counters =
which are not idempotent</div>

<div><br></div><div><a =
href=3D"https://issues.apache.org/jira/browse/CASSANDRA-2495">https://issu=
es.apache.org/jira/browse/CASSANDRA-2495</a>&nbsp;</div><div><br></div><di=
v><br></div><div><br><div class=3D"gmail_quote">On Sat, Aug 25, 2012 at =
4:38 AM, Russell Haering <span dir=3D"ltr">&lt;<a =
href=3D"mailto:russellhaering@gmail.com" =
target=3D"_blank">russellhaering@gmail.com</a>&gt;</span> wrote:<br>

<blockquote class=3D"gmail_quote" style=3D"margin: 0px 0px 0px 0.8ex; =
border-left-width: 1px; border-left-color: rgb(204, 204, 204); =
border-left-style: solid; padding-left: 1ex; position: static; z-index: =
auto; ">The "issue" is that it is possible for a quorum write to return =
an<br>
error, but for the result of the write to still be reflected in the<br>
view seen by the client. There is really no performant way around =
this<br>
(although reading at ALL can make it much less frequent). =
Guaranteeing<br>
complete success or failure would (barring a creative solution I'm<br>
unaware of) require a transactional commit of some sort across the<br>
replica nodes for the key being written to. The performance tradeoff<br>
might be desirable under some circumstances, but if this is a<br>
requirement you should probably look at other databases.<br>
<br>
Some good rules to play by (someone correct me if these aren't 100% =
true):<br>
<br>
1. For writes to a single key, an UnavailableException means the =
write<br>
failed totally (clients will never see the data you wrote)<br>
2. For writes to a single key, a TimedOutException means you cannot<br>
know whether the write succeeded or failed<br>
3. For writes to multiple keys, either an UnavailableException or a<br>
TimedOutException means you cannot know whether the write succeeded =
or<br>
failed.<br>
<span class=3D"HOEnZb"><font color=3D"#888888"><br>
-Russell<br>
</font></span><div class=3D"HOEnZb"><div class=3D"h5"><br>
On Sat, Aug 25, 2012 at 12:17 AM, Guillermo Winkler<br>
&lt;<a =
href=3D"mailto:gwinkler@inconcertcc.com">gwinkler@inconcertcc.com</a>&gt; =
wrote:<br>
&gt; Hi Philip,<br>
&gt;<br>
&gt; =46rom <a =
href=3D"http://wiki.apache.org/cassandra/ArchitectureOverview" =
target=3D"_blank">http://wiki.apache.org/cassandra/ArchitectureOverview</a=
><br>
&gt;<br>
&gt; Quorum write: blocks until quorum is reached<br>
&gt;<br>
&gt; By my understanding if you _did_ a quorum write it means it =
successfully<br>
&gt; completed.<br>
&gt;<br>
&gt; Guille<br>
&gt;<br>
&gt;<br>
&gt;&gt; I *think* we're saying the same thing here. The addition of the =
word<br>
&gt;&gt; "successful" (or something more suitable) would make the =
documentation more<br>
&gt;&gt; precise, not less.<br>
</div></div></blockquote></div><br></div>
</blockquote></div><br></div></body></html>=

--Apple-Mail=_DD5C7D41-448C-4DFE-960B-FADDF5EA0198--