Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: neutral (athena.apache.org: local policy)
MIME-Version: 1.0
Date: Fri, 25 Jul 2014 10:46:17 -0700
Message-ID: 
 <CA+Rbu67_weeeMO2yVUnBGzCUBHN-rD-+7XaCTcedxX5q683Njw@mail.gmail.com>
Subject: Replication factor 2 with immutable data
From: Jon Travis <jtravis@p00p.org>
To: user@cassandra.apache.org
Content-Type: multipart/alternative; boundary=e89a8f8397ad7f8b0504ff082664

--e89a8f8397ad7f8b0504ff082664
Content-Type: text/plain; charset=UTF-8

I have a couple questions regarding the availability of my data in a RF=2
scenario.

- The setup -
I am currently storing immutable data in a CF with RF=2 and
read_repair_chance = 0.0.  There is a lot of data, so bumping up to RF=3
would increase my storage costs quite dramatically.  For the most part, I
am only adding data to this CF (and nightly, do some deleting).  Writes and
Reads are both being done with CL = ONE.

- The questions -
When I write a value, it is written to replicas A and B.  If B is down,
then A will still acknowledge the write and the write will succeed.  Great.
Now then, if B comes back up, and before B gets the handoff of the data
from A, a client attempts to read the recently-written data.  If the client
attempts to read the data and it gets routed to replica B, the data will
not exist there, and the read will fail, correct?

But what I really want is for the read to hit both A and B, and whichever
one returns the data then great -- I only need 1 of them to actually
acknowledge having it.

My questions are:
  - Is it possible to achieve consistency in this approach?  Even if I try
at CL=TWO and backoff to CL=ONE in a failure condition, there still seems
to be a race where I could hit the replica without the data.
  - Does a replica 'not having the data' count towards the CL requirements?
 I.e. replica B responds, "Nope, don't have it" -- I don't want the CL to
be satisfied, because the data is either there or it is not.  I have not
done updates to the data.

This feels a bit quorum-ish, where a quorum under RF=3 will ask 3 nodes for
the data and return success when 2 have consistent results.

It feels strange to be able to write data at RF=2, then with only 1 node
being down, not be able to read it ...

Thanks,

-- Jon

--e89a8f8397ad7f8b0504ff082664
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">I have a couple questions regarding the availability of my=
 data in a RF=3D2 scenario. =C2=A0<div><br></div><div>- The setup -=C2=A0</=
div><div>I am currently storing immutable data in a CF with RF=3D2 and read=
_repair_chance =3D 0.0. =C2=A0There is a lot of data, so bumping up to RF=
=3D3 would increase my storage costs quite dramatically. =C2=A0For the most=
 part, I am only adding data to this CF (and nightly, do some deleting). =
=C2=A0Writes and Reads are both being done with CL =3D ONE.</div>
<div><br></div><div>- The questions -</div><div>When I write a value, it is=
 written to replicas A and B. =C2=A0If B is down, then A will still acknowl=
edge the write and the write will succeed. =C2=A0Great.</div><div>Now then,=
 if B comes back up, and before B gets the handoff of the data from A, a cl=
ient attempts to read the recently-written data. =C2=A0If the client attemp=
ts to read the data and it gets routed to replica B, the data will not exis=
t there, and the read will fail, correct?</div>
<div><br></div><div>But what I really want is for the read to hit both A an=
d B, and whichever one returns the data then great -- I only need 1 of them=
 to actually acknowledge having it. =C2=A0</div><div><br></div><div>My ques=
tions are:<br>
=C2=A0 - Is it possible to achieve consistency in this approach? =C2=A0Even=
 if I try at CL=3DTWO and backoff to CL=3DONE in a failure condition, there=
 still seems to be a race where I could hit the replica without the data.</=
div><div>=C2=A0 - Does a replica &#39;not having the data&#39; count toward=
s the CL requirements? =C2=A0I.e. replica B responds, &quot;Nope, don&#39;t=
 have it&quot; -- I don&#39;t want the CL to be satisfied, because the data=
 is either there or it is not. =C2=A0I have not done updates to the data.</=
div>
<div><br></div><div>This feels a bit quorum-ish, where a quorum under RF=3D=
3 will ask 3 nodes for the data and return success when 2 have consistent r=
esults.</div><div><br></div><div>It feels strange to be able to write data =
at RF=3D2, then with only 1 node being down, not be able to read it ...</di=
v>
<div><br></div><div>Thanks,</div><div><br></div><div>-- Jon</div><div><br><=
/div></div>

--e89a8f8397ad7f8b0504ff082664--