Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
MIME-Version: 1.0
In-Reply-To: <CAPiyorXGEQ1-WkHw7KgxponC082VA1dSzGVLPHET4VNzeOw2ug@mail.gmail.com>
References: <CAPph6F+bgiY7H8Ga-h8hWVhLZvwoeXHsm8mpwvqY2KfW8UcQHQ@mail.gmail.com>
	<CAGJhWaXb=Vkee+1xKuHkW5kJszhbyecCu5bxLx7CiCQYKYG-hA@mail.gmail.com>
	<32465DF4-CC80-455B-9CF0-12179731459E@crowdstrike.com>
	<CAPph6FL0Be6prEhiKfBdyFUKb7pn0-vX+SOj6KnywPCxVu+cpw@mail.gmail.com>
	<CAPiyorUvDmnUUvKPtkqWyO2fyvfL=Qa3TWUyQ3KCu+PtV1xrUw@mail.gmail.com>
	<CAPph6F+0qi5HpsK1XxA_+QqscZRwSuq+wr+Wt0=YUafeFke6-Q@mail.gmail.com>
	<CAPiyorUA-WVtJeckpKwMkijNUMEvpdZ7ZWA2qQmOYmZDCUWsPQ@mail.gmail.com>
	<CAPiyorXGEQ1-WkHw7KgxponC082VA1dSzGVLPHET4VNzeOw2ug@mail.gmail.com>
Date: Fri, 6 May 2016 22:20:02 +0530
Message-ID: <CAPiyorXa0MaxyMQKzgsjyvH6UggTjnk-P7y-GGiW1LeiuoL_jw@mail.gmail.com>
Subject: Re: Read data from specific node in cassandra
From: Joseph Tech <jaalex.tech@gmail.com>
To: user@cassandra.apache.org
Content-Type: multipart/alternative; boundary=94eb2c06bf3806ba0305322f40f4
archived-at: Fri, 06 May 2016 16:50:08 -0000

--94eb2c06bf3806ba0305322f40f4
Content-Type: text/plain; charset=UTF-8

Please check if nodetool getendpoints be used, if you know the key (going
by your problem  description)
On 6 May 2016 22:04, "Siddharth Verma" <verma.siddharth@snapdeal.com> wrote:

@Joseph,
An incident we saw in production, and have a speculation as to how it might
have occured.

*A detailed description of use case*

*Incident*
We have a 2 DCs each with three nodes.
And our keyspace has RF 3 per DC. read_repair_chance is 0.0 for all the
tables.
After a while(we run periodic full table scans to dump data someplace
else), we saw corrupted data being dumped.
We copied the ss tables of all node of one DC to a separate cluster created
for debugging.
     We shutdown two nodes of the replica cluster, so that only one was up,
and made queries on cqlsh for the possibly corrupted data.
     What we saw was. out of the three nodes of replica, two has similar
data, and one had some extra data which shouldn't have been there for that
particular partition key.


*Speculation*
A possible cause we could come up with was, on a particular day, one of the
nodes of the production DC might have gone down. And that time might have
crossed the hinted_handoff_window.
Say, node went down on 12PM
Coordinator nodes stored hints from 12PM - 3PM.
Node was started on 6PM
All deletions/updates 3PM-6PM were not on our particular node.
And repair wasn't run on that node. After 10 days, tombstones
deleted(gc_grace_seconds).
Now that particular node still has data which was missed in deletion, and
the data has been removed from other two nodes.
So, we can't run repair now.

Again, it is a possible speculation. We are not sure. This is the only
cause we could come up with


@User
Back to the requirement "*Read data from specific node in cassandra*"
I prematurely stated whitelist worked *perfectly. *However, while scanning
the data, it isn't the case. It has caused ambiguous data dump.
This option didn't work for debugging.
Could someone suggest other alternatives?

--94eb2c06bf3806ba0305322f40f4
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<p dir=3D"ltr">Please check if nodetool getendpoints be used, if you know t=
he key (going by your problem=C2=A0 description)</p>
<div class=3D"gmail_quote">On 6 May 2016 22:04, &quot;Siddharth Verma&quot;=
 &lt;<a href=3D"mailto:verma.siddharth@snapdeal.com">verma.siddharth@snapde=
al.com</a>&gt; wrote:<br type=3D"attribution"><blockquote class=3D"quote" s=
tyle=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div=
 dir=3D"ltr"><div><div><div><div><div><div><div><div><div><div><div><div><d=
iv><div><div><div><div><div><div><div>@Joseph,<br></div>An incident we saw =
in production, and have a speculation as to how it might have occured.<br><=
br></div><div><b>A detailed description of use case</b><br></div><u>Inciden=
t<br></u></div>We have a 2 DCs each with three nodes.<br></div><div>And our=
 keyspace has RF 3 per DC. read_repair_chance is 0.0 for all the tables.<br=
></div>After a while(we run periodic full table scans to dump data someplac=
e else), we saw corrupted data being dumped. <br></div>We copied the ss tab=
les of all node of one DC to a separate cluster created for debugging.<br><=
/div>=C2=A0=C2=A0=C2=A0=C2=A0
 We shutdown two nodes of the replica cluster, so that only one was up,=20
and made queries on cqlsh for the possibly corrupted data.<br></div>=C2=A0=
=C2=A0=C2=A0=C2=A0
 What we saw was. out of the three nodes of replica, two has similar=20
data, and one had some extra data which shouldn&#39;t have been there for=
=20
that particular partition key.<br><br><br></div><u>Speculation<br></u></div=
>A
 possible cause we could come up with was, on a particular day, one of=20
the nodes of the production DC might have gone down. And that time might
 have crossed the hinted_handoff_window.<br></div>Say, node went down on 12=
PM<br></div>Coordinator nodes stored hints from 12PM - 3PM.<br></div>Node w=
as started on 6PM<br></div>All deletions/updates 3PM-6PM were not on our pa=
rticular node.<br></div>And repair wasn&#39;t run on that node. After 10 da=
ys, tombstones deleted(gc_grace_seconds).<br></div>Now that particular node=
 still has data which was missed in deletion, and the data has been removed=
 from other two nodes.<br></div>So, we can&#39;t run repair now.<br><br></d=
iv>Again, it is a possible speculation. We are not sure. This is the only c=
ause we could come up with<br><br><br></div><div>@User<br></div>Back to the=
 requirement &quot;<b>Read data from specific node in cassandra</b>&quot; <=
br>I prematurely stated whitelist worked <i>perfectly. </i>However, while s=
canning the data, it isn&#39;t the case. It has caused ambiguous data dump.=
 <br></div>This option didn&#39;t work for debugging. <br></div>Could someo=
ne suggest other alternatives?<br></div>
</blockquote></div>

--94eb2c06bf3806ba0305322f40f4--