Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
MIME-Version: 1.0
References: <173319751.1622558.1488977720789.ref@mail.yahoo.com>
 <173319751.1622558.1488977720789@mail.yahoo.com> <CAG_0Gqutmp=Ju6qdOrMANmYr22qXCC0s27Zmqv8BdLCt60JEbQ@mail.gmail.com>
In-Reply-To: <CAG_0Gqutmp=Ju6qdOrMANmYr22qXCC0s27Zmqv8BdLCt60JEbQ@mail.gmail.com>
From: Eric Stevens <mightye@gmail.com>
Date: Wed, 08 Mar 2017 18:45:14 +0000
Message-ID: <CAORswtzdXoXDiOQCapx9qNgrEYoM3UwUQaLpGoTAs7jd1Og2mw@mail.gmail.com>
Subject: Re: Is it possible to recover a deleted-in-future record?
To: user@cassandra.apache.org,
	"anujw_2003@yahoo.co.in" <anujw_2003@yahoo.co.in>
Content-Type: multipart/alternative; boundary=001a114897b41b0479054a3c8823
archived-at: Wed, 08 Mar 2017 18:45:36 -0000

--001a114897b41b0479054a3c8823
Content-Type: text/plain; charset=UTF-8

Those future tombstones are going to continue to cause problems on those
partitions.  If you're still writing to those partitions, you might be
losing data in the mean time.  It's going to be hard to get the tombstone
out of the way so that new writes can begin to happen there (newly written
data will be occluded by the existing tombstones).  Manual cleanup might be
required here, such sstablefilter or sstable2json->clean up the data ->
json2sstable.  This could get really hairy.

Another option, depending on the kind of tombstone they were (eg cell
level), my deleting compactor[1] might be able to clean them up on the live
cluster via user defined compaction if you wrote a convictor for this
purpose.  But that tool has a gap for cluster and/or partition level
tombstones which it doesn't properly recognize yet (there's an open PR that
provides partial implementation, but I'm not sure it would get you what you
need).  You can see my talk about that[2].

Careful caveat on this though, the deleting compactor was written to
_avoid_ tombstones, it hasn't been well tested against data that contains
tombstones, so although time is critical for you here to avoid ongoing
corruption of your data while those bad tombstones remain in the way, I
would still fully encourage you to validate whether this could satisfy your
use case.

[1] https://github.com/protectwise/cassandra-util
[2] https://www.youtube.com/watch?v=BhGkSnBZgJA

On Wed, Mar 8, 2017 at 6:06 AM Arvydas Jonusonis <
arvydas.jonusonis@gmail.com> wrote:

> That's a good point - a snapshot is certainly in order ASAP, if not
> already done.
>
> One more thing I'd add about "data has to be consolidated from all the
> nodes" (from #3 below):
>
>    - EITHER run the sstable2json ops on each node
>    - OR if size permits, copy the relevant sstables (containing the
>    desired keys, from the output of the nodetool getsstables) locally or onto
>    a new single-node instance, start that instance and run the commands there
>
> If restoring the sstables from a snapshot, you'll need to do the latter
> anyway.
>
> Arvydas
>
> On Wed, Mar 8, 2017 at 1:55 PM, Anuj Wadehra <anujw_2003@yahoo.co.in>
> wrote:
>
> DISCLAIMER: This is only my personal opinion. Evaluate the situation
> carefully and if you find below suggestions useful, follow them at your own
> risk.
>
> If I have understood the problem correctly, malicious deletes would
> actually lead to deletion of data.  I am not sure how everything is normal
> after the deletes?
>
> If data is critical,you could:
>
> 1. Take a database snapshot immediately so that you dont lose information
> if delete entrues in sstables are compacted together with original data.
>
> 2. Transfer snapshot to suitable place and Run some utility such as
> sstabletojson to get the keys impacted by the deletes and original data for
> keys. Data has to be consolidated from all the nodes.
>
> 3. Devise a strategy to restore deleted data.
>
> Thanks
> Anuj
>
>
>
> On Tue, Mar 7, 2017 at 8:44 AM, Michael Fong
> <michael.fong@ruckuswireless.com> wrote:
>
> Hi, all,
>
>
>
>
>
> We recently encountered an issue in production that some records were
> mysteriously deleted with a timestamp 100+ years from now. Everything is
> normal as of now, and how the deletion happened and accuracy of system
> timestamp at that moment are unknown. We were wondering if there is a
> general way to recover the mysteriously-deleted data when the timestamp
> meta is screwed up.
>
>
>
> Thanks in advanced,
>
>
>
> Regards,
>
>
>
> Michael Fong
>
>
>

--001a114897b41b0479054a3c8823
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">Those future tombstones are going to continue to cause pro=
blems on those partitions.=C2=A0 If you&#39;re still writing to those parti=
tions, you might be losing data in the mean time.=C2=A0 It&#39;s going to b=
e hard to get the tombstone out of the way so that new writes can begin to =
happen there (newly written data will be occluded by the existing tombstone=
s).=C2=A0 Manual cleanup might be required here, such sstablefilter or ssta=
ble2json-&gt;clean up the data -&gt; json2sstable.=C2=A0 This could get rea=
lly hairy. =C2=A0<div><br></div><div>Another option, depending on the kind =
of tombstone they were (eg cell level), my deleting compactor[1] might be a=
ble to clean them up on the live cluster via user defined compaction if you=
 wrote a convictor for this purpose.=C2=A0 But that tool has a gap for clus=
ter and/or partition level tombstones which it doesn&#39;t properly recogni=
ze yet (there&#39;s an open PR that provides partial implementation, but I&=
#39;m not sure it would get you what you need).=C2=A0 You can see my talk a=
bout that[2].</div><div><br></div><div>Careful caveat on this though, the d=
eleting compactor was written to _avoid_ tombstones, it hasn&#39;t been wel=
l tested against data that contains tombstones, so although time is critica=
l for you here to avoid ongoing corruption of your data while those bad tom=
bstones remain in the way, I would still fully encourage you to validate wh=
ether this could satisfy your use case.</div><div><br></div><div>[1]=C2=A0<=
a href=3D"https://github.com/protectwise/cassandra-util">https://github.com=
/protectwise/cassandra-util</a>=C2=A0</div><div>[2]=C2=A0<a href=3D"https:/=
/www.youtube.com/watch?v=3DBhGkSnBZgJA">https://www.youtube.com/watch?v=3DB=
hGkSnBZgJA</a>=C2=A0</div></div><br><div class=3D"gmail_quote"><div dir=3D"=
ltr">On Wed, Mar 8, 2017 at 6:06 AM Arvydas Jonusonis &lt;<a href=3D"mailto=
:arvydas.jonusonis@gmail.com">arvydas.jonusonis@gmail.com</a>&gt; wrote:<br=
></div><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-=
left:1px #ccc solid;padding-left:1ex"><div dir=3D"ltr" class=3D"gmail_msg">=
That&#39;s a good point - a snapshot is certainly in order ASAP, if not alr=
eady done.<div class=3D"gmail_msg"><br class=3D"gmail_msg"></div><div class=
=3D"gmail_msg">One more thing I&#39;d add about &quot;data has to be consol=
idated from all the nodes&quot; (from #3 below):</div><div class=3D"gmail_m=
sg"><ul class=3D"gmail_msg"><li class=3D"gmail_msg">EITHER run the sstable2=
json ops on each node</li><li class=3D"gmail_msg">OR if size permits, copy =
the relevant sstables (containing the desired keys, from the output of the =
nodetool getsstables) locally or onto a new single-node instance, start tha=
t instance and run the commands there<br class=3D"gmail_msg"></li></ul><div=
 class=3D"gmail_msg">If restoring the sstables from a snapshot, you&#39;ll =
need to do the latter anyway.</div></div></div><div dir=3D"ltr" class=3D"gm=
ail_msg"><div class=3D"gmail_msg"><div class=3D"gmail_msg"><br class=3D"gma=
il_msg"></div><div class=3D"gmail_msg">Arvydas</div></div></div><div dir=3D=
"ltr" class=3D"gmail_msg"><div class=3D"gmail_msg"><div class=3D"gmail_msg"=
><br class=3D"gmail_msg"><div class=3D"gmail_extra gmail_msg"><div class=3D=
"gmail_quote gmail_msg">On Wed, Mar 8, 2017 at 1:55 PM, Anuj Wadehra <span =
dir=3D"ltr" class=3D"gmail_msg">&lt;<a href=3D"mailto:anujw_2003@yahoo.co.i=
n" class=3D"gmail_msg" target=3D"_blank">anujw_2003@yahoo.co.in</a>&gt;</sp=
an> wrote:<br class=3D"gmail_msg"><blockquote class=3D"gmail_quote gmail_ms=
g" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">=
<div id=3D"m_269995764458169469m_2783713789221460040m_-6259105945896128106y=
mail_android_signature" class=3D"gmail_msg">DISCLAIMER: This is only my per=
sonal opinion. Evaluate the situation carefully and if you find below sugge=
stions useful, follow them at your own risk.</div><div id=3D"m_269995764458=
169469m_2783713789221460040m_-6259105945896128106yMail_cursorElementTracker=
_1488976609787" class=3D"gmail_msg"><br class=3D"gmail_msg"></div><div id=
=3D"m_269995764458169469m_2783713789221460040m_-6259105945896128106yMail_cu=
rsorElementTracker_1488977026344" class=3D"gmail_msg">If I have understood =
the problem correctly, malicious deletes would actually lead to deletion of=
 data.=C2=A0 I am not sure how everything is normal after the deletes?</div=
><div id=3D"m_269995764458169469m_2783713789221460040m_-6259105945896128106=
yMail_cursorElementTracker_1488977444608" class=3D"gmail_msg"><br class=3D"=
gmail_msg"></div><div id=3D"m_269995764458169469m_2783713789221460040m_-625=
9105945896128106yMail_cursorElementTracker_1488977483944" class=3D"gmail_ms=
g">If data is critical,you could:<br class=3D"gmail_msg"></div><div id=3D"m=
_269995764458169469m_2783713789221460040m_-6259105945896128106yMail_cursorE=
lementTracker_1488977123158" class=3D"gmail_msg"><br class=3D"gmail_msg"></=
div><div id=3D"m_269995764458169469m_2783713789221460040m_-6259105945896128=
106yMail_cursorElementTracker_1488977123238" class=3D"gmail_msg">1. Take a =
database snapshot immediately so that you dont lose information if delete e=
ntrues in sstables are compacted together with original data.=C2=A0</div><d=
iv id=3D"m_269995764458169469m_2783713789221460040m_-6259105945896128106yMa=
il_cursorElementTracker_1488977241936" class=3D"gmail_msg"><br class=3D"gma=
il_msg"></div><div id=3D"m_269995764458169469m_2783713789221460040m_-625910=
5945896128106yMail_cursorElementTracker_1488977242366" class=3D"gmail_msg">=
2. Transfer snapshot to suitable place and Run some utility such as sstable=
tojson to get the keys impacted by the deletes and original data for keys. =
Data has to be consolidated from all the nodes.</div><div id=3D"m_269995764=
458169469m_2783713789221460040m_-6259105945896128106yMail_cursorElementTrac=
ker_1488977510078" class=3D"gmail_msg"><br class=3D"gmail_msg"></div><div i=
d=3D"m_269995764458169469m_2783713789221460040m_-6259105945896128106yMail_c=
ursorElementTracker_1488977510215" class=3D"gmail_msg">3. Devise a strategy=
 to restore deleted data.</div><div id=3D"m_269995764458169469m_27837137892=
21460040m_-6259105945896128106yMail_cursorElementTracker_1488977586933" cla=
ss=3D"gmail_msg"><br class=3D"gmail_msg"></div><div id=3D"m_269995764458169=
469m_2783713789221460040m_-6259105945896128106yMail_cursorElementTracker_14=
88977587115" class=3D"gmail_msg">Thanks</div><div id=3D"m_26999576445816946=
9m_2783713789221460040m_-6259105945896128106yMail_cursorElementTracker_1488=
977591051" class=3D"gmail_msg">Anuj</div><div id=3D"m_269995764458169469m_2=
783713789221460040m_-6259105945896128106yMail_cursorElementTracker_14889755=
01910" class=3D"gmail_msg"><br class=3D"gmail_msg"></div><div id=3D"m_26999=
5764458169469m_2783713789221460040m_-6259105945896128106yMail_cursorElement=
Tracker_1488975502058" class=3D"gmail_msg"><br class=3D"gmail_msg"></div> <=
br class=3D"gmail_msg"> <blockquote style=3D"margin:0 0 20px 0" class=3D"gm=
ail_msg"> <u class=3D"gmail_msg"></u> <div class=3D"gmail_msg">On Tue, Mar =
7, 2017 at 8:44 AM, Michael Fong</div><div class=3D"gmail_msg">&lt;<a href=
=3D"mailto:michael.fong@ruckuswireless.com" class=3D"gmail_msg" target=3D"_=
blank">michael.fong@ruckuswireless.com</a>&gt; wrote:</div> <u class=3D"gma=
il_msg"></u><span class=3D"gmail_msg"> <div style=3D"padding:10px 0 0 20px;=
margin:10px 0 0 0;border-left:1px solid #6d00f6" class=3D"gmail_msg">=20
<div class=3D"m_269995764458169469m_2783713789221460040m_-62591059458961281=
06WordSection1 gmail_msg">
<p class=3D"MsoNormal gmail_msg">Hi, all,</p>=20
<p class=3D"MsoNormal gmail_msg"> =C2=A0</p>=20
<p class=3D"MsoNormal gmail_msg"> =C2=A0</p>=20
<p class=3D"MsoNormal gmail_msg">We recently encountered an issue in produc=
tion that some records were mysteriously deleted with a timestamp 100+ year=
s from now. Everything is normal as of now, and how the deletion happened a=
nd accuracy of system timestamp at that moment
 are unknown. We were wondering if there is a general way to recover the my=
steriously-deleted data when the timestamp meta is screwed up.</p>=20
<p class=3D"MsoNormal gmail_msg"> =C2=A0</p>=20
<p class=3D"MsoNormal gmail_msg">Thanks in advanced,</p>=20
<p class=3D"MsoNormal gmail_msg"> =C2=A0</p>=20
<p class=3D"MsoNormal gmail_msg">Regards,</p>=20
<p class=3D"MsoNormal gmail_msg"> =C2=A0</p>=20
<p class=3D"MsoNormal gmail_msg">Michael Fong</p>=20
</div>
 </div> </span></blockquote></blockquote></div><br class=3D"gmail_msg"></di=
v></div></div></div></blockquote></div>

--001a114897b41b0479054a3c8823--