Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (athena.apache.org: domain of rcoli@eventbrite.com
 designates 209.85.217.169 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CACUS8n1=8zWmE5SMDqKgSim9BCMOtpVOY0jA1o8zbCr+MMMejA@mail.gmail.com>
References: 
 <CA+VqcYMbgdwbqDpAnR7mWqBtN16X+X0oRTEoA11z-33PjsCxaQ@mail.gmail.com>
	<CAEDUwd1rT_CRCibqhMG9_gWZH98gzcurw3aBTc+VJv1y=UPhSw@mail.gmail.com>
	<CACUS8n1jAdx1sDi=dkO58G9KLw86LK4zAVvrSGXQhXJ+HvT5sQ@mail.gmail.com>
	<CAEDUwd29zmjutGCrB-8JPmyjmrVo9VeckS1wUiRRmv59cy4Vpg@mail.gmail.com>
	<CACUS8n3CXXhRUJ4VewdoaNnRHdTLt3RFxXpKf_TLnxfZK135dg@mail.gmail.com>
	<CAEDUwd2aAtnBPH96v5yJNKa3PptrGX0j_ygkFAD0qbmTWXRZeQ@mail.gmail.com>
	<CACUS8n08che49NVhD06V3kK_C4meH9DGeAaAbgzq1RpcG=_AJw@mail.gmail.com>
	<CAEDUwd3biGu0=Wf9BY3rTuJHsrZ-gUOrMFVTK1eTrZpYPVTr4A@mail.gmail.com>
	<CACUS8n1=8zWmE5SMDqKgSim9BCMOtpVOY0jA1o8zbCr+MMMejA@mail.gmail.com>
Date: Tue, 4 Feb 2014 10:39:56 -0800
Message-ID: 
 <CAEDUwd1LdRK6zCWD-bYJ0KpPqZYXzGLxP_gh7qnL-nk7rz5jpA@mail.gmail.com>
Subject: Re: Data tombstoned during bulk loading 1.2.10 -> 2.0.3
From: Robert Coli <rcoli@eventbrite.com>
To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Content-Type: multipart/alternative; boundary=001a1135e87a835c7a04f198f73c

--001a1135e87a835c7a04f198f73c
Content-Type: text/plain; charset=ISO-8859-1

On Tue, Feb 4, 2014 at 12:21 AM, olek.stasiak@gmail.com <
olek.stasiak@gmail.com> wrote:

> I don't know what is the real cause of my problem. We are still guessing.
> All operations I have done one cluster are described on timeline:
> 1.1.7-> 1.2.10 -> upgradesstable -> 2.0.2 -> normal operations ->2.0.3
> -> normal operations -> now
> normal operations means reads/writes/repairs.
> Could you please, describe briefly how to recover data? I have a
> problem with scenario described under link:
>
> http://thelastpickle.com/blog/2011/12/15/Anatomy-of-a-Cassandra-Partition.html,
> I can't apply this solution to my case.
>

I think your only option is the following :

1) determine which SSTables contain rows have doomstones (tombstones from
the far future)
2) determine whether these tombstones mask a live or dead version of the
row, by looking at other row fragments
3) dump/filter/re-write all your data via some method, probably
sstable2json/json2sstable
4) load the corrected sstables by starting a node with the sstables in the
data directory

I understand you have a lot of data, but I am pretty sure there is no way
for you to fix it within Cassandra. Perhaps ask for advice on the JIRA
ticket mentioned upthread if this answer is not sufficient?

=Rob

--001a1135e87a835c7a04f198f73c
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div class=3D"gmail_extra"><div class=3D"gmail_quote">On T=
ue, Feb 4, 2014 at 12:21 AM, <a href=3D"mailto:olek.stasiak@gmail.com">olek=
.stasiak@gmail.com</a> <span dir=3D"ltr">&lt;<a href=3D"mailto:olek.stasiak=
@gmail.com" target=3D"_blank">olek.stasiak@gmail.com</a>&gt;</span> wrote:<=
br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">I don&#39;t know what is the real cause of m=
y problem. We are still guessing.<br>
All operations I have done one cluster are described on timeline:<br>
<div class=3D"im">1.1.7-&gt; 1.2.10 -&gt; upgradesstable -&gt; 2.0.2 -&gt; =
normal operations -&gt;2.0.3<br>
</div>-&gt; normal operations -&gt; now<br>
normal operations means reads/writes/repairs.<br>
Could you please, describe briefly how to recover data? I have a<br>
problem with scenario described under link:<br>
<a href=3D"http://thelastpickle.com/blog/2011/12/15/Anatomy-of-a-Cassandra-=
Partition.html" target=3D"_blank">http://thelastpickle.com/blog/2011/12/15/=
Anatomy-of-a-Cassandra-Partition.html</a> ,<br>
I can&#39;t apply this solution to my case.<br></blockquote><div><br></div>=
<div>I think your only option is the following :</div><div><br></div><div>1=
) determine which SSTables contain rows have doomstones (tombstones from th=
e far future)</div>
<div>2) determine whether these tombstones mask a live or dead version of t=
he row, by looking at other row fragments</div><div>3) dump/filter/re-write=
 all your data via some method, probably sstable2json/json2sstable</div>
<div>4) load the corrected sstables by starting a node with the sstables in=
 the data directory</div><div><br></div><div>I understand you have a lot of=
 data, but I am pretty sure there is no way for you to fix it within Cassan=
dra. Perhaps ask for advice on the JIRA ticket mentioned upthread if this a=
nswer is not sufficient?</div>
<div><br></div><div>=3DRob</div><div><br></div></div></div></div>

--001a1135e87a835c7a04f198f73c--