Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (athena.apache.org: local policy)
Message-ID: <4F572D50.3050500@unitedgames.com>
Date: Wed, 07 Mar 2012 10:41:36 +0100
From: Stefan Reek <stefan@unitedgames.com>
User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US;
 rv:1.9.1.16) Gecko/20110307 Iceowl/1.0b1 Icedove/3.0.11
MIME-Version: 1.0
To: user@cassandra.apache.org
Subject: Re: Old data coming alive after adding node
References: <4F55D522.7080807@unitedgames.com>
 <745045D3-5C52-4B71-BFF4-2B14EEBC6EFA@thelastpickle.com>
 <4F55E362.3080307@unitedgames.com>
 <0679AE72-BB02-4B19-ADE3-6DE95932E35F@thelastpickle.com>
In-Reply-To: <0679AE72-BB02-4B19-ADE3-6DE95932E35F@thelastpickle.com>
Content-Type: multipart/alternative;
 boundary="------------030505020104040105020301"

This is a multi-part message in MIME format.
--------------030505020104040105020301
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit

After the old data came up we were able to delete it again. And it is 
stable now.
We are in the process of upgrading to 1.0, but as you said that's a 
painful process.
I just hope 0.6 will keep running till we're done with the upgrade.
Anyway thanks for the help.

Cheers,

Stefan


On 03/06/2012 07:02 PM, aaron morton wrote:
>> All our writes/deletes are done with CL.QUORUM.
>> Our reads are done with CL.ONE. Although the reads that confirmed the 
>> old data were done with CL.QUORUM.
> mmmm
>
>> According to 
>> https://svn.apache.org/viewvc/cassandra/branches/cassandra-0.6/CHANGES.txt 0.6.6 
>> has the same patch
>> for (CASSANDRA-1074) as 0.7 and so I assumed that minor compactions 
>> in 0.6.6 and up also purged tombstones.
> My bad. As you were.
>
> After the repair did the un-deleted data remain un-deleted ? Are you 
> back to a stable situation ?
>
> Without a lot more detail I am at a bit of a loss.
>
> I know it's painful but migrating to 1.0 *really* will make your life 
> so much easier and faster. At some point you may hit a bug or a 
> problem in 0.6 and the solution may be to upgrade, quickly.
>
> Cheers
>
> -----------------
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 6/03/2012, at 11:13 PM, Stefan Reek wrote:
>
>> Hi Aaron,
>>
>> Thanks for the quick reply.
>> All our writes/deletes are done with CL.QUORUM.
>> Our reads are done with CL.ONE. Although the reads that confirmed the 
>> old data were done with CL.QUORUM.
>> According to 
>> https://svn.apache.org/viewvc/cassandra/branches/cassandra-0.6/CHANGES.txt 
>> 0.6.6 has the same patch
>> for (CASSANDRA-1074) as 0.7 and so I assumed that minor compactions 
>> in 0.6.6 and up also purged tombstones.
>> The only suspicious thing I noticed was that after adding the fourth 
>> node repairs became extremely slow and heavy.
>> Running it degraded the performance of the whole cluster and the new 
>> node even went OOM when running it.
>>
>> Cheers,
>>
>> Stefan
>>
>> On 03/06/2012 10:51 AM, aaron morton wrote:
>>>> After we added a fourth node, keeping RF=3, some old data appeared 
>>>> in the database.
>>> What CL are you working at ? (Should not matter too much with repair 
>>> working, just asking)
>>>
>>>
>>>> We don't run compact on the nodes explicitly as I understand that 
>>>> running repair will trigger a
>>>> major compaction. I'm not entirely sure if it does so, but in any 
>>>> case the tombstones will be removed by a minor
>>>> compaction.
>>> In 0.6.x tombstones were only purged during a major / manual 
>>> compaction. Purging during minor compaction came in during 0.7
>>> https://github.com/apache/cassandra/blob/trunk/CHANGES.txt#L1467
>>>
>>>> Can anyone think of any reason why the old data reappeared?
>>> It sounds like you are doing things correctly. The complicating 
>>> factor is 0.6 is so very old.
>>>
>>>
>>> If I wanted to poke around some more I would conduct reads as CL one 
>>> against nodes and see if they return the "deleted" data or not. This 
>>> would help me understand if the tombstone is still out there.
>>>
>>> I would also poke around a lot in the logs to make sure repair was 
>>> running as expected and completing. If you find anything suspicious 
>>> post examples.
>>>
>>> Finally I would ensure CL QUROUM was been used.
>>>
>>> Hope that helps.
>>>
>>>
>>> -----------------
>>> Aaron Morton
>>> Freelance Developer
>>> @aaronmorton
>>> http://www.thelastpickle.com <http://www.thelastpickle.com/>
>>>
>>> On 6/03/2012, at 10:13 PM, Stefan Reek wrote:
>>>
>>>> Hi,
>>>>
>>>> We were running a 3-node cluster of cassandra 0.6.13 with RF=3.
>>>> After we added a fourth node, keeping RF=3, some old data appeared 
>>>> in the database.
>>>> As far as I understand this can only happen if nodetool repair 
>>>> wasn't run for more than GCGraceSeconds.
>>>> Our GCGraceSeconds is set to the default of 10 days (864000 seconds).
>>>> We have  a scheduled cronjob to run repair once each week on every 
>>>> node, each on another day.
>>>> I'm sure that none of the nodes ever skipped running a repair.
>>>> We don't run compact on the nodes explicitly as I understand that 
>>>> running repair will trigger a
>>>> major compaction. I'm not entirely sure if it does so, but in any 
>>>> case the tombstones will be removed by a minor
>>>> compaction. So I expected that the reappearing data, which is a 
>>>> couple of months old in some cases, was long gone
>>>> by the time we added the node.
>>>>
>>>> Can anyone think of any reason why the old data reappeared?
>>>>
>>>> Stefan
>>>
>>
>


--------------030505020104040105020301
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
  <meta content="text/html; charset=ISO-8859-1"
 http-equiv="Content-Type">
  <title></title>
</head>
<body bgcolor="#ffffff" text="#000000">
After the old data came up we were able to delete it again. And it is
stable now.<br>
We are in the process of upgrading to 1.0, but as you said that's a
painful process.<br>
I just hope 0.6 will keep running till we're done with the upgrade.<br>
Anyway thanks for the help.<br>
<br>
Cheers,<br>
<br>
Stefan<br>
<br>
<br>
<br>
On 03/06/2012 07:02 PM, aaron morton wrote:
<blockquote
 cite="mid:0679AE72-BB02-4B19-ADE3-6DE95932E35F@thelastpickle.com"
 type="cite">
  <blockquote type="cite">
    <div bgcolor="#ffffff" text="#000000">All our writes/deletes are
done with CL.QUORUM.<br>
Our reads are done with CL.ONE. Although the reads that confirmed the
old data were done with CL.QUORUM.</div>
  </blockquote>
mmmm
  <div><br>
  <blockquote type="cite">
    <div bgcolor="#ffffff" text="#000000">According to&nbsp;<a
 moz-do-not-send="true" class="moz-txt-link-freetext"
 href="https://svn.apache.org/viewvc/cassandra/branches/cassandra-0.6/CHANGES.txt">https://svn.apache.org/viewvc/cassandra/branches/cassandra-0.6/CHANGES.txt</a>&nbsp;0.6.6
has the same patch<br>
for (CASSANDRA-1074) as 0.7 and so I assumed that minor compactions in
0.6.6 and up also purged tombstones.</div>
  </blockquote>
My bad. As you were.&nbsp;
  <div><br>
  </div>
  <div>After the repair did the un-deleted data remain un-deleted ? Are
you back to a stable situation ?&nbsp;</div>
  <div><br>
  </div>
  <div>Without a lot more detail I am at a bit of a loss.&nbsp;</div>
  <div><br>
  </div>
  <div>I know it's painful but migrating to 1.0 *really* will make your
life so much easier and faster. At some point you may hit a bug or a
problem in 0.6 and the solution may be to upgrade, quickly.</div>
  <div><br>
  </div>
  <div>Cheers</div>
  <div><br>
  <div apple-content-edited="true"><span class="Apple-style-span"
 style="border-collapse: separate; color: rgb(0, 0, 0); font-family: Helvetica; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: 2; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; font-size: medium;"><span
 class="Apple-style-span"
 style="border-collapse: separate; color: rgb(0, 0, 0); font-family: Helvetica; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: 2; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; font-size: medium;">
  <div style="word-wrap: break-word;"><span class="Apple-style-span"
 style="border-collapse: separate; color: rgb(0, 0, 0); font-family: Helvetica; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: 2; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; font-size: medium;">
  <div style="word-wrap: break-word;"><span class="Apple-style-span"
 style="border-collapse: separate; color: rgb(0, 0, 0); font-family: Helvetica; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: 2; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; font-size: medium;">
  <div style="word-wrap: break-word;">
  <div>
  <div>-----------------</div>
  <div>Aaron Morton</div>
  <div>Freelance Developer</div>
  <div>@aaronmorton</div>
  <div><a moz-do-not-send="true" href="http://www.thelastpickle.com">http://www.thelastpickle.com</a></div>
  </div>
  </div>
  </span></div>
  </span></div>
  </span></span></div>
  <br>
  <div>
  <div>On 6/03/2012, at 11:13 PM, Stefan Reek wrote:</div>
  <br class="Apple-interchange-newline">
  <blockquote type="cite">
    <meta content="text/html; charset=ISO-8859-1"
 http-equiv="Content-Type">
    <div bgcolor="#ffffff" text="#000000">
Hi Aaron,<br>
    <br>
Thanks for the quick reply.<br>
All our writes/deletes are done with CL.QUORUM.<br>
Our reads are done with CL.ONE. Although the reads that confirmed the
old data were done with CL.QUORUM.<br>
According to
    <a moz-do-not-send="true" class="moz-txt-link-freetext"
 href="https://svn.apache.org/viewvc/cassandra/branches/cassandra-0.6/CHANGES.txt">https://svn.apache.org/viewvc/cassandra/branches/cassandra-0.6/CHANGES.txt</a>
0.6.6 has the same patch<br>
for (CASSANDRA-1074) as 0.7 and so I assumed that minor compactions in
0.6.6 and up also purged tombstones.<br>
The only suspicious thing I noticed was that after adding the fourth
node repairs became extremely slow and heavy.<br>
Running it degraded the performance of the whole cluster and the new
node even went OOM when running it.<br>
    <br>
Cheers,<br>
    <br>
Stefan<br>
    <br>
On 03/06/2012 10:51 AM, aaron morton wrote:
    <blockquote
 cite="mid:745045D3-5C52-4B71-BFF4-2B14EEBC6EFA@thelastpickle.com"
 type="cite">
      <div>
      <blockquote type="cite">
        <div>After we added a fourth node, keeping RF=3, some old data
appeared in the database.</div>
      </blockquote>
What CL are you working at ? (Should not matter too much with repair
working, just asking)</div>
      <div><br>
      </div>
      <div><br>
      <blockquote type="cite">
        <div>We don't run compact on the nodes explicitly as I
understand
that running repair will trigger a<br>
major compaction. I'm not entirely sure if it does so, but in any case
the tombstones will be removed by a minor<br>
compaction.</div>
      </blockquote>
In 0.6.x tombstones were only purged during a major / manual
compaction. Purging during minor compaction came in during 0.7</div>
      <a moz-do-not-send="true"
 href="https://github.com/apache/cassandra/blob/trunk/CHANGES.txt#L1467">https://github.com/apache/cassandra/blob/trunk/CHANGES.txt#L1467</a>
      <div><br>
      </div>
      <div>
      <blockquote type="cite">
        <div>Can anyone think of any reason why the old data reappeared?<br>
        </div>
      </blockquote>
      <div>It sounds like you are doing things correctly. The
complicating
factor is 0.6 is so very old.&nbsp;</div>
      <div><br>
      </div>
      <div><br>
      </div>
      <div>If I wanted to poke around some more I would conduct reads
as CL
one against nodes and see if they return the "deleted" data or not.
This would help me understand if the tombstone is still out there.&nbsp;</div>
      <div><br>
      </div>
      <div>I would also poke around a lot in the logs to make sure
repair
was running as expected and completing. If you find anything suspicious
post examples.&nbsp;</div>
      <div><br>
      </div>
      <div>Finally I would ensure CL QUROUM was been used.&nbsp;</div>
      <div><br>
      </div>
      <div>Hope that helps.</div>
      <br>
      <br>
      <div apple-content-edited="true"><span class="Apple-style-span"
 style="border-collapse: separate; font-family: Helvetica; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: 2; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; font-size: medium;"><span
 class="Apple-style-span"
 style="border-collapse: separate; font-family: Helvetica; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: 2; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; font-size: medium;">
      <div style="word-wrap: break-word;"><span class="Apple-style-span"
 style="border-collapse: separate; font-family: Helvetica; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: 2; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; font-size: medium;">
      <div style="word-wrap: break-word;"><span class="Apple-style-span"
 style="border-collapse: separate; font-family: Helvetica; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: 2; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; font-size: medium;">
      <div style="word-wrap: break-word;">
      <div>
      <div>-----------------</div>
      <div>Aaron Morton</div>
      <div>Freelance Developer</div>
      <div>@aaronmorton</div>
      <div><a moz-do-not-send="true"
 href="http://www.thelastpickle.com/">http://www.thelastpickle.com</a></div>
      </div>
      </div>
      </span></div>
      </span></div>
      </span></span></div>
      <br>
      <div>
      <div>On 6/03/2012, at 10:13 PM, Stefan Reek wrote:</div>
      <br class="Apple-interchange-newline">
      <blockquote type="cite">
        <div>Hi,<br>
        <br>
We were running a 3-node cluster of cassandra 0.6.13 with RF=3.<br>
After we added a fourth node, keeping RF=3, some old data appeared in
the database.<br>
As far as I understand this can only happen if nodetool repair wasn't
run for more than GCGraceSeconds.<br>
Our GCGraceSeconds is set to the default of 10 days (864000 seconds).<br>
We have &nbsp;a scheduled cronjob to run repair once each week on every
node, each on another day.<br>
I'm sure that none of the nodes ever skipped running a repair.<br>
We don't run compact on the nodes explicitly as I understand that
running repair will trigger a<br>
major compaction. I'm not entirely sure if it does so, but in any case
the tombstones will be removed by a minor<br>
compaction. So I expected that the reappearing data, which is a couple
of months old in some cases, was long gone<br>
by the time we added the node.<br>
        <br>
Can anyone think of any reason why the old data reappeared?<br>
        <br>
Stefan<br>
        </div>
      </blockquote>
      </div>
      <br>
      </div>
    </blockquote>
    <br>
    </div>
  </blockquote>
  </div>
  <br>
  </div>
  </div>
</blockquote>
<br>
</body>
</html>

--------------030505020104040105020301--