incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Coli <rc...@eventbrite.com>
Subject Re: How often to run `nodetool repair`
Date Thu, 01 Aug 2013 21:07:54 GMT
On Thu, Aug 1, 2013 at 1:16 PM, Andrey Ilinykh <ailinykh@gmail.com> wrote:

>
> On Thu, Aug 1, 2013 at 12:26 PM, Robert Coli <rcoli@eventbrite.com> wrote:
>
>> TTL is effectively DELETE; you need to run a repair once every
>> gc_grace_seconds. If you don't, data might un-delete itself.
>>
>
> How is it possible? Every replica has TTL, so it when it expires every
> replica has tombstone. I don't see how you can get data with no tombstone.
> What do I miss?
>

I knew I had heard of cases where repair is required despite TTL, but
didn't recall the specifics. Thanks for the opportunity to go look it up...

http://comments.gmane.org/gmane.comp.db.cassandra.user/21008

quoting Sylvain Lebresne :
"
The initial question was about "can I use inserting with ttl=1 instead of
issuing deletes", ***so that would be a case where you do shadow a previous
version with a very small ttl and so repair is important.*** (EMPHASIS
rcoli)

But you're right that if you only issue data with expiration (no deletes)
and
that you
  * either do not overwrite columns
  * or are sure that when you do overwrite, the value you're overwriting has
     a ttl that is lesser or equal than the ttl of the value you're
overwriting with
     (+gc_grace to be precise)
then yes, ***repair is not necessary because you can't have shadowed value
resurfacing.*** (EMPHASIS rcoli)
"

So, to be more precise with my initial statement :

"TTL is like DELETE in some cases, so unless you are certain that you are
not (and will not be) in those cases, you should run repair when using TTL."

Also you will be unable to repair entire keyspaces, you will have to repair
on a per column family basis, manually excluding CFs matching these
criteria, increasing management complexity.

=Rob

Mime
View raw message