cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From daemeon reiydelle <daeme...@gmail.com>
Subject Re: Questions about anti-entropy repair
Date Wed, 20 Jul 2016 15:31:23 GMT
I don't know if my perspective on this will assist, so YMMV:

Summary

   1. Nodetool repairs are required when a node has issues and can't get
   its (e.g. hinted handoff) resync done: culprit: usually network, sometimes
   container/vm, rarely disk.
   2. Scripts to do partition range are a pain to maintain, and you have to
   be CONSTANTLY checking for new keyspaces, parsing them, etc. Git hub
   project?
   3. Monitor/monitor/monitor: if you do a best practices job of actually
   monitoring the FULL stack, you only need to do repairs when the world goes
   south.
   4. Are you alerted when errors show up in the logs, network goes wacky,
   etc? No? then you have to CYA by doing hail mary passes with periodic
   nodetool repairs.
   5. Nodetool repair is a CYA for a cluster whose status is not well
   monitored.

Daemeon's thoughts:

Nodetool repair is not required for a cluster that is and "always has been"
in a known good state. Monitoring of the relevant logs/network/disk/etc. is
the only way that I know of to assure this state. Because (e.g. AWS, and
EVERY ONE OF my clients' infrastructures: screwed up networks) nodes can
disappear then the cluster *can* get overloaded (network traffic) causing
hinted handoffs to have all of the worst case corner cases you can never
hope to see.

So, if you have good monitoring in place to assure that there is known good
cluster behaviour (network, disk, etc.), repairs are not required until you
are alerted that a cluster health problem has occurred. Partition range
repair is a pain in various parts of the anatomy because one has to
CONSTANTLY be updating the scripts that generate the commands (I have not
seen a git hub project around this, would love to see responses that point
them out!).



*.......*



*Daemeon C.M. ReiydelleUSA (+1) 415.501.0198London (+44) (0) 20 8144 9872*

On Wed, Jul 20, 2016 at 4:33 AM, Alain RODRIGUEZ <arodrime@gmail.com> wrote:

> Hi Satoshi,
>
>
>> Q1:
>> According to the DataStax document, it's recommended to run full repair
>> weekly or monthly. Is it needed even if repair with partitioner range
>> option ("nodetool repair -pr", in C* v2.2+) is set to run periodically for
>> every node in the cluster?
>>
>
> More accurately you need to run a repair for each node and each table
> within the gc_grace_seconds value defined at the table level to ensure no
> deleted data will return. Also running this on a regular basis ensure a
> constantly low entropy in your cluster, allowing better consistency (if not
> using a strong consistency like with CL.R&W = quorum).
>
> A full repair means every piece of data have been repaired. On a 3 node
> cluster with RF=3, running 'nodetool repair -pr' on the 3 nodes or
> 'nodetool repair' on one node are an equivalent "full repair". The best
> approach is often to run repair with '-pr' on all the nodes indeed. This is
> a full repair.
>
> Is it a good practice to repair a node without using non-repaired
>> snapshots when I want to restore a node because repair process is too slow?
>
>
> I am sorry, this is unclear to me. But from this "actually 1GB data is
> updated because the snapshot is already repaired" I understand you are
> using incremental repairs (or that you think that Cassandra repair uses it
> by default, which is not the case in your version).
> http://www.datastax.com/dev/blog/more-efficient-repairs
>
> Also, be aware that repair is a PITA for all the operators using
> Cassandra, that lead to many tries to improve things:
>
> Range repair: https://github.com/BrianGallew/cassandra_range_repair
> Reaper: https://github.com/spotify/cassandra-reaper
> Ticket to automatically schedule / handle repairs in Cassandra:
> https://issues.apache.org/jira/browse/CASSANDRA-10070
> Ticket to switch to Mutation Based Repairs (MBR):
> https://issues.apache.org/jira/browse/CASSANDRA-8911
>
> And probably many more... There is a lot to read and try, repair is an
> important yet non trivial topic for any Cassandra operator.
>
> C*heers,
> -----------------------
> Alain Rodriguez - alain@thelastpickle.com
> France
>
> The Last Pickle - Apache Cassandra Consulting
> http://www.thelastpickle.com
>
>
>
>
> 2016-07-14 9:41 GMT+02:00 Satoshi Hikida <sahikida@gmail.com>:
>
>> Hi,
>>
>> I have two questions about anti-entropy repair.
>>
>> Q1:
>> According to the DataStax document, it's recommended to run full repair
>> weekly or monthly. Is it needed even if repair with partitioner range
>> option ("nodetool repair -pr", in C* v2.2+) is set to run periodically for
>> every node in the cluster?
>>
>> References:
>> - DataStax, "When to run anti-entropy repair",
>> http://docs.datastax.com/en/cassandra/2.2/cassandra/operations/opsRepairNodesWhen.html
>>
>>
>> Q2:
>> Is it a good practice to repair a node without using non-repaired
>> snapshots when I want to restore a node because repair process is too slow?
>>
>> I've done some simple verifications for anti-entropy repair and found out
>> that the repair process spends too much time than simply transferring the
>> replica data from existing nodes to restoring node.
>>
>> My verification settings are as following:
>>
>> - 3 node cluster (N1, N2, N3)
>> - 2 CPUs, 8GB memory, 500GB HDD for each node
>> - Replication Factor is 3
>> - C* version is 2.2.6
>> - CS is LCS
>>
>> And I prepared test data as following:
>>
>> - a snapshot (10GB, full repaired) for N1, N2, N3.
>> - 1GB SSTables (by using incremental backup) for N1, N2, N3.
>> - another 1GB SSTables for N1, N2
>>
>> I've measured repair time for two cases.
>>
>> - Case 1: repair N3 with the snapshot and 1GB SStables
>> - Case 2: repair N3 with the snapshot only
>>
>> In case 1, N3 is needed to repair 12GB (actually 1GB data is updated
>> because the snapshot is already repaired) and received 1GB data from N1 or
>> N2. Whereas in case 2, N3 is needed to repair 12GB (actually just compare
>> merkle tree for 10GB) and received 2GB data from N1 or N2.
>>
>> The result showed that case 2 was faster than case 1 (case 1: 6889sec,
>> case 2: 4535sec). I guess the repair process is very slow and it would be
>> better to repair a node without (non repaired) backed up (snapshot or
>> incremental backup) files if the other replica nodes exists.
>>
>> So... I guess if I just have non-repaired backups, what's the point of
>> using them? Looks like there's no merit... Am I missing something?
>>
>> Regards,
>> Satoshi
>>
>
>

Mime
View raw message