cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ryan Svihla ...@foundev.pro>
Subject Re: Questions about anti-entropy repair
Date Fri, 22 Jul 2016 18:36:50 GMT
I would say only repairing when there is a known problem has a couple of logical issues off
the top of my head:

1. you're assuming hints are successfully delivering within their time window. There isn't
really any indication that I've ever found myself.
2. unless you're using CL ALL you really have no indication if the other replicas not needed
didn't succeed a write on the initial attempt.

Now if you're using CL LOCAL_QUORUM you'll have reasonable consistency and chances are pretty
good that you eventually hit your RF anyway with read_repair, so I get the thought process
behind what you're saying Daemeon.

Likewise, I've seen well sized clusters with steady good workloads in general behave pretty
well and not need to stream a lot of data during repair, but because of 1 and 2 even with
good monitoring that's a bit "running with scissors" for my taste as I'm not confident there
is enough monitoring coverage that'll ever guarantee you're "mostly meeting RF" or not.

Running repair within gc_grace_seconds should be something you can handle anyway with your
workload or you're not sized correctly (else what happens when you need to run repair after
a major event?), so why not just keep it running.

YMMV and if someone has kept their cluster up and running and know all the stuff to look for
Kudos. I still view it as a cheap cost to CYA and even working with Cassandra now for 3 years
in a wide variety of pretty crazy situations I'm not confident I could keep a cluster healthy
without running repair consistently.

regards,

Ryan Svihla

On Jul 20, 2016, 10:32 AM -0500, daemeon reiydelle <daemeonr@gmail.com>, wrote:
> I don't know if my perspective on this will assist, so YMMV:
>
> Summary
> Nodetool repairs are required when a node has issues and can't get its (e.g. hinted handoff)
resync done: culprit: usually network, sometimes container/vm, rarely disk.
> Scripts to do partition range are a pain to maintain, and you have to be CONSTANTLY checking
for new keyspaces, parsing them, etc. Git hub project?
> Monitor/monitor/monitor: if you do a best practices job of actually monitoring the FULL
stack, you only need to do repairs when the world goes south.
> Are you alerted when errors show up in the logs, network goes wacky, etc? No? then you
have to CYA by doing hail mary passes with periodic nodetool repairs.
> Nodetool repair is a CYA for a cluster whose status is not well monitored.
> Daemeon's thoughts:
>
> Nodetool repair is not required for a cluster that is and "always has been" in a known
good state. Monitoring of the relevant logs/network/disk/etc. is the only way that I know
of to assure this state. Because (e.g. AWS, and EVERY ONE OF my clients' infrastructures:
screwed up networks) nodes can disappear then the cluster *can* get overloaded (network traffic)
causing hinted handoffs to have all of the worst case corner cases you can never hope to see.
>
> So, if you have good monitoring in place to assure that there is known good cluster behaviour
(network, disk, etc.), repairs are not required until you are alerted that a cluster health
problem has occurred. Partition range repair is a pain in various parts of the anatomy because
one has to CONSTANTLY be updating the scripts that generate the commands (I have not seen
a git hub project around this, would love to see responses that point them out!).
>
>
>
> .......
>
> Daemeon C.M. Reiydelle
> USA (+1) 415.501.0198 (tel:(+1)%20415.501.0198)
> London (+44) (0) 20 8144 9872 (tel:(+44)%2020%208144%209872)
>
> On Wed, Jul 20, 2016 at 4:33 AM, Alain RODRIGUEZ <arodrime@gmail.com (mailto:arodrime@gmail.com)>
wrote:
> > Hi Satoshi,
> >
> > > Q1:
> > > According to the DataStax document, it's recommended to run full repair weekly
or monthly. Is it needed even if repair with partitioner range option ("nodetool repair -pr",
in C* v2.2+) is set to run periodically for every node in the cluster?
> >
> >
> > More accurately you need to run a repair for each node and each table within the
gc_grace_seconds value defined at the table level to ensure no deleted data will return. Also
running this on a regular basis ensure a constantly low entropy in your cluster, allowing
better consistency (if not using a strong consistency like with CL.R&W = quorum).
> >
> > A full repair means every piece of data have been repaired. On a 3 node cluster
with RF=3, running 'nodetool repair -pr' on the 3 nodes or 'nodetool repair' on one node are
an equivalent "full repair". The best approach is often to run repair with '-pr' on all the
nodes indeed. This is a full repair.
> >
> > > Is it a good practice to repair a node without using non-repaired snapshots
when I want to restore a node because repair process is too slow?
> >
> > I am sorry, this is unclear to me. But from this "actually 1GB data is updated because
the snapshot is already repaired" I understand you are using incremental repairs (or that
you think that Cassandra repair uses it by default, which is not the case in your version).
http://www.datastax.com/dev/blog/more-efficient-repairs
> >
> > Also, be aware that repair is a PITA for all the operators using Cassandra, that
lead to many tries to improve things:
> >
> > Range repair: https://github.com/BrianGallew/cassandra_range_repair
> > Reaper: https://github.com/spotify/cassandra-reaper
> > Ticket to automatically schedule / handle repairs in Cassandra: https://issues.apache.org/jira/browse/CASSANDRA-10070
> > Ticket to switch to Mutation Based Repairs (MBR): https://issues.apache.org/jira/browse/CASSANDRA-8911
> >
> > And probably many more... There is a lot to read and try, repair is an important
yet non trivial topic for any Cassandra operator.
> >
> > C*heers,
> > -----------------------
> > Alain Rodriguez - alain@thelastpickle.com (mailto:alain@thelastpickle.com)
> > France
> >
> > The Last Pickle - Apache Cassandra Consulting
> > http://www.thelastpickle.com
> >
> >
> >
> >
> >
> > 2016-07-14 9:41 GMT+02:00 Satoshi Hikida <sahikida@gmail.com (mailto:sahikida@gmail.com)>:
> > > Hi,
> > >
> > > I have two questions about anti-entropy repair.
> > >
> > > Q1:
> > > According to the DataStax document, it's recommended to run full repair weekly
or monthly. Is it needed even if repair with partitioner range option ("nodetool repair -pr",
in C* v2.2+) is set to run periodically for every node in the cluster?
> > >
> > > References:
> > > - DataStax, "When to run anti-entropy repair", http://docs.datastax.com/en/cassandra/2.2/cassandra/operations/opsRepairNodesWhen.html
> > >
> > >
> > > Q2:
> > > Is it a good practice to repair a node without using non-repaired snapshots
when I want to restore a node because repair process is too slow?
> > >
> > > I've done some simple verifications for anti-entropy repair and found out that
the repair process spends too much time than simply transferring the replica data from existing
nodes to restoring node.
> > >
> > > My verification settings are as following:
> > >
> > > - 3 node cluster (N1, N2, N3)
> > > - 2 CPUs, 8GB memory, 500GB HDD for each node
> > > - Replication Factor is 3
> > > - C* version is 2.2.6
> > > - CS is LCS
> > >
> > > And I prepared test data as following:
> > >
> > > - a snapshot (10GB, full repaired) for N1, N2, N3.
> > > - 1GB SSTables (by using incremental backup) for N1, N2, N3.
> > > - another 1GB SSTables for N1, N2
> > >
> > > I've measured repair time for two cases.
> > >
> > > - Case 1: repair N3 with the snapshot and 1GB SStables
> > > - Case 2: repair N3 with the snapshot only
> > >
> > > In case 1, N3 is needed to repair 12GB (actually 1GB data is updated because
the snapshot is already repaired) and received 1GB data from N1 or N2. Whereas in case 2,
N3 is needed to repair 12GB (actually just compare merkle tree for 10GB) and received 2GB
data from N1 or N2.
> > >
> > > The result showed that case 2 was faster than case 1 (case 1: 6889sec, case
2: 4535sec). I guess the repair process is very slow and it would be better to repair a node
without (non repaired) backed up (snapshot or incremental backup) files if the other replica
nodes exists.
> > >
> > > So... I guess if I just have non-repaired backups, what's the point of using
them? Looks like there's no merit... Am I missing something?
> > >
> > > Regards,
> > > Satoshi
> > >
> >
> >
> >
>

Mime
View raw message