cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Viktor Jevdokimov <Viktor.Jevdoki...@adform.com>
Subject RE: nodetool repair -pr enough in this scenario?
Date Tue, 05 Jun 2012 09:19:34 GMT
But in any case, repair is a two way process?
I mean that repair without -PR on node N1 will repair N1 and N2 and N3, because N2 is a replica
of N1 range and N1 is a replica of N3 range?
And if there're more ranges, that not belongs to N1, that ranges and nodes will not be repaired?


Am I understood correctly, that "repair" with or without -PR is not a "repair selected node"
process, but "synchronize data range(s) between replicas" process?
Single DC scenario:
With -PR: synchronize data for only primary data range of selected node between all nodes
for that range (max number of nodes for the range = RF).
Without -PR: synchronize data for all data ranges of selected node (primary and replica) between
all nodes of that ranges (max number of nodes for the ranges = RF*RF). Not effective since
ranges overlaps, the same ranges will be synchronized more than once (max = RF times).
Multiple DC with 100% data range in each DC scenario: the same, only RF = sum of RF from all
DC's.
Is that correct?

Finally - is this process for SSTables only, excluding memtables and hints?





Best regards / Pagarbiai
Viktor Jevdokimov
Senior Developer

Email: Viktor.Jevdokimov@adform.com<mailto:Viktor.Jevdokimov@adform.com>
Phone: +370 5 212 3063, Fax +370 5 261 0453
J. Jasinskio 16C, LT-01112 Vilnius, Lithuania
Follow us on Twitter: @adforminsider<http://twitter.com/#!/adforminsider>
What is Adform: watch this short video<http://vimeo.com/adform/display>

[Adform News] <http://www.adform.com>


Disclaimer: The information contained in this message and attachments is intended solely for
the attention and use of the named addressee and may be confidential. If you are not the intended
recipient, you are reminded that the information remains the property of the sender. You must
not use, disclose, distribute, copy, print or rely on this e-mail. If you have received this
message in error, please contact the sender immediately and irrevocably delete this message
and any copies.

From: Sylvain Lebresne [mailto:sylvain@datastax.com]
Sent: Tuesday, June 05, 2012 11:02
To: user@cassandra.apache.org
Subject: Re: nodetool repair -pr enough in this scenario?

On Tue, Jun 5, 2012 at 8:44 AM, Viktor Jevdokimov <Viktor.Jevdokimov@adform.com<mailto:Viktor.Jevdokimov@adform.com>>
wrote:
Understand simple mechanics first, decide how to act later.

Without -PR there's no difference from which host to run repair, it runs for the whole 100%
range, from start to end, the whole cluster, all nodes, at once.

That's not exactly true. A repair without -pr will repair all the ranges of the node on which
repair is ran. So it will only repair the ranges that the node is a replica for. It will *not*
repair the whole cluster (unless the replication factor is equal to the number of nodes in
the cluster but that's a degenerate case). And hence it does matter on which host repair is
run (it always matter, whether you use -pr or not).

In general you want to use repair without -pr in case where you want to repair a specific
node. Typically, if a node was dead for a reasonably long time, you may want to run a repair
(without -pr) on that specific node to have him catch up faster (faster that if you were only
relying on read-repair and hinted-handoff).

For repairing a whole cluster, as is the case for the weekly scheduled repairs in the initial
question, you want to use -rp. You *do not* want to use repair without -pr in that case. You
do not because for that task using -pr is more efficient (and to be clear, not using -pr won't
cause problems, but it does is less efficient).

--
Sylvain



With -PR it runs only for a primary range of a node you are running a repair.
Let say you have simple ring of 3 nodes with RF=2 and ranges (per node) N1=C-A, N2=A-B, N3=B-C
(node tokens are N1=A, N2=B, N3=C). No rack, no DC aware.
So running repair with -PR on node N2 will only repair a range A-B, for which node N2 is a
primary and N3 is a backup. N2 and N3 will synchronize A-B range one with other. For other
ranges you need to run on other nodes.

Without -PR running on any node will repair all ranges, A-B, B-C, C-A. A node you run a repair
without -PR is just a repair coordinator, so no difference, which one will be next time.



Best regards / Pagarbiai
Viktor Jevdokimov
Senior Developer

Email: Viktor.Jevdokimov@adform.com<mailto:Viktor.Jevdokimov@adform.com>
Phone: +370 5 212 3063<tel:%2B370%205%20212%203063>, Fax +370 5 261 0453<tel:%2B370%205%20261%200453>
J. Jasinskio 16C, LT-01112 Vilnius, Lithuania
Follow us on Twitter: @adforminsider<http://twitter.com/#!/adforminsider>
What is Adform: watch this short video<http://vimeo.com/adform/display>

[Adform News]<http://www.adform.com>


Disclaimer: The information contained in this message and attachments is intended solely for
the attention and use of the named addressee and may be confidential. If you are not the intended
recipient, you are reminded that the information remains the property of the sender. You must
not use, disclose, distribute, copy, print or rely on this e-mail. If you have received this
message in error, please contact the sender immediately and irrevocably delete this message
and any copies.

From: David Daeschler [mailto:david.daeschler@gmail.com<mailto:david.daeschler@gmail.com>]
Sent: Tuesday, June 05, 2012 08:59
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: nodetool repair -pr enough in this scenario?

Hello,

Currently I have a 4 node cassandra cluster on CentOS64. I have been running nodetool repair
(no -pr option) on a weekly schedule like:

Host1: Tue, Host2: Wed, Host3: Thu, Host4: Fri

In this scenario, if I were to add the -pr option, would this still be sufficient to prevent
forgotten deletes and properly maintain consistency?

Thank you,
- David


Mime
View raw message