cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sylvain Lebresne <sylv...@datastax.com>
Subject Re: repair takes 10x more time in one DC compared to the other
Date Wed, 25 Jun 2014 16:48:51 GMT
I see. Well, you shouldn't use both "-local" and "-pr" together, they don't
make sense together. Which is the reason why their combination will be
rejected in 2.0.9 (you can check
https://issues.apache.org/jira/browse/CASSANDRA-7317 for details).
Basically, the result of using both is that lots of stuffs don't get
repaired.


On Wed, Jun 25, 2014 at 6:11 PM, Paulo Ricardo Motta Gomes <
paulo.motta@chaordicsystems.com> wrote:

> Thanks for the explanation, but I got slightly confused:
>
> From my understanding, you just described the behavior of the
> -pr/--partitioner-range option: "Repair only the first range returned by
> the partitioner for the node." , so I would understand that repairs in the
> same CFs in different DCs with only the -pr option could take different
> times.
>
> However according to the description of the -local/--in-local-dc option,
> it "only repairs against nodes in the same data center", but you said that "the
> range will be repaired for all replica in all data-centers", even with the
> "-local" option, or did you confuse it with "-pr" option?
>
> In any case, I'm using both "-local" and "-pr" options, what is the
> expected behavior in that case?
>
> Cheers,
>
>
>
> On Wed, Jun 25, 2014 at 12:46 PM, Sylvain Lebresne <sylvain@datastax.com>
> wrote:
>
>> TL;DR, this is not unexpected and this is perfectly fine.
>>
>> For every node, 'repair --local' will repair the "primary" (where primary
>> means "the first range on the ring picked by the consistent hashing for
>> this node given its token", nothing more) range of the node in the ring.
>> And that range will be repaired for all replica in all data-centers. When
>> you assign tokens to multiple DC, it's actually pretty common to offset the
>> tokens of one DC slightly compared to the other one. This will result in
>> the "primary" ranges being always small in one DC but not the other. But
>> please note that this perfectly ok, it does not imply any imbalance in
>> data-centers. It also don't really mean that the node of one DC actually do
>> a lot more work than the other ones: all nodes most likely contribute
>> roughly the same amount of work to the repair. It only mean that the nodes
>> of one DC "coordinate" more repair work that those of the other DC. Which
>> is not really a big deal since coordinating a repair is cheap.
>>
>> --
>> Sylvain
>>
>>
>> On Wed, Jun 25, 2014 at 4:43 PM, Paulo Ricardo Motta Gomes <
>> paulo.motta@chaordicsystems.com> wrote:
>>
>>> Hello,
>>>
>>> I'm running repair on a large CF with the "--local" flag in 2 different
>>> DCs. In one of the DCs the operation takes about 1 hour per node, while in
>>> the other it takes 10 hours per node.
>>>
>>> I would expect the times to differ, but not so much. The writes on that
>>> CF all come from the DC where it takes 10 hours per node, could this be the
>>> cause why it takes so long on this DC?
>>>
>>> Additional info: C* 1.2.16, both DCs have the same replication factor.
>>>
>>> Cheers,
>>>
>>> --
>>> *Paulo Motta*
>>>
>>> Chaordic | *Platform*
>>> *www.chaordic.com.br <http://www.chaordic.com.br/>*
>>> +55 48 3232.3200
>>>
>>
>>
>
>
> --
> *Paulo Motta*
>
> Chaordic | *Platform*
> *www.chaordic.com.br <http://www.chaordic.com.br/>*
> +55 48 3232.3200
>

Mime
View raw message