cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bryan Cheng <br...@blockcypher.com>
Subject Re: Checking replication status
Date Fri, 26 Feb 2016 23:38:11 GMT
Hi Jimmy,

If you sustain a long downtime, repair is almost always the way to go.

It seems like you're asking to what extent a cluster is able to
recover/resync a downed peer.

A peer will not attempt to reacquire all the data it has missed while being
down. Recovery happens in a few ways:

1) Hints: Assuming that there are enough peers to satisfy your quorum
requirements on write, the live peers will queue up these operations for up
to max_hint_window_in_ms (from cassandra.yaml). These hints will be
delivered once the peer recovers.
2) Read repair: There is a probability that read repair will happen,
meaning that a query will trigger data consistency checks and updates _on
the query being performed_.
3) Repair.

If a machine goes down for longer than max_hint_window_in_ms, AFAIK you
_will_ have missing data. If you cannot tolerate this situation, you need
to take a look at your tunable consistency and/or trigger a repair.

On Thu, Feb 25, 2016 at 7:26 PM, Jimmy Lin <y2klyf+work@gmail.com> wrote:

> so far they are not long, just some config change and restart.
> if it is a 2 hrs downtime due to whatever reason, a repair is better
> option than trying to figure out if replication syn finish or not?
>
> On Thu, Feb 25, 2016 at 1:09 PM, daemeon reiydelle <daemeonr@gmail.com>
> wrote:
>
>> Hmm. What are your processes when a node comes back after "a long
>> offline"? Long enough to take the node offline and do a repair? Run the
>> risk of serving stale data? Parallel repairs? ???
>>
>> So, what sort of time frames are "a long time"?
>>
>>
>> *.......*
>>
>>
>>
>> *Daemeon C.M. ReiydelleUSA (+1) 415.501.0198
>> <%28%2B1%29%20415.501.0198>London (+44) (0) 20 8144 9872
>> <%28%2B44%29%20%280%29%2020%208144%209872>*
>>
>> On Thu, Feb 25, 2016 at 11:36 AM, Jimmy Lin <y2klyf@gmail.com> wrote:
>>
>>> hi all,
>>>
>>> what are the better ways to check replication overall status of cassandra cluster?
>>>
>>>  within a single DC, unless a node is down for long time, most of the time i
feel it is pretty much non-issue and things are replicated pretty fast. But when a node come
back from a long offline, is there a way to check that the node has finished its data sync
with other nodes  ?
>>>
>>>  Now across DC, we have frequent VPN outage (sometime short sometims long) between
DCs, i also like to know if there is a way to find how the replication progress between DC
catching up under this condtion?
>>>
>>>  Also, if i understand correctly, the only gaurantee way to make sure data are
synced is to run a complete repair job,
>>> is that correct? I am trying to see if there is a way to "force a quick replication
sync" between DCs after vpn outage.
>>> Or maybe this is unnecessary, as Cassandra will catch up as fast as it can, there
is nothing else we/(system admin) can do to make it faster or better?
>>>
>>>
>>>
>>> Sent from my iPhone
>>>
>>
>>
>

Mime
View raw message