cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean Tremblay <jean.tremb...@zen-innovations.com>
Subject Re: Catastrophy Recovery.
Date Mon, 15 Jun 2015 15:58:26 GMT
That is really wonderful. Thank you very much Alain. You gave me a lot of trails to investigate.
Thanks again for you help.

On 15 Jun 2015, at 17:49 , Alain RODRIGUEZ <arodrime@gmail.com<mailto:arodrime@gmail.com>>
wrote:

Hi, it looks like your starting to use Cassandra.

Welcome.

I invite you to read from here as much as you can http://docs.datastax.com/en/cassandra/2.1/cassandra/gettingStartedCassandraIntro.html.

When a node lose some data you have various anti entropy mechanism

Hinted Handoff --> For writes that occurred while node was down and known as such by other
nodes (exclusively)
Read repair --> On each read, you can set a chance to check other nodes for auto correction.
Repair ( called either manual / anti entropy / full / ...) : Which takes care to give back
a node its missing data only for the range this node handles (-pr) or for all its data (its
range plus its replica). This is something you generally want to perform on all nodes on a
regular basis (lower than the lowest gc_grace_period set on any of your tables).

Also, you are having wrong values because you probably have a Consistency Level (CL) too low.
If you want this to never happen you have to set Read (R) / Write (W) consistency level as
follow : R + W > RF (Refplication Factor), if not you can see what you are currently seeing.
I advise you to set your consistency to "local_quorum" or "quorum" on single DC environment.
Also, with 3 nodes, you should set RF to 3, if not you won't be able to reach a strong consistency
due to the formula I just give you.

There is a lot more to know, you should read about this all. Using Cassandra without knowing
about its internals would lead you to very poor and unexpected results.

To answer your questions:

"For what I understand, if you have a fixed node with no data it will automatically bootstrap
and recover all its old data from its neighbour while doing the joining phase. Is this correct?"

--> Not at all, unless it join the ring for the first time, which is not your case. Through
it will (by default) slowly recover while you read.

"After such catastrophe, and after the joining phase is done should the cluster not be ready
to deliver always consistent data if there was no inserts or delete during the catastrophe?"

No, we can't ensure that, excepted dropping the node and bootstrapping a new one. What we
can make sure of is that there is enough replica remaining to serve consistent data (search
for RF and CL)

"After the bootstrap of a broken node is finish, i.e. after the joining phase, is there not
simply a repair to be done on that node using “node repair"?"

This sentence is false bootstrap / joining phase ≠ from broken node coming back. You are
right on repair, if a broken node (or down for too long - default 3 hours) come back you have
to repair. But repair is slow, make sure you can afford a node, see my previous answer.

Testing is a really good idea but you also have to read a lot imho.

Good luck,

C*heers,

Alain


2015-06-15 11:13 GMT+02:00 Jean Tremblay <jean.tremblay@zen-innovations.com<mailto:jean.tremblay@zen-innovations.com>>:

Hi,

I have a cluster of 3 nodes RF: 2.
There are about 2 billion rows in one table.
I use LeveledCompactionStrategy on my table.
I use version 2.1.6.
I use the default cassandra.yaml, only the ip address for seeds and throughput has been change.

I am have tested a scenario where one node crashes and loose all its data.
I have deleted all data on this node after having stopped Cassandra.
At this point I noticed that the cluster was giving proper results. What I was expecting from
a cluster DB.

I then restarted that node and I observed that the node was joining the cluster.
After an hour or so the old “defect” node was up and normal.
I noticed that its hard disk loaded with much less data than its neighbours.

When I was querying the DB, the cluster was giving me different results for successive identical
queries.
I guess the old “defect” node was giving me less rows than it should have.

1) For what I understand, if you have a fixed node with no data it will automatically bootstrap
and recover all its old data from its neighbour while doing the joining phase. Is this correct?
2) After such catastrophe, and after the joining phase is done should the cluster not be ready
to deliver always consistent data if there was no inserts or delete during the catastrophe?
3) After the bootstrap of a broken node is finish, i.e. after the joining phase, is there
not simply a repair to be done on that node using “node repair"?


Thanks for your comments.

Kind regards

Jean



Mime
View raw message