incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aaron morton <aa...@thelastpickle.com>
Subject Re: best practices on EC2 question
Date Fri, 17 May 2013 19:02:14 GMT
I was considering that when bootstrapping starts the nodes receive writes so that when the
process is complete they have both the data from the streaming process and all writes from
the time they started. So that a repair is not needed. Compared to bootstrapping a node from
a backup where a (non -pr) repair is needed on the node to achieve consistency. In that sense
the node as all it's data when the bootstrap has finished. 

If there is data that is replicated to a single node there is always a risk of data loss.
The data could have been written in the time between the last backup and the node failing.


Cheers

-----------------
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 18/05/2013, at 6:32 AM, Robert Coli <rcoli@eventbrite.com> wrote:

> On Fri, May 17, 2013 at 11:13 AM, aaron morton <aaron@thelastpickle.com> wrote:
>> Bootstrapping a new node into the cluster has a small impact on the existing
>> nodes and the new nodes to have all the data they need when the finish the
>> process.
> 
> Sorry for the pedantry, but bootstrapping from existing replicas
> cannot guarantee that the new nodes have "all" the data they need when
> they finish the process. There is a non-zero chance that the failed
> node contained the single under-replicated copy of a given datum. In
> practice if your RF is >=2, you are unlikely to experience this type
> of data loss. But restore-a-backup-then-repair protects you against
> this unlikely case.
> 
> =Rob


Mime
View raw message