cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kai Wang <>
Subject Re: Is there a way to add a new node to a cluster but not sync old data?
Date Thu, 22 Jan 2015 13:50:15 GMT
In last year's summit there was a presentation from Instaclustr -
It could be the solution you are looking for. However I don't see the code
being checked in or JIRA being created. So for now you'd better plan the
capacity carefully.

On Wed, Jan 21, 2015 at 11:21 PM, Yatong Zhang <> wrote:

> Yes, my cluster is almost full and there are lots of pending tasks. You
> helped me a lot and thank you Eric~
> On Thu, Jan 22, 2015 at 11:59 AM, Eric Stevens <> wrote:
>> Yes, bootstrapping a new node will cause read loads on your existing
>> nodes - it is becoming the owner and replica of a whole new set of existing
>> data.  To do that it needs to know what data it's now responsible for, and
>> that's what bootstrapping is for.
>> If you're at the point where bootstrapping a new node is placing a
>> too-heavy burden on your existing nodes, you may be dangerously close to or
>> even past the tipping point where you ought to have already grown your
>> cluster.  You need to grow your cluster as soon as possible, and chances
>> are you're close to no longer being able to keep up with compaction (see
>> nodetool compactionstats, make sure pending tasks is <5, preferably 0 or
>> 1).  Once you're falling behind on compaction, it becomes difficult to
>> successfully bootstrap new nodes, and you're in a very tough spot.
>> On Wed, Jan 21, 2015 at 7:43 PM, Yatong Zhang <>
>> wrote:
>>> Thanks for the reply. The bootstrap of new node put a heavy burden on
>>> the whole cluster and I don't know why. So that' the issue I want to fix
>>> actually.
>>> On Mon, Jan 12, 2015 at 6:08 AM, Eric Stevens <> wrote:
>>>> Yes, but it won't do what I suspect you're hoping for.  If you disable
>>>> auto_bootstrap in cassandra.yaml the node will join the cluster and will
>>>> not stream any old data from existing nodes.
>>>> The cluster will now be in an inconsistent state.  If you bring enough
>>>> nodes online this way to violate your read consistency level (eg RF=3,
>>>> CL=Quorum, if you bring on 2 nodes this way), some of your queries will be
>>>> missing data that they ought to have returned.
>>>> There is no way to bring a new node online and have it be responsible
>>>> just for new data, and have no responsibility for old data.  It *will* be
>>>> responsible for old data, it just won't *know* about the old data it
>>>> should be responsible for.  Executing a repair will fix this, but only
>>>> because the existing nodes will stream all the missing data to the new
>>>> node.  This will create more pressure on your cluster than just normal
>>>> bootstrapping would have.
>>>> I can't think of any reason you'd want to do that unless you needed to
>>>> grow your cluster really quickly, and were ok with corrupting your old data.
>>>> On Sat, Jan 10, 2015 at 12:39 AM, Yatong Zhang <>
>>>> wrote:
>>>>> Hi there,
>>>>> I am using C* 2.0.10 and I was trying to add a new node to a
>>>>> cluster(actually replace a dead node). But after added the new node some
>>>>> other nodes in the cluster had a very high work-load and affected the
>>>>> performance of the cluster.
>>>>> So I am wondering is there a way to add a new node and this node only
>>>>> afford new data?

View raw message