cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yatong Zhang <bluefl...@gmail.com>
Subject Re: Is there a way to add a new node to a cluster but not sync old data?
Date Thu, 22 Jan 2015 04:21:57 GMT
Yes, my cluster is almost full and there are lots of pending tasks. You
helped me a lot and thank you Eric~

On Thu, Jan 22, 2015 at 11:59 AM, Eric Stevens <mightye@gmail.com> wrote:

> Yes, bootstrapping a new node will cause read loads on your existing nodes
> - it is becoming the owner and replica of a whole new set of existing
> data.  To do that it needs to know what data it's now responsible for, and
> that's what bootstrapping is for.
>
> If you're at the point where bootstrapping a new node is placing a
> too-heavy burden on your existing nodes, you may be dangerously close to or
> even past the tipping point where you ought to have already grown your
> cluster.  You need to grow your cluster as soon as possible, and chances
> are you're close to no longer being able to keep up with compaction (see
> nodetool compactionstats, make sure pending tasks is <5, preferably 0 or
> 1).  Once you're falling behind on compaction, it becomes difficult to
> successfully bootstrap new nodes, and you're in a very tough spot.
>
>
> On Wed, Jan 21, 2015 at 7:43 PM, Yatong Zhang <blueflycn@gmail.com> wrote:
>
>> Thanks for the reply. The bootstrap of new node put a heavy burden on the
>> whole cluster and I don't know why. So that' the issue I want to fix
>> actually.
>>
>> On Mon, Jan 12, 2015 at 6:08 AM, Eric Stevens <mightye@gmail.com> wrote:
>>
>>> Yes, but it won't do what I suspect you're hoping for.  If you disable
>>> auto_bootstrap in cassandra.yaml the node will join the cluster and will
>>> not stream any old data from existing nodes.
>>>
>>> The cluster will now be in an inconsistent state.  If you bring enough
>>> nodes online this way to violate your read consistency level (eg RF=3,
>>> CL=Quorum, if you bring on 2 nodes this way), some of your queries will be
>>> missing data that they ought to have returned.
>>>
>>> There is no way to bring a new node online and have it be responsible
>>> just for new data, and have no responsibility for old data.  It *will* be
>>> responsible for old data, it just won't *know* about the old data it
>>> should be responsible for.  Executing a repair will fix this, but only
>>> because the existing nodes will stream all the missing data to the new
>>> node.  This will create more pressure on your cluster than just normal
>>> bootstrapping would have.
>>>
>>> I can't think of any reason you'd want to do that unless you needed to
>>> grow your cluster really quickly, and were ok with corrupting your old data.
>>>
>>> On Sat, Jan 10, 2015 at 12:39 AM, Yatong Zhang <blueflycn@gmail.com>
>>> wrote:
>>>
>>>> Hi there,
>>>>
>>>> I am using C* 2.0.10 and I was trying to add a new node to a
>>>> cluster(actually replace a dead node). But after added the new node some
>>>> other nodes in the cluster had a very high work-load and affected the whole
>>>> performance of the cluster.
>>>> So I am wondering is there a way to add a new node and this node only
>>>> afford new data?
>>>>
>>>
>>>
>>
>

Mime
View raw message