incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aaron morton <aa...@thelastpickle.com>
Subject Re: Adding nodes in 1.2 with vnodes requires huge disks
Date Mon, 29 Apr 2013 09:24:36 GMT
is this understanding correct "we had a 12 node cluster with 256 vnodes on each node (upgraded
from 1.1), we added two additional nodes that streamed so much data (600+Gb when other nodes
had 150-200GB) during the joining phase that they filled their local disks and had to be killed"
?

Can you raise a ticket on https://issues.apache.org/jira/browse/CASSANDRA and update the thread
with the ticket number.

Can you show the output from nodetool status so we can get a feel for the ring?
Can you include the logs from one of the nodes that failed to join ? 

Thanks

-----------------
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 29/04/2013, at 10:01 AM, John Watson <john@disqus.com> wrote:

> On Sun, Apr 28, 2013 at 2:19 PM, aaron morton <aaron@thelastpickle.com> wrote:
>> We're going to try running a shuffle before adding a new node again... maybe that
will help
> 
> I don't think  hurt but I doubt it will help. 
> 
> We had to bail on shuffle since we need to add capacity ASAP and not in 20 days.
>  
> 
>>> It seems when new nodes join, they are streamed *all* sstables in the cluster.
> 
>> 
> 
> How many nodes did you join, what was the num_tokens ? 
> Did you notice streaming from all nodes (in the logs) or are you saying this in response
to the cluster load increasing ? 
> 
>  
> Was only adding 2 nodes at the time (planning to add a total of 12.) Starting with a
cluster of 12, but now 11 since 1 node entered some weird state when one of the new nodes
ran out disk space.
> num_tokens is set to 256 on all nodes.
> Yes, nearly all current nodes were streaming to the new ones (which was great until disk
space was an issue.)
>>> The purple line machine, I just stopped the joining process because the main
cluster was dropping mutation messages at this point on a few nodes (and it still had dozens
of sstables to stream.)
> Which were the new nodes ?
> Can you show the output from nodetool status?
> 
> 
> The new nodes are the purple and gray lines above all the others.
> 
> nodetool status doesn't show joining nodes. I think I saw a bug already filed for this
but I can't seem to find it.
>  
> 
> Cheers
> 
> -----------------
> Aaron Morton
> Freelance Cassandra Consultant
> New Zealand
> 
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 27/04/2013, at 9:35 AM, Bryan Talbot <btalbot@aeriagames.com> wrote:
> 
>> I believe that "nodetool rebuild" is used to add a new datacenter, not just a new
host to an existing cluster.  Is that what you ran to add the node?
>> 
>> -Bryan
>> 
>> 
>> 
>> On Fri, Apr 26, 2013 at 1:27 PM, John Watson <john@disqus.com> wrote:
>> Small relief we're not the only ones that had this issue.
>> 
>> We're going to try running a shuffle before adding a new node again... maybe that
will help
>> 
>> - John
>> 
>> 
>> On Fri, Apr 26, 2013 at 5:07 AM, Francisco Nogueira Calmon Sobral <fsobral@igcorp.com.br>
wrote:
>> I am using the same version and observed something similar.
>> 
>> I've added a new node, but the instructions from Datastax did not work for me. Then
I ran "nodetool rebuild" on the new node. After finished this command, it contained two times
the load of the other nodes. Even when I ran "nodetool cleanup" on the older nodes, the situation
was the same.
>> 
>> The problem only seemed to disappear when "nodetool repair" was applied to all nodes.
>> 
>> Regards,
>> Francisco Sobral.
>> 
>> 
>> 
>> 
>> On Apr 25, 2013, at 4:57 PM, John Watson <john@disqus.com> wrote:
>> 
>>> After finally upgrading to 1.2.3 from 1.1.9, enabling vnodes, and running upgradesstables,
I figured it would be safe to start adding nodes to the cluster. Guess not?
>>> 
>>> It seems when new nodes join, they are streamed *all* sstables in the cluster.
>>> 
>>> https://dl.dropbox.com/s/bampemkvlfck2dt/Screen%20Shot%202013-04-25%20at%2012.35.24%20PM.png
>>> 
>>> The gray the line machine ran out disk space and for some reason cascaded into
errors in the cluster about 'no host id' when trying to store hints for it (even though it
hadn't joined yet).
>>> The purple line machine, I just stopped the joining process because the main
cluster was dropping mutation messages at this point on a few nodes (and it still had dozens
of sstables to stream.)
>>> 
>>> I followed this: http://www.datastax.com/docs/1.2/operations/add_replace_nodes
>>> 
>>> Is there something missing in that documentation?
>>> 
>>> Thanks,
>>> 
>>> John
>> 
>> 
>> 
> 
> 


Mime
View raw message