Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@cassandra.apache.org
Date: Thu, 7 Nov 2013 15:19:17 +0000 (UTC)
From: "Quentin Conner (JIRA)" <jira@apache.org>
To: commits@cassandra.apache.org
Message-ID: <JIRA.12671666.1380662395262.35084.1383837557727@arcas>
In-Reply-To: <JIRA.12671666.1380662395262@arcas>
References: <JIRA.12671666.1380662395262@arcas>
Subject: [jira] [Commented] (CASSANDRA-6127) vnodes don't scale to hundreds
 of nodes
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/CASSANDRA-6127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13816037#comment-13816037 ] 

Quentin Conner commented on CASSANDRA-6127:
-------------------------------------------

Good morning.  We saw the same CPU usage profile with cassandra-1.2 8e7d7285cdeac4f2527c933280d595bbddd26935 (which included the patch to not flush peers CF).  

CPU time was spent in looking up EndpointState or spent in PHI calculation.  No surprises were found.  No race conditions, no deadlocks or mutex/monitor contention.

I do not know if flapping happens in 1.2 head without vnodes.  I will find out today, if I can get the nodes (having trouble this morning allocating from EC2).  Will keep trying (Fridays seem better) but could slip into the weekend...


> vnodes don't scale to hundreds of nodes
> ---------------------------------------
>
>                 Key: CASSANDRA-6127
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6127
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>         Environment: Any cluster that has vnodes and consists of hundreds of physical nodes.
>            Reporter: Tupshin Harper
>            Assignee: Jonathan Ellis
>         Attachments: 2013-11-05_18-04-03_no_compression_cpu_time.png, 2013-11-05_18-09-38_compression_on_cpu_time.png, 6000vnodes.patch, AdjustableGossipPeriod.patch, delayEstimatorUntilStatisticallyValid.patch
>
>
> There are a lot of gossip-related issues related to very wide clusters that also have vnodes enabled. Let's use this ticket as a master in case there are sub-tickets.
> The most obvious symptom I've seen is with 1000 nodes in EC2 with m1.xlarge instances. Each node configured with 32 vnodes.
> Without vnodes, cluster spins up fine and is ready to handle requests within 30 minutes or less. 
> With vnodes, nodes are reporting constant up/down flapping messages with no external load on the cluster. After a couple of hours, they were still flapping, had very high cpu load, and the cluster never looked like it was going to stabilize or be useful for traffic.


--
This message was sent by Atlassian JIRA
(v6.1#6144)