Return-Path: X-Original-To: apmail-cassandra-commits-archive@www.apache.org Delivered-To: apmail-cassandra-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 73B85107F2 for ; Tue, 12 Nov 2013 21:26:20 +0000 (UTC) Received: (qmail 44847 invoked by uid 500); 12 Nov 2013 21:26:20 -0000 Delivered-To: apmail-cassandra-commits-archive@cassandra.apache.org Received: (qmail 44826 invoked by uid 500); 12 Nov 2013 21:26:20 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 44817 invoked by uid 99); 12 Nov 2013 21:26:20 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 12 Nov 2013 21:26:20 +0000 Date: Tue, 12 Nov 2013 21:26:20 +0000 (UTC) From: "Quentin Conner (JIRA)" To: commits@cassandra.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (CASSANDRA-6127) vnodes don't scale to hundreds of nodes MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CASSANDRA-6127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Quentin Conner updated CASSANDRA-6127: -------------------------------------- Attachment: flaps-vs-tokens.png Flapping occurs with vnodes or without. Please see attached. Using vnodes appears to exacerbate, possibly with longer messages, probably with higher cpu utilization. Either would delay the timestamp for the Failure Detector interarrival time. > vnodes don't scale to hundreds of nodes > --------------------------------------- > > Key: CASSANDRA-6127 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6127 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: Any cluster that has vnodes and consists of hundreds of physical nodes. > Reporter: Tupshin Harper > Assignee: Jonathan Ellis > Attachments: 2013-11-05_18-04-03_no_compression_cpu_time.png, 2013-11-05_18-09-38_compression_on_cpu_time.png, 6000vnodes.patch, AdjustableGossipPeriod.patch, delayEstimatorUntilStatisticallyValid.patch, flaps-vs-tokens.png > > > There are a lot of gossip-related issues related to very wide clusters that also have vnodes enabled. Let's use this ticket as a master in case there are sub-tickets. > The most obvious symptom I've seen is with 1000 nodes in EC2 with m1.xlarge instances. Each node configured with 32 vnodes. > Without vnodes, cluster spins up fine and is ready to handle requests within 30 minutes or less. > With vnodes, nodes are reporting constant up/down flapping messages with no external load on the cluster. After a couple of hours, they were still flapping, had very high cpu load, and the cluster never looked like it was going to stabilize or be useful for traffic. -- This message was sent by Atlassian JIRA (v6.1#6144)