Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 8421AF9E8 for ; Sun, 28 Apr 2013 21:21:48 +0000 (UTC) Received: (qmail 22706 invoked by uid 500); 28 Apr 2013 21:21:46 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 22676 invoked by uid 500); 28 Apr 2013 21:21:46 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 22668 invoked by uid 99); 28 Apr 2013 21:21:46 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 28 Apr 2013 21:21:46 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of john@disqus.com designates 209.85.210.177 as permitted sender) Received: from [209.85.210.177] (HELO mail-ia0-f177.google.com) (209.85.210.177) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 28 Apr 2013 21:21:41 +0000 Received: by mail-ia0-f177.google.com with SMTP id y26so4917586iab.8 for ; Sun, 28 Apr 2013 14:21:20 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:x-received:date:message-id:subject:from:to :content-type:x-gm-message-state; bh=SVKRancKhb70WurHBDRqubDdB3kukS6fSQKRhHGHniw=; b=fCWo7hT34uE3zKWFaePFvcgrN035hR2bklKbAK863kPKvcRRfKmoM6+ul5hbGLp/ed DRWK0cPmCraiaKBk4W+S0NlmpywAQfKdPh1rNmo7h/Z0OAvr4NjNf8m98n7jzhW9uQKl WL0VLYIP/WfC4ZxLuCVAyIKVZb7x5++kK0jzxz9xWkL3PLUdOfLL++B6gpZ+RCm/WMZy Tt1vQT7KgZ6gn0sDrNlZ/VBCADqsVSN5v8ks943S1Gs7zeWBWH794J+V68GDBj+47PeK /jUwl038UDZnXTcZEjdJfS3VJprqq5LxAYBektB5BG3uBEmVFJoepx3aFvU2QvANY2Lq X5SQ== MIME-Version: 1.0 X-Received: by 10.50.92.42 with SMTP id cj10mr6354110igb.60.1367184080232; Sun, 28 Apr 2013 14:21:20 -0700 (PDT) Received: by 10.64.13.243 with HTTP; Sun, 28 Apr 2013 14:21:20 -0700 (PDT) Date: Sun, 28 Apr 2013 14:21:20 -0700 Message-ID: Subject: cassandra-shuffle time to completion and required disk space From: John Watson To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=047d7b10d0b571fadf04db72598c X-Gm-Message-State: ALoCoQnkiX7CBfAy6DmXkMldIfeXM+c+jUnbqwUe2g90fqfnwMYUuMsgz6oPW+SWdXTlsC3OUW30 X-Virus-Checked: Checked by ClamAV on apache.org --047d7b10d0b571fadf04db72598c Content-Type: text/plain; charset=UTF-8 The amount of time/space cassandra-shuffle requires when upgrading to using vnodes should really be apparent in documentation (when some is made). Only semi-noticeable remark about the exorbitant amount of time is a bullet point in: http://wiki.apache.org/cassandra/VirtualNodes/Balance "Shuffling will entail moving a lot of data around the cluster and so has the potential to consume a lot of disk and network I/O, and to take a considerable amount of time. For this to be an online operation, the shuffle will need to operate on a lower priority basis to other streaming operations, and should be expected to take days or weeks to complete." We tried running shuffle on a QA version of our cluster and 2 things were brought to light: - Even with no reads/writes it was going to take 20 days - Each machine needed enough free diskspace to potentially hold the entire cluster's sstables on disk Regards, John --047d7b10d0b571fadf04db72598c Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
The amount of time/space cassandra-shuffle requires when u= pgrading to using vnodes should really be apparent in documentation (when s= ome is made).

Only semi-noticeable remark about th= e exorbitant amount of time is a bullet point in:=C2=A0http://wiki.apache.org/cassan= dra/VirtualNodes/Balance

"Shuffling will entail moving a lot of= data around the cluster and so has the potential to consume a lot of disk = and network I/O, and to take a considerable amount of time. For this to be = an online operation, the shuffle will need to operate on a lower priority b= asis to other streaming operations, and should be expected to take days or = weeks to complete."

We tried running shuffle on a QA version of= our cluster and 2 things were brought to light:
=C2=A0- Ev= en with no reads/writes it was going to take 20 days
=C2=A0= - Each machine needed enough free diskspace to potentially hold the entire = cluster's sstables on disk

Regards,

John
--047d7b10d0b571fadf04db72598c--