Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id B5DED9C7D for ; Mon, 12 Mar 2012 11:14:58 +0000 (UTC) Received: (qmail 78796 invoked by uid 500); 12 Mar 2012 11:14:56 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 78761 invoked by uid 500); 12 Mar 2012 11:14:56 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 78747 invoked by uid 99); 12 Mar 2012 11:14:56 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 12 Mar 2012 11:14:56 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_NEUTRAL,TO_NO_BRKTS_PCNT X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [74.125.82.172] (HELO mail-we0-f172.google.com) (74.125.82.172) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 12 Mar 2012 11:14:48 +0000 Received: by werb10 with SMTP id b10so3868143wer.31 for ; Mon, 12 Mar 2012 04:14:26 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=message-id:date:from:user-agent:mime-version:to:subject:references :in-reply-to:content-type:x-gm-message-state; bh=6onV62ig5RC586eSPdBBclBhkOn5uJ+KAxY5QjaM5fk=; b=P/1rxPNnEMxUEQh9VqL4HHQRX5UekPn5yUhUln63qnv2/zVsiJ0aA0AqHOLIcRieT+ X6DSPbHNaveM10bLiiCyhvZfjteeebRo4ueILEw/6rBprjwxSlNRwDBBf+00GJItLNfD MKpzmoFcC1xz/LaTV/QiOVEmB7/+qRvsEbbgmDSgutMFysvT3PGTTRxc+1ta59yMj2hr dugItTG/U1WyDgFWzXn1WHuBpPsoj5QRtlObB3+Pn3gRAMU3NqBk/Lqz4RUNMNyQDMOW O6kzGK+8m2R4IUkqVuxAGcftW303sHc5YCIGcd24NbiC8q+scVV2z7peepT/eV2cTmE9 U/nQ== Received: by 10.216.137.97 with SMTP id x75mr7252287wei.25.1331550866054; Mon, 12 Mar 2012 04:14:26 -0700 (PDT) Received: from Rustams-MacBook-Air.local ([91.209.82.20]) by mx.google.com with ESMTPS id df3sm33112721wib.1.2012.03.12.04.14.24 (version=TLSv1/SSLv3 cipher=OTHER); Mon, 12 Mar 2012 04:14:25 -0700 (PDT) Message-ID: <4F5DDA90.6070002@code.az> Date: Mon, 12 Mar 2012 11:14:24 +0000 From: Rustam Aliyev User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:10.0.1) Gecko/20120208 Thunderbird/10.0.1 MIME-Version: 1.0 To: user@cassandra.apache.org Subject: Re: Adding node to Cassandra References: <4F5DC081.5070500@gmail.com> In-Reply-To: <4F5DC081.5070500@gmail.com> Content-Type: multipart/alternative; boundary="------------030906040704060604070708" X-Gm-Message-State: ALoCoQmQjBzSz+qISo73rzQgbKMgYO/7ULI0e243olbBK/nr4ShejcZ3vJJpL7QCZVozwtmelC4F X-Virus-Checked: Checked by ClamAV on apache.org This is a multi-part message in MIME format. --------------030906040704060604070708 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Hi, If you use SizeTieredCompactionStrategy, you should have x2 disk space to be on the safe side. So if you want to store 2TB data, you need partition size of 4TB at least. LeveledCompactionStrategy is available in 1.x and supposed to require less free disk space (but comes at price of I/O). -- Rustam. On 12/03/2012 09:23, Vanger wrote: > *We have cassandra 4 nodes cluster* with RF = 3 (nodes named from 'A' > to 'D', initial tokens: > *A (25%)*: 20543402371996174596346065790779111550, * > B (25%)*: 63454860067234500516210522518260948578, > *C (25%)*: 106715317233367107622067286720208938865, > *D (25%)*: 150141183460469231731687303715884105728), > *and want to add 5th node* ('E') with initial token = > 164163260474281062972548100673162157075, then we want to rebalance A, > D, E nodes such way they'll own equal percentage of data. All nodes > have ~400 GB of data and around ~300GB disk free space. > What we did: > 1. 'Join' new cassandra instance (node 'E') to cluster and wait 'till > it loads data for it tokens range. > > 2. Move node 'D' initial token down from 150... to 130... > Here we ran into a problem. When "move" started disk usage for node C > grows from 400 to 750GB, we saw running compactions on node 'D' but > some compactions failed with /"WARN [CompactionExecutor:580] > 2012-03-11 16:57:56,036 CompactionTask.java (line 87) insufficient > space to compact all requested files SSTableReader"/ after that we > killed "move" process to avoid "out of disk space" error (when 5GB of > free space left). After restart it frees 100GB of space and now we > have total of 105GB free disk space on node 'D'. Also we noticed > increased disk usage by ~150GB at node 'B' but it stops growing before > we stopped "move token". > > > So now we have 5 nodes in cluster in status like this: > Node, Owns%, Load, Init. token > A: 16% 400GB 020... > B: 25% 520GB 063... > C: 25% 400GB 106... > D: 25% 640GB 150... > E: 9% 300GB 164... > > We'll add disk space for all nodes and run some cleanups, but there's > still left some questions: > > What is the best next step for us from this point? > What is correct procedure after all and what should we expect when > adding node to cassandra cluster? > We expected decrease of used disk space on node 'D' 'cause we shrink > token range for this node, but saw the opposite, why it happened and > is it normal behavior? > What if we'll have 2TB of data on 2.5TB disk and we wanted to add > another node and move tokens? > Is it possible to automate node addition to cluster and be sure we > won't run out of space? > > Thank. --------------030906040704060604070708 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: 8bit Hi,

If you use SizeTieredCompactionStrategy, you should have x2 disk space to
be on the safe side. So if you want to store 2TB data, you need partition size of 4TB at least.  LeveledCompactionStrategy is available in 1.x and supposed to require less free disk space (but comes at price of I/O).

--
Rustam.

On 12/03/2012 09:23, Vanger wrote:
We have cassandra 4 nodes cluster with RF = 3 (nodes named from 'A' to 'D', initial tokens:    
A (25%): 20543402371996174596346065790779111550,    
B (25%)
: 63454860067234500516210522518260948578,    
C (25%): 106715317233367107622067286720208938865,   
D (25%): 150141183460469231731687303715884105728),
and want to add 5th node ('E') with initial token = 164163260474281062972548100673162157075,  then we want to rebalance A, D, E nodes such way they'll own equal percentage of data. All nodes have ~400 GB of data and around ~300GB disk free space.
What we did:
1. 'Join' new cassandra instance (node 'E') to cluster and wait 'till it loads data for it tokens range.

2. Move node 'D' initial token down from 150... to 130...
Here we ran into a problem. When "move" started disk usage for node C grows from 400 to 750GB, we saw running compactions on node 'D' but some compactions failed with "WARN [CompactionExecutor:580] 2012-03-11 16:57:56,036 CompactionTask.java (line 87) insufficient space to compact all requested files SSTableReader" after that we killed "move" process to avoid "out of disk space" error (when 5GB of free space left). After restart it frees 100GB of space and now we have total of 105GB free disk space on node 'D'. Also we noticed increased disk usage by ~150GB at node 'B' but it stops growing before we stopped "move token".


So now we have 5 nodes in cluster in status like this:
Node, Owns%,     Load,     Init. token
A:         16%       400GB        020...
B:         25%       520GB        063...
C:         25%       400GB        106...
D:         25%       640GB        150...
E:          9%         300GB        164...

We'll add disk space for all nodes and run some cleanups, but there's still left some questions:

What is the best next step  for us from this point?
What is correct procedure after all and what should we expect when adding node to cassandra cluster?
We expected decrease of used disk space on node 'D' 'cause we shrink token range for this node, but saw the opposite, why it happened and is it normal behavior?
What if we'll have 2TB of data on 2.5TB disk and we wanted to add another node and move tokens?
Is it possible to automate node addition to cluster and be sure we won't run out of space?

Thank.
--------------030906040704060604070708--