Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id E3F7D10B9F for ; Wed, 10 Jul 2013 18:24:14 +0000 (UTC) Received: (qmail 339 invoked by uid 500); 10 Jul 2013 18:24:12 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 316 invoked by uid 500); 10 Jul 2013 18:24:12 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 305 invoked by uid 99); 10 Jul 2013 18:24:11 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 10 Jul 2013 18:24:11 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of rodrigofelixdealmeida@gmail.com designates 209.85.220.44 as permitted sender) Received: from [209.85.220.44] (HELO mail-pa0-f44.google.com) (209.85.220.44) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 10 Jul 2013 18:24:07 +0000 Received: by mail-pa0-f44.google.com with SMTP id lj1so6962651pab.3 for ; Wed, 10 Jul 2013 11:23:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; bh=XdjKDTlDC2yaPUvutq/LXnACVz78nfA5gaCy4wvx/nI=; b=q89p/JNzTz7UJEVh8r9j2pN1s/Dgb1gEkzUEcLzfwQoyKbVY8ULANqWbVJ2r2Og7gl ORVJmdzq7kGtCQdpgoha75/r5KyKnJIpvW4kBwzhwoytbDU53boqK5AnXEyKCnEeYos8 +vqL/YHIAL0cmyrtFiRGHQDFI4e1KWsMuAT5wUDFZ/7711rYELkxkxx6jyAU24FhDcOm JyQ4rPhiQPhETRGG6hLqRHljqCWt+nR+G0VBMx4twGb/bsF3TUS/u0lGTAJNSDRYjyYG /jSAM5HdaCUW7nsiJhzZuq/FWlUvLwD4Zwfqs5W3GshG2rTIqWGsri2aT8iV9Gz0MolY Tvvw== X-Received: by 10.68.227.36 with SMTP id rx4mr29295468pbc.1.1373480627205; Wed, 10 Jul 2013 11:23:47 -0700 (PDT) MIME-Version: 1.0 Received: by 10.66.172.172 with HTTP; Wed, 10 Jul 2013 11:23:27 -0700 (PDT) In-Reply-To: References: From: Rodrigo Felix Date: Wed, 10 Jul 2013 15:23:27 -0300 Message-ID: Subject: Re: General doubts about bootstrap To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=047d7b163257e3fe9304e12c6017 X-Virus-Checked: Checked by ClamAV on apache.org --047d7b163257e3fe9304e12c6017 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Currently, I'm using cassandra 1.1.5, but I'm considering to update to 1.2.x in order to make use of vnodes. Doubling the size is not possible to me because I want to measure the response while adding (or removing) single nodes. Thank you guys. It help me a lot to understand better how cassandra works. Att. *Rodrigo Felix de Almeida* LSBD - Universidade Federal do Cear=E1 Project Manager MBA, CSM, CSPO, SCJP On Wed, Jul 10, 2013 at 11:11 AM, Eric Stevens wrote: > > =3D> Adding a new node between other nodes would avoid running move, bu= t > the ring would be unbalanced, right? Would this imply in having a node > (with bigger range, 1/2 of the range while other 2 nodes have 1/2 each, > supposing 3 nodes) overloaded? I'm refering > http://wiki.apache.org/cassandra/Operations#Load_balancing >> >> >>> >>> Yes, if you're using a single vnode per server, or are running an older > version of Cassandra. For lowest impact, doubling the size of your clust= er > is recommended so that you can avoid doing moves. Or if you're on > Cassandra 1.2+, you can use vnodes, and you should not typically need to > rebalance after bringing a new server online. > > > On Tue, Jul 9, 2013 at 9:31 PM, Rodrigo Felix < > rodrigofelixdealmeida@gmail.com> wrote: > >> Thank you very much for you response. Follows my comments about your >> email. >> >> Att. >> >> *Rodrigo Felix de Almeida* >> LSBD - Universidade Federal do Cear=E1 >> Project Manager >> MBA, CSM, CSPO, SCJP >> >> >> On Mon, Jul 8, 2013 at 6:05 PM, Robert Coli wrote= : >> >>> On Sat, Jul 6, 2013 at 1:50 PM, Rodrigo Felix < >>> rodrigofelixdealmeida@gmail.com> wrote: >>> >>>> >>>> - Is it normal to take about 9 minutes to add a new node? Follows >>>> the log generated by a script to add a new node. >>>> >>>> Sure. =3D> OK >>> >>>> >>>> - Is there a way to reduce the time to start cassandra? >>>> >>>> Not usually. =3D> OK >>> >>>> >>>> - Sometimes cleanup operation takes make minutes (about 10). Is >>>> this normal since the amount of data is small (1.7gb at maximum / s= eed)? >>>> >>>> Compaction is throttled, and cleanup is a type of compaction. Bootstra= p >>> is also throttled via the streaming throttle. =3D> OK >>> >>>> >>>> - Considering that I have two seeds in the beginning, their tokens >>>> are 0 and 85070591730234615865843651857942052864. When I add a new = machine, >>>> do I need to execute move and cleanup on both seeds? Nowadays, I'm = running >>>> cleanup on seed 0, move + cleanup on the other seed and neither mov= e nor >>>> cleanup on the just added node. Is this OK? >>>> >>>> Only nodes which have "lost" ranges need to run cleanup. In general yo= u >>> should add new nodes "between" other nodes such that "move" is not requ= ired >>> at all. >>> >> >> =3D> Adding a new node between other nodes would avoid running move, but >> the ring would be unbalanced, right? Would this imply in having a node >> (with bigger range, 1/2 of the range while other 2 nodes have 1/2 each, >> supposing 3 nodes) overloaded? I'm refering >> http://wiki.apache.org/cassandra/Operations#Load_balancing >> >>> >>>> - What if I do not run cleanup in any existing node when adding or >>>> removing a node? Is the data that was not "cleaned up" still availa= ble if I >>>> send a scan, for instance, and the scan range is still in the node = but it >>>> wouldn't be there if I had run cleanup? Data would be gather from o= ther >>>> node, ie. the one that properly has the range specified in the scan= query? >>>> >>>> If data for range [x] is on node [a] but node [a] is no longer >>> considered an endpoint for range [x], it will never receive a request t= o >>> serve range [x]. =3D> OK >>> >>>> >>>> - After decommissioning a node, is it advisable to run cleanup in >>>> the remaining nodes? The consequences of not to run are the same of= not to >>>> run when adding a node? >>>> >>>> Cleanup is only for the node which lost a range. In decommission case, >>> no live nodes lost a range, only some nodes gained one. =3D> OK >>> >>> =3DRob >>> >> >> > --047d7b163257e3fe9304e12c6017 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
Currently, I'm using cassandra 1.1.5, but I'm cons= idering to update to 1.2.x in order to make use of vnodes.
Doubling the= size is not possible to me because I want to measure the response while ad= ding (or removing) single nodes.
Thank you guys. It help me a lot to understand better how cassan= dra works.

Att.

Rodrigo Felix de Almeida
LSBD - Universidade Feder= al do Cear=E1
Project Manager
MBA, CSM, CSPO, SCJP


On Wed, Jul 10, 2013 at 11:11 AM, Eric S= tevens <mightye@gmail.com> wrote:
>=A0=3D> Adding a new node between other nodes wo= uld avoid running move, but the ring would be unbalanced, right? Would this= imply in having a node (with bigger range, 1/2 of the range while other 2 = nodes have 1/2 each, supposing 3 nodes) overloaded? I'm refering=A0http= ://wiki.apache.org/cassandra/Operations#Load_balancing
    Yes, if you're using a single vnode per server, or ar= e running an older version of Cassandra. =A0For lowest impact, doubling the= size of your cluster is recommended so that you can avoid doing moves. =A0= Or if you're on Cassandra 1.2+, you can use vnodes, and you should not = typically need to rebalance after bringing a new server online.

    On Tue, Jul 9, 2013 at 9:31 PM, Rodrigo Fe= lix <rodrigofelixdealmeida@gmail.com> wrote:
    Thank you very much for you= response. Follows my comments about your email.

    Att.

    Rodrigo Felix de Almeida
    = LSBD - Universidade Federal do Cear=E1
    Project Manager
    MBA, CSM, CSPO, SCJP


    On Mon, Jul 8, 2013 at 6:05 P= M, Robert Coli <rcoli@eventbrite.com> wrote:
    On Sat, Jul 6, 2013 at 1:50 PM, Rodrigo Felix <rodrigofelixdealmeida@gmail.com> wrote:
    • Is it normal to take about 9 minutes to add a new node? Follows the= log generated by a script to add a new node.
    Sure. =A0=3D> OK
    • Is there a way to reduce the time to start cassand= ra?
    Not usually. =3D> OK
    =
    • Sometimes cleanup operation takes make minutes (about 10). Is this norm= al since the amount of data is small (1.7gb at maximum / seed)?
    Compaction is throttled, and cleanup is a= type of compaction. Bootstrap is also throttled via the streaming throttle= . =3D> OK
    • Considering that = I have two seeds in the beginning, their tokens are 0 and=A0850705917302346= 15865843651857942052864. When I add a new machine, do I need to execute mov= e and cleanup on both seeds? Nowadays, I'm running cleanup on seed 0, m= ove + cleanup on the other seed and neither move nor cleanup on the just ad= ded node. Is this OK?
    Only nodes which have "lost" r= anges need to run cleanup. In general you should add new nodes "betwee= n" other nodes such that "move" is not required at all.=A0

    =3D> Adding a n= ew node between other nodes would avoid running move, but the ring would be= unbalanced, right? Would this imply in having a node (with bigger range, 1= /2 of the range while other 2 nodes have 1/2 each, supposing 3 nodes) overl= oaded? I'm refering=A0http://wiki.apache.org/cassandra/Op= erations#Load_balancing
    • What if I do not run cleanup in any existing node when adding or removi= ng a node? Is the data that was not "cleaned up" still available = if I send a scan, for instance, and the scan range is still in the node but= it wouldn't be there if I had run cleanup? Data would be gather from o= ther node, ie. the one that properly has the range specified in the scan qu= ery?
    If data for range [x] is on node [= a] but node [a] is no longer considered an endpoint for range [x], it will = never receive a request to serve range [x]. =3D> OK
    • After decommissioning a node, is it advisable to run cleanup in the rem= aining nodes? The consequences of not to run are the same of not to run whe= n adding a node?
    Cleanup is on= ly for the node which lost a range. In decommission case, no live nodes los= t a range, only some nodes gained one. =3D> OK

    =3DRob



    --047d7b163257e3fe9304e12c6017--