Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 59DD7200CF3 for ; Wed, 13 Sep 2017 09:24:59 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 56FAE1609CA; Wed, 13 Sep 2017 07:24:59 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 760481609C9 for ; Wed, 13 Sep 2017 09:24:58 +0200 (CEST) Received: (qmail 39444 invoked by uid 500); 13 Sep 2017 07:24:55 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 39434 invoked by uid 99); 13 Sep 2017 07:24:55 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 13 Sep 2017 07:24:55 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 544551A22C2 for ; Wed, 13 Sep 2017 07:24:55 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.38 X-Spam-Level: ** X-Spam-Status: No, score=2.38 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RCVD_IN_SORBS_SPAM=0.5, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (1024-bit key) header.d=iiitd.ac.in Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id JLe__ttiGX4q for ; Wed, 13 Sep 2017 07:24:52 +0000 (UTC) Received: from mail-lf0-f48.google.com (mail-lf0-f48.google.com [209.85.215.48]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 397C15FC1C for ; Wed, 13 Sep 2017 07:24:52 +0000 (UTC) Received: by mail-lf0-f48.google.com with SMTP id m199so31863988lfe.3 for ; Wed, 13 Sep 2017 00:24:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=iiitd.ac.in; s=google; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=XwHATGCIvjQ9HVTuJd+iXsoCClcqDMKkQEbZSuyPls8=; b=jlFBFpYKmI56FcptVPM3giONFQSj2c+hl6caTo5RvUNFX1wmMzyigz/VmDR7+53yed q9XLBEEp3rEoJ8SudE66vl+5Yb01Vb+0l3Z8cCOuZL/xc5EIO/xITrET0r6QtdRNhAQo woG6XT2BRow//HEEZJXIVly7VfCpsVNRuZ1JQ= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=XwHATGCIvjQ9HVTuJd+iXsoCClcqDMKkQEbZSuyPls8=; b=eMOylFTzX4CzFm038+EcdOl/OtpNyx21EChf9tROW+AI39Bbxs1rQzU7L70PDR5BSX Arq/eHWEsjmY2QiOakOWzD7rS6N7hhzjZMdsAR/TIYGOrQNCD7kCeP4w31pm5HPx5o2M rcOC0JUFcEnww9TRXuyxyY9wP5YMVxWBZW0yKZBJPJMhlbW+wmC6egw2LdgP1xrsJMta Ftibt6ZJ8xCYZwDw2MYPusuS7DGST1Lz0knQvc6189sm28nU66jM9ijgmvCVjc7NTnnn 9Uu+hq2PHBlL+af3KFKM4b/5DvesisEOStpcaRsLcLYq5mWYe/gBbqlP9MRN3uw7nz1S j9Wg== X-Gm-Message-State: AHPjjUhDOG0m9p3CD1y3AEfJLEpit7kfl+5p0yEApwHdzlzlMgVoRYgr bBVz+HLWxJkdKFGv5h8uEpdYm/CKf3qI4M1BKEHUaA== X-Google-Smtp-Source: ADKCNb7Adrq4PUGQBCrm8tihxpFuPRaig0LFA+X9vjP3Al39e8nEe5g8UTyCiRVQmWx9eLW11duFMLXS/yrEJD4XwhI= X-Received: by 10.46.14.1 with SMTP id 1mr6196811ljo.133.1505287490648; Wed, 13 Sep 2017 00:24:50 -0700 (PDT) MIME-Version: 1.0 Received: by 10.46.2.68 with HTTP; Wed, 13 Sep 2017 00:24:30 -0700 (PDT) In-Reply-To: References: From: Akshit Jain Date: Wed, 13 Sep 2017 12:54:30 +0530 Message-ID: Subject: Re: Rebalance a cassandra cluster To: =?UTF-8?Q?Hannu_Kr=C3=B6ger?= Cc: user@cassandra.apache.org Content-Type: multipart/alternative; boundary="f403045e9e102a551505590d0e49" archived-at: Wed, 13 Sep 2017 07:24:59 -0000 --f403045e9e102a551505590d0e49 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Suppose I have a cassandra cluster with the data that is skewed such that one node have 40% more data than other nodes.Since while creating the cassandra the tokens were distributed uniformly. Now to make the data uniform I have to recalculate the tokens and assign them to nodes in the cluster. Then run repair and cleanup. The question is How to recalculate the tokens and assign them to nodes(Keeping cost ,distance between nodes and data movement in mind) Regards Akshit Jain B-Tech,2013124 9891724697 On Wed, Sep 13, 2017 at 11:54 AM, Hannu Kr=C3=B6ger wro= te: > Hi, > > you should make sure that token range is evenly distributed if you have a > single token configured per node. You can use e.g. this tool to calculate > tokens: > https://www.geroba.com/cassandra/cassandra-token-calculator/ > > Also, make sure that none of the partitions in your data model are > hotspots that contain a lot more data than on average. Check also > materialized views if you use them. > > Also, due to way the compactions work, it=E2=80=99s normal that the disk = usage > goes up and down. Since nodes often do that in different rhythms, you > always see that some node(s) are using more disk space than others if som= e > point of time especially if you do updates&deletes and not just inserts. > > Cheers, > Hannu > > On 13 September 2017 at 07:47:09, Akshit Jain (akshit13124@iiitd.ac.in) > wrote: > > Hi, > Can a cassandra cluster be unbalanced in terms of data? > If yes then how to rebalance a cassandra cluster. > > --f403045e9e102a551505590d0e49 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Suppose I have a cassandra cluster with the data tha= t is skewed such that one node have 40% more data than other nodes.Since wh= ile creating the cassandra the tokens were distributed uniformly.
Now to make the data uniform I have to recalculate the tokens and assign = them to nodes in the cluster. Then run repair and cleanup.
The = question is How to recalculate the tokens and assign them to nodes(Keeping = cost ,distance between nodes and data movement in mind)

Regards
Akshit Jain
B-= Tech,2013124
9891724697


On Wed, Sep 13, 2017 at 11:54 AM, Hannu Kr= =C3=B6ger <hkroger@gmail.com> wrote:
Hi,

= you should make sure that token range is evenly distributed if you have a s= ingle token configured per node. You can use e.g. this tool to calculate to= kens:

Also, make sure that none of the parti= tions in your data model are hotspots that contain a lot more data than on = average. Check also materialized views if you use them.

<= div id=3D"m_-773908583332974345bloop_customfont" style=3D"font-family:Helve= tica,Arial;font-size:13px;color:rgba(0,0,0,1.0);margin:0px;line-height:auto= ">Also, due to way the compactions work, it=E2=80=99s normal that the disk = usage goes up and down. Since nodes often do that in different rhythms, you= always see that some node(s) are using more disk space than others if some= point of time especially if you do updates&deletes and not just insert= s.

Cheers,
Hannu
<= p class=3D"m_-773908583332974345airmail_on">On 13 September 2017 at 07:47:0= 9, Akshit Jain (akshit13124@iiitd.ac.in) wrote:

Hi,
Can a cassandra cluster be unbalanced in terms of data?
If yes then how to rebalance a cassandra cluster.


--f403045e9e102a551505590d0e49--