Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id C00CB200B99 for ; Wed, 5 Oct 2016 23:20:28 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id BE9E5160ADE; Wed, 5 Oct 2016 21:20:28 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 774E4160AC9 for ; Wed, 5 Oct 2016 23:20:27 +0200 (CEST) Received: (qmail 36705 invoked by uid 500); 5 Oct 2016 21:20:25 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 36695 invoked by uid 99); 5 Oct 2016 21:20:25 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 05 Oct 2016 21:20:25 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 74F0B1806F8 for ; Wed, 5 Oct 2016 21:20:25 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 3 X-Spam-Level: *** X-Spam-Status: No, score=3 tagged_above=-999 required=6.31 tests=[HTML_MESSAGE=2, KAM_LAZY_DOMAIN_SECURITY=1] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id nGvFYMOR1Op2 for ; Wed, 5 Oct 2016 21:20:23 +0000 (UTC) Received: from mx0b-00206401.pphosted.com (mx0a-00206401.pphosted.com [148.163.148.21]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id E40FD5F22F for ; Wed, 5 Oct 2016 21:20:22 +0000 (UTC) Received: from pps.filterd (m0092946.ppops.net [127.0.0.1]) by mx0a-00206401.pphosted.com (8.16.0.17/8.16.0.17) with SMTP id u95LJS55021986 for ; Wed, 5 Oct 2016 14:20:15 -0700 Received: from ee01.crowdstrike.sys (dragosx.crowdstrike.com [208.42.231.60]) by mx0a-00206401.pphosted.com with ESMTP id 25uy9u9e6b-1 (version=TLSv1 cipher=ECDHE-RSA-AES256-SHA bits=256 verify=NOT) for ; Wed, 05 Oct 2016 14:20:15 -0700 Received: from casmbox02.crowdstrike.sys (10.3.0.20) by ee01.crowdstrike.sys (10.100.0.12) with Microsoft SMTP Server (TLS) id 15.0.847.32; Wed, 5 Oct 2016 14:20:19 -0700 Received: from Casmbox03.crowdstrike.sys (10.100.11.66) by casmbox02.crowdstrike.sys (10.3.0.20) with Microsoft SMTP Server (TLS) id 15.0.1210.3; Wed, 5 Oct 2016 14:20:12 -0700 Received: from Casmbox03.crowdstrike.sys ([fe80::791d:4b4b:db82:a11a]) by Casmbox03.crowdstrike.sys ([fe80::791d:4b4b:db82:a11a%25]) with mapi id 15.00.1210.000; Wed, 5 Oct 2016 14:20:12 -0700 From: Jeff Jirsa To: "user@cassandra.apache.org" Subject: Re: Nodetool rebuild question Thread-Topic: Nodetool rebuild question Thread-Index: AdIfR1KaOoL4wi6GSlOj4c2qhfzXRgAAeQ4AAABONsAAAPRmgA== Date: Wed, 5 Oct 2016 21:20:12 +0000 Message-ID: References: In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: yes X-MS-TNEF-Correlator: x-ms-exchange-messagesentrepresentingtype: 1 x-ms-exchange-transport-fromentityheader: Hosted x-originating-ip: [10.100.0.9] x-disclaimer: USA Content-Type: multipart/signed; protocol="application/pkcs7-signature"; micalg=sha256; boundary="B_3558522011_964081441" MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2016-10-05_06:,, signatures=0 X-Proofpoint-Spam-Details: rule=russia_temp_notspam policy=russia_temp score=0 priorityscore=1501 malwarescore=0 suspectscore=0 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 impostorscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1609300000 definitions=main-1610050360 archived-at: Wed, 05 Oct 2016 21:20:28 -0000 --B_3558522011_964081441 Content-type: multipart/alternative; boundary="B_3558522011_1499913414" --B_3558522011_1499913414 Content-transfer-encoding: quoted-printable Content-Type: text/plain; charset="utf-8" If you set RF to 0, you can ignore my second sentence/paragraph. The third = still applies. =20 =20 From: Anubhav Kale Reply-To: "user@cassandra.apache.org" Date: Wednesday, October 5, 2016 at 1:56 PM To: "user@cassandra.apache.org" Subject: RE: Nodetool rebuild question =20 Thanks.=20 =20 We always set RF to 0 and then =E2=80=9Cremovenode=E2=80=9D all nodes in th= e DC that we want to decom. So, I highly doubt that is the problem. Plus, #= SSTables on a given node on average is ~2000 (we have 140 nodes in one ring= and two rings overall). =20 From: Jeff Jirsa [mailto:jeff.jirsa@crowdstrike.com]=20 Sent: Wednesday, October 5, 2016 1:44 PM To: user@cassandra.apache.org Subject: Re: Nodetool rebuild question =20 Both of your statements are true. =20 During your decom, you likely streamed LOTs of sstables to the remaining no= des (especially true if you didn=E2=80=99t drop the replication factor to 0= for the DC you decommissioned). Since those tens of thousands of sstables = take a while to compact, if you then rebuild (or bootstrap) before compacti= on is done, you=E2=80=99ll get a LOT of extra sstables. =20 This is one of the reasons that people with large clusters don=E2=80=99t us= e vnodes =E2=80=93 if you needed to bootstrap ~100 more nodes into a cluste= r, you=E2=80=99d have to wait potentially a day or more per node to compact= away the leftovers before bootstrapping the next, which is prohibitive at = scale.=20 =20 - Jeff =20 From: Anubhav Kale Reply-To: "user@cassandra.apache.org" Date: Wednesday, October 5, 2016 at 1:34 PM To: "user@cassandra.apache.org" Subject: Nodetool rebuild question =20 Hello, =20 As part of rebuild, I noticed that the destination node gets -tmp- files fr= om other nodes. Are following statements correct ? =20 1. The files are written to disk without going through memtables. 2. Regular compactors eventually compact them to bring down # SSTable= s to a reasonable number. =20 We have noticed that the destination node has created > 40K *Data* files in= first hour of streaming itself. We have not seen such pattern before, so t= rying to understand what could have changed. (We do use Vnodes and We haven= =E2=80=99t increased # nodes recently, but have decomm-ed a DC).=20 =20 Thanks much ! ____________________________________________________________________ CONFIDENTIALITY NOTE: This e-mail and any attachments are confidential and = may be legally privileged. If you are not the intended recipient, do not di= sclose, copy, distribute, or use this email or any attachments. If you have= received this in error please let the sender know and then delete the emai= l and all attachments. ____________________________________________________________________ CONFIDENTIALITY NOTE: This e-mail and any attachments are confidential and = may be legally privileged. If you are not the intended recipient, do not di= sclose, copy, distribute, or use this email or any attachments. If you have= received this in error please let the sender know and then delete the emai= l and all attachments. --B_3558522011_1499913414 Content-transfer-encoding: quoted-printable Content-Type: text/html; charset="utf-8"

If you set RF t= o 0, you can ignore my second sentence/paragraph. The third still applies.<= o:p>

 

 

From: Anubhav Kale <Anub= hav.Kale@microsoft.com>
Reply-To: "user@cassandra.apache.= org" <user@cassandra.apache.org>
Date: Wednesday, Octo= ber 5, 2016 at 1:56 PM
To: "user@cassandra.apache.org" = <user@cassandra.apache.org>
Subject: RE: Nodetool rebuild q= uestion

 

T= hanks.

 

We always set RF to 0 and then “removenode” all nodes= in the DC that we want to decom. So, I highly doubt that is the problem. P= lus, #SSTables on a given node on average is ~2000 (we have 140 nodes in on= e ring and two rings overall).

 

From: Jeff Jirsa [mai= lto:jeff.jirsa@crowdstrike.com]
Sent: Wednesday, October 5, 2016= 1:44 PM
To: user@cassandra.apache.org
Subject: Re: Nod= etool rebuild question

 = ;

Both of your statements are true.=

 

Du= ring your decom, you likely streamed LOTs of sstables to the remaining node= s (especially true if you didn’t drop the replication factor to 0 for= the DC you decommissioned). Since those tens of thousands of sstables take= a while to compact, if you then rebuild (or bootstrap) before compaction i= s done, you’ll get a LOT of extra sstables.

 

This is one of the reas= ons that people with large clusters don’t use vnodes – if you n= eeded to bootstrap ~100 more nodes into a cluster, you’d have to wait= potentially a day or more per node to compact away the leftovers before bo= otstrapping the next, which is prohibitive at scale.

 

- &n= bsp;        Jeff=

 

From: Anubhav Kale <Anubhav.Kale@microsoft.com>
Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org>= ;
Date: Wednesday, October 5, 2016 at 1:34 PM
To: "= ;user@cassandra.apache.org= " <user@cassandra.apac= he.org>
Subject: Nodetool rebuild question

 

H= ello,

 

As part of rebuild, I noticed that the destination node gets -tmp- = files from other nodes. Are following statements correct ?

 

1.&n= bsp;      The files are wr= itten to disk without going through memtables.

2.       Regular compactors eventually compact them to bring down # SSTables to a= reasonable number.

 

We have noticed that the destination node has created= > 40K *Data* files in first hour of streaming itself. We have no= t seen such pattern before, so trying to understand what could have changed= . (We do use Vnodes and We haven’t increased # nodes recently, but ha= ve decomm-ed a DC).

 

Thanks much !