Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D832519B55 for ; Thu, 14 Apr 2016 14:36:55 +0000 (UTC) Received: (qmail 89278 invoked by uid 500); 14 Apr 2016 14:36:52 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 89235 invoked by uid 500); 14 Apr 2016 14:36:52 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 89223 invoked by uid 99); 14 Apr 2016 14:36:52 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 14 Apr 2016 14:36:52 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 11339C0E0E for ; Thu, 14 Apr 2016 14:36:52 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.179 X-Spam-Level: * X-Spam-Status: No, score=1.179 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd4-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id rgnkm0ILSt1K for ; Thu, 14 Apr 2016 14:36:49 +0000 (UTC) Received: from mail-vk0-f52.google.com (mail-vk0-f52.google.com [209.85.213.52]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id B16F55F239 for ; Thu, 14 Apr 2016 14:36:48 +0000 (UTC) Received: by mail-vk0-f52.google.com with SMTP id e185so111583172vkb.1 for ; Thu, 14 Apr 2016 07:36:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to; bh=6fk1HJVsTM2g35mZgA8Qs0JTGieyTZKdgrmvowEIWl0=; b=xqonFvrx199BXw6AZMkny6i3epnz+tmxxVrkBdqUlxCmv3wmFItuLvT0U72G4YLjSI BWbYJJzy0F9rusHl3fKAv0pPysC/Y+r8gHISKyjsSK8OawXdQgrFfzpbKC36Hma0OeH6 UadOgE1PFUUUzhDcbb0JDJN7bH1K303VgktsD/mSZ2Ii+LTzq4s/IwBlJhdfsIjHvWvw IYjwt5KgVrXCRM/LviMb/l27+OmYDlqsehyHWTkBDDbDYEO5mvohstt9UmKQ/hWd2nsZ GuK3O1L24AvBlQU6fQeEXH0GHqbrOs0DAI/ADZ9HQc5xuoznc3PK6NWvjo7vNer/mUmQ ShAQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to; bh=6fk1HJVsTM2g35mZgA8Qs0JTGieyTZKdgrmvowEIWl0=; b=Y7LOxsyFpy6TAu3w3mk6G4gUIVXH92aCBAs/E4BbXj/4V1wHj7NMQip5gk9mLqxYOG LGIdoTbyYR+lmee2RzpwybM1SD9aJQ3rWAp9c4Z4hv9cGQJtS7uVDaEcaCnES3xAy0z8 UtBPohK09iRCS+lrMFN2pCQ5mRRRoe0qVVNG+zc6MHQ+mAAwVanULy6MgmaJ1QDtmdyL HS3rLZ9FsOdCZlgzQWElhteW5SJyufYSUt1voeQlNJLdAy3cbZFORR1Bf+fy1kctQSbi gGtvjIJihX6x0GUAuzBdcoqwBnteqkRtOsy2ZaQtMUYnlBG4CLwk+uszrKkmpAJCqP4i aFWQ== X-Gm-Message-State: AOPr4FWAb7y9wJqAml4LLWqaCCkN6KHxRfyfaVlDUiQkSkF5D0086mTE/MaqRzzpff0wI35kwT8pZQQa8K8MKA== MIME-Version: 1.0 X-Received: by 10.31.45.143 with SMTP id t137mr7528347vkt.143.1460644602574; Thu, 14 Apr 2016 07:36:42 -0700 (PDT) Received: by 10.31.54.13 with HTTP; Thu, 14 Apr 2016 07:36:42 -0700 (PDT) In-Reply-To: References: <2246AFD2-4F3B-4843-94BD-746C35779027@flipagram.com> Date: Thu, 14 Apr 2016 10:36:42 -0400 Message-ID: Subject: Re: Cassandra 2.1.12 Node size From: Jack Krupansky To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=001a114304d6ae0173053072d21d --001a114304d6ae0173053072d21d Content-Type: text/plain; charset=UTF-8 The four criteria I would suggest for evaluating node size: 1. Query latency. 2. Query throughput/load 3. Repair time - worst case, full repair, what you can least afford if it happens at the worst time 4. Expected growth over the next six to 18 months - you don't what to be scrambling with latency, throughput, and repair problems when you bump into a wall on capacity. 20% to 30% is a fair number. Alas, it is very difficult to determine how much spare capacity you have, other than an artificial, synthetic load test: Try 30% more clients and queries with 30% more (synthetic) data and see what happens to query latency, total throughput, and repair time. Run such a test periodically (monthly) to get a heads-up when load is getting closer to a wall. Incremental repair is great to streamline and optimize your day-to-day operations, but focus attention on replacement of down nodes during times of stress. -- Jack Krupansky On Thu, Apr 14, 2016 at 10:14 AM, Alain RODRIGUEZ wrote: > Would adding nodes be the right way to start if I want to get the data per >> node down > > > Yes, if everything else is fine, the last and always available option to > reduce the disk size per node is to add new nodes. Sometimes it is the > first option considered as it is relatively quick and quite strait forward. > > Again, 50 % of free disk space is not a hard limit. To give you a rough > idea, if the biggest sstable is 100 GB big and you still have 400 GB free, > you will probably be good to go, excepted if 4 compaction of 100 GB trigger > at the same time, filling up the disk. > > Now is the good time to think of a plan to handle the growth for you, but > don't worry if data reaches 60%, it will probably not be a big deal. > > You can make sure that: > > - There are no snapshots, heap dumps or data not related with C* taking > some space > - The biggest sstables tombstone ratio are not too high (are tombstones > are correctly evicted ?) > - You are using compression (if you want too) > > Consider: > > - Adding TTLs to data you don't want to keep forever, shorten TTLs as much > as allowed. > - Migrating to C*3.0+ and take advantage of the new engine storage > > C*heers, > ----------------------- > Alain Rodriguez - alain@thelastpickle.com > France > > The Last Pickle - Apache Cassandra Consulting > http://www.thelastpickle.com > > > 2016-04-14 15:41 GMT+02:00 Aiman Parvaiz : > >> Thanks for the response Alain. I am using STCS and would like to take >> some action as we would be hitting 50% disk space pretty soon. Would adding >> nodes be the right way to start if I want to get the data per node down >> otherwise can you or someone on the list please suggest the right way to go >> about it. >> >> Thanks >> >> Sent from my iPhone >> >> On Apr 14, 2016, at 5:17 PM, Alain RODRIGUEZ wrote: >> >> Hi, >> >> I seek advice in data size per node. Each of my node has close to 1 TB of >>> data. I am not seeing any issues as of now but wanted to run it by you guys >>> if this data size is pushing the limits in any manner and if I should be >>> working on reducing data size per node. >> >> >> There is no real limit to the data size other than 50% of the machine >> disk space using STCS and 80 % if you are using LCS. Those are 'soft' >> limits as it will depend on your biggest sstables size and the number of >> concurrent compactions mainly, but to stay away from trouble, it is better >> to keep things under control, below the limits mentioned above. >> >> I will me migrating to incremental repairs shortly and full repair as of >>> now takes 20 hr/node. I am not seeing any issues with the nodes for now. >>> >> >> As you noticed, you need to keep in mind that the larger the dataset is, >> the longer operations will take. Repairs but also bootstrap or replace a >> node, remove a node, any operation that require to stream data or read it. >> Repair time can be mitigated by using incremental repairs indeed. >> >> I am running a 9 node C* 2.1.12 cluster. >>> >> >> It should be quite safe to give incremental repair a try as many bugs >> have been fixe in this version: >> >> FIX 2.1.12 - A lot of sstables using range repairs due to anticompaction >> - incremental only >> >> https://issues.apache.org/jira/browse/CASSANDRA-10422 >> >> FIX 2.1.12 - repair hang when replica is down - incremental only >> >> https://issues.apache.org/jira/browse/CASSANDRA-10288 >> >> If you are using DTCS be aware of >> https://issues.apache.org/jira/browse/CASSANDRA-11113 >> >> If using LCS, watch closely sstable and compactions pending counts. >> >> As a general comment, I would say that Cassandra has evolved to be able >> to handle huge datasets (memory structures off-heap + increase of heap size >> using G1GC, JBOD, vnodes, ...). Today Cassandra works just fine with big >> dataset. I have seen clusters with 4+ TB nodes and other using a few GB per >> node. It all depends on your requirements and your machines spec. If fast >> operations are absolutely necessary, keep it small. If you want to use the >> entire disk space (50/80% of total disk space max), go ahead as long as >> other resources are fine (CPU, memory, disk throughput, ...). >> >> C*heers, >> >> ----------------------- >> Alain Rodriguez - alain@thelastpickle.com >> France >> >> The Last Pickle - Apache Cassandra Consulting >> http://www.thelastpickle.com >> >> 2016-04-14 10:57 GMT+02:00 Aiman Parvaiz : >> >>> Hi all, >>> I am running a 9 node C* 2.1.12 cluster. I seek advice in data size per >>> node. Each of my node has close to 1 TB of data. I am not seeing any issues >>> as of now but wanted to run it by you guys if this data size is pushing the >>> limits in any manner and if I should be working on reducing data size per >>> node. I will me migrating to incremental repairs shortly and full repair as >>> of now takes 20 hr/node. I am not seeing any issues with the nodes for now. >>> >>> Thanks >>> >>> >>> >>> >> > --001a114304d6ae0173053072d21d Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
The four criteria I would suggest for evaluating node size= :

1. Query latency.
2. Query throughput/load
3. Repair time - worst case, full repair, what you can least affor= d if it happens at the worst time
4. Expected growth over the nex= t six to 18 months - you don't what to be scrambling with latency, thro= ughput, and repair problems when you bump into a wall on capacity. 20% to 3= 0% is a fair number.

Alas, it is very difficult to= determine how much spare capacity you have, other than an artificial, synt= hetic load test: Try 30% more clients and queries with 30% more (synthetic)= data and see what happens to query latency, total throughput, and repair t= ime. Run such a test periodically (monthly) to get a heads-up when load is = getting closer to a wall.

Incremental repair is gr= eat to streamline and optimize your day-to-day operations, but focus attent= ion on replacement of down nodes during times of stress.



-- Jack Krupansky
=

On Thu, Apr 14, 2016 at 10:14 AM, Alain RODR= IGUEZ <arodrime@gmail.com> wrote:
Would adding nodes be the right way to start if I want to= get the data per node down

Y= es, if everything else is fine, the last and always available option to red= uce the disk size per node is to add new nodes. Sometimes it is the first o= ption considered as it is relatively quick and quite strait forward.
<= div>
Again, 50 % of free disk space is not a hard limit. To g= ive you a rough idea, if the biggest sstable is 100 GB big and you still ha= ve 400 GB free, you will probably be good to go, excepted if 4 compaction o= f 100 GB trigger at the same time, filling up the disk.

Now is the good time to think of a plan to handle the growth for you,= but don't worry if data reaches 60%, it will probably not be a big dea= l.

You can make sure that:

- There are no snapshots, heap dumps or data not related with C* taking s= ome space
- The biggest sstables tombstone ratio are not too high= (are tombstones are correctly evicted ?)
- You are using compres= sion (if you want too)

Consider:
- Adding TTLs to data you don't want to keep forever, short= en TTLs as much as allowed.
- Migrating to C*3.0+ and take advant= age of the new engine storage

C*heers,
-----------------------
Alain Rodriguez = - alain@thelas= tpickle.com
France

The Last Pickle -= Apache Cassandra Consulting

2016-04-14 15:41 GMT+02:00 Aiman= Parvaiz <aiman@flipagram.com>:
Thanks for the response Alain. I am using ST= CS and would like to take some action as we would be hitting 50% disk space= pretty soon. Would adding nodes be the right way to start if I want to get= the data per node down otherwise can you or someone on the list please sug= gest the right way to go about it.

Thanks

S= ent from my iPhone

On Apr 14, 2016, at 5:17 PM, Ala= in RODRIGUEZ <ar= odrime@gmail.com> wrote:

Hi,
I seek advice in data size= per node. Each of my node has close to 1 TB of data. I am not seeing any i= ssues as of now but wanted to run it by you guys if this data size is pushi= ng the limits in any manner and if I should be working on reducing data siz= e per node.

Ther= e is no real limit to the data size other than 50% of the machine disk spac= e using STCS and 80 % if you are using LCS. Those are 'soft' limits= as it will depend on your biggest sstables size and the number of concurre= nt compactions mainly, but to stay away from trouble, it is better to keep = things under control, below the limits mentioned above.

I will me migrating to increme= ntal repairs shortly and full repair as of now takes 20 hr/node. I am not s= eeing any issues with the nodes for now.

As you noticed, you need to keep in mind that = the larger the dataset is, the longer operations will take. Repairs but also=C2=A0bootstrap or replac= e a node, remove a node, any operation that require to stream data or read = it. Repair time can be mitigated by using incremental repairs indeed.=C2=A0=

I am runni= ng a 9 node C* 2.1.12 cluster.

It should be quite safe to give incremental repair a try= as many bugs have been fixe in this version:

FIX 2.1.12 - A lot of = sstables using range repairs due to anticompaction -=C2=A0incremental only<= /font>

https://issue= s.apache.org/jira/browse/CASSANDRA-10422

FIX 2.1.12 - repair ha= ng when replica is down - incremental only

https://issue= s.apache.org/jira/browse/CASSANDRA-10288

If you are using DTCS be aware of https://issues.apache.org/jira/browse/CASSANDRA-11113

If using LCS, watch closely sstab= le and compactions pending counts.

As a general comment, I would say that Cassandra has evolv= ed to be able to handle huge datasets (memory structures off-heap + increas= e of heap size using G1GC, JBOD, vnodes, ...). Today Cassandra works just f= ine with big dataset. I have seen clusters with 4+ TB nodes and other using= a few GB per node. It all depends on your requirements and your machines s= pec. If fast operations are absolutely necessary, keep it small. If you wan= t to use the entire disk space (50/80% of total disk space max), go ahead a= s long as other resources are fine (CPU, memory, disk throughput, ...).

C*heers,

-----------------------
Alain Rodriguez = - alain@thelas= tpickle.com
France

The L= ast Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com




--001a114304d6ae0173053072d21d--