Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id AAD3B2009F3 for ; Fri, 20 May 2016 17:16:17 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id A9498160A0E; Fri, 20 May 2016 15:16:17 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id A689B1609B1 for ; Fri, 20 May 2016 17:16:16 +0200 (CEST) Received: (qmail 12262 invoked by uid 500); 20 May 2016 15:16:15 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 12252 invoked by uid 99); 20 May 2016 15:16:15 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 20 May 2016 15:16:15 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id C0545C201C for ; Fri, 20 May 2016 15:16:14 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.179 X-Spam-Level: * X-Spam-Status: No, score=1.179 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id 7i_Zrn2X8bJY for ; Fri, 20 May 2016 15:16:12 +0000 (UTC) Received: from mail-vk0-f53.google.com (mail-vk0-f53.google.com [209.85.213.53]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id E84895FBD2 for ; Fri, 20 May 2016 15:16:11 +0000 (UTC) Received: by mail-vk0-f53.google.com with SMTP id c189so147586571vkb.1 for ; Fri, 20 May 2016 08:16:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=CCYwD+vKoC9L/FDznUuOe83onyT4OkLOlds01grvAoQ=; b=Kq12mQW/Zmwv88OLLZlJ609K++PjCG1eSj2Td4oGQGU70dcd5MGeWrfsXIGGRc7URC wgyDiP4oLvOFcTW9sV9ZqFQ9eWG69yXolKu/x6u3yg0a/nBdOGr1kEiWe7MhNHpT+jaW Q4rKznIYtAqFkJ8fXTm3EYQWqIIau68gH92UDz2PTgTNt+DlK1rF/xa+wxueuSpf5hVM AEdX7fW6Ia0odXXxBQNcIbFqsTKklh7AjV7gb0hLtm8PH1eJbVlM3mboAw1MivTuB5/Y 0egCcjDCGHj/4nuuOlPWAZBLtAAzs3lyhIAuQlpHvlHYmmuXmaOHSghp8Ry4wL8tdMfb jkvw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=CCYwD+vKoC9L/FDznUuOe83onyT4OkLOlds01grvAoQ=; b=XHM3++tQFrCn0Rd1/PP8PyGYSwKSVj836Dfuo6oZySw0K0FAd7T2u0wwD3kPqx8DcQ tmdi+CGX09hR2iutXE/pPoa0hXYOR1m6lCCjeNYv3rtnHq5tVjQDTyYIDLJxIsR99aSU lTLlxFCnSd86oenMqIeP6LZvxPK4Y1o+QsMsNcFRO/rMt8nPMQgcf/dDTxmawSYnEUL/ jj5HzCF6tnZwZNARr5NKksyRuTzwgv7ZU3++UeR1FIVz097lnz3EFX6t5eNAKymLj3vb r+hbBkHa4Kf+xxuMqJN/UxJROOGWV+IrkBXl2Kz6wAldzX5yJZ5+6diDEJoMG2+c8LYT gGnQ== X-Gm-Message-State: AOPr4FXhgYy7rCIpxpC+QEiybMlMs1mB1JNdt5zzqNJ9yinKCSM/Shqp6ERm63osv/sYLswQ/sD3R3m1sWo2/Q== X-Received: by 10.31.6.83 with SMTP id 80mr1979912vkg.33.1463757370820; Fri, 20 May 2016 08:16:10 -0700 (PDT) MIME-Version: 1.0 Received: by 10.159.35.167 with HTTP; Fri, 20 May 2016 08:15:51 -0700 (PDT) In-Reply-To: References: From: Alain RODRIGUEZ Date: Fri, 20 May 2016 17:15:51 +0200 Message-ID: Subject: Re: on-disk size vs partition-size in cfhistograms To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=001a1143d6f2200f8705334792d0 archived-at: Fri, 20 May 2016 15:16:17 -0000 --001a1143d6f2200f8705334792d0 Content-Type: text/plain; charset=UTF-8 Hi Joseph, The approach i took was to insert increasing number of rows into a replica > of the table to sized, watch the size of the "data" directory (after doing > nodetool flush and compact), and calculate the average size per row (total > directory size/count of rows). Can this be considered a valid approach to > extrapolate for future growth of data ? You also need to consider the replication factor you are going to use and the percentage of the data this node you are looking at is owning. Also, when you run "nodetool compact" you get the minimal possible size, when in real conditions you probably never will never be in this state. If you update the same row again and again, shards of the row will be spread in multiple sstables, with more overhead. Plus if you plan to TTL data or to delete some, you will always having some tombstones in there too, and maybe for long depending on how you tune Cassandra and on you use case I guess. So I would say this approach is not very accurate. My guess is you will end up using more space than you think. But it is also harder to do capacity planning from nothing than from a working system. It seems the size in cfhisto has a wide variation with the calculated value > using the approach detailed above (avg 2KB/row). Could this difference be > due to compression, or are there any other factors at play here? It could be compression indeed. To check that, you need to dig into the code. What Cassandra version are you planning to use? By the way, If disk space matters to you as it seems to me, you might want to use Cassandra 3.0+: http://www.datastax.com/2015/12/storage-engine-30, http://www.planetcassandra.org/blog/this-week-in-cassandra-3-0-storage-engine-deep-dive-3112016/, http://thelastpickle.com/blog/2016/03/04/introductiont-to-the-apache-cassandra-3-storage-engine.html . > What would be the typical use/interpretation of the "partition size" > metric. I guess people use that to spot wide rows mainly, but if you are happy summing those, it should be good as long as you know what you are summing. Each Cassandra operator has his tips and own usage of the tools available and might have a distinct way of performing operations depending on its needs and own experience :-). So if it looks relevant to you, go ahead. For example, if you find out that this is the data before compression, then just applying the compression ratio to your sum should be good. Still take care of my first point above. C*heers, ----------------------- Alain Rodriguez - alain@thelastpickle.com France The Last Pickle - Apache Cassandra Consulting http://www.thelastpickle.com 2016-05-06 13:27 GMT+02:00 Joseph Tech : > Hi, > > I am trying to get some baselines for capacity planning. The approach i > took was to insert increasing number of rows into a replica of the table to > sized, watch the size of the "data" directory (after doing nodetool flush > and compact), and calculate the average size per row (total directory > size/count of rows). Can this be considered a valid approach to extrapolate > for future growth of data ? > > Related to this, is there any information we can gather from > partition-size of cfhistograms (snipped output for my table below) : > > Partition Size (bytes) > 642 bytes: 221 > 770 bytes: 2328 > 924 bytes: 328858 > .. > 8239 bytes: 153178 > ... > 24601 bytes: 16973 > 29521 bytes: 10805 > ... > 219342 bytes: 23 > 263210 bytes: 6 > 315852 bytes: 4 > > It seems the size in cfhisto has a wide variation with the calculated > value using the approach detailed above (avg 2KB/row). Could this > difference be due to compression, or are there any other factors at play > here? . What would be the typical use/interpretation of the "partition > size" metric. > > The table definition is like : > > CREATE TABLE abc ( > key1 text, > col1 text, > PRIMARY KEY ((key1)) > ) WITH > bloom_filter_fp_chance=0.010000 AND > caching='KEYS_ONLY' AND > comment='' AND > dclocal_read_repair_chance=0.100000 AND > gc_grace_seconds=864000 AND > index_interval=128 AND > read_repair_chance=0.000000 AND > replicate_on_write='true' AND > populate_io_cache_on_flush='false' AND > default_time_to_live=0 AND > speculative_retry='99.0PERCENTILE' AND > memtable_flush_period_in_ms=0 AND > compaction={'sstable_size_in_mb': '50', 'class': > 'LeveledCompactionStrategy'} AND > compression={'sstable_compression': 'LZ4Compressor'}; > > Thanks, > Joseph > > > > > --001a1143d6f2200f8705334792d0 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Hi Joseph,

The approach i took was to insert increasing number of row= s into a replica of the table to sized, =C2=A0watch the size of the "d= ata" directory (after doing nodetool flush and compact), and calculate= the average size per row (total directory size/count of rows). Can this be= considered a valid approach to extrapolate for future growth of data ? =C2= =A0

You also need to consider the re= plication factor you are going to use and the percentage of the data this n= ode you are looking at is owning.
Also, when you run "nodeto= ol compact" you get the minimal possible size, when in real conditions= you probably never will never be in this state. If you update the same row= again and again, shards of the row will be spread in multiple sstables, wi= th more overhead. Plus if you plan to TTL data or to delete some, you will = always having some tombstones in there too, and maybe for long depending on= how you tune Cassandra and on you use case I guess.

So I would say this approach is not very accurate. My guess is you will = end up using more space than you think. But it is also harder to do capacit= y planning from nothing than from a working system.

It seems the size in cfhisto= has a wide variation with the calculated value using the approach detailed= above (avg 2KB/row). Could this difference be due to compression, or are t= here any other factors at play here?

It could be compression indeed. To check that, you need to dig into the co= de. What Cassandra version are you planning to use? By the way, If disk spa= ce matters to you as it seems to me, you might want to use Cassandra 3.0+: = http://www.datastax.com/2015/12/storage-engine-30, http://www.planetcassandra.org/blog/th= is-week-in-cassandra-3-0-storage-engine-deep-dive-3112016/, http://thelastpickle.com/blog/= 2016/03/04/introductiont-to-the-apache-cassandra-3-storage-engine.html.=
=C2=A0
Wh= at would be the typical use/interpretation of the "partition size"= ; metric.

I guess people use that to= spot wide rows mainly, but if you are happy summing those, it should be go= od as long as you know what you are summing. Each Cassandra operator has hi= s tips and own usage of the tools available and might have a distinct way o= f performing operations depending on its needs and own experience :-). So i= f it looks relevant to you, go ahead. For example, if you find out that thi= s is the data before compression, then just applying the compression ratio = to your sum should be good. Still take care of my first point above.
<= div>
C*heers,
-----------------------
Alain Rodriguez - alain@thelastpickle.com
France

The Last Pickle - Apache Cassandra Consulting

2016-05= -06 13:27 GMT+02:00 Joseph Tech <jaalex.tech@gmail.com>:=
Hi,

= I am trying to get some baselines for capacity planning. The approach i too= k was to insert increasing number of rows into a replica of the table to si= zed, =C2=A0watch the size of the "data" directory (after doing no= detool flush and compact), and calculate the average size per row (total di= rectory size/count of rows). Can this be considered a valid approach to ext= rapolate for future growth of data ? =C2=A0

Relate= d to this, is there any information we can gather from partition-size of cf= histograms (snipped output for my table below) :=C2=A0

=
Partition Size (bytes)
=C2=A0 =C2=A0642 bytes: 221
=C2=A0 =C2=A0770 bytes: 2328
=C2=A0 =C2=A0924 bytes: 328858=
..
8239 bytes: 153178
...
=C2= =A024601 bytes: 16973
=C2=A029521 bytes: 10805
...<= br>
219342 bytes: 23
263210 bytes: 6
315852 b= ytes: 4

It seems the size in cfhisto has a w= ide variation with the calculated value using the approach detailed above (= avg 2KB/row). Could this difference be due to compression, or are there any= other factors at play here? . What would be the typical use/interpretation= of the "partition size" metric.

The tab= le definition is like :=C2=A0

CREATE TABLE ab= c (
=C2=A0 key1 text,
=C2=A0 col1 text,
=C2= =A0 PRIMARY KEY ((key1))
) WITH
=C2=A0 bloom_filter_fp_= chance=3D0.010000 AND
=C2=A0 caching=3D'KEYS_ONLY' AND
=C2=A0 comment=3D'' AND
=C2=A0 dclocal_read_repai= r_chance=3D0.100000 AND
=C2=A0 gc_grace_seconds=3D864000 AND
=C2=A0 index_interval=3D128 AND
=C2=A0 read_repair_chance= =3D0.000000 AND
=C2=A0 replicate_on_write=3D'true' AND
=C2=A0 populate_io_cache_on_flush=3D'false' AND
= =C2=A0 default_time_to_live=3D0 AND
=C2=A0 speculative_retry=3D&#= 39;99.0PERCENTILE' AND
=C2=A0 memtable_flush_period_in_ms=3D0= AND
=C2=A0 compaction=3D{'sstable_size_in_mb': '50&#= 39;, 'class': 'LeveledCompactionStrategy'} AND
= =C2=A0 compression=3D{'sstable_compression': 'LZ4Compressor'= ;};

Thanks,
Joseph

<= /div>




--001a1143d6f2200f8705334792d0--