Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
MIME-Version: 1.0
In-Reply-To: <CAPiyorWzQoWkJVE0g4QTzQrL-hGqMXgamkkBAqnmUobiY==xVw@mail.gmail.com>
References: <CAPiyorWzQoWkJVE0g4QTzQrL-hGqMXgamkkBAqnmUobiY==xVw@mail.gmail.com>
From: Alain RODRIGUEZ <arodrime@gmail.com>
Date: Fri, 20 May 2016 17:15:51 +0200
Message-ID: <CA+VSrLqq+tDr7wz=p9xwGb+89BFdy_jhUbmGKddQ0g=S=V0LPw@mail.gmail.com>
Subject: Re: on-disk size vs partition-size in cfhistograms
To: user@cassandra.apache.org
Content-Type: multipart/alternative; boundary=001a1143d6f2200f8705334792d0
archived-at: Fri, 20 May 2016 15:16:17 -0000

--001a1143d6f2200f8705334792d0
Content-Type: text/plain; charset=UTF-8

Hi Joseph,

The approach i took was to insert increasing number of rows into a replica
> of the table to sized,  watch the size of the "data" directory (after doing
> nodetool flush and compact), and calculate the average size per row (total
> directory size/count of rows). Can this be considered a valid approach to
> extrapolate for future growth of data ?


You also need to consider the replication factor you are going to use and
the percentage of the data this node you are looking at is owning.
Also, when you run "nodetool compact" you get the minimal possible size,
when in real conditions you probably never will never be in this state. If
you update the same row again and again, shards of the row will be spread
in multiple sstables, with more overhead. Plus if you plan to TTL data or
to delete some, you will always having some tombstones in there too, and
maybe for long depending on how you tune Cassandra and on you use case I
guess.

So I would say this approach is not very accurate. My guess is you will end
up using more space than you think. But it is also harder to do capacity
planning from nothing than from a working system.

It seems the size in cfhisto has a wide variation with the calculated value
> using the approach detailed above (avg 2KB/row). Could this difference be
> due to compression, or are there any other factors at play here?


It could be compression indeed. To check that, you need to dig into the
code. What Cassandra version are you planning to use? By the way, If disk
space matters to you as it seems to me, you might want to use Cassandra
3.0+: http://www.datastax.com/2015/12/storage-engine-30,
http://www.planetcassandra.org/blog/this-week-in-cassandra-3-0-storage-engine-deep-dive-3112016/,
http://thelastpickle.com/blog/2016/03/04/introductiont-to-the-apache-cassandra-3-storage-engine.html
.


> What would be the typical use/interpretation of the "partition size"
> metric.


I guess people use that to spot wide rows mainly, but if you are happy
summing those, it should be good as long as you know what you are summing.
Each Cassandra operator has his tips and own usage of the tools available
and might have a distinct way of performing operations depending on its
needs and own experience :-). So if it looks relevant to you, go ahead. For
example, if you find out that this is the data before compression, then
just applying the compression ratio to your sum should be good. Still take
care of my first point above.

C*heers,
-----------------------
Alain Rodriguez - alain@thelastpickle.com
France

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com

2016-05-06 13:27 GMT+02:00 Joseph Tech <jaalex.tech@gmail.com>:

> Hi,
>
> I am trying to get some baselines for capacity planning. The approach i
> took was to insert increasing number of rows into a replica of the table to
> sized,  watch the size of the "data" directory (after doing nodetool flush
> and compact), and calculate the average size per row (total directory
> size/count of rows). Can this be considered a valid approach to extrapolate
> for future growth of data ?
>
> Related to this, is there any information we can gather from
> partition-size of cfhistograms (snipped output for my table below) :
>
> Partition Size (bytes)
>    642 bytes: 221
>    770 bytes: 2328
>    924 bytes: 328858
> ..
> 8239 bytes: 153178
> ...
>  24601 bytes: 16973
>  29521 bytes: 10805
> ...
> 219342 bytes: 23
> 263210 bytes: 6
> 315852 bytes: 4
>
> It seems the size in cfhisto has a wide variation with the calculated
> value using the approach detailed above (avg 2KB/row). Could this
> difference be due to compression, or are there any other factors at play
> here? . What would be the typical use/interpretation of the "partition
> size" metric.
>
> The table definition is like :
>
> CREATE TABLE abc (
>   key1 text,
>   col1 text,
>   PRIMARY KEY ((key1))
> ) WITH
>   bloom_filter_fp_chance=0.010000 AND
>   caching='KEYS_ONLY' AND
>   comment='' AND
>   dclocal_read_repair_chance=0.100000 AND
>   gc_grace_seconds=864000 AND
>   index_interval=128 AND
>   read_repair_chance=0.000000 AND
>   replicate_on_write='true' AND
>   populate_io_cache_on_flush='false' AND
>   default_time_to_live=0 AND
>   speculative_retry='99.0PERCENTILE' AND
>   memtable_flush_period_in_ms=0 AND
>   compaction={'sstable_size_in_mb': '50', 'class':
> 'LeveledCompactionStrategy'} AND
>   compression={'sstable_compression': 'LZ4Compressor'};
>
> Thanks,
> Joseph
>
>
>
>
>

--001a1143d6f2200f8705334792d0
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">Hi Joseph,<div><br></div><blockquote class=3D"gmail_quote"=
 style=3D"margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:=
solid;border-left-color:rgb(204,204,204);padding-left:1ex"><span style=3D"f=
ont-size:12.8px">The approach i took was to insert increasing number of row=
s into a replica of the table to sized, =C2=A0watch the size of the &quot;d=
ata&quot; directory (after doing nodetool flush and compact), and calculate=
 the average size per row (total directory size/count of rows). Can this be=
 considered a valid approach to extrapolate for future growth of data ? =C2=
=A0</span></blockquote><div><br></div><div>You also need to consider the re=
plication factor you are going to use and the percentage of the data this n=
ode you are looking at is owning.</div><div>Also, when you run &quot;nodeto=
ol compact&quot; you get the minimal possible size, when in real conditions=
 you probably never will never be in this state. If you update the same row=
 again and again, shards of the row will be spread in multiple sstables, wi=
th more overhead. Plus if you plan to TTL data or to delete some, you will =
always having some tombstones in there too, and maybe for long depending on=
 how you tune Cassandra and on you use case I guess.</div><div><br></div><d=
iv>So I would say this approach is not very accurate. My guess is you will =
end up using more space than you think. But it is also harder to do capacit=
y planning from nothing than from a working system.</div><div><br></div><bl=
ockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-lef=
t-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padd=
ing-left:1ex"><span style=3D"font-size:12.8px">It seems the size in cfhisto=
 has a wide variation with the calculated value using the approach detailed=
 above (avg 2KB/row). Could this difference be due to compression, or are t=
here any other factors at play here?</span></blockquote><div><br></div><div=
>It could be compression indeed. To check that, you need to dig into the co=
de. What Cassandra version are you planning to use? By the way, If disk spa=
ce matters to you as it seems to me, you might want to use Cassandra 3.0+: =
<a href=3D"http://www.datastax.com/2015/12/storage-engine-30" target=3D"_bl=
ank">http://www.datastax.com/2015/12/storage-engine-30</a>, <a href=3D"http=
://www.planetcassandra.org/blog/this-week-in-cassandra-3-0-storage-engine-d=
eep-dive-3112016/" target=3D"_blank">http://www.planetcassandra.org/blog/th=
is-week-in-cassandra-3-0-storage-engine-deep-dive-3112016/</a>, <a href=3D"=
http://thelastpickle.com/blog/2016/03/04/introductiont-to-the-apache-cassan=
dra-3-storage-engine.html" target=3D"_blank">http://thelastpickle.com/blog/=
2016/03/04/introductiont-to-the-apache-cassandra-3-storage-engine.html</a>.=
</div><div>=C2=A0</div><blockquote class=3D"gmail_quote" style=3D"margin:0p=
x 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-c=
olor:rgb(204,204,204);padding-left:1ex"><span style=3D"font-size:12.8px">Wh=
at would be the typical use/interpretation of the &quot;partition size&quot=
; metric.</span></blockquote><div><br></div><div>I guess people use that to=
 spot wide rows mainly, but if you are happy summing those, it should be go=
od as long as you know what you are summing. Each Cassandra operator has hi=
s tips and own usage of the tools available and might have a distinct way o=
f performing operations depending on its needs and own experience :-). So i=
f it looks relevant to you, go ahead. For example, if you find out that thi=
s is the data before compression, then just applying the compression ratio =
to your sum should be good. Still take care of my first point above.</div><=
div><br></div><div>C*heers,</div><div><div>-----------------------</div><di=
v>Alain Rodriguez - <a href=3D"mailto:alain@thelastpickle.com" target=3D"_b=
lank">alain@thelastpickle.com</a></div><div>France</div><div><br></div><div=
>The Last Pickle - Apache Cassandra Consulting</div><div><a href=3D"http://=
www.thelastpickle.com" target=3D"_blank">http://www.thelastpickle.com</a></=
div></div><div class=3D"gmail_extra"><br><div class=3D"gmail_quote">2016-05=
-06 13:27 GMT+02:00 Joseph Tech <span dir=3D"ltr">&lt;<a href=3D"mailto:jaa=
lex.tech@gmail.com" target=3D"_blank">jaalex.tech@gmail.com</a>&gt;</span>:=
<br><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-lef=
t:1px #ccc solid;padding-left:1ex"><div dir=3D"ltr">Hi,<div><br></div><div>=
I am trying to get some baselines for capacity planning. The approach i too=
k was to insert increasing number of rows into a replica of the table to si=
zed, =C2=A0watch the size of the &quot;data&quot; directory (after doing no=
detool flush and compact), and calculate the average size per row (total di=
rectory size/count of rows). Can this be considered a valid approach to ext=
rapolate for future growth of data ? =C2=A0</div><div><br></div><div>Relate=
d to this, is there any information we can gather from partition-size of cf=
histograms (snipped output for my table below) :=C2=A0</div><div><br></div>=
<div><div>Partition Size (bytes)</div><div>=C2=A0 =C2=A0642 bytes: 221</div=
><div>=C2=A0 =C2=A0770 bytes: 2328</div><div>=C2=A0 =C2=A0924 bytes: 328858=
</div><div>..</div><div>8239 bytes: 153178<br></div><div>...</div><div>=C2=
=A024601 bytes: 16973<br></div><div>=C2=A029521 bytes: 10805</div><div>...<=
br></div><div>219342 bytes: 23</div><div>263210 bytes: 6</div><div>315852 b=
ytes: 4</div></div><div><br></div><div>It seems the size in cfhisto has a w=
ide variation with the calculated value using the approach detailed above (=
avg 2KB/row). Could this difference be due to compression, or are there any=
 other factors at play here? . What would be the typical use/interpretation=
 of the &quot;partition size&quot; metric.</div><div><br></div><div>The tab=
le definition is like :=C2=A0</div><div><br></div><div><div>CREATE TABLE ab=
c (</div><div>=C2=A0 key1 text,</div><div>=C2=A0 col1 text,</div><div>=C2=
=A0 PRIMARY KEY ((key1))</div><div>) WITH</div><div>=C2=A0 bloom_filter_fp_=
chance=3D0.010000 AND</div><div>=C2=A0 caching=3D&#39;KEYS_ONLY&#39; AND</d=
iv><div>=C2=A0 comment=3D&#39;&#39; AND</div><div>=C2=A0 dclocal_read_repai=
r_chance=3D0.100000 AND</div><div>=C2=A0 gc_grace_seconds=3D864000 AND</div=
><div>=C2=A0 index_interval=3D128 AND</div><div>=C2=A0 read_repair_chance=
=3D0.000000 AND</div><div>=C2=A0 replicate_on_write=3D&#39;true&#39; AND</d=
iv><div>=C2=A0 populate_io_cache_on_flush=3D&#39;false&#39; AND</div><div>=
=C2=A0 default_time_to_live=3D0 AND</div><div>=C2=A0 speculative_retry=3D&#=
39;99.0PERCENTILE&#39; AND</div><div>=C2=A0 memtable_flush_period_in_ms=3D0=
 AND</div><div>=C2=A0 compaction=3D{&#39;sstable_size_in_mb&#39;: &#39;50&#=
39;, &#39;class&#39;: &#39;LeveledCompactionStrategy&#39;} AND</div><div>=
=C2=A0 compression=3D{&#39;sstable_compression&#39;: &#39;LZ4Compressor&#39=
;};</div></div><div><br></div><div>Thanks,</div><div>Joseph</div><div><br><=
/div><div><br></div><div><br></div><div><br></div></div>
</blockquote></div><br></div></div>

--001a1143d6f2200f8705334792d0--