Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
MIME-Version: 1.0
In-Reply-To: <AM4PR10MB0130210B96CA9A000E6EEA57B5A10@AM4PR10MB0130.EURPRD10.PROD.OUTLOOK.COM>
References: <AM4PR10MB0130210B96CA9A000E6EEA57B5A10@AM4PR10MB0130.EURPRD10.PROD.OUTLOOK.COM>
From: Jens Rantil <jens.rantil@tink.se>
Date: Wed, 2 Nov 2016 10:43:30 -1000
Message-ID: <CAL+ArfX=1iB04UiqNLbbsrkDr_1Ks7bhvV-iHHtcXEhp_WPVhw@mail.gmail.com>
Subject: Re: Cassandra Poor Read Performance Response Time
To: Cassandra Group <user@cassandra.apache.org>
Content-Type: multipart/alternative; boundary=001a114b2d9ea11b160540577fa9
archived-at: Wed, 02 Nov 2016 20:44:00 -0000

--001a114b2d9ea11b160540577fa9
Content-Type: text/plain; charset=UTF-8

Hi,

I am by no means an expert on Cassandra, nor on
DateTieredCompactionStrategy. However, looking in "Query 2.xlsx" I see a
lot of

    Partition index with 0 entries found for sstable 186

To me, that looks like Cassandra is looking at a lot of sstables and
realize too late that they don't contain any relevant data. Are you using
TTLs when you write data? Do the TTLs vary? If they do, there's a risk
Cassandra will have to inspect a lot of tables that turns out to hold
expired data. Also, have you checked `nodetool cfstats` and bloom filter
false positives?

Does `nodetool cfhistograms` give you any insights? I'm mostly thinking in
terms of unbalanced partition keys.

Have you checked the logs for how long GC pauses are being taken?

Somewhat implementation specific: Would adjusting the time bucket to a
smaller time resolution be an option?

Also, since you are using DateTieredCompactionStrategy, have you considered
using a TIMESTAMP constraint[1]? That might help you a lot actually.

[1] https://issues.apache.org/jira/browse/CASSANDRA-5514

Cheers,
Jens

On Mon, Oct 31, 2016 at 11:10 PM, _ _ <rage39a@hotmail.com> wrote:

> Hi
>
> Currently i am running a cassandra cluster of 3 nodes (with it replicating
> to both nodes) and am experiencing poor performance, usually getting second
> response times when running queries when i am expecting/needing millisecond
> response times. Currently i have a table which looks like:
>
> CREATE TABLE tracker.all_ad_impressions_counter_1d (
>     time_bucket bigint,
>     ad_id text,
>     uc text,
>     count counter,
>     PRIMARY KEY ((time_bucket, ad_id), uc)
> ) WITH CLUSTERING ORDER BY (uc ASC)
>     AND bloom_filter_fp_chance = 0.01
>     AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
>     AND comment = ''
>     AND compaction = {'base_time_seconds': '3600', 'class':
> 'org.apache.cassandra.db.compaction.DateTieredCompactionStrategy',
> 'max_sstable_age_days': '30', 'max_threshold': '32', 'min_threshold': '4',
> 'timestamp_resolution': 'MILLISECONDS'}
>     AND compression = {'chunk_length_in_kb': '64', 'class': '
> org.apache.cassandra.io.compress.LZ4Compressor'}
>     AND crc_check_chance = 1.0
>     AND dclocal_read_repair_chance = 0.1
>     AND default_time_to_live = 0
>     AND gc_grace_seconds = 864000
>     AND max_index_interval = 2048
>     AND memtable_flush_period_in_ms = 0
>     AND min_index_interval = 128
>     AND read_repair_chance = 0.0
>     AND speculative_retry = '99PERCENTILE';
>
>
> and queries which look like:
>
>         SELECT
>             time_bucket,
>             uc,
>             count
>         FROM
>             all_ad_impressions_counter_1d
>
>         WHERE ad_id = ?
>             AND time_bucket = ?
>
> the cluster is running on servers with 16 GB RAM, and 4 CPU cores and 3
> 100GB datastores, the storage is not local and these VMs are being managed
> through openstack. There are roughly 200 million records being written per
> day (1 time_bucket) and maybe a few thousand records per partition
> (time_bucket, ad_id) at most. The amount of writes is not having a
> significant effect on our read performance as when writes are stopped, the
> read response time does not improve noticeably. I have attached a trace of
> one query i ran which took around 3 seconds which i would expect to take
> well below a second. I have also included the cassandra.yaml file and jvm
> options file. We do intend to change the storage to local storage and
> expect this will have a significant impact but i was wondering if there's
> anything else which could be changed which will also have a significant
> impact on read performance?
>
> Thanks
> Ian
>
>


-- 
Jens Rantil
Backend engineer
Tink AB

Email: jens.rantil@tink.se
Phone: +46 708 84 18 32
Web: www.tink.se

Facebook <https://www.facebook.com/#!/tink.se> Linkedin
<http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_photo&trkInfo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary>
 Twitter <https://twitter.com/tink>

--001a114b2d9ea11b160540577fa9
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">Hi,<div><br></div><div>I am by no means an expert on Cassa=
ndra, nor on DateTieredCompactionStrategy. However, looking in &quot;Query =
2.xlsx&quot; I see a lot of</div><div><br></div><div>=C2=A0 =C2=A0=C2=A0Par=
tition index with 0 entries found for sstable 186</div><div><br></div><div>=
To me, that looks like Cassandra is looking at a lot of sstables and realiz=
e too late that they don&#39;t contain any relevant data. Are you using TTL=
s when you write data? Do the TTLs vary? If they do, there&#39;s a risk Cas=
sandra will have to inspect a lot of tables that turns out to hold expired =
data. Also, have you checked `nodetool cfstats` and bloom filter false posi=
tives?</div><div><br></div><div>Does `nodetool cfhistograms` give you any i=
nsights? I&#39;m mostly thinking in terms of unbalanced partition keys.</di=
v><div><br></div><div>Have you checked the logs for how long GC pauses are =
being taken?</div><div><br></div><div>Somewhat implementation specific: Wou=
ld adjusting the time bucket to a smaller time resolution be an option?</di=
v><div><br></div><div>Also, since you are using DateTieredCompactionStrateg=
y, have you considered using a TIMESTAMP constraint[1]? That might help you=
 a lot actually.</div><div><br></div><div>[1]=C2=A0<a href=3D"https://issue=
s.apache.org/jira/browse/CASSANDRA-5514">https://issues.apache.org/jira/bro=
wse/CASSANDRA-5514</a></div><div><br></div><div>Cheers,</div><div>Jens</div=
></div><div class=3D"gmail_extra"><br><div class=3D"gmail_quote">On Mon, Oc=
t 31, 2016 at 11:10 PM, _ _ <span dir=3D"ltr">&lt;<a href=3D"mailto:rage39a=
@hotmail.com" target=3D"_blank">rage39a@hotmail.com</a>&gt;</span> wrote:<b=
r><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:=
1px #ccc solid;padding-left:1ex">


<div dir=3D"ltr">
<div id=3D"m_5930691484467543386divtagdefaultwrapper" style=3D"font-size:12=
pt;color:#000000;font-family:Calibri,Arial,Helvetica,sans-serif">
<p><span style=3D"font-size:12.8px;font-family:calibri,arial,helvetica,sans=
-serif,&quot;apple color emoji&quot;,&quot;segoe ui emoji&quot;,notocolorem=
oji,&quot;segoe ui symbol&quot;,&quot;android emoji&quot;,emojisymbols">Hi<=
/span></p>
<div style=3D"font-size:12.8px;font-family:calibri,arial,helvetica,sans-ser=
if,&quot;apple color emoji&quot;,&quot;segoe ui emoji&quot;,notocoloremoji,=
&quot;segoe ui symbol&quot;,&quot;android emoji&quot;,emojisymbols">
<br>
</div>
<div style=3D"font-size:12.8px;font-family:calibri,arial,helvetica,sans-ser=
if,&quot;apple color emoji&quot;,&quot;segoe ui emoji&quot;,notocoloremoji,=
&quot;segoe ui symbol&quot;,&quot;android emoji&quot;,emojisymbols">
Currently i am running a cassandra cluster of 3 nodes (with it replicating =
to both nodes)=C2=A0and am experiencing poor performance, usually getting s=
econd response times when running queries when=C2=A0i am=C2=A0expecting/nee=
ding millisecond response times. Currently=C2=A0i=C2=A0have
 a table which looks like:</div>
<div style=3D"font-family:arial,sans-serif;font-size:12.8px"><font size=3D"=
1" face=3D"monospace, monospace"><br>
</font></div>
<div style=3D"font-family:arial,sans-serif;font-size:12.8px">
<div><font size=3D"1" face=3D"monospace, monospace">CREATE TABLE tracker.al=
l_ad_impressions_cou<wbr>nter_1d (</font></div>
<div><font size=3D"1" face=3D"monospace, monospace">=C2=A0 =C2=A0 time_buck=
et bigint,</font></div>
<div><font size=3D"1" face=3D"monospace, monospace">=C2=A0 =C2=A0 ad_id tex=
t,</font></div>
<div><font size=3D"1" face=3D"monospace, monospace">=C2=A0 =C2=A0 uc text,<=
/font></div>
<div><font size=3D"1" face=3D"monospace, monospace">=C2=A0 =C2=A0 count cou=
nter,</font></div>
<div><font size=3D"1" face=3D"monospace, monospace">=C2=A0 =C2=A0 PRIMARY K=
EY ((time_bucket, ad_id), uc)</font></div>
<div><font size=3D"1" face=3D"monospace, monospace">) WITH CLUSTERING ORDER=
 BY (uc ASC)</font></div>
<div><font size=3D"1" face=3D"monospace, monospace">=C2=A0 =C2=A0 AND bloom=
_filter_fp_chance =3D 0.01</font></div>
<div><font size=3D"1" face=3D"monospace, monospace">=C2=A0 =C2=A0 AND cachi=
ng =3D {&#39;keys&#39;: &#39;ALL&#39;, &#39;rows_per_partition&#39;: &#39;N=
ONE&#39;}</font></div>
<div><font size=3D"1" face=3D"monospace, monospace">=C2=A0 =C2=A0 AND comme=
nt =3D &#39;&#39;</font></div>
<div><font size=3D"1" face=3D"monospace, monospace">=C2=A0 =C2=A0 AND compa=
ction =3D {&#39;base_time_seconds&#39;: &#39;3600&#39;, &#39;class&#39;: &#=
39;org.apache.cassandra.db.compa<wbr>ction.DateTieredCompactionStra<wbr>teg=
y&#39;, &#39;max_sstable_age_days&#39;: &#39;30&#39;, &#39;max_threshold=
9;: &#39;32&#39;, &#39;min_threshold&#39;: &#39;4&#39;,
 &#39;timestamp_resolution&#39;: &#39;MILLISECONDS&#39;}</font></div>
<div><font size=3D"1" face=3D"monospace, monospace">=C2=A0 =C2=A0 AND compr=
ession =3D {&#39;chunk_length_in_kb&#39;: &#39;64&#39;, &#39;class&#39;: &#=
39;<a href=3D"http://org.apache.cassandra.io" target=3D"_blank">org.apache.=
cassandra.io</a>.compr<wbr>ess.LZ4Compressor&#39;}</font></div>
<div><font size=3D"1" face=3D"monospace, monospace">=C2=A0 =C2=A0 AND crc_c=
heck_chance =3D 1.0</font></div>
<div><font size=3D"1" face=3D"monospace, monospace">=C2=A0 =C2=A0 AND dcloc=
al_read_repair_chance =3D 0.1</font></div>
<div><font size=3D"1" face=3D"monospace, monospace">=C2=A0 =C2=A0 AND defau=
lt_time_to_live =3D 0</font></div>
<div><font size=3D"1" face=3D"monospace, monospace">=C2=A0 =C2=A0 AND gc_gr=
ace_seconds =3D 864000</font></div>
<div><font size=3D"1" face=3D"monospace, monospace">=C2=A0 =C2=A0 AND max_i=
ndex_interval =3D 2048</font></div>
<div><font size=3D"1" face=3D"monospace, monospace">=C2=A0 =C2=A0 AND memta=
ble_flush_period_in_ms =3D 0</font></div>
<div><font size=3D"1" face=3D"monospace, monospace">=C2=A0 =C2=A0 AND min_i=
ndex_interval =3D 128</font></div>
<div><font size=3D"1" face=3D"monospace, monospace">=C2=A0 =C2=A0 AND read_=
repair_chance =3D 0.0</font></div>
<div><font size=3D"1" face=3D"monospace, monospace">=C2=A0 =C2=A0 AND specu=
lative_retry =3D &#39;99PERCENTILE&#39;;</font></div>
<br>
</div>
<div style=3D"font-size:12.8px;font-family:calibri,arial,helvetica,sans-ser=
if,&quot;apple color emoji&quot;,&quot;segoe ui emoji&quot;,notocoloremoji,=
&quot;segoe ui symbol&quot;,&quot;android emoji&quot;,emojisymbols">
<br>
</div>
<div style=3D"font-size:12.8px;font-family:calibri,arial,helvetica,sans-ser=
if,&quot;apple color emoji&quot;,&quot;segoe ui emoji&quot;,notocoloremoji,=
&quot;segoe ui symbol&quot;,&quot;android emoji&quot;,emojisymbols">
and queries which look like:</div>
<div style=3D"font-size:12.8px;font-family:calibri,arial,helvetica,sans-ser=
if,&quot;apple color emoji&quot;,&quot;segoe ui emoji&quot;,notocoloremoji,=
&quot;segoe ui symbol&quot;,&quot;android emoji&quot;,emojisymbols">
<br>
</div>
<div style=3D"font-family:arial,sans-serif;font-size:12.8px">
<div><font face=3D"calibri, arial, helvetica, sans-serif, apple color emoji=
, segoe ui emoji, notocoloremoji, segoe ui symbol, android emoji, emojisymb=
ols">=C2=A0 =C2=A0 =C2=A0 =C2=A0=C2=A0</font><font size=3D"1" face=3D"monos=
pace, monospace">SELECT</font></div>
<div><font size=3D"1" face=3D"monospace, monospace">=C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 time_bucket,</font></div>
<div><font size=3D"1" face=3D"monospace, monospace">=C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 uc,</font></div>
<div><font size=3D"1" face=3D"monospace, monospace">=C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 count</font></div>
<div><font size=3D"1" face=3D"monospace, monospace">=C2=A0 =C2=A0 =C2=A0 =
=C2=A0 FROM</font></div>
<div><font size=3D"1" face=3D"monospace, monospace">=C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 all_ad_impressions_counter_1d</font></div>
<div><font size=3D"1" face=3D"monospace, monospace"><br>
</font></div>
<div><font size=3D"1" face=3D"monospace, monospace">=C2=A0 =C2=A0 =C2=A0 =
=C2=A0 WHERE ad_id =3D ?</font></div>
<div><font size=3D"1" face=3D"monospace, monospace">=C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 AND time_bucket =3D ?</font></div>
<div style=3D"font-family:calibri,arial,helvetica,sans-serif,&quot;apple co=
lor emoji&quot;,&quot;segoe ui emoji&quot;,notocoloremoji,&quot;segoe ui sy=
mbol&quot;,&quot;android emoji&quot;,emojisymbols">
<br>
</div>
<div style=3D"font-family:calibri,arial,helvetica,sans-serif,&quot;apple co=
lor emoji&quot;,&quot;segoe ui emoji&quot;,notocoloremoji,&quot;segoe ui sy=
mbol&quot;,&quot;android emoji&quot;,emojisymbols">
the cluster is running on servers with 16 GB RAM, and 4 CPU cores and 3 100=
GB datastores, the storage is not local and these VMs are being managed thr=
ough openstack.=C2=A0There are=C2=A0roughly 200 million records=C2=A0being =
written per day (1 time_bucket) and maybe a few
 thousand records per partition (time_bucket,=C2=A0ad_id) at most. The amou=
nt of writes is not having a significant effect on our read performance as =
when writes are stopped, the read response time does not improve noticeably=
.=C2=A0I have attached a trace of one query
 i ran which took around 3 seconds which i would expect to take well below=
=C2=A0a second. I have also included the cassandra.yaml file and jvm option=
s file. We do intend to change the storage to local storage and expect this=
 will have a significant impact but i
 was wondering if there&#39;s anything else which could be changed which wi=
ll also have a significant impact on read performance?</div>
<div style=3D"font-family:calibri,arial,helvetica,sans-serif,&quot;apple co=
lor emoji&quot;,&quot;segoe ui emoji&quot;,notocoloremoji,&quot;segoe ui sy=
mbol&quot;,&quot;android emoji&quot;,emojisymbols">
<br>
</div>
<div style=3D"font-family:calibri,arial,helvetica,sans-serif,&quot;apple co=
lor emoji&quot;,&quot;segoe ui emoji&quot;,notocoloremoji,&quot;segoe ui sy=
mbol&quot;,&quot;android emoji&quot;,emojisymbols">
Thanks</div>
<div style=3D"font-family:calibri,arial,helvetica,sans-serif,&quot;apple co=
lor emoji&quot;,&quot;segoe ui emoji&quot;,notocoloremoji,&quot;segoe ui sy=
mbol&quot;,&quot;android emoji&quot;,emojisymbols">
Ian</div>
</div>
<br>
<p></p>
</div>
</div>

</blockquote></div><br><br clear=3D"all"><div><br></div>-- <br><div class=
=3D"gmail_signature" data-smartmail=3D"gmail_signature"><div dir=3D"ltr"><d=
iv>Jens Rantil</div><div>Backend engineer</div><div>Tink AB</div><div><br><=
/div><div>Email:=C2=A0<a href=3D"mailto:jens.rantil@tink.se" style=3D"color=
:rgb(17,85,204)" target=3D"_blank">jens.rantil@tink.se</a></div><div>Phone:=
 +46 708 84 18 32</div><div>Web:=C2=A0<a href=3D"http://www.tink.se/" style=
=3D"color:rgb(17,85,204)" target=3D"_blank">www.tink.se</a></div><div><br><=
/div><div><a href=3D"https://www.facebook.com/#!/tink.se" style=3D"color:rg=
b(17,85,204);font-family:arial;font-size:small" target=3D"_blank">Facebook<=
/a><span style=3D"font-family:arial;font-size:small">=C2=A0</span><a href=
=3D"http://www.linkedin.com/company/2735919?trk=3Dvsrp_companies_res_photo&=
amp;trkInfo=3DVSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A273591=
9%2CVSRPcmpt%3Aprimary" style=3D"color:rgb(17,85,204);font-family:arial;fon=
t-size:small" target=3D"_blank">Linkedin</a><span style=3D"font-family:aria=
l;font-size:small">=C2=A0</span><a href=3D"https://twitter.com/tink" style=
=3D"color:rgb(17,85,204);font-family:arial;font-size:small" target=3D"_blan=
k">Twitter</a></div></div></div>
</div>

--001a114b2d9ea11b160540577fa9--