Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (athena.apache.org: domain of mightye@gmail.com designates
 209.85.213.169 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CALC1DLduSCVg1Upfrd1ye3gM6CLxgBo3_gYKVW+79f6ZOxzXqw@mail.gmail.com>
References: 
 <CALC1DLde489CPSxVwoE0=m3x9N759E579Dj_KUdHjSJSAQusYQ@mail.gmail.com>
 <CAL+ArfUDvCjhKCKtq70H2BKdsbOvCoKpCcwwjyMfkH4sYeHxqA@mail.gmail.com>
 <CALC1DLduSCVg1Upfrd1ye3gM6CLxgBo3_gYKVW+79f6ZOxzXqw@mail.gmail.com>
From: Eric Stevens <mightye@gmail.com>
Date: Mon, 23 Mar 2015 12:53:05 -0600
Message-ID: 
 <CAORswtwXMrBmGj-3uVHkY8VPnVBSa=9zC+shRAdJTPN41nA1sA@mail.gmail.com>
Subject: Re: Really high read latency
To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Content-Type: multipart/alternative; boundary=bcaec511e1165ffc120511f92e5e

--bcaec511e1165ffc120511f92e5e
Content-Type: text/plain; charset=UTF-8

Enable tracing in cqlsh and see how many sstables are being lifted to
satisfy the query (are you repeatedly writing to the same partition
[row_time]) over time?).

Also watch for whether you're hitting a lot of tombstones (are you deleting
lots of values in the same partition over time?).

On Mon, Mar 23, 2015 at 4:01 AM, Dave Galbraith <david92galbraith@gmail.com>
wrote:

> Duncan: I'm thinking it might be something like that. I'm also seeing just
> a ton of garbage collection on the box, could it be pulling rows for all
> 100k attrs for a given row_time into memory since only row_time is the
> partition key?
>
> Jens: I'm not using EBS (although I used to until I read up on how useless
> it is). I'm not sure what constitutes proper paging but my client has a
> pretty small amount of available memory so I'm doing pages of size 5k using
> the C++ Datastax driver.
>
> Thanks for the replies!
>
> -Dave
>
> On Mon, Mar 23, 2015 at 2:00 AM, Jens Rantil <jens.rantil@tink.se> wrote:
>
>> Also, two control questions:
>>
>>    - Are you using EBS for data storage? It might introduce additional
>>    latencies.
>>    - Are you doing proper paging when querying the keyspace?
>>
>> Cheers,
>> Jens
>>
>> On Mon, Mar 23, 2015 at 5:56 AM, Dave Galbraith <
>> david92galbraith@gmail.com> wrote:
>>
>>> Hi! So I've got a table like this:
>>>
>>> CREATE TABLE "default".metrics (row_time int,attrs varchar,offset
>>> int,value double, PRIMARY KEY(row_time, attrs, offset)) WITH COMPACT
>>> STORAGE AND bloom_filter_fp_chance=0.01 AND caching='KEYS_ONLY' AND
>>> comment='' AND dclocal_read_repair_chance=0 AND gc_grace_seconds=864000 AND
>>> index_interval=128 AND read_repair_chance=1 AND replicate_on_write='true'
>>> AND populate_io_cache_on_flush='false' AND default_time_to_live=0 AND
>>> speculative_retry='NONE' AND memtable_flush_period_in_ms=0 AND
>>> compaction={'class':'DateTieredCompactionStrategy','timestamp_resolution':'MILLISECONDS'}
>>> AND compression={'sstable_compression':'LZ4Compressor'};
>>>
>>> and I'm running Cassandra on an EC2 m3.2xlarge out in the cloud, with 4
>>> GB of heap space. So it's timeseries data that I'm doing so I increment
>>> "row_time" each day, "attrs" is additional identifying information about
>>> each series, and "offset" is the number of milliseconds into the day for
>>> each data point. So for the past 5 days, I've been inserting 3k
>>> points/second distributed across 100k distinct "attrs"es. And now when I
>>> try to run queries on this data that look like
>>>
>>> "SELECT * FROM "default".metrics WHERE row_time = 5 AND attrs =
>>> 'potatoes_and_jam'"
>>>
>>> it takes an absurdly long time and sometimes just times out. I did
>>> "nodetool cftsats default" and here's what I get:
>>>
>>> Keyspace: default
>>>     Read Count: 59
>>>     Read Latency: 397.12523728813557 ms.
>>>     Write Count: 155128
>>>     Write Latency: 0.3675690719921613 ms.
>>>     Pending Flushes: 0
>>>         Table: metrics
>>>         SSTable count: 26
>>>         Space used (live): 35146349027
>>>         Space used (total): 35146349027
>>>         Space used by snapshots (total): 0
>>>         SSTable Compression Ratio: 0.10386468749216264
>>>         Memtable cell count: 141800
>>>         Memtable data size: 31071290
>>>         Memtable switch count: 41
>>>         Local read count: 59
>>>         Local read latency: 397.126 ms
>>>         Local write count: 155128
>>>         Local write latency: 0.368 ms
>>>         Pending flushes: 0
>>>         Bloom filter false positives: 0
>>>         Bloom filter false ratio: 0.00000
>>>         Bloom filter space used: 2856
>>>         Compacted partition minimum bytes: 104
>>>         Compacted partition maximum bytes: 36904729268
>>>         Compacted partition mean bytes: 986530969
>>>         Average live cells per slice (last five minutes):
>>> 501.66101694915255
>>>         Maximum live cells per slice (last five minutes): 502.0
>>>         Average tombstones per slice (last five minutes): 0.0
>>>         Maximum tombstones per slice (last five minutes): 0.0
>>>
>>> Ouch! 400ms of read latency, orders of magnitude higher than it has any
>>> right to be. How could this have happened? Is there something fundamentally
>>> broken about my data model? Thanks!
>>>
>>>
>>
>>
>> --
>> Jens Rantil
>> Backend engineer
>> Tink AB
>>
>> Email: jens.rantil@tink.se
>> Phone: +46 708 84 18 32
>> Web: www.tink.se
>>
>> Facebook <https://www.facebook.com/#!/tink.se> Linkedin
>> <http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_photo&trkInfo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary>
>>  Twitter <https://twitter.com/tink>
>>
>
>

--bcaec511e1165ffc120511f92e5e
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">Enable tracing in cqlsh and see how many sstables are bein=
g lifted to satisfy the query (are you repeatedly writing to the same parti=
tion [row_time]) over time?).<div><br></div><div>Also watch for whether you=
&#39;re hitting a lot of tombstones (are you deleting lots of values in the=
 same partition over time?).</div></div><div class=3D"gmail_extra"><br><div=
 class=3D"gmail_quote">On Mon, Mar 23, 2015 at 4:01 AM, Dave Galbraith <spa=
n dir=3D"ltr">&lt;<a href=3D"mailto:david92galbraith@gmail.com" target=3D"_=
blank">david92galbraith@gmail.com</a>&gt;</span> wrote:<br><blockquote clas=
s=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;pad=
ding-left:1ex"><div dir=3D"ltr"><div><div>Duncan: I&#39;m thinking it might=
 be something like that. I&#39;m also seeing just a ton of garbage collecti=
on on the box, could it be pulling rows for all 100k attrs for a given row_=
time into memory since only row_time is the partition key?<br><br></div>Jen=
s: I&#39;m not using EBS (although I used to until I read up on how useless=
 it is). I&#39;m not sure what constitutes proper paging but my client has =
a pretty small amount of available memory so I&#39;m doing pages of size 5k=
 using the C++ Datastax driver.<br><br></div><div>Thanks for the replies!<s=
pan class=3D"HOEnZb"><font color=3D"#888888"><br></font></span></div><span =
class=3D"HOEnZb"><font color=3D"#888888"><div><br></div>-Dave<br></font></s=
pan></div><div class=3D"HOEnZb"><div class=3D"h5"><div class=3D"gmail_extra=
"><br><div class=3D"gmail_quote">On Mon, Mar 23, 2015 at 2:00 AM, Jens Rant=
il <span dir=3D"ltr">&lt;<a href=3D"mailto:jens.rantil@tink.se" target=3D"_=
blank">jens.rantil@tink.se</a>&gt;</span> wrote:<br><blockquote class=3D"gm=
ail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-le=
ft:1ex"><div dir=3D"ltr">Also, two control questions:<div><ul><li>Are you u=
sing EBS for data storage? It might introduce additional latencies.</li><li=
>Are you doing proper paging when querying the keyspace?</li></ul><div>Chee=
rs,</div></div><div>Jens</div></div><div class=3D"gmail_extra"><div><div><b=
r><div class=3D"gmail_quote">On Mon, Mar 23, 2015 at 5:56 AM, Dave Galbrait=
h <span dir=3D"ltr">&lt;<a href=3D"mailto:david92galbraith@gmail.com" targe=
t=3D"_blank">david92galbraith@gmail.com</a>&gt;</span> wrote:<br><blockquot=
e class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc sol=
id;padding-left:1ex"><div dir=3D"ltr">Hi! So I&#39;ve got a table like this=
:<br><br>CREATE TABLE &quot;default&quot;.metrics (row_time int,attrs varch=
ar,offset int,value double, PRIMARY KEY(row_time, attrs, offset)) WITH COMP=
ACT STORAGE AND bloom_filter_fp_chance=3D0.01 AND caching=3D&#39;KEYS_ONLY&=
#39; AND comment=3D&#39;&#39; AND dclocal_read_repair_chance=3D0 AND gc_gra=
ce_seconds=3D864000 AND index_interval=3D128 AND read_repair_chance=3D1 AND=
 replicate_on_write=3D&#39;true&#39; AND populate_io_cache_on_flush=3D&#39;=
false&#39; AND default_time_to_live=3D0 AND speculative_retry=3D&#39;NONE&#=
39; AND memtable_flush_period_in_ms=3D0 AND compaction=3D{&#39;class&#39;:&=
#39;DateTieredCompactionStrategy&#39;,&#39;timestamp_resolution&#39;:&#39;M=
ILLISECONDS&#39;} AND compression=3D{&#39;sstable_compression&#39;:&#39;LZ4=
Compressor&#39;};<br><br>and I&#39;m running Cassandra on an EC2 m3.2xlarge=
 out in the cloud, with 4 GB of heap space. So it&#39;s timeseries data tha=
t I&#39;m doing so I increment &quot;row_time&quot; each day, &quot;attrs&q=
uot; is additional identifying information about each series, and &quot;off=
set&quot; is the number of milliseconds into the day for each data point. S=
o for the past 5 days, I&#39;ve been inserting 3k points/second distributed=
 across 100k distinct &quot;attrs&quot;es. And now when I try to run querie=
s on this data that look like<br><br>&quot;SELECT * FROM &quot;default&quot=
;.metrics WHERE row_time =3D 5 AND attrs =3D &#39;potatoes_and_jam&#39;&quo=
t;<br><br>it takes an absurdly long time and sometimes just times out. I di=
d &quot;nodetool cftsats default&quot; and here&#39;s what I get:<br><br>Ke=
yspace: default<br>=C2=A0=C2=A0=C2=A0 Read Count: 59<br>=C2=A0=C2=A0=C2=A0 =
Read Latency: 397.12523728813557 ms.<br>=C2=A0=C2=A0=C2=A0 Write Count: 155=
128<br>=C2=A0=C2=A0=C2=A0 Write Latency: 0.3675690719921613 ms.<br>=C2=A0=
=C2=A0=C2=A0 Pending Flushes: 0<br>=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0 Table: metrics<br>=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 SSTable co=
unt: 26<br>=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 Space used (live): 35=
146349027<br>=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 Space used (total):=
 35146349027<br>=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 Space used by sn=
apshots (total): 0<br>=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 SSTable Co=
mpression Ratio: 0.10386468749216264<br>=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0 Memtable cell count: 141800<br>=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0 Memtable data size: 31071290<br>=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0 Memtable switch count: 41<br>=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0 Local read count: 59<br>=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 L=
ocal read latency: 397.126 ms<br>=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
 Local write count: 155128<br>=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 Lo=
cal write latency: 0.368 ms<br>=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 P=
ending flushes: 0<br>=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 Bloom filte=
r false positives: 0<br>=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 Bloom fi=
lter false ratio: 0.00000<br>=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 Blo=
om filter space used: 2856<br>=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 Co=
mpacted partition minimum bytes: 104<br>=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0 Compacted partition maximum bytes: 36904729268<br>=C2=A0=C2=A0=C2=
=A0=C2=A0=C2=A0=C2=A0=C2=A0 Compacted partition mean bytes: 986530969<br>=
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 Average live cells per slice (la=
st five minutes): 501.66101694915255<br>=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0 Maximum live cells per slice (last five minutes): 502.0<br>=C2=A0=
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 Average tombstones per slice (last fiv=
e minutes): 0.0<br>=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 Maximum tombs=
tones per slice (last five minutes): 0.0<br><br>Ouch! 400ms of read latency=
, orders of magnitude higher than it has any right to be. How could this ha=
ve happened? Is there something fundamentally broken about my data model? T=
hanks!<br><br></div>
</blockquote></div><br><br clear=3D"all"><div><br></div></div></div><span><=
font color=3D"#888888">-- <br><div><div dir=3D"ltr"><div>Jens Rantil</div><=
div>Backend engineer</div><div>Tink AB</div><div><br></div><div>Email:=C2=
=A0<a href=3D"mailto:jens.rantil@tink.se" style=3D"color:rgb(17,85,204)" ta=
rget=3D"_blank">jens.rantil@tink.se</a></div><div>Phone: +46 708 84 18 32</=
div><div>Web:=C2=A0<a href=3D"http://www.tink.se/" style=3D"color:rgb(17,85=
,204)" target=3D"_blank">www.tink.se</a></div><div><br></div><div><a href=
=3D"https://www.facebook.com/#!/tink.se" style=3D"color:rgb(17,85,204);font=
-family:arial;font-size:small" target=3D"_blank">Facebook</a><span style=3D=
"font-family:arial;font-size:small">=C2=A0</span><a href=3D"http://www.link=
edin.com/company/2735919?trk=3Dvsrp_companies_res_photo&amp;trkInfo=3DVSRPs=
earchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprim=
ary" style=3D"color:rgb(17,85,204);font-family:arial;font-size:small" targe=
t=3D"_blank">Linkedin</a><span style=3D"font-family:arial;font-size:small">=
=C2=A0</span><a href=3D"https://twitter.com/tink" style=3D"color:rgb(17,85,=
204);font-family:arial;font-size:small" target=3D"_blank">Twitter</a></div>=
</div></div>
</font></span></div>
</blockquote></div><br></div>
</div></div></blockquote></div><br></div>

--bcaec511e1165ffc120511f92e5e--