Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
MIME-Version: 1.0
In-Reply-To: <1313955902.1368282.1473086509708@mail.yahoo.com>
References: <CAPiyorXgrN0p9pHfs1EnLNbkTWO4t9CJiQrDvz8U1MsQC=QjYw@mail.gmail.com>
 <CAGY8e-7khBbPTR=bbset4czgT25azAd==YOmyQJ2pikNd99UmA@mail.gmail.com>
 <CAPiyorU7Bt_FA=3nXVteDAQcTj5JVWAcfv-2feqcxBRXhpirmw@mail.gmail.com>
 <CAG0vsSKFV9KN+NPbPC9HVmdpQpMK=XRRTUM4j5zC15tfJg=MdA@mail.gmail.com>
 <CAPiyorW1MfNDD5J45sd8gV=cOOf6ZT8Yi0AgCo8NERibVcANaw@mail.gmail.com>
 <CAGY8e-4-ZmFLFT59HJYYQH7e971bByCiO+HzwMEjUtAzU33WvQ@mail.gmail.com>
 <CAPiyorUWrcHQc-HMkWamPRLrJjZEYtECa7TOfWY_3VfvezDpFA@mail.gmail.com>
 <7E0D86054CA325B1.1DAF7965-ADCB-4607-9396-50AC3926F952@mail.outlook.com>
 <CAPiyorWnHAnEzyxL3K0K_vjRb0OH7T3zfnmp5qN2KP7+6TDV3A@mail.gmail.com>
 <1114137226.1184009.1473078942507@mail.yahoo.com> <CAPiyorXsm81MJ-eZxYLPXJ576jZTkDws10MWNuvYXE+SLq09Zw@mail.gmail.com>
 <1313955902.1368282.1473086509708@mail.yahoo.com>
From: Anshu Vajpayee <anshu.vajpayee@gmail.com>
Date: Mon, 5 Sep 2016 22:58:01 +0530
Message-ID: <CAEcKvjaywFpuPxXGvse+OUt37cHRdQ4w89_j+Vv_ET=84M=Fxw@mail.gmail.com>
Subject: Re: Read timeouts on primary key queries
To: user@cassandra.apache.org, Romain Hardouin <romainh_ml@yahoo.fr>
Content-Type: multipart/alternative; boundary=001a114c5e648f113c053bc60084
archived-at: Mon, 05 Sep 2016 17:28:16 -0000

--001a114c5e648f113c053bc60084
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

We have seen read time out issue in cassandra due to high droppable
tombstone ratio for repository.

Please check for high droppable tombstone ratio for your repo.

On Mon, Sep 5, 2016 at 8:11 PM, Romain Hardouin <romainh_ml@yahoo.fr> wrote=
:

> Yes dclocal_read_repair_chance will reduce the cross-DC traffic and
> latency, so you can swap the values ( https://issues.apache.org/
> jira/browse/CASSANDRA-7320 ). I guess the sstable_size_in_mb was set to
> 50 because back in the day (C* 1.0) the default size was way too small: 5
> MB. So maybe someone in your company tried "10 * the default" i.e. 50 MB.
> Now the default is 160 MB. I don't say to change the value but just keep =
in
> mind that you're using a small value here, it could help you someday.
>
> Regarding the cells, the histograms shows an *estimation* of the min, p50=
,
> ..., p99, max of cells based on SSTables metadata. On your screenshot, th=
e
> Max is 4768. So you have a partition key with ~ 4768 cells. The p99 is
> 1109, so 99% of your partition keys have less than (or equal to) 1109
> cells.
> You can see these data of a given sstable with the tool sstablemetadata.
>
> Best,
>
> Romain
>
>
>
> Le Lundi 5 septembre 2016 15h17, Joseph Tech <jaalex.tech@gmail.com> a
> =C3=A9crit :
>
>
> Thanks, Romain . We will try to enable the DEBUG logging (assuming it
> won't clog the logs much) . Regarding the table configs, read_repair_chan=
ce
> must be carried over from older versions - mostly defaults. I think sstab=
le_size_in_mb
> was set to limit the max SSTable size, though i am not sure on the reason
> for the 50 MB value.
>
> Does setting dclocal_read_repair_chance help in reducing cross-DC traffic
> (haven't looked into this parameter, just going by the name).
>
> By the cell count definition : is it incremented based on the number of
> writes for a given name(key?) and value. This table is heavy on reads and
> writes. If so, the value should be much higher?
>
> On Mon, Sep 5, 2016 at 7:35 AM, Romain Hardouin <romainh_ml@yahoo.fr>
> wrote:
>
> Hi,
>
> Try to put org.apache.cassandra.db. ConsistencyLevel at DEBUG level, it
> could help to find a regular pattern. By the way, I see that you have set=
 a
> global read repair chance:
>     read_repair_chance =3D 0.1
> And not the local read repair:
>     dclocal_read_repair_chance =3D 0.0
> Is there any reason to do that or is it just the old (pre 2.0.9) default
> configuration?
>
> The cell count is the number of triplets: (name, value, timestamp)
>
> Also, I see that you have set sstable_size_in_mb at 50 MB. What is the
> rational behind this? (Yes I'm curious :-) ). Anyway your "SSTables per
> read" are good.
>
> Best,
>
> Romain
>
> Le Lundi 5 septembre 2016 13h32, Joseph Tech <jaalex.tech@gmail.com> a
> =C3=A9crit :
>
>
> Hi Ryan,
>
> Attached are the cfhistograms run within few mins of each other. On the
> surface, don't see anything which indicates too much skewing (assuming
> skewing =3D=3Dkeys spread across many SSTables) . Please confirm. Related=
 to
> this, what does the "cell count" metric indicate ; didn't find a clear
> explanation in the documents.
>
> Thanks,
> Joseph
>
>
> On Thu, Sep 1, 2016 at 6:30 PM, Ryan Svihla <rs@foundev.pro> wrote:
>
> Have you looked at cfhistograms/tablehistograms your data maybe just
> skewed (most likely explanation is probably the correct one here)
>
> Regard,
>
> Ryan Svihla
>
> _____________________________
> From: Joseph Tech <jaalex.tech@gmail.com>
> Sent: Wednesday, August 31, 2016 11:16 PM
> Subject: Re: Read timeouts on primary key queries
> To: <user@cassandra.apache.org>
>
>
>
> Patrick,
>
> The desc table is below (only col names changed) :
>
> CREATE TABLE db.tbl (
>     id1 text,
>     id2 text,
>     id3 text,
>     id4 text,
>     f1 text,
>     f2 map<text, text>,
>     f3 map<text, text>,
>     created timestamp,
>     updated timestamp,
>     PRIMARY KEY (id1, id2, id3, id4)
> ) WITH CLUSTERING ORDER BY (id2 ASC, id3 ASC, id4 ASC)
>     AND bloom_filter_fp_chance =3D 0.01
>     AND caching =3D '{"keys":"ALL", "rows_per_partition":"NONE"}'
>     AND comment =3D ''
>     AND compaction =3D {'sstable_size_in_mb': '50', 'class':
> 'org.apache.cassandra.db. compaction. LeveledCompactionStrategy'}
>     AND compression =3D {'sstable_compression': 'org.apache.cassandra.io.
> compress.LZ4Compressor'}
>     AND dclocal_read_repair_chance =3D 0.0
>     AND default_time_to_live =3D 0
>     AND gc_grace_seconds =3D 864000
>     AND max_index_interval =3D 2048
>     AND memtable_flush_period_in_ms =3D 0
>     AND min_index_interval =3D 128
>     AND read_repair_chance =3D 0.1
>     AND speculative_retry =3D '99.0PERCENTILE';
>
> and the query is select * from tbl where id1=3D? and id2=3D? and id3=3D? =
and
> id4=3D?
>
> The timeouts happen within ~2s to ~5s, while the successful calls have av=
g
> of 8ms and p99 of 15s. These times are seen from app side, the actual que=
ry
> times would be slightly lower.
>
> Is there a way to capture traces only when queries take longer than a
> specified duration? . We can't enable tracing in production given the
> volume of traffic. We see that the same query which timed out works fine
> later, so not sure if the trace of a successful run would help.
>
> Thanks,
> Joseph
>
>
> On Wed, Aug 31, 2016 at 8:05 PM, Patrick McFadin <pmcfadin@gmail.com>
> wrote:
>
> If you are getting a timeout on one table, then a mismatch of RF and node
> count doesn't seem as likely.
>
> Time to look at your query. You said it was a 'select * from table where
> key=3D?' type query. I would next use the trace facility in cqlsh to
> investigate further. That's a good way to find hard to find issues. You
> should be looking for clear ledge where you go from single digit ms to 4 =
or
> 5 digit ms times.
>
> The other place to look is your data model for that table if you want to
> post the output from a desc table.
>
> Patrick
>
>
>
> On Tue, Aug 30, 2016 at 11:07 AM, Joseph Tech <jaalex.tech@gmail.com>
> wrote:
>
> On further analysis, this issue happens only on 1 table in the KS which
> has the max reads.
>
> @Atul, I will look at system health, but didnt see anything standing out
> from GC logs. (using JDK 1.8_92 with G1GC).
>
> @Patrick , could you please elaborate the "mismatch on node count + RF"
> part.
>
> On Tue, Aug 30, 2016 at 5:35 PM, Atul Saroha <atul.saroha@snapdeal.com>
> wrote:
>
> There could be many reasons for this if it is intermittent. CPU usage +
> I/O wait status. As read are I/O intensive, your IOPS requirement should =
be
> met that time load. Heap issue if CPU is busy for GC only. Network health
> could be the reason. So better to look system health during that time whe=
n
> it comes.
>
> ------------------------------ ------------------------------
> ------------------------------ ---------------------------
> Atul Saroha
> *Lead Software Engineer*
> *M*: +91 8447784271 *T*: +91 124-415-6069 *EXT*: 12369
> Plot # 362, ASF Centre - Tower A, Udyog Vihar,
>  Phase -4, Sector 18, Gurgaon, Haryana 122016, INDIA
>
> On Tue, Aug 30, 2016 at 5:10 PM, Joseph Tech <jaalex.tech@gmail.com>
> wrote:
>
> Hi Patrick,
>
> The nodetool status shows all nodes up and normal now. From OpsCenter
> "Event Log" , there are some nodes reported as being down/up etc. during
> the timeframe of timeout, but these are Search workload nodes from the
> remote (non-local) DC. The RF is 3 and there are 9 nodes per DC.
>
> Thanks,
> Joseph
>
> On Mon, Aug 29, 2016 at 11:07 PM, Patrick McFadin <pmcfadin@gmail.com>
> wrote:
>
> You aren't achieving quorum on your reads as the error is explains. That
> means you either have some nodes down or your topology is not matching up=
.
> The fact you are using LOCAL_QUORUM might point to a datacenter mis-match
> on node count + RF.
>
> What does your nodetool status look like?
>
> Patrick
>
> On Mon, Aug 29, 2016 at 10:14 AM, Joseph Tech <jaalex.tech@gmail.com>
> wrote:
>
> Hi,
>
> We recently started getting intermittent timeouts on primary key queries
> (select * from table where key=3D<key>)
>
> The error is : com.datastax.driver.core.excep tions.ReadTimeoutException:
> Cassandra timeout during read query at consistency LOCAL_QUORUM (2
> responses were required but only 1 replica
> a responded)
>
> The same query would work fine when tried directly from cqlsh. There are
> no indications in system.log for the table in question, though there were
> compactions in progress for tables in another keyspace which is more
> frequently accessed.
>
> My understanding is that the chances of primary key queries timing out is
> very minimal. Please share the possible reasons / ways to debug this issu=
e.
>
> We are using Cassandra 2.1 (DSE 4.8.7).
>
> Thanks,
> Joseph
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>


--=20
*Regards,*
*Anshu *

--001a114c5e648f113c053bc60084
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div class=3D"gmail_default" style=3D"font-family:verdana,=
sans-serif">We have seen read time out issue in cassandra due to high dropp=
able tombstone ratio for repository.=C2=A0</div><div class=3D"gmail_default=
" style=3D"font-family:verdana,sans-serif"><br></div><div class=3D"gmail_de=
fault" style=3D"font-family:verdana,sans-serif">Please check for high dropp=
able tombstone ratio for your repo.=C2=A0</div></div><div class=3D"gmail_ex=
tra"><br><div class=3D"gmail_quote">On Mon, Sep 5, 2016 at 8:11 PM, Romain =
Hardouin <span dir=3D"ltr">&lt;<a href=3D"mailto:romainh_ml@yahoo.fr" targe=
t=3D"_blank">romainh_ml@yahoo.fr</a>&gt;</span> wrote:<br><blockquote class=
=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padd=
ing-left:1ex"><div><div style=3D"color:#000;background-color:#fff;font-fami=
ly:HelveticaNeue,Helvetica Neue,Helvetica,Arial,Lucida Grande,sans-serif;fo=
nt-size:16px"><div dir=3D"ltr"><span><font size=3D"2">Yes=C2=A0</font></spa=
n><span style=3D"font-size:13px">dclocal_read_repair_chance will reduce the=
 cross-DC traffic and latency, so you can swap the values (=C2=A0</span><sp=
an style=3D"font-size:13px"><a href=3D"https://issues.apache.org/jira/brows=
e/CASSANDRA-7320" target=3D"_blank">https://issues.apache.org/<wbr>jira/bro=
wse/CASSANDRA-7320</a>=C2=A0</span><span style=3D"font-size:13px">). I gues=
s the sstable_size_in_mb was set to 50 because back in the day (C* 1.0) the=
 default size was way too small:=C2=A0</span><span style=3D"font-size:13px"=
>5 MB</span><span style=3D"font-size:13px">. So maybe someone in your compa=
ny tried &quot;10 * the default&quot; i.e. 50 MB. Now the default is 160 MB=
. I don&#39;t say to change the value but just keep in mind that you&#39;re=
 using a small value here, it could help you someday.</span></div><div dir=
=3D"ltr"><span style=3D"font-size:13px"><br></span></div><div dir=3D"ltr"><=
span style=3D"font-size:13px">Regarding the cells, the histograms shows an =
*estimation* of the min, p50, ..., p99, max of cells based on SSTables meta=
data. On your screenshot, the Max is 4768. So you have a partition key with=
 ~ 4768 cells. The p99 is 1109, so 99% of your partition keys have less tha=
n (or equal to) 1109 cells.=C2=A0</span></div><div dir=3D"ltr"><span style=
=3D"font-size:13px">You can see these data of a given sstable with the tool=
 sstablemetadata.</span></div><div dir=3D"ltr"><span style=3D"font-size:13p=
x"><br></span></div><div dir=3D"ltr"><span style=3D"font-size:13px">Best,</=
span></div><div dir=3D"ltr"><span style=3D"font-size:13px"><br></span></div=
><div dir=3D"ltr"><span style=3D"font-size:13px">Romain</span></div><div di=
r=3D"ltr"><span style=3D"font-size:13px"><br></span></div> <div><br><br></d=
iv><div style=3D"display:block"> <div style=3D"font-family:HelveticaNeue,He=
lvetica Neue,Helvetica,Arial,Lucida Grande,sans-serif;font-size:16px"> <div=
 style=3D"font-family:HelveticaNeue,Helvetica Neue,Helvetica,Arial,Lucida G=
rande,sans-serif;font-size:16px"><span class=3D""> <div dir=3D"ltr"><font s=
ize=3D"2" face=3D"Arial"> Le Lundi 5 septembre 2016 15h17, Joseph Tech &lt;=
<a href=3D"mailto:jaalex.tech@gmail.com" target=3D"_blank">jaalex.tech@gmai=
l.com</a>&gt; a =C3=A9crit :<br></font></div>  <br><br> </span><div><div><d=
iv><span class=3D""><div dir=3D"ltr">Thanks, Romain . We will try to enable=
 the DEBUG logging (assuming it won&#39;t clog the logs much) . Regarding t=
he table configs,=C2=A0<span>read_repair_chance must be=C2=A0</span>carried=
 over from older versions - mostly defaults. I think=C2=A0<span>sstable_siz=
e_in_mb was set to limit the max SSTable size, though i am not sure on the =
reason for the 50 MB value.</span><div><font color=3D"#000000" face=3D"helv=
eticaneue, helvetica neue, helvetica, arial, lucida grande, sans-serif"><br=
 clear=3D"none"></font></div><div><font color=3D"#000000" face=3D"helvetica=
neue, helvetica neue, helvetica, arial, lucida grande, sans-serif">Does set=
ting </font><span>dclocal_read_repair_chance help in reducing cross-DC traf=
fic (haven&#39;t looked into this parameter, just going by the name).</span=
><font color=3D"#000000" face=3D"helveticaneue, helvetica neue, helvetica, =
arial, lucida grande, sans-serif"><br clear=3D"none"></font><div><span><br =
clear=3D"none"></span></div><div><span>By the cell count definition : is it=
 incremented based on the number of writes for a given name(key?) and value=
. This table is heavy on reads and writes. If so, the value should be much =
higher?</span></div></div></div></span><div><div><br clear=3D"none"><div><s=
pan class=3D"">On Mon, Sep 5, 2016 at 7:35 AM, Romain Hardouin <span dir=3D=
"ltr">&lt;<a rel=3D"nofollow" shape=3D"rect" href=3D"mailto:romainh_ml@yaho=
o.fr" target=3D"_blank">romainh_ml@yahoo.fr</a>&gt;</span> wrote:<br clear=
=3D"none"></span><blockquote style=3D"margin:0 0 0 .8ex;border-left:1px #cc=
c solid;padding-left:1ex"><div><div style=3D"color:#000;background-color:#f=
ff;font-family:HelveticaNeue,Helvetica Neue,Helvetica,Arial,Lucida Grande,s=
ans-serif;font-size:16px"><div><span><font size=3D"2">Hi,</font></span></di=
v><div><span><font size=3D"2"><br clear=3D"none"></font></span></div><div d=
ir=3D"ltr"><span><font size=3D"2">Try to put=C2=A0</font></span><font size=
=3D"2">org.apache.cassandra.db. ConsistencyLevel at DEBUG level, it could h=
elp to find a regular pattern. By the way, I see that you have set a global=
 read repair chance:</font></div><div><div class=3D"h5"><div dir=3D"ltr"><f=
ont size=3D"2">=C2=A0 =C2=A0=C2=A0</font><span style=3D"font-size:13px">rea=
d_repair_chance =3D 0.1</span></div><div dir=3D"ltr"><span style=3D"font-si=
ze:13px">And not the local read repair:</span></div><div dir=3D"ltr"><span =
style=3D"font-size:13px">=C2=A0 =C2=A0 dclocal_read_repair_chance =3D 0.0</=
span></div> <div dir=3D"ltr"><font size=3D"2">Is there any reason to do tha=
t or is it just the old (pre 2.0.9) default configuration?=C2=A0</font></di=
v><div dir=3D"ltr"><font size=3D"2"><br clear=3D"none"></font></div><div di=
r=3D"ltr"><font size=3D"2">The cell count is the number of triplets: (name,=
 value, timestamp)</font></div><div dir=3D"ltr"><font size=3D"2"><br clear=
=3D"none"></font></div><div dir=3D"ltr"><font size=3D"2">Also, I see that y=
ou have set=C2=A0</font><span style=3D"font-size:13px">sstable_size_in_mb a=
t 50 MB. What is the rational behind this? (Yes I&#39;m curious :-) ). Anyw=
ay your &quot;SSTables per read&quot; are good.</span></div><div dir=3D"ltr=
"><font size=3D"2"><br clear=3D"none"></font></div><div dir=3D"ltr"><font s=
ize=3D"2">Best,</font></div><div dir=3D"ltr"><font size=3D"2"><br clear=3D"=
none"></font></div><div dir=3D"ltr"><font size=3D"2">Romain</font></div><di=
v dir=3D"ltr"><font size=3D"2"><br clear=3D"none"></font></div><div style=
=3D"display:block"> <div> <div><div><div> <div dir=3D"ltr"><font size=3D"2"=
 face=3D"Arial"> Le Lundi 5 septembre 2016 13h32, Joseph Tech &lt;<a rel=3D=
"nofollow" shape=3D"rect" href=3D"mailto:jaalex.tech@gmail.com" target=3D"_=
blank">jaalex.tech@gmail.com</a>&gt; a =C3=A9crit :<br clear=3D"none"></fon=
t></div>  <font size=3D"2"><br clear=3D"none"><br clear=3D"none"> </font></=
div></div><div><div><div><div><div><div dir=3D"ltr"><font size=3D"2">Hi Rya=
n,</font><div><font size=3D"2"><br clear=3D"none"></font></div><div><font s=
ize=3D"2">Attached are the cfhistograms run within few mins of each other. =
On the surface, don&#39;t see anything which indicates too much skewing (as=
suming skewing =3D=3Dkeys spread across many SSTables) . Please confirm. Re=
lated to this, what does the &quot;cell count&quot; metric indicate ; didn&=
#39;t find a clear explanation in the documents.</font></div><div><font siz=
e=3D"2"><br clear=3D"none"></font></div><div><font size=3D"2">Thanks,</font=
></div><div><font size=3D"2">Joseph</font></div><div><font size=3D"2"><br c=
lear=3D"none"></font></div></div></div></div><div><div><font size=3D"2"><br=
 clear=3D"none"></font><div><div><div><font size=3D"2">On Thu, Sep 1, 2016 =
at 6:30 PM, Ryan Svihla <span dir=3D"ltr">&lt;<a rel=3D"nofollow" shape=3D"=
rect" href=3D"mailto:rs@foundev.pro" target=3D"_blank">rs@foundev.pro</a>&g=
t;</span> wrote:<br clear=3D"none"></font></div></div><blockquote style=3D"=
margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div><div>
		<div style=3D"padding-left:20px;padding-right:20px;padding-bottom:8px"><f=
ont size=3D"2">Have you looked at cfhistograms/tablehistograms your data ma=
ybe just skewed (most likely explanation is probably the correct one here)<=
br clear=3D"none"><br clear=3D"none"></font><div><font size=3D"2">Regard,</=
font></div><div><font size=3D"2"><br clear=3D"none"></font></div><div><font=
 size=3D"2">Ryan Svihla</font></div><font size=3D"2"><br clear=3D"none"></f=
ont></div>
		</div></div><div><div><div><font size=3D"2">_____________________________=
<br clear=3D"none">From: Joseph Tech &lt;<a rel=3D"nofollow" shape=3D"rect"=
 dir=3D"ltr" href=3D"mailto:jaalex.tech@gmail.com" target=3D"_blank">jaalex=
.tech@gmail.com</a>&gt;<br clear=3D"none">Sent: Wednesday, August 31, 2016 =
11:16 PM<br clear=3D"none">Subject: Re: Read timeouts on primary key querie=
s<br clear=3D"none">To:  &lt;<a rel=3D"nofollow" shape=3D"rect" dir=3D"ltr"=
 href=3D"mailto:user@cassandra.apache.org" target=3D"_blank">user@cassandra=
.apache.org</a>&gt;</font></div></div><div><div><div><div><font size=3D"2">=
<br clear=3D"none"><br clear=3D"none"><br clear=3D"none"></font><div dir=3D=
"ltr"><font size=3D"2">Patrick,</font><div><font size=3D"2"><br clear=3D"no=
ne"></font></div><div><font size=3D"2">The desc table is below (only col na=
mes changed) :=C2=A0</font></div><div><font size=3D"2"><br clear=3D"none"><=
/font></div><div><div><font size=3D"2">CREATE TABLE db.tbl (</font></div><d=
iv><font size=3D"2">=C2=A0 =C2=A0 id1 text,</font></div><div><font size=3D"=
2">=C2=A0 =C2=A0 id2 text,</font></div><div><font size=3D"2">=C2=A0 =C2=A0 =
id3 text,</font></div><div><font size=3D"2">=C2=A0 =C2=A0 id4 text,</font><=
/div><div><font size=3D"2">=C2=A0 =C2=A0 f1 text,</font></div><div><font si=
ze=3D"2">=C2=A0 =C2=A0 f2 map&lt;text, text&gt;,</font></div><div><font siz=
e=3D"2">=C2=A0 =C2=A0 f3 map&lt;text, text&gt;,</font></div><div><font size=
=3D"2">=C2=A0 =C2=A0 created timestamp,</font></div><div><font size=3D"2">=
=C2=A0 =C2=A0 updated timestamp,</font></div><div><font size=3D"2">=C2=A0 =
=C2=A0 PRIMARY KEY (id1, id2, id3, id4)</font></div><div><font size=3D"2">)=
 WITH CLUSTERING ORDER BY (id2 ASC, id3 ASC, id4 ASC)</font></div><div><fon=
t size=3D"2">=C2=A0 =C2=A0 AND bloom_filter_fp_chance =3D 0.01</font></div>=
<div><font size=3D"2">=C2=A0 =C2=A0 AND caching =3D &#39;{&quot;keys&quot;:=
&quot;ALL&quot;, &quot;rows_per_partition&quot;:&quot;NONE&quot;}&#39;</fon=
t></div><div><font size=3D"2">=C2=A0 =C2=A0 AND comment =3D &#39;&#39;</fon=
t></div><div><font size=3D"2">=C2=A0 =C2=A0 AND compaction =3D {&#39;sstabl=
e_size_in_mb&#39;: &#39;50&#39;, &#39;class&#39;: &#39;org.apache.cassandra=
.db. compaction. LeveledCompactionStrategy&#39;}</font></div><div><font siz=
e=3D"2">=C2=A0 =C2=A0 AND compression =3D {&#39;sstable_compression&#39;: &=
#39;<a rel=3D"nofollow" shape=3D"rect" href=3D"http://org.apache.cassandra.=
io/" target=3D"_blank">org.apache.cassandra.io</a>. compress.LZ4Compressor&=
#39;}</font></div><div><font size=3D"2">=C2=A0 =C2=A0 AND dclocal_read_repa=
ir_chance =3D 0.0</font></div><div><font size=3D"2">=C2=A0 =C2=A0 AND defau=
lt_time_to_live =3D 0</font></div><div><font size=3D"2">=C2=A0 =C2=A0 AND g=
c_grace_seconds =3D 864000</font></div><div><font size=3D"2">=C2=A0 =C2=A0 =
AND max_index_interval =3D 2048</font></div><div><font size=3D"2">=C2=A0 =
=C2=A0 AND memtable_flush_period_in_ms =3D 0</font></div><div><font size=3D=
"2">=C2=A0 =C2=A0 AND min_index_interval =3D 128</font></div><div><font siz=
e=3D"2">=C2=A0 =C2=A0 AND read_repair_chance =3D 0.1</font></div><div><font=
 size=3D"2">=C2=A0 =C2=A0 AND speculative_retry =3D &#39;99.0PERCENTILE&#39=
;;</font></div></div><div><font size=3D"2"><br clear=3D"none"></font></div>=
<div><font size=3D"2">and the query is=C2=A0select * from tbl where id1=3D?=
 and id2=3D? and id3=3D? and id4=3D?</font></div><div><font size=3D"2"><br =
clear=3D"none"></font></div><div><font size=3D"2">The timeouts happen withi=
n ~2s to ~5s, while the successful calls have avg of 8ms and p99 of 15s. Th=
ese times are seen from app side, the actual query times would be slightly =
lower.=C2=A0</font></div><div><font size=3D"2"><br clear=3D"none"></font></=
div><div><font size=3D"2">Is there a way to capture traces only when querie=
s take longer than a specified duration? . We can&#39;t enable tracing in p=
roduction given the volume of traffic. We see that the same query which tim=
ed out works fine later, so not sure if the trace of a successful run would=
 help.</font></div><div><font size=3D"2"><br clear=3D"none"></font></div><d=
iv><font size=3D"2">Thanks,</font></div><div><font size=3D"2">Joseph</font>=
</div><div><font size=3D"2"><br clear=3D"none"></font></div></div></div></d=
iv><div><font size=3D"2"><br clear=3D"none"></font><div><div><div><font siz=
e=3D"2">On Wed, Aug 31, 2016 at 8:05 PM, Patrick McFadin <span dir=3D"ltr">=
&lt;<a rel=3D"nofollow" shape=3D"rect" href=3D"mailto:pmcfadin@gmail.com" t=
arget=3D"_blank">pmcfadin@gmail.com</a>&gt;</span> wrote:<br clear=3D"none"=
></font></div></div><blockquote style=3D"margin:0 0 0 .8ex;border-left:1px =
#ccc solid;padding-left:1ex"><div><div><div dir=3D"ltr"><font size=3D"2">If=
 you are getting a timeout on one table, then a mismatch of RF and node cou=
nt doesn&#39;t seem as likely.=C2=A0</font><div><font size=3D"2"><br clear=
=3D"none"></font></div><div><font size=3D"2">Time to look at your query. Yo=
u said it was a &#39;select * from table where key=3D?&#39; type query. I w=
ould next use the trace facility in cqlsh to investigate further. That&#39;=
s a good way to find hard to find issues. You should be looking for clear l=
edge where you go from single digit ms to 4 or 5 digit ms times.=C2=A0</fon=
t></div><div><font size=3D"2"><br clear=3D"none"></font></div><div><font si=
ze=3D"2">The other place to look is your data model for that table if you w=
ant to post the output from a desc table.</font></div><span><font color=3D"=
#888888" size=3D"2"></font></span><div><font size=3D"2"><br clear=3D"none">=
</font></div><div><font size=3D"2">Patrick</font></div><div><font size=3D"2=
"><br clear=3D"none"></font></div><div><font size=3D"2"><br clear=3D"none">=
</font></div></div></div></div><div><div><div><font size=3D"2"><br clear=3D=
"none"></font><div><div><div><font size=3D"2">On Tue, Aug 30, 2016 at 11:07=
 AM, Joseph Tech <span dir=3D"ltr">&lt;<a rel=3D"nofollow" shape=3D"rect" h=
ref=3D"mailto:jaalex.tech@gmail.com" target=3D"_blank">jaalex.tech@gmail.co=
m</a>&gt;</span> wrote:<br clear=3D"none"></font></div></div><blockquote st=
yle=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div>=
<div><div dir=3D"ltr"><font size=3D"2">On further analysis, this issue happ=
ens only on 1 table in the KS which has the max reads.=C2=A0</font><div><fo=
nt size=3D"2"><br clear=3D"none"></font></div><div><font size=3D"2">@Atul, =
I will look at system health, but didnt see anything standing out from GC l=
ogs. (using JDK 1.8_92 with G1GC).=C2=A0</font><div><font size=3D"2"><br cl=
ear=3D"none"></font></div><div><font size=3D"2">@Patrick , could you please=
 elaborate the &quot;mismatch on node count + RF&quot; part.</font></div></=
div></div></div></div><div><div><div><font size=3D"2"><br clear=3D"none"></=
font><div><div><div><font size=3D"2">On Tue, Aug 30, 2016 at 5:35 PM, Atul =
Saroha <span dir=3D"ltr">&lt;<a rel=3D"nofollow" shape=3D"rect" href=3D"mai=
lto:atul.saroha@snapdeal.com" target=3D"_blank">atul.saroha@snapdeal.com</a=
>&gt;</span> wrote:<br clear=3D"none"></font></div></div><blockquote style=
=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div><di=
v><div dir=3D"ltr"><font size=3D"2">There could be many reasons for this if=
 it is intermittent. CPU usage + I/O wait status. As read are I/O intensive=
, your IOPS requirement should be met that time load. Heap issue if CPU is =
busy for GC only. Network health could be the reason. So better to look sys=
tem health during that time when it comes.<br clear=3D"none"></font></div><=
/div></div><div><div><div><font size=3D"2"><br clear=3D"none"></font><div><=
div><div dir=3D"ltr"><div><div dir=3D"ltr"><div><div dir=3D"ltr"><div><div =
dir=3D"ltr"><font size=3D"2"><span></span></font><div style=3D"font-weight:=
bold;font-stretch:normal;line-height:16px;font-family:Arial,Helvetica,sans-=
serif;color:rgb(251,2,41)"><font size=3D"2"><span style=3D"color:rgb(178,0,=
0)">------------------------------ ------------------------------ ---------=
--------------------- ---------------------------<br clear=3D"none">Atul Sa=
roha</span><span><span><img align=3D"right" src=3D"http://i1.sdlcdn.com/img=
/marketing-mailers/mailer/2015/MKT/refer_23dec/images/sd_logo_23dec.png" he=
ight=3D"72" width=3D"219"></span></span><br clear=3D"none" style=3D"color:r=
gb(0,0,0)"><span style=3D"font-style:normal;font-variant:normal;font-weight=
:bold;font-stretch:normal;line-height:16px;font-family:Arial,Helvetica,sans=
-serif;color:rgb(0,0,0)"><b>Lead Software Engineer</b></span><br clear=3D"n=
one"></font></div><font size=3D"2"><span style=3D"font-style:normal;font-va=
riant:normal;font-weight:normal;font-stretch:normal;line-height:16px;font-f=
amily:Arial,Helvetica,sans-serif;color:rgb(139,139,139)"><b style=3D"color:=
#747474">M</b>: <a rel=3D"nofollow" shape=3D"rect">+91 8447784271</a>=C2=A0=
</span><span style=3D"font-style:normal;font-variant:normal;font-weight:nor=
mal;font-stretch:normal;line-height:16px;font-family:Arial,Helvetica,sans-s=
erif;color:rgb(139,139,139)"><span style=3D"font-stretch:normal;line-height=
:16px;font-family:Arial,Helvetica,sans-serif;color:rgb(139,139,139)"><b>T</=
b>: <a rel=3D"nofollow" shape=3D"rect">+91 124-415-6069</a> <b>EXT</b>: 123=
69</span></span><span style=3D"font-style:normal;font-variant:normal;font-w=
eight:normal;font-stretch:normal;line-height:16px;font-family:Arial,Helveti=
ca,sans-serif;color:rgb(139,139,139)"><br clear=3D"none"></span><span style=
=3D"font-stretch:normal;line-height:16px;font-family:Arial,Helvetica,sans-s=
erif;color:rgb(139,139,139)">Plot # 362, ASF Centre - Tower A, Udyog Vihar,=
<br clear=3D"none">=C2=A0Phase -4, Sector 18, Gurgaon, Haryana 122016, INDI=
A</span></font></div></div></div></div></div></div></div></div></div></div>=
</div><div><div><font size=3D"2"><br clear=3D"none"></font><div><div><div><=
font size=3D"2">On Tue, Aug 30, 2016 at 5:10 PM, Joseph Tech <span dir=3D"l=
tr">&lt;<a rel=3D"nofollow" shape=3D"rect" href=3D"mailto:jaalex.tech@gmail=
.com" target=3D"_blank">jaalex.tech@gmail.com</a>&gt;</span> wrote:<br clea=
r=3D"none"></font></div></div><blockquote style=3D"margin:0 0 0 .8ex;border=
-left:1px #ccc solid;padding-left:1ex"><div><div><div dir=3D"ltr"><font siz=
e=3D"2">Hi Patrick,</font><div><font size=3D"2"><br clear=3D"none"></font><=
/div><div><font size=3D"2">The nodetool status shows all nodes up and norma=
l now. From OpsCenter &quot;Event Log&quot; , there are some nodes reported=
 as being down/up etc. during the timeframe of timeout, but these are Searc=
h workload nodes from the remote (non-local) DC. The RF is 3 and there are =
9 nodes per DC.</font></div><div><font size=3D"2"><br clear=3D"none"></font=
></div><div><font size=3D"2">Thanks,</font></div><div><font size=3D"2">Jose=
ph</font></div></div></div></div><div><div><div><font size=3D"2"><br clear=
=3D"none"></font><div><div><div><font size=3D"2">On Mon, Aug 29, 2016 at 11=
:07 PM, Patrick McFadin <span dir=3D"ltr">&lt;<a rel=3D"nofollow" shape=3D"=
rect" href=3D"mailto:pmcfadin@gmail.com" target=3D"_blank">pmcfadin@gmail.c=
om</a>&gt;</span> wrote:<br clear=3D"none"></font></div></div><blockquote s=
tyle=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div=
><div><div dir=3D"ltr"><font size=3D"2">You aren&#39;t achieving quorum on =
your reads as the error is explains. That means you either have some nodes =
down or your topology is not matching up. The fact you are using LOCAL_QUOR=
UM might point to a datacenter mis-match on node count + RF.=C2=A0</font><d=
iv><font size=3D"2"><br clear=3D"none"></font></div><div><font size=3D"2">W=
hat does your nodetool status look like?</font></div><span><font color=3D"#=
888888" size=3D"2"></font></span><div><font size=3D"2"><br clear=3D"none"><=
/font></div><div><font size=3D"2">Patrick</font></div></div></div></div><di=
v><div><div><font size=3D"2"><br clear=3D"none"></font><div><div><div><font=
 size=3D"2">On Mon, Aug 29, 2016 at 10:14 AM, Joseph Tech <span dir=3D"ltr"=
>&lt;<a rel=3D"nofollow" shape=3D"rect" href=3D"mailto:jaalex.tech@gmail.co=
m" target=3D"_blank">jaalex.tech@gmail.com</a>&gt;</span> wrote:<br clear=
=3D"none"></font></div></div><blockquote style=3D"margin:0 0 0 .8ex;border-=
left:1px #ccc solid;padding-left:1ex"><div dir=3D"ltr"><div><div><font size=
=3D"2">Hi,</font><div><font size=3D"2"><br clear=3D"none"></font></div><div=
><font size=3D"2">We recently started getting intermittent timeouts on prim=
ary key queries (select * from table where key=3D&lt;key&gt;)</font></div><=
div><font size=3D"2"><br clear=3D"none"></font></div></div></div><div><font=
 size=3D"2">The error is : com.datastax.driver.core.excep tions.ReadTimeout=
Exception: Cassandra timeout during read query at consistency LOCAL_QUORUM =
(2 responses were required but only 1 replica<br clear=3D"none"></font></di=
v><span></span><div><div><font size=3D"2">a responded)</font></div></div><d=
iv><font size=3D"2"><br clear=3D"none"></font></div><div><div><font size=3D=
"2">The same query would work fine when tried directly from cqlsh. There ar=
e no indications in system.log for the table in question, though there were=
 compactions in progress for tables in another keyspace which is more frequ=
ently accessed.=C2=A0</font></div></div><div><font size=3D"2"><br clear=3D"=
none"></font></div><div><font size=3D"2">My understanding is that the chanc=
es of primary key queries timing out is very minimal. Please share the poss=
ible reasons / ways to debug this issue.=C2=A0<br clear=3D"none"></font></d=
iv><div><font size=3D"2"><br clear=3D"none"></font></div><div><font size=3D=
"2">We are using Cassandra 2.1 (DSE 4.8.7).</font></div><div><font size=3D"=
2"><br clear=3D"none"></font></div><div><font size=3D"2">Thanks,</font></di=
v><div><font size=3D"2">Joseph</font></div><div><font size=3D"2"><br clear=
=3D"none"></font></div><div><font size=3D"2"><br clear=3D"none"></font></di=
v><div><font size=3D"2"><br clear=3D"none"></font></div></div></blockquote>=
</div><font size=3D"2"><br clear=3D"none"></font></div></div></div></blockq=
uote></div><font size=3D"2"><br clear=3D"none"></font></div></div></div></b=
lockquote></div><font size=3D"2"><br clear=3D"none"></font></div></div></di=
v></blockquote></div><font size=3D"2"><br clear=3D"none"></font></div></div=
></div></blockquote></div><font size=3D"2"><br clear=3D"none"></font></div>=
</div></div></blockquote></div><font size=3D"2"><br clear=3D"none"></font><=
/div><font size=3D"2"><br clear=3D"none"><br clear=3D"none"></font></div></=
div></div>
	</blockquote></div><font size=3D"2"><br clear=3D"none"></font></div></div>=
</div></div><font size=3D"2"><br clear=3D"none"></font><br clear=3D"none"><=
/div>  </div> </div>  </div></div></div></div></div></blockquote></div><br =
clear=3D"none"></div></div></div></div><br><br></div>  </div> </div>  </div=
></div></div></blockquote></div><br><br clear=3D"all"><div><br></div>-- <br=
><div class=3D"gmail_signature" data-smartmail=3D"gmail_signature"><div dir=
=3D"ltr"><div><font face=3D"verdana, sans-serif"><b>Regards,</b></font></di=
v><font face=3D"verdana, sans-serif"><b>Anshu=C2=A0</b></font><div><font fa=
ce=3D"verdana, sans-serif"><b><br></b></font><div><div><span style=3D"font-=
size:13px;border-collapse:collapse;color:rgb(136,136,136)"></span><br></div=
></div></div></div></div>
</div>

--001a114c5e648f113c053bc60084--