Mailing-List: contact cassandra-user-help@incubator.apache.org; run by ezmlm
Precedence: bulk
Reply-To: cassandra-user@incubator.apache.org
Received-SPF: pass (nike.apache.org: domain of chris.were@gmail.com designates
 209.85.222.174 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=mime-version:reply-to:in-reply-to:references:from:date:message-id
         :subject:to:content-type;
        b=R7Xge3f8vt3oEl1nJlP60hnHZ1Ptf5dqwYfOlxiAkL5jA82ipUvWz4bVN3f+HDQDj2
         Ii/EVM2CNBXA+m8qQ0vpT8EiOna1zbtwsd9JkPavJtEhjmhpaOnRJWsNH6R+J6gNdBNl
         viLBz8ZuNWDAVfnVJDhs3k2OtE7hJKiKcau9k=
MIME-Version: 1.0
Reply-To: chris@chriswere.com
In-Reply-To: 
 <59DD1BA8FD3C0F4C90771C18F2B5B53A4C842D4E4A@GVW0432EXB.americas.hpqcorp.net>
References: <35bb42690911092025l109b871exa58ff629d624e299@mail.gmail.com>
	<e06563880911092122v17340b79q401004ac18b3844b@mail.gmail.com>
	<35bb42690911101123y795c80erb18c2091fe960ae2@mail.gmail.com>
	<e06563880911101134r6f24aee3x735e7283800d2f06@mail.gmail.com>
	<35bb42690911101149i18fcc590v1cbc2ba9b2b99356@mail.gmail.com>
	<e06563880911101150i48ecb865x2d540f139da3b3c5@mail.gmail.com>
	<35bb42690911101153y3a998431se86a64613f31b030@mail.gmail.com>
	<e06563880911101155u29379034wefa41e5ccda256a6@mail.gmail.com>
	<35bb42690911160946pb37f763x52666a890ded9a91@mail.gmail.com>
	<59DD1BA8FD3C0F4C90771C18F2B5B53A4C842D4E4A@GVW0432EXB.americas.hpqcorp.net>
From: Chris Were <chris.were@gmail.com>
Date: Mon, 16 Nov 2009 10:13:06 -0800
Message-ID: <35bb42690911161013y3ee067cao637c189e751fea49@mail.gmail.com>
Subject: Re: Timeout Exception
To: cassandra-user@incubator.apache.org
Content-Type: multipart/alternative; boundary=000e0cd1061c3eadc2047880f747

--000e0cd1061c3eadc2047880f747
Content-Type: text/plain; charset=ISO-8859-1

Hi Tim,

Thanks for the great pointers.

si, so are regularly in the 100-2000 range. I'll need to Google more about
what these mean etc, but are you effectively saying to tell cassandra to use
less memory? Cassandra is the only Java App running on the server.

Cheers,
Chris

On Mon, Nov 16, 2009 at 9:59 AM, Freeman, Tim <tim.freeman@hp.com> wrote:

>  I'm running 0.4.1.  I used to get timeouts, then I changed my timeout
> from 5 seconds to 30 seconds and I get no more timeouts.  The relevant line
> from storage-conf.xml is:
>
>
>
>   <RpcTimeoutInMillis>30000</RpcTimeoutInMillis>
>
>
>
> The maximum latency is often just over 5 seconds in the worst case when I
> fetch thousands of records, so default timeout of 5 seconds happens to be a
> little bit too low for me.  My records are ~100Kbytes each.  You may get
> different results if your records are much larger or much smaller.
>
>
>
> The other issue I was having a few days ago was that the machine was page
> faulting so garbage collections were taking forever.  Some GC's took 20
> minutes in another Java process.  I didn't have verbose:gc turned on in
> Cassandra so I'm not sure what the score was there, but there's little
> reason to expect it to be qualitatively better, since it's pretty random
> which process gets some of its pages swapped out.  On a Linux machine, run
> "vmstat 5" when your machine is loaded and if you see numbers greater than 0
> in the "si" and "so" columns in rows after the first, tell one of your Java
> processes to take less memory.
>
>
>
> Tim Freeman
> Email: tim.freeman@hp.com
> Desk in Palo Alto: (650) 857-2581
> Home: (408) 774-1298
> Cell: (408) 348-7536 (No reception business hours Monday, Tuesday, and
> Thursday; call my desk instead.)
>
>
>
> *From:* Chris Were [mailto:chris.were@gmail.com]
> *Sent:* Monday, November 16, 2009 9:47 AM
> *To:* Jonathan Ellis
> *Cc:* cassandra-user@incubator.apache.org
> *Subject:* Re: Timeout Exception
>
>
>
> I turned on debug logging for a few days and timeouts happened across
> pretty much all requests. I couldn't see any particular request that was
> consistently the problem.
>
>
>
> After some experimenting it seems that shutting down cassandra and
> restarting resolves the problem. Once it hits the JVM memory limit however,
> the timeouts start again. I have read the page on MemTable thresholds and
> have tried thresholds of 32MB, 64MB and 128MB with no noticeable difference.
> Cassandra is set to use 7GB of memory. I have 12 CF's, however only 6 of
> those have lots of data.
>
>
>
> Cheers,
>
> Chris
>
> On Tue, Nov 10, 2009 at 11:55 AM, Jonathan Ellis <jbellis@gmail.com>
> wrote:
>
> if you're timing out doing a slice on 10 columns w/ 10% cpu used,
> something is broken
>
> is it consistent as to which keys this happens on?  try turning on
> debug logging and seeing where the latency is coming from.
>
>
> On Tue, Nov 10, 2009 at 1:53 PM, Chris Were <chris.were@gmail.com> wrote:
> >
> > On Tue, Nov 10, 2009 at 11:50 AM, Jonathan Ellis <jbellis@gmail.com>
> wrote:
> >>
> >> On Tue, Nov 10, 2009 at 1:49 PM, Chris Were <chris.were@gmail.com>
> wrote:
> >> > Maybe... but it's not just multigets, it also happens when retreiving
> >> > one
> >> > row with get_slice.
> >>
> >> how many of the 3M columns are you trying to slice at once?
> >
> > Sorry, I must have mixed up the terminology.
> > There's ~3M keys, but less than 10 columns in each. The get_slice calls
> are
> > to retreive all the columns (10) for a given key.
>
>
>

--000e0cd1061c3eadc2047880f747
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

Hi Tim,<div><br></div><div>Thanks for the great pointers.</div><div><br></d=
iv><div>si, so are regularly in the 100-2000 range. I&#39;ll need to Google=
 more about what these mean etc, but are you effectively saying to tell cas=
sandra to use less memory? Cassandra is the only Java App running on the se=
rver.</div>

<div><br></div><div>Cheers,</div><div>Chris<br><br><div class=3D"gmail_quot=
e">On Mon, Nov 16, 2009 at 9:59 AM, Freeman, Tim <span dir=3D"ltr">&lt;<a h=
ref=3D"mailto:tim.freeman@hp.com">tim.freeman@hp.com</a>&gt;</span> wrote:<=
br>

<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex;">


<div lang=3D"EN-US" link=3D"blue" vlink=3D"purple">

<div>

<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;color:#1F497D">I&#39=
;m running 0.4.1.=A0 I used to get timeouts, then I changed
my timeout from 5 seconds to 30 seconds and I get no more timeouts.=A0 The
relevant line from storage-conf.xml is:</span></p>

<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;color:#1F497D">=A0</=
span></p>

<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;color:#1F497D">=A0
&lt;RpcTimeoutInMillis&gt;30000&lt;/RpcTimeoutInMillis&gt;</span></p>

<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;color:#1F497D">=A0</=
span></p>

<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;color:#1F497D">The m=
aximum latency is often just over 5 seconds in the worst
case when I fetch thousands of records, so default timeout of 5 seconds hap=
pens
to be a little bit too low for me.=A0 My records are ~100Kbytes each.=A0
You may get different results if your records are much larger or much small=
er.</span></p>

<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;color:#1F497D">=A0</=
span></p>

<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;color:#1F497D">The o=
ther issue I was having a few days ago was that the machine
was page faulting so garbage collections were taking forever.=A0 Some GC=
9;s
took 20 minutes in another Java process.=A0 I didn&#39;t have verbose:gc tu=
rned
on in Cassandra so I&#39;m not sure what the score was there, but there&#39=
;s little
reason to expect it to be qualitatively better, since it&#39;s pretty rando=
m which
process gets some of its pages swapped out.=A0 On a Linux machine, run
&quot;vmstat 5&quot; when your machine is loaded and if you see numbers gre=
ater
than 0 in the &quot;si&quot; and &quot;so&quot; columns in rows after the f=
irst,
tell one of your Java processes to take less memory.</span></p>

<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;color:#1F497D">=A0</=
span></p>

<p class=3D"MsoNormal"><span style=3D"font-size:10.0pt;color:#1F497D">Tim F=
reeman<br>
Email: <a href=3D"mailto:tim.freeman@hp.com" target=3D"_blank">tim.freeman@=
hp.com</a><br>
Desk in Palo Alto: (650) 857-2581<br>
Home: (408) 774-1298<br>
Cell: (408) 348-7536 (No reception business hours Monday, Tuesday, and
Thursday; call my desk instead.)</span><span style=3D"font-size:11.0pt;colo=
r:#1F497D"></span></p>

<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;color:#1F497D">=A0</=
span></p>

<div style=3D"border:none;border-top:solid #B5C4DF 1.0pt;padding:3.0pt 0in =
0in 0in">

<p class=3D"MsoNormal"><b><span style=3D"font-size:10.0pt">From:</span></b>=
<span style=3D"font-size:10.0pt"> Chris Were
[mailto:<a href=3D"mailto:chris.were@gmail.com" target=3D"_blank">chris.wer=
e@gmail.com</a>] <br>
<b>Sent:</b> Monday, November 16, 2009 9:47 AM<br>
<b>To:</b> Jonathan Ellis<br>
<b>Cc:</b> <a href=3D"mailto:cassandra-user@incubator.apache.org" target=3D=
"_blank">cassandra-user@incubator.apache.org</a><br>
<b>Subject:</b> Re: Timeout Exception</span></p>

</div><div><div></div><div class=3D"h5">

<p class=3D"MsoNormal">=A0</p>

<p class=3D"MsoNormal">I turned on debug logging for a few days and timeout=
s
happened across pretty much all requests. I couldn&#39;t see any particular=
 request
that was consistently the problem.</p>

<div>

<p class=3D"MsoNormal">=A0</p>

</div>

<div>

<p class=3D"MsoNormal">After some experimenting it seems that shutting down
cassandra and restarting resolves the problem. Once it hits the JVM memory
limit however, the timeouts start again. I have read the page on MemTable
thresholds and have tried thresholds of 32MB, 64MB and 128MB with no notice=
able
difference. Cassandra is set to use 7GB of memory. I have 12 CF&#39;s, howe=
ver only
6 of those have lots of data.</p>

</div>

<div>

<p class=3D"MsoNormal">=A0</p>

</div>

<div>

<p class=3D"MsoNormal">Cheers,</p>

</div>

<div>

<p class=3D"MsoNormal" style=3D"margin-bottom:12.0pt">Chris</p>

<div>

<p class=3D"MsoNormal">On Tue, Nov 10, 2009 at 11:55 AM, Jonathan Ellis &lt=
;<a href=3D"mailto:jbellis@gmail.com" target=3D"_blank">jbellis@gmail.com</=
a>&gt; wrote:</p>

<p class=3D"MsoNormal">if you&#39;re timing out doing a slice on 10 columns=
 w/ 10% cpu
used,<br>
something is broken<br>
<br>
is it consistent as to which keys this happens on? =A0try turning on<br>
debug logging and seeing where the latency is coming from.</p>

<div>

<div>

<p class=3D"MsoNormal"><br>
On Tue, Nov 10, 2009 at 1:53 PM, Chris Were &lt;<a href=3D"mailto:chris.wer=
e@gmail.com" target=3D"_blank">chris.were@gmail.com</a>&gt; wrote:<br>
&gt;<br>
&gt; On Tue, Nov 10, 2009 at 11:50 AM, Jonathan Ellis &lt;<a href=3D"mailto=
:jbellis@gmail.com" target=3D"_blank">jbellis@gmail.com</a>&gt; wrote:<br>
&gt;&gt;<br>
&gt;&gt; On Tue, Nov 10, 2009 at 1:49 PM, Chris Were &lt;<a href=3D"mailto:=
chris.were@gmail.com" target=3D"_blank">chris.were@gmail.com</a>&gt; wrote:=
<br>
&gt;&gt; &gt; Maybe... but it&#39;s not just multigets, it also happens whe=
n
retreiving<br>
&gt;&gt; &gt; one<br>
&gt;&gt; &gt; row with get_slice.<br>
&gt;&gt;<br>
&gt;&gt; how many of the 3M columns are you trying to slice at once?<br>
&gt;<br>
&gt; Sorry, I must have mixed up the terminology.<br>
&gt; There&#39;s ~3M keys, but less than 10 columns in each. The get_slice =
calls
are<br>
&gt; to retreive all the columns (10) for a given key.</p>

</div>

</div>

</div>

<p class=3D"MsoNormal">=A0</p>

</div>

</div></div></div>

</div>


</blockquote></div><br></div>

--000e0cd1061c3eadc2047880f747--