Mailing-List: contact cassandra-user-help@incubator.apache.org; run by ezmlm
Precedence: bulk
Reply-To: cassandra-user@incubator.apache.org
Received-SPF: pass (athena.apache.org: domain of ikatkov@gmail.com designates
 209.85.218.210 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=mime-version:in-reply-to:references:from:date:message-id:subject:to
         :content-type;
        b=KH3vu5CDI37F+P3NJikqI8V5FEMU4DV87BKHhUZD/gBZv17SH9OEGl3XRHl8ZnZWNE
         sDzOhI2kCaenoTjS/S0ZWtxAo2BgS+Wpq6zB5dI1xFYzF2A/35hYI1WK4pRRJ6m/L2Gq
         9aQzaE7A39uWqBW7VNQ8Di/EAo9a4obrsYXjQ=
MIME-Version: 1.0
In-Reply-To: <35bb42690911161812q7ce42990v340fc99b413c2134@mail.gmail.com>
References: <35bb42690911092025l109b871exa58ff629d624e299@mail.gmail.com>
	<35bb42690911101149i18fcc590v1cbc2ba9b2b99356@mail.gmail.com>
	<e06563880911101150i48ecb865x2d540f139da3b3c5@mail.gmail.com>
	<35bb42690911101153y3a998431se86a64613f31b030@mail.gmail.com>
	<e06563880911101155u29379034wefa41e5ccda256a6@mail.gmail.com>
	<35bb42690911160946pb37f763x52666a890ded9a91@mail.gmail.com>
	<59DD1BA8FD3C0F4C90771C18F2B5B53A4C842D4E4A@GVW0432EXB.americas.hpqcorp.net>
	<35bb42690911161013y3ee067cao637c189e751fea49@mail.gmail.com>
	<59DD1BA8FD3C0F4C90771C18F2B5B53A4C842D4E8B@GVW0432EXB.americas.hpqcorp.net>
	<35bb42690911161812q7ce42990v340fc99b413c2134@mail.gmail.com>
From: Igor Katkov <ikatkov@gmail.com>
Date: Mon, 16 Nov 2009 21:25:53 -0500
Message-ID: <23b1e84e0911161825w6adae420n935f3dbdb52b2325@mail.gmail.com>
Subject: Re: Timeout Exception
To: cassandra-user@incubator.apache.org, chris@chriswere.com
Content-Type: multipart/alternative; boundary=0023545bd7609bdfa0047887d965

--0023545bd7609bdfa0047887d965
Content-Type: text/plain; charset=ISO-8859-1

On most resolvable hardware (for Cassandra) JVM will be running in server
mode by default.
http://java.sun.com/j2se/1.5.0/docs/guide/vm/server-class.html

On Mon, Nov 16, 2009 at 9:12 PM, Chris Were <chris.were@gmail.com> wrote:

> Reading more on JVM GC led me to investigate the java -server flag (
> http://stackoverflow.com/questions/198577/real-differences-between-java-server-and-java-client
> )
>
> From what I can see cassandra's startup scripts don't invoke this mode, or
> did I miss it?
>
> Chris.
>
>
> On Mon, Nov 16, 2009 at 10:33 AM, Freeman, Tim <tim.freeman@hp.com> wrote:
>
>>  You'll have to stop the swapping somehow.  Maybe you can install more
>> memory, maybe you can run Cassandra smaller, maybe you can get some other
>> process on the machine to be smaller or on some other machine, maybe you can
>> move Cassandra to some other machine with more available physical memory.
>>
>>
>>
>> I don't have experience with running Cassandra smaller than the
>> recommended size, so one of those options might not work.
>>
>>
>>
>> Caching database information in swapped-out pages usually isn't a win.  To
>> a first approximation, you need an I/O to fetch the swapped-out page, but
>> you'd need an I/O anyway to get the information from the database.  Swapping
>> on modern machines usually isn't a win in general -- Memory got bigger and
>> CPU's got faster in the last decade, but disks didn't get much faster.
>>
>>
>>
>> Tim Freeman
>> Email: tim.freeman@hp.com
>> Desk in Palo Alto: (650) 857-2581
>> Home: (408) 774-1298
>> Cell: (408) 348-7536 (No reception business hours Monday, Tuesday, and
>> Thursday; call my desk instead.)
>>
>>
>>
>> *From:* Chris Were [mailto:chris.were@gmail.com]
>> *Sent:* Monday, November 16, 2009 10:13 AM
>> *To:* cassandra-user@incubator.apache.org
>> *Subject:* Re: Timeout Exception
>>
>>
>>
>> Hi Tim,
>>
>>
>>
>> Thanks for the great pointers.
>>
>>
>>
>> si, so are regularly in the 100-2000 range. I'll need to Google more about
>> what these mean etc, but are you effectively saying to tell cassandra to use
>> less memory? Cassandra is the only Java App running on the server.
>>
>>
>>
>> Cheers,
>>
>> Chris
>>
>> On Mon, Nov 16, 2009 at 9:59 AM, Freeman, Tim <tim.freeman@hp.com> wrote:
>>
>> I'm running 0.4.1.  I used to get timeouts, then I changed my timeout from
>> 5 seconds to 30 seconds and I get no more timeouts.  The relevant line from
>> storage-conf.xml is:
>>
>>
>>
>>   <RpcTimeoutInMillis>30000</RpcTimeoutInMillis>
>>
>>
>>
>> The maximum latency is often just over 5 seconds in the worst case when I
>> fetch thousands of records, so default timeout of 5 seconds happens to be a
>> little bit too low for me.  My records are ~100Kbytes each.  You may get
>> different results if your records are much larger or much smaller.
>>
>>
>>
>> The other issue I was having a few days ago was that the machine was page
>> faulting so garbage collections were taking forever.  Some GC's took 20
>> minutes in another Java process.  I didn't have verbose:gc turned on in
>> Cassandra so I'm not sure what the score was there, but there's little
>> reason to expect it to be qualitatively better, since it's pretty random
>> which process gets some of its pages swapped out.  On a Linux machine, run
>> "vmstat 5" when your machine is loaded and if you see numbers greater than 0
>> in the "si" and "so" columns in rows after the first, tell one of your Java
>> processes to take less memory.
>>
>>
>>
>> Tim Freeman
>> Email: tim.freeman@hp.com
>> Desk in Palo Alto: (650) 857-2581
>> Home: (408) 774-1298
>> Cell: (408) 348-7536 (No reception business hours Monday, Tuesday, and
>> Thursday; call my desk instead.)
>>
>>
>>
>> *From:* Chris Were [mailto:chris.were@gmail.com]
>> *Sent:* Monday, November 16, 2009 9:47 AM
>> *To:* Jonathan Ellis
>> *Cc:* cassandra-user@incubator.apache.org
>> *Subject:* Re: Timeout Exception
>>
>>
>>
>> I turned on debug logging for a few days and timeouts happened across
>> pretty much all requests. I couldn't see any particular request that was
>> consistently the problem.
>>
>>
>>
>> After some experimenting it seems that shutting down cassandra and
>> restarting resolves the problem. Once it hits the JVM memory limit however,
>> the timeouts start again. I have read the page on MemTable thresholds and
>> have tried thresholds of 32MB, 64MB and 128MB with no noticeable difference.
>> Cassandra is set to use 7GB of memory. I have 12 CF's, however only 6 of
>> those have lots of data.
>>
>>
>>
>> Cheers,
>>
>> Chris
>>
>> On Tue, Nov 10, 2009 at 11:55 AM, Jonathan Ellis <jbellis@gmail.com>
>> wrote:
>>
>> if you're timing out doing a slice on 10 columns w/ 10% cpu used,
>> something is broken
>>
>> is it consistent as to which keys this happens on?  try turning on
>> debug logging and seeing where the latency is coming from.
>>
>>
>> On Tue, Nov 10, 2009 at 1:53 PM, Chris Were <chris.were@gmail.com> wrote:
>> >
>> > On Tue, Nov 10, 2009 at 11:50 AM, Jonathan Ellis <jbellis@gmail.com>
>> wrote:
>> >>
>> >> On Tue, Nov 10, 2009 at 1:49 PM, Chris Were <chris.were@gmail.com>
>> wrote:
>> >> > Maybe... but it's not just multigets, it also happens when retreiving
>> >> > one
>> >> > row with get_slice.
>> >>
>> >> how many of the 3M columns are you trying to slice at once?
>> >
>> > Sorry, I must have mixed up the terminology.
>> > There's ~3M keys, but less than 10 columns in each. The get_slice calls
>> are
>> > to retreive all the columns (10) for a given key.
>>
>>
>>
>>
>>
>
>

--0023545bd7609bdfa0047887d965
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

On most resolvable hardware (for Cassandra) JVM will be running in server m=
ode by default.<br><a href=3D"http://java.sun.com/j2se/1.5.0/docs/guide/vm/=
server-class.html">http://java.sun.com/j2se/1.5.0/docs/guide/vm/server-clas=
s.html</a> <br>

<br><div class=3D"gmail_quote">On Mon, Nov 16, 2009 at 9:12 PM, Chris Were =
<span dir=3D"ltr">&lt;<a href=3D"mailto:chris.were@gmail.com">chris.were@gm=
ail.com</a>&gt;</span> wrote:<br><blockquote class=3D"gmail_quote" style=3D=
"border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padd=
ing-left: 1ex;">

<div>Reading more on JVM GC led me to investigate the java -server flag (<a=
 href=3D"http://stackoverflow.com/questions/198577/real-differences-between=
-java-server-and-java-client" target=3D"_blank">http://stackoverflow.com/qu=
estions/198577/real-differences-between-java-server-and-java-client</a>)</d=
iv>


<div><br></div><div>From what I can see cassandra&#39;s startup scripts don=
&#39;t invoke this mode, or did I miss it?</div><div><br></div><div><font c=
olor=3D"#888888">Chris.</font><div><div></div><div class=3D"h5"><br><br><di=
v class=3D"gmail_quote">

On Mon, Nov 16, 2009 at 10:33 AM, Freeman, Tim <span dir=3D"ltr">&lt;<a hre=
f=3D"mailto:tim.freeman@hp.com" target=3D"_blank">tim.freeman@hp.com</a>&gt=
;</span> wrote:<br>

<blockquote class=3D"gmail_quote" style=3D"border-left: 1px solid rgb(204, =
204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">


<div link=3D"blue" vlink=3D"purple" lang=3D"EN-US">

<div>

<p class=3D"MsoNormal"><span style=3D"font-size: 11pt; color: rgb(31, 73, 1=
25);">You&#39;ll have to stop the swapping somehow.=A0 Maybe you can instal=
l
more memory, maybe you can run Cassandra smaller, maybe you can get some ot=
her
process on the machine to be smaller or on some other machine, maybe you ca=
n
move Cassandra to some other machine with more available physical memory.=
=A0
</span></p>

<p class=3D"MsoNormal"><span style=3D"font-size: 11pt; color: rgb(31, 73, 1=
25);">=A0</span></p>

<p class=3D"MsoNormal"><span style=3D"font-size: 11pt; color: rgb(31, 73, 1=
25);">I don&#39;t have experience with running Cassandra smaller than the
recommended size, so one of those options might not work.</span></p>

<p class=3D"MsoNormal"><span style=3D"font-size: 11pt; color: rgb(31, 73, 1=
25);">=A0</span></p>

<p class=3D"MsoNormal"><span style=3D"font-size: 11pt; color: rgb(31, 73, 1=
25);">Caching database information in swapped-out pages usually isn&#39;t
a win.=A0 To a first approximation, you need an I/O to fetch the swapped-ou=
t
page, but you&#39;d need an I/O anyway to get the information from the data=
base.=A0
Swapping on modern machines usually isn&#39;t a win in general -- Memory go=
t bigger
and CPU&#39;s got faster in the last decade, but disks didn&#39;t get much =
faster.</span></p><div>

<p class=3D"MsoNormal"><span style=3D"font-size: 11pt; color: rgb(31, 73, 1=
25);">=A0</span></p>

<p class=3D"MsoNormal"><span style=3D"font-size: 10pt; color: rgb(31, 73, 1=
25);">Tim Freeman<br>
Email: <a href=3D"mailto:tim.freeman@hp.com" target=3D"_blank">tim.freeman@=
hp.com</a><br>
Desk in Palo Alto: (650) 857-2581<br>
Home: (408) 774-1298<br>
Cell: (408) 348-7536 (No reception business hours Monday, Tuesday, and
Thursday; call my desk instead.)</span><span style=3D"font-size: 11pt; colo=
r: rgb(31, 73, 125);"></span></p>

<p class=3D"MsoNormal"><span style=3D"font-size: 11pt; color: rgb(31, 73, 1=
25);">=A0</span></p>

</div><div style=3D"border-style: solid none none; border-color: rgb(181, 1=
96, 223) -moz-use-text-color -moz-use-text-color; border-width: 1pt medium =
medium; padding: 3pt 0in 0in;">

<p class=3D"MsoNormal"><b><span style=3D"font-size: 10pt;">From:</span></b>=
<span style=3D"font-size: 10pt;"> Chris Were
[mailto:<a href=3D"mailto:chris.were@gmail.com" target=3D"_blank">chris.wer=
e@gmail.com</a>] <br>
<b>Sent:</b> Monday, November 16, 2009 10:13 AM<br>
<b>To:</b> <a href=3D"mailto:cassandra-user@incubator.apache.org" target=3D=
"_blank">cassandra-user@incubator.apache.org</a><br>
<b>Subject:</b> Re: Timeout Exception</span></p>

</div><div><div></div><div>

<p class=3D"MsoNormal">=A0</p>

<p class=3D"MsoNormal">Hi Tim,</p>

<div>

<p class=3D"MsoNormal">=A0</p>

</div>

<div>

<p class=3D"MsoNormal">Thanks for the great pointers.</p>

</div>

<div>

<p class=3D"MsoNormal">=A0</p>

</div>

<div>

<p class=3D"MsoNormal">si, so are regularly in the 100-2000 range. I&#39;ll=
 need to
Google more about what these mean etc, but are you effectively saying to te=
ll
cassandra to use less memory? Cassandra is the only Java App running on the
server.</p>

</div>

<div>

<p class=3D"MsoNormal">=A0</p>

</div>

<div>

<p class=3D"MsoNormal">Cheers,</p>

</div>

<div>

<p class=3D"MsoNormal" style=3D"margin-bottom: 12pt;">Chris</p>

<div>

<p class=3D"MsoNormal">On Mon, Nov 16, 2009 at 9:59 AM, Freeman, Tim &lt;<a=
 href=3D"mailto:tim.freeman@hp.com" target=3D"_blank">tim.freeman@hp.com</a=
>&gt; wrote:</p>

<div>

<div>

<p class=3D"MsoNormal"><span style=3D"font-size: 11pt; color: rgb(31, 73, 1=
25);">I&#39;m running 0.4.1.=A0 I used to get
timeouts, then I changed my timeout from 5 seconds to 30 seconds and I get =
no
more timeouts.=A0 The relevant line from storage-conf.xml is:</span></p>

<p class=3D"MsoNormal"><span style=3D"font-size: 11pt; color: rgb(31, 73, 1=
25);">=A0</span></p>

<p class=3D"MsoNormal"><span style=3D"font-size: 11pt; color: rgb(31, 73, 1=
25);">=A0
&lt;RpcTimeoutInMillis&gt;30000&lt;/RpcTimeoutInMillis&gt;</span></p>

<p class=3D"MsoNormal"><span style=3D"font-size: 11pt; color: rgb(31, 73, 1=
25);">=A0</span></p>

<p class=3D"MsoNormal"><span style=3D"font-size: 11pt; color: rgb(31, 73, 1=
25);">The maximum latency is often just over 5
seconds in the worst case when I fetch thousands of records, so default tim=
eout
of 5 seconds happens to be a little bit too low for me.=A0 My records are
~100Kbytes each.=A0 You may get different results if your records are much
larger or much smaller.</span></p>

<p class=3D"MsoNormal"><span style=3D"font-size: 11pt; color: rgb(31, 73, 1=
25);">=A0</span></p>

<p class=3D"MsoNormal"><span style=3D"font-size: 11pt; color: rgb(31, 73, 1=
25);">The other issue I was having a few days
ago was that the machine was page faulting so garbage collections were taki=
ng
forever.=A0 Some GC&#39;s took 20 minutes in another Java process.=A0 I
didn&#39;t have verbose:gc turned on in Cassandra so I&#39;m not sure what =
the score
was there, but there&#39;s little reason to expect it to be qualitatively b=
etter,
since it&#39;s pretty random which process gets some of its pages swapped
out.=A0 On a Linux machine, run &quot;vmstat 5&quot; when your machine is
loaded and if you see numbers greater than 0 in the &quot;si&quot; and
&quot;so&quot; columns in rows after the first, tell one of your Java proce=
sses
to take less memory.</span></p>

<p class=3D"MsoNormal"><span style=3D"font-size: 11pt; color: rgb(31, 73, 1=
25);">=A0</span></p>

<p class=3D"MsoNormal"><span style=3D"font-size: 10pt; color: rgb(31, 73, 1=
25);">Tim Freeman<br>
Email: <a href=3D"mailto:tim.freeman@hp.com" target=3D"_blank">tim.freeman@=
hp.com</a><br>
Desk in Palo Alto: (650) 857-2581<br>
Home: (408) 774-1298<br>
Cell: (408) 348-7536 (No reception business hours Monday, Tuesday, and
Thursday; call my desk instead.)</span></p>

<p class=3D"MsoNormal"><span style=3D"font-size: 11pt; color: rgb(31, 73, 1=
25);">=A0</span></p>

<div style=3D"border-style: solid none none; border-color: rgb(181, 196, 22=
3) -moz-use-text-color -moz-use-text-color; border-width: 1pt medium medium=
; padding: 3pt 0in 0in;">

<p class=3D"MsoNormal"><b><span style=3D"font-size: 10pt;">From:</span></b>=
<span style=3D"font-size: 10pt;"> Chris
Were [mailto:<a href=3D"mailto:chris.were@gmail.com" target=3D"_blank">chri=
s.were@gmail.com</a>]
<br>
<b>Sent:</b> Monday, November 16, 2009 9:47 AM<br>
<b>To:</b> Jonathan Ellis<br>
<b>Cc:</b> <a href=3D"mailto:cassandra-user@incubator.apache.org" target=3D=
"_blank">cassandra-user@incubator.apache.org</a><br>
<b>Subject:</b> Re: Timeout Exception</span></p>

</div>

<div>

<div>

<p class=3D"MsoNormal">=A0</p>

<p class=3D"MsoNormal">I
turned on debug logging for a few days and timeouts happened across pretty =
much
all requests. I couldn&#39;t see any particular request that was consistent=
ly the
problem.</p>

<div>

<p class=3D"MsoNormal">=A0</p>

</div>

<div>

<p class=3D"MsoNormal">After
some experimenting it seems that shutting down cassandra and restarting
resolves the problem. Once it hits the JVM memory limit however, the timeou=
ts
start again. I have read the page on MemTable thresholds and have tried
thresholds of 32MB, 64MB and 128MB with no noticeable difference. Cassandra=
 is
set to use 7GB of memory. I have 12 CF&#39;s, however only 6 of those have =
lots of
data.</p>

</div>

<div>

<p class=3D"MsoNormal">=A0</p>

</div>

<div>

<p class=3D"MsoNormal">Cheers,</p>

</div>

<div>

<p class=3D"MsoNormal" style=3D"margin-bottom: 12pt;">Chris</p>

<div>

<p class=3D"MsoNormal">On
Tue, Nov 10, 2009 at 11:55 AM, Jonathan Ellis &lt;<a href=3D"mailto:jbellis=
@gmail.com" target=3D"_blank">jbellis@gmail.com</a>&gt;
wrote:</p>

<p class=3D"MsoNormal">if
you&#39;re timing out doing a slice on 10 columns w/ 10% cpu used,<br>
something is broken<br>
<br>
is it consistent as to which keys this happens on? =A0try turning on<br>
debug logging and seeing where the latency is coming from.</p>

<div>

<div>

<p class=3D"MsoNormal"><br>
On Tue, Nov 10, 2009 at 1:53 PM, Chris Were &lt;<a href=3D"mailto:chris.wer=
e@gmail.com" target=3D"_blank">chris.were@gmail.com</a>&gt;
wrote:<br>
&gt;<br>
&gt; On Tue, Nov 10, 2009 at 11:50 AM, Jonathan Ellis &lt;<a href=3D"mailto=
:jbellis@gmail.com" target=3D"_blank">jbellis@gmail.com</a>&gt;
wrote:<br>
&gt;&gt;<br>
&gt;&gt; On Tue, Nov 10, 2009 at 1:49 PM, Chris Were &lt;<a href=3D"mailto:=
chris.were@gmail.com" target=3D"_blank">chris.were@gmail.com</a>&gt;
wrote:<br>
&gt;&gt; &gt; Maybe... but it&#39;s not just multigets, it also happens whe=
n
retreiving<br>
&gt;&gt; &gt; one<br>
&gt;&gt; &gt; row with get_slice.<br>
&gt;&gt;<br>
&gt;&gt; how many of the 3M columns are you trying to slice at once?<br>
&gt;<br>
&gt; Sorry, I must have mixed up the terminology.<br>
&gt; There&#39;s ~3M keys, but less than 10 columns in each. The get_slice =
calls
are<br>
&gt; to retreive all the columns (10) for a given key.</p>

</div>

</div>

</div>

<p class=3D"MsoNormal">=A0</p>

</div>

</div>

</div>

</div>

</div>

</div>

<p class=3D"MsoNormal">=A0</p>

</div>

</div></div></div>

</div>


</blockquote></div><br></div></div></div>
</blockquote></div><br>

--0023545bd7609bdfa0047887d965--