Return-Path: Delivered-To: apmail-incubator-cassandra-user-archive@minotaur.apache.org Received: (qmail 20868 invoked from network); 16 Nov 2009 18:13:56 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 16 Nov 2009 18:13:56 -0000 Received: (qmail 13414 invoked by uid 500); 16 Nov 2009 18:13:56 -0000 Delivered-To: apmail-incubator-cassandra-user-archive@incubator.apache.org Received: (qmail 13394 invoked by uid 500); 16 Nov 2009 18:13:56 -0000 Mailing-List: contact cassandra-user-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: cassandra-user@incubator.apache.org Delivered-To: mailing list cassandra-user@incubator.apache.org Received: (qmail 13380 invoked by uid 99); 16 Nov 2009 18:13:56 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 16 Nov 2009 18:13:56 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of chris.were@gmail.com designates 209.85.222.174 as permitted sender) Received: from [209.85.222.174] (HELO mail-pz0-f174.google.com) (209.85.222.174) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 16 Nov 2009 18:13:47 +0000 Received: by pzk4 with SMTP id 4so3649753pzk.32 for ; Mon, 16 Nov 2009 10:13:26 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:reply-to:in-reply-to :references:from:date:message-id:subject:to:content-type; bh=TeGbDPUHmcQep9g+Es4y4AxkCC/kf94Ha4I0lgZOiD4=; b=oVIGaLJy59rwbNp+/7yAMT+Hn2ZyLUEnYAcfm0E9JDKsARJWsukaLUCF8kpkxBBb7+ lwTSToRCmpkyWQAp53+M65Dv3Ga8MgkNC98rpjj63krwnfinWnIiRofbVq4Y58fHUiCy 7WYjS6u6144k+jZSWvvduSL/5AOa0xOGhbjrA= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:reply-to:in-reply-to:references:from:date:message-id :subject:to:content-type; b=R7Xge3f8vt3oEl1nJlP60hnHZ1Ptf5dqwYfOlxiAkL5jA82ipUvWz4bVN3f+HDQDj2 Ii/EVM2CNBXA+m8qQ0vpT8EiOna1zbtwsd9JkPavJtEhjmhpaOnRJWsNH6R+J6gNdBNl viLBz8ZuNWDAVfnVJDhs3k2OtE7hJKiKcau9k= MIME-Version: 1.0 Received: by 10.141.14.20 with SMTP id r20mr432997rvi.190.1258395206069; Mon, 16 Nov 2009 10:13:26 -0800 (PST) Reply-To: chris@chriswere.com In-Reply-To: <59DD1BA8FD3C0F4C90771C18F2B5B53A4C842D4E4A@GVW0432EXB.americas.hpqcorp.net> References: <35bb42690911092025l109b871exa58ff629d624e299@mail.gmail.com> <35bb42690911101123y795c80erb18c2091fe960ae2@mail.gmail.com> <35bb42690911101149i18fcc590v1cbc2ba9b2b99356@mail.gmail.com> <35bb42690911101153y3a998431se86a64613f31b030@mail.gmail.com> <35bb42690911160946pb37f763x52666a890ded9a91@mail.gmail.com> <59DD1BA8FD3C0F4C90771C18F2B5B53A4C842D4E4A@GVW0432EXB.americas.hpqcorp.net> From: Chris Were Date: Mon, 16 Nov 2009 10:13:06 -0800 Message-ID: <35bb42690911161013y3ee067cao637c189e751fea49@mail.gmail.com> Subject: Re: Timeout Exception To: cassandra-user@incubator.apache.org Content-Type: multipart/alternative; boundary=000e0cd1061c3eadc2047880f747 X-Virus-Checked: Checked by ClamAV on apache.org --000e0cd1061c3eadc2047880f747 Content-Type: text/plain; charset=ISO-8859-1 Hi Tim, Thanks for the great pointers. si, so are regularly in the 100-2000 range. I'll need to Google more about what these mean etc, but are you effectively saying to tell cassandra to use less memory? Cassandra is the only Java App running on the server. Cheers, Chris On Mon, Nov 16, 2009 at 9:59 AM, Freeman, Tim wrote: > I'm running 0.4.1. I used to get timeouts, then I changed my timeout > from 5 seconds to 30 seconds and I get no more timeouts. The relevant line > from storage-conf.xml is: > > > > 30000 > > > > The maximum latency is often just over 5 seconds in the worst case when I > fetch thousands of records, so default timeout of 5 seconds happens to be a > little bit too low for me. My records are ~100Kbytes each. You may get > different results if your records are much larger or much smaller. > > > > The other issue I was having a few days ago was that the machine was page > faulting so garbage collections were taking forever. Some GC's took 20 > minutes in another Java process. I didn't have verbose:gc turned on in > Cassandra so I'm not sure what the score was there, but there's little > reason to expect it to be qualitatively better, since it's pretty random > which process gets some of its pages swapped out. On a Linux machine, run > "vmstat 5" when your machine is loaded and if you see numbers greater than 0 > in the "si" and "so" columns in rows after the first, tell one of your Java > processes to take less memory. > > > > Tim Freeman > Email: tim.freeman@hp.com > Desk in Palo Alto: (650) 857-2581 > Home: (408) 774-1298 > Cell: (408) 348-7536 (No reception business hours Monday, Tuesday, and > Thursday; call my desk instead.) > > > > *From:* Chris Were [mailto:chris.were@gmail.com] > *Sent:* Monday, November 16, 2009 9:47 AM > *To:* Jonathan Ellis > *Cc:* cassandra-user@incubator.apache.org > *Subject:* Re: Timeout Exception > > > > I turned on debug logging for a few days and timeouts happened across > pretty much all requests. I couldn't see any particular request that was > consistently the problem. > > > > After some experimenting it seems that shutting down cassandra and > restarting resolves the problem. Once it hits the JVM memory limit however, > the timeouts start again. I have read the page on MemTable thresholds and > have tried thresholds of 32MB, 64MB and 128MB with no noticeable difference. > Cassandra is set to use 7GB of memory. I have 12 CF's, however only 6 of > those have lots of data. > > > > Cheers, > > Chris > > On Tue, Nov 10, 2009 at 11:55 AM, Jonathan Ellis > wrote: > > if you're timing out doing a slice on 10 columns w/ 10% cpu used, > something is broken > > is it consistent as to which keys this happens on? try turning on > debug logging and seeing where the latency is coming from. > > > On Tue, Nov 10, 2009 at 1:53 PM, Chris Were wrote: > > > > On Tue, Nov 10, 2009 at 11:50 AM, Jonathan Ellis > wrote: > >> > >> On Tue, Nov 10, 2009 at 1:49 PM, Chris Were > wrote: > >> > Maybe... but it's not just multigets, it also happens when retreiving > >> > one > >> > row with get_slice. > >> > >> how many of the 3M columns are you trying to slice at once? > > > > Sorry, I must have mixed up the terminology. > > There's ~3M keys, but less than 10 columns in each. The get_slice calls > are > > to retreive all the columns (10) for a given key. > > > --000e0cd1061c3eadc2047880f747 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Hi Tim,

Thanks for the great pointers.

si, so are regularly in the 100-2000 range. I'll need to Google= more about what these mean etc, but are you effectively saying to tell cas= sandra to use less memory? Cassandra is the only Java App running on the se= rver.

Cheers,
Chris

On Mon, Nov 16, 2009 at 9:59 AM, Freeman, Tim <tim.freeman@hp.com> wrote:<= br>

I'= ;m running 0.4.1.=A0 I used to get timeouts, then I changed my timeout from 5 seconds to 30 seconds and I get no more timeouts.=A0 The relevant line from storage-conf.xml is:

=A0

=A0 <RpcTimeoutInMillis>30000</RpcTimeoutInMillis>

=A0

The m= aximum latency is often just over 5 seconds in the worst case when I fetch thousands of records, so default timeout of 5 seconds hap= pens to be a little bit too low for me.=A0 My records are ~100Kbytes each.=A0 You may get different results if your records are much larger or much small= er.

=A0

The o= ther issue I was having a few days ago was that the machine was page faulting so garbage collections were taking forever.=A0 Some GC= 9;s took 20 minutes in another Java process.=A0 I didn't have verbose:gc tu= rned on in Cassandra so I'm not sure what the score was there, but there'= ;s little reason to expect it to be qualitatively better, since it's pretty rando= m which process gets some of its pages swapped out.=A0 On a Linux machine, run "vmstat 5" when your machine is loaded and if you see numbers gre= ater than 0 in the "si" and "so" columns in rows after the f= irst, tell one of your Java processes to take less memory.

=A0

Tim F= reeman
Email: tim.freeman@= hp.com
Desk in Palo Alto: (650) 857-2581
Home: (408) 774-1298
Cell: (408) 348-7536 (No reception business hours Monday, Tuesday, and Thursday; call my desk instead.)

=A0

From:= Chris Were [mailto:chris.wer= e@gmail.com]
Sent: Monday, November 16, 2009 9:47 AM
To: Jonathan Ellis
Cc: cassandra-user@incubator.apache.org
Subject: Re: Timeout Exception

=A0

I turned on debug logging for a few days and timeout= s happened across pretty much all requests. I couldn't see any particular= request that was consistently the problem.

=A0

After some experimenting it seems that shutting down cassandra and restarting resolves the problem. Once it hits the JVM memory limit however, the timeouts start again. I have read the page on MemTable thresholds and have tried thresholds of 32MB, 64MB and 128MB with no notice= able difference. Cassandra is set to use 7GB of memory. I have 12 CF's, howe= ver only 6 of those have lots of data.

=A0

Cheers,

Chris

On Tue, Nov 10, 2009 at 11:55 AM, Jonathan Ellis <= ;jbellis@gmail.com> wrote:

if you're timing out doing a slice on 10 columns= w/ 10% cpu used,
something is broken

is it consistent as to which keys this happens on? =A0try turning on
debug logging and seeing where the latency is coming from.


On Tue, Nov 10, 2009 at 1:53 PM, Chris Were <
chris.were@gmail.com> wrote:
>
> On Tue, Nov 10, 2009 at 11:50 AM, Jonathan Ellis <jbellis@gmail.com> wrote:
>>
>> On Tue, Nov 10, 2009 at 1:49 PM, Chris Were <chris.were@gmail.com> wrote:=
>> > Maybe... but it's not just multigets, it also happens whe= n retreiving
>> > one
>> > row with get_slice.
>>
>> how many of the 3M columns are you trying to slice at once?
>
> Sorry, I must have mixed up the terminology.
> There's ~3M keys, but less than 10 columns in each. The get_slice = calls are
> to retreive all the columns (10) for a given key.

=A0


--000e0cd1061c3eadc2047880f747--