Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 16703 invoked from network); 8 Mar 2011 20:14:43 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 8 Mar 2011 20:14:43 -0000 Received: (qmail 14312 invoked by uid 500); 8 Mar 2011 20:14:41 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 14289 invoked by uid 500); 8 Mar 2011 20:14:41 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 14281 invoked by uid 99); 8 Mar 2011 20:14:41 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 08 Mar 2011 20:14:41 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of ruslan.usifov@gmail.com designates 209.85.212.170 as permitted sender) Received: from [209.85.212.170] (HELO mail-px0-f170.google.com) (209.85.212.170) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 08 Mar 2011 20:14:35 +0000 Received: by pxi19 with SMTP id 19so1048317pxi.29 for ; Tue, 08 Mar 2011 12:14:15 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type; bh=kf218rZu7WOUPqUDSrNTsLJuTZSkqDCCMT1b/gB/t/Q=; b=qht4YR+4+2I5PPkmyDhNBOrBFdwOdO5mbHPSxzaXBu0IrSglsf/vA/iE7B9leGiD5u /bCVHxxZ2U5VfHfk/ryUundPVWDhdxV2x2ENaKd9O32Roj5X80lYm0AxuTk195Ft3HsT pVUGFFaUYhfIOfam673/CK9Y6vmXhdds7mtbA= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=FbIZVTak7gPK0j8Il2kKjdvBPvbynsnqrnHp0TZzTIcjyw96RvO+SpypQH4YUhmjj/ 3sknVQEzj6f7lVQ5HUTUh/Xj9BaaAEOryly5jCNLnUHEQ1DedT5bi1XEWQui2nQ4o7zm wFCr/DtuWz8nL4XDmmzotz6CuD/UJ6djTUuYY= MIME-Version: 1.0 Received: by 10.143.21.14 with SMTP id y14mr4785342wfi.126.1299615255671; Tue, 08 Mar 2011 12:14:15 -0800 (PST) Received: by 10.143.7.10 with HTTP; Tue, 8 Mar 2011 12:14:15 -0800 (PST) In-Reply-To: References: <4D7553DD.1080702@yellowseo.com> Date: Tue, 8 Mar 2011 23:14:15 +0300 Message-ID: Subject: Re: Nodes frozen in GC From: ruslan usifov To: Peter Schuller , user@cassandra.apache.org Content-Type: multipart/alternative; boundary=00504502cc39a8b51b049dfe41eb --00504502cc39a8b51b049dfe41eb Content-Type: text/plain; charset=ISO-8859-1 2011/3/8 Peter Schuller > > $client->batch_mutate($mutations, > > cassandra_ConsistencyLevel::QUORUM); > > Btw, what are the mutations? Are you doing something like inserting > both very small values and very large ones? > > I have big xml file (5 GB) (mysql dump in xml format) and read data from it with SAX xml parser, all records on that file looks like this: 5 3619780:1 0 7 0 1291053619 0 0 mutations in that case is 10 similar records (follow fragment of code, describes situation ) $l_supercolumn = new cassandra_SuperColumn(array("name" => $l_row["aid"], "columns" => $l_columns)); $l_c_or_sc = new cassandra_ColumnOrSuperColumn(array("super_column" => $l_supercolumn)); $l_mutation = new cassandra_Mutation(array("column_or_supercolumn" => $l_c_or_sc)); if(array_key_exists($l_key, $mutations)) { array_push($mutations[$l_key]['aquarium_friend'], $l_mutation); } else { $mutations[$l_key] = array('aquarium_friend' => array($l_mutation)); }; if(!($l_i % 10)) { make_mutation($client, $mutations, $g_loger, $g_rloger); $mutations = array(); if(!($l_i % 1000)) { $g_loger->info(sprintf("inserted: %s", $l_i)); }; }; > That's why I asked about the frequency. If you're doing a long-term > stress test and seeing a 30 second pause once per week, that's a lot > more likely to be "normal" for your workload than if you're seeing it > happen once ever three minutes. The issue is that if you want to fix > your problem, one must first figure out what the problem *is*. Based > on past mailing list traffic, it seems most people's problems that are > seemingly "due to GC" end up being because of a too high live set size > or the CMS phase triggering too late. These are fixable issues if are > running into them. > > In may case this happen from time to time. For example insert all 5GB xml took about 30-40 minutes, and nodes frozen about 5-10 time on that period (avg time of frozen 15 secs) > If all you have is a single column family with a 64 mb flush threshold > and doing a bunch of insertions, and have a heap size of 5 (was it?) > gig, you should not be having these issues. But stating that helps no > one, which is why I'm asking for more information. (Widely > extrapolating and suggesting that all Cassandra nodes will always > freeze for 30 seconds every now and then is also helping no one, other > than not being true.) > > At initial state HEAP was 6GB. When i increase HEAP size to 7GB nodes frozen only one time, but took much greater time (40 secs) --00504502cc39a8b51b049dfe41eb Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable

2011/3/8 Peter Schuller <peter.schuller@infidyn= e.com>
> =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 =A0=A0=A0 $client->= batch_mutate($mutations,
> cassandra_ConsistencyLevel::QUORUM);

Btw, what are the mutations? Are you doing something like inserting both very small values and very large ones?

I have big xml file (5 GB) (mysql dump in xml format)= and read data from it with SAX xml parser, all records on that file looks = like this:

=A0=A0=A0=A0=A0=A0=A0 <row>
=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0 <field name=3D"uid">5</field= >
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 <field name=3D"aid&qu= ot;>3619780:1</field>
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0 <field name=3D"cleanness">0</field>
=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 <field name=3D"counter"= ;>7</field>
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 <field name=3D"gcount= ">0</field>
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 = <field name=3D"lastchange">1291053619</field>
=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 <field name=3D"disaster&= quot;>0</field>
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 <field name=3D"tdisas= ter">0</field>
=A0=A0=A0=A0=A0=A0=A0 </row>

<= br>mutations in that case is 10 similar records (follow fragment of code, d= escribes situation )

=A0=A0=A0 $l_supercolumn =3D new cassandra_Supe= rColumn(array("name" =3D> $l_row["aid"], "colum= ns" =3D> $l_columns));
=A0=A0=A0 $l_c_or_sc =3D new cassandra_ColumnOrSuperColumn(array("supe= r_column" =3D> $l_supercolumn));
=A0=A0=A0 $l_mutation =3D new c= assandra_Mutation(array("column_or_supercolumn" =3D> $l_c_or_s= c));

=A0=A0=A0 if(array_key_exists($l_key, $mutations))
=A0=A0=A0 {
=A0=A0=A0 =A0=A0=A0 array_push($mutations[$l_key]['aquar= ium_friend'], $l_mutation);
=A0=A0=A0 }
=A0=A0=A0 else
=A0=A0= =A0 {
=A0=A0=A0 =A0=A0=A0 $mutations[$l_key] =3D array('aquarium_fri= end' =3D> array($l_mutation));
=A0=A0=A0 };

=A0=A0=A0 if(!($l_i % 10))
=A0=A0=A0 {
=A0=A0=A0 =A0=A0=A0 make_m= utation($client, $mutations, $g_loger, $g_rloger);
=A0=A0=A0 =A0=A0=A0 $= mutations =3D array();

=A0=A0=A0 =A0=A0=A0 if(!($l_i % 1000))
=A0= =A0=A0 =A0=A0=A0 {
=A0=A0=A0 =A0=A0=A0 =A0=A0=A0 $g_loger->info(sprin= tf("inserted: %s", $l_i));
=A0=A0=A0 =A0=A0=A0 };
=A0=A0=A0 };



That's why I asked about the frequency. If you're doing a long-term=
stress test and seeing a 30 second pause once per week, that's a lot more likely to be "normal" for your workload than if you're s= eeing it
happen once ever three minutes. The issue is that if you want to fix
your problem, one must first figure out what the problem *is*. Based
on past mailing list traffic, it seems most people's problems that are<= br> seemingly "due to GC" end up being because of a too high live set= size
or the CMS phase triggering too late. These are fixable issues if are
running into them.


In may case this happen from time to time. For ex= ample insert all 5GB xml took about 30-40 minutes, and nodes frozen about 5= -10 time on that period (avg time of frozen 15 secs)

=A0
If all you have is a single column family with a 64 mb flush threshold
and doing a bunch of insertions, and have a heap size of 5 (was it?)
gig, you should not be having these issues. But stating that helps no
one, which is why I'm asking for more information. (Widely
extrapolating and suggesting that all Cassandra nodes will always
freeze for 30 seconds every now and then is also helping no one, other
than not being true.)

=A0At initial state HEAP was 6GB. When i increase HEA= P size to 7GB nodes frozen only one time, but took much greater time (40 se= cs)
=A0


--00504502cc39a8b51b049dfe41eb--