Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 7655 invoked from network); 16 Feb 2011 04:26:08 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 16 Feb 2011 04:26:08 -0000 Received: (qmail 48053 invoked by uid 500); 16 Feb 2011 04:26:06 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 47854 invoked by uid 500); 16 Feb 2011 04:26:03 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 47845 invoked by uid 99); 16 Feb 2011 04:26:02 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 16 Feb 2011 04:26:02 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [199.106.237.199] (HELO ny1-exhub-01.ip-soft.net) (199.106.237.199) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 16 Feb 2011 04:25:54 +0000 Received: from NY1-EXMB-01.ip-soft.net ([192.168.27.101]) by ny1-exhub-01.ip-soft.net ([::1]) with mapi; Tue, 15 Feb 2011 23:25:33 -0500 From: Chris Baron To: "user@cassandra.apache.org" Date: Tue, 15 Feb 2011 23:25:32 -0500 Subject: Hinted Handoff/GC Tuning Headache Thread-Topic: Hinted Handoff/GC Tuning Headache Thread-Index: AQHLzZGMGXWysJ55N0i3PaW/xxwaYg== Message-ID: <43AFA1D4CADA484797B9F3D58C86DB4901ABAB5739@NY1-EXMB-01.ip-soft.net> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: en-US Content-Type: text/plain; charset="Windows-1252" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Virus-Checked: Checked by ClamAV on apache.org Recently upgraded my 8 node cluster from 0.6.6 to 0.7.0 (even more recently= 0.7.1) for ExpiringColumn, among the many other spectacular improvements. =20 Retuned the GC settings based on experience from 0.6.6 and new defaults. =20 After about a week, two of the nodes were very far behind on minor compacti= ons (2k+ SSTables per CF and growing, 20k+ pending compactions). The SSTab= le switch rate on these two nodes was about 10x higher than the other nodes= . I also observed rolling long pause deaths (Gossip saying node X is dead)= , seemingly every three minutes one of the nodes would long pause GC. I sa= w this behavior also when I upgraded from 0.6.6 to 0.6.8, but I rolled back= to 0.6.6 because time did not allow for a deeper observation at that time.= (found this: https://issues.apache.org/jira/browse/CASSANDRA-1656) =20 I eventually traced this behavior back to a nasty interaction between Hinte= d Handoff and GC tuned for normal operating conditions. =20 =20 If I understand the code correctly, when a node replays a hint it reads the= hinted data directly from the application tables (read: my ColumnFamily). = If the replaying node happens to be to also be a replica it will resend th= e entire row, even if only one column was mutated. Because of the rolling = GC pause deaths the HHs rarely succeeded and if they did it wasn=92t long b= efore a new set of hints were recorded. =20 Disabling Hinted Handoffs has fixed this problem, for me. =20 Looking into intermittent GC issues further, the verbose gc log showed ParN= ew promotion failures, so I conservatively lowered CMSInitiatingOccupancyFr= action, MAX_NEWSIZE, and in_memory_compaction_limit_in_mb. I=92m now seein= g long CMS times (8000ms+) but no failures, which leads me to believe 6G he= ap may be too large based on the current tuning. =20 It=92s worth noting that I saw no increase in ColumnFamily WriteCount or St= orageProxy.WriteOperations, only ColumnFamily MemtableColumnsCount and Memt= ableDataSize were increasing very rapidly on the target node while HintedHa= ndoffs were replaying. -- Chris=