Return-Path: X-Original-To: apmail-cassandra-commits-archive@www.apache.org Delivered-To: apmail-cassandra-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 9942BEA71 for ; Thu, 31 Jan 2013 04:09:15 +0000 (UTC) Received: (qmail 99591 invoked by uid 500); 31 Jan 2013 04:09:14 -0000 Delivered-To: apmail-cassandra-commits-archive@cassandra.apache.org Received: (qmail 99549 invoked by uid 500); 31 Jan 2013 04:09:14 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 99333 invoked by uid 99); 31 Jan 2013 04:09:13 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 31 Jan 2013 04:09:13 +0000 Date: Thu, 31 Jan 2013 04:09:13 +0000 (UTC) From: "sunjian (JIRA)" To: commits@cassandra.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (CASSANDRA-5205) The first three Cassandra node is very busy , GC pause the world (Real production Env. Exp.) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CASSANDRA-5205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sunjian updated CASSANDRA-5205: ------------------------------- Attachment: the-trouble-maker-node.jpg one of the trouble maker node , which always frozen with the JVM heap increasing > The first three Cassandra node is very busy , GC pause the world (Real production Env. Exp.) > -------------------------------------------------------------------------------------------- > > Key: CASSANDRA-5205 > URL: https://issues.apache.org/jira/browse/CASSANDRA-5205 > Project: Cassandra > Issue Type: Improvement > Components: Core > Affects Versions: 1.1.5 > Environment: cassandra 1.1.5 release > centos 5.5 > jdk1.7u9 > vmware(TM)'s exsi based VM : 30GB RAM , 4*4core CPU > Hard ware : Dell R720 , 2*6core CPU , 128GB RAM , made 3 node as above > data hosted by each node : about 8GB > Reporter: sunjian > Priority: Minor > Fix For: 1.1.10 > > Attachments: the-normal-free-node-no-presure.jpg, the-trouble-maker-node.jpg > > > hi dear cares , > I have 10 nodes before , all on the centos VM with 16GB ram and 8core CPU , and running the cassandra 1.1.5 with only one User keyspace (RF=3) . Heap(Old:8GB,New:2GB) > matters : > 1. the first three nodes (from token 0) goes very busy all the time , but the left 7 nodes seems nothing to do , both the CPU and RAM was freely . > 2. all of the first three nodes' JVM ram cost increasing crazy , CMS GC fires nearly every seconds > 3. when GC happened , the world seems stopped . checking via node tool , when running node tool on the first three node , nodetool will hung up . when running on the left 7 nodes , it shows that the first three node down > 4. when GC finished , the node comes back , but it will gone in mins later . > 5. kill java process , reboot the frozen node , it will up in mins , and the JVM ram will be increasing full in mins as well , and everythings above repeating .... > 6. even if only one of the first three node frozen , the client request will failed . but my client request CL=QUORUM , and I am playing with hector client lib. > 7. disable the three nodes' thrift api , nothing changed. > ----------change------------ > 0. stop the coming user request (stop our user service to make cassandra free) > 1. decommission 4 nodes (one by one) > 2. moving tokens to banlance the left 6 nodes (one by one) > 3. change the left 6 node resource to : 30GB RAM 16core CPU , heap(16G old , 4GB new) > 4. enable JNA > 5. do major compaction on the 6nodes , do repair on the 6nodes > 6. start the new cluster ... > 7. everything seems ok in the early running time , but 5hours past , every bad matters come back . > 8. because of we have got double RAM now , the dead repeating cycle goes hourly > some screen short attached . -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira