Return-Path: Delivered-To: apmail-hbase-user-archive@www.apache.org Received: (qmail 31032 invoked from network); 5 Jan 2011 16:10:48 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 5 Jan 2011 16:10:48 -0000 Received: (qmail 60888 invoked by uid 500); 5 Jan 2011 16:10:47 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 60696 invoked by uid 500); 5 Jan 2011 16:10:47 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 60688 invoked by uid 99); 5 Jan 2011 16:10:46 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 05 Jan 2011 16:10:46 +0000 X-ASF-Spam-Status: No, hits=3.7 required=10.0 tests=FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,RFC_ABUSE_POST,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of wav100@gmail.com designates 209.85.161.41 as permitted sender) Received: from [209.85.161.41] (HELO mail-fx0-f41.google.com) (209.85.161.41) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 05 Jan 2011 16:10:38 +0000 Received: by fxm12 with SMTP id 12so8972210fxm.14 for ; Wed, 05 Jan 2011 08:10:18 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:date:message-id :subject:from:to:content-type; bh=fhNfqklmIVrtQI9kN4GHe4qhkco8NcBcpmFyhsuhC+8=; b=plw1sdGur6P5jOZ7pbh1aFwo9JbPvMYLYKzADW6mHTqOTpnteNGEUdl6zFgRamGmW8 UKV5TZsSnktyNi72jLI0lY4/muUA3SHQQkJYTS91WnkM0qj7UA6q0DRzS32vlDqQpUx4 1L50295SAm46kUJ5lBpMNiLy4bLsBizYmkXx8= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:date:message-id:subject:from:to:content-type; b=k8l5z2awfffEL26aE0UDopeMqDxB8HEHI06yPq9aspJpEjEJlzssefiUQPfmwd8/Zv 5vOREAC1RqGA3l6wFXc/7DJAcFYwpFpOGPSOoTnTy9sgLijgBdsio5Rzzq4nqNT3YZ7v txMpOx5PG6Wwc5NwGFtgfYX59WJ/4JUIsPYVg= MIME-Version: 1.0 Received: by 10.223.100.8 with SMTP id w8mr3058296fan.55.1294243818749; Wed, 05 Jan 2011 08:10:18 -0800 (PST) Received: by 10.223.102.66 with HTTP; Wed, 5 Jan 2011 08:10:18 -0800 (PST) Date: Wed, 5 Jan 2011 11:10:18 -0500 Message-ID: Subject: JVM OOM From: Wayne To: user@hbase.apache.org Content-Type: multipart/alternative; boundary=20cf3054a62d11bd9604991b9f1b X-Virus-Checked: Checked by ClamAV on apache.org --20cf3054a62d11bd9604991b9f1b Content-Type: text/plain; charset=ISO-8859-1 I am still struggling with the JVM. We just had a hard OOM crash of a region server after only running for 36 hours. Any help would be greatly appreciated. Do we need to restart nodes every 24 hours under load? GC Pauses are something we are trying to plan for, but full out OOM crashes are a new problem. The message below seems to be where it starts going bad. It is followed by no less than 63 Concurrent Mode Failure errors over a 16 minute period. *GC locker: Trying a full collection because scavenge failed* Lastly here is the end (after the 63 CMF errors). Heap par new generation total 1887488K, used 303212K [0x00000005fae00000, 0x000000067ae00000, 0x000000067ae00000) eden space 1677824K, 18% used [0x00000005fae00000, 0x000000060d61b078, 0x0000000661480000) from space 209664K, 0% used [0x000000066e140000, 0x000000066e140000, 0x000000067ae00000) to space 209664K, 0% used [0x0000000661480000, 0x0000000661480000, 0x000000066e140000) concurrent mark-sweep generation total 6291456K, used 2440155K [0x000000067ae00000, 0x00000007fae00000, 0x00000007fae00000) concurrent-mark-sweep perm gen total 31704K, used 18999K [0x00000007fae00000, 0x00000007fccf6000, 0x0000000800000000) Here again are our custom settings in case there are some suggestions out there. Are we making it worse with these settings? What should we try next? -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=60 -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8 -XX:NewRatio=3 -XX:MaxTenuringThreshold=1 Thanks! --20cf3054a62d11bd9604991b9f1b--