From user-return-46778-apmail-hbase-user-archive=hbase.apache.org@hbase.apache.org Wed Jan 7 04:15:14 2015 Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 09FCD1094F for ; Wed, 7 Jan 2015 04:15:14 +0000 (UTC) Received: (qmail 99736 invoked by uid 500); 7 Jan 2015 04:15:13 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 99662 invoked by uid 500); 7 Jan 2015 04:15:13 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 99383 invoked by uid 99); 7 Jan 2015 04:15:04 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 07 Jan 2015 04:15:04 +0000 X-ASF-Spam-Status: No, hits=1.7 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of linshuai2012@gmail.com designates 209.85.217.169 as permitted sender) Received: from [209.85.217.169] (HELO mail-lb0-f169.google.com) (209.85.217.169) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 07 Jan 2015 04:14:39 +0000 Received: by mail-lb0-f169.google.com with SMTP id p9so435438lbv.14 for ; Tue, 06 Jan 2015 20:12:23 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=+TpzqORmDTQdzdWR5K2Aoa4eLqsa980y/n7tET16K1U=; b=jl7RkYDrhNMBgwiUY5KGA+1ODeHy5S11Xi4ZCJzEvf535vVjpCRGW3xdOEq7q0mQHa Ch9GMpjqn8E7M6xUtDaF5ZKTgVJOxzBtN0Be3xqeleJnvPt7uSjlyRl1adSuBttC9sfL QU2my2qCkz5MUkWUwdgVN1bMBWJ8dV4wqAGpW8BDEJOZu+961KbxlJZ07GyrZGFT8gks 9sbkpMEKF3xPBMGFXXIRGaYT4BNJ0NpgHdmQVmVQPQ7WQ9pabaSQ38uXBTyHiaakgqYe GMetVFetxV780CGoZl2nP/x6flKBWVy8CffFwdGW4Op3NRXR4zEF5x8O8E+po9VdV/9A eVpQ== MIME-Version: 1.0 X-Received: by 10.152.205.75 with SMTP id le11mr996548lac.20.1420603943245; Tue, 06 Jan 2015 20:12:23 -0800 (PST) Received: by 10.114.10.170 with HTTP; Tue, 6 Jan 2015 20:12:23 -0800 (PST) In-Reply-To: <1420599083952.90851@xiaomi.com> References: <1420599083952.90851@xiaomi.com> Date: Wed, 7 Jan 2015 12:12:23 +0800 Message-ID: Subject: =?UTF-8?Q?Re=3A_=E7=AD=94=E5=A4=8D=3A_Region_Server_OutOfMemory_Error?= From: Shuai Lin To: user@hbase.apache.org Content-Type: multipart/alternative; boundary=001a11349ff8675e62050c082180 X-Virus-Checked: Checked by ClamAV on apache.org --001a11349ff8675e62050c082180 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Yeah, I know a heap dump would work, but I'm a little worried about dumping 22GB of data on a production server, since it could take quite a while, and make the recovery more slower. On Wed, Jan 7, 2015 at 10:51 AM, =E8=B0=A2=E8=89=AF w= rote: > Could you retry with " -XX:+HeapDumpOnOutOfMemoryError" ? > the heap dump will make the thing clear > ________________________________________ > =E5=8F=91=E4=BB=B6=E4=BA=BA: Shuai Lin > =E5=8F=91=E9=80=81=E6=97=B6=E9=97=B4: 2015=E5=B9=B41=E6=9C=886=E6=97=A5 1= 9:32 > =E6=94=B6=E4=BB=B6=E4=BA=BA: user@hbase.apache.org > =E4=B8=BB=E9=A2=98: Region Server OutOfMemory Error > > Hi all, > > We have a hbase cluster of 5 region servers, each, each hosting 60+ > regions. > > But under heavy load the region servers crashes for OOME now and then: > > # > # java.lang.OutOfMemoryError: Java heap space > # -XX:OnOutOfMemoryError=3D"kill -9 %p" > # Executing /bin/sh -c "kill -9 16820"... > > We have max heap size set to 22GB (-Xmx22528m) for each RS, and uses the > G1GC (-XX:+UseG1GC). To debug the problem we have turned on the jvm GC > log. The last few lines of the GC log before each crash are always like > this: > > 2015-01-06T11:10:19.087+0000: 5035.720: [Full GC 7122M->5837M(21G), > 0.8867660 secs] > [Eden: 1024.0K(7278.0M)->0.0B(8139.0M) Survivors: 68.0M->0.0B Heap: > 7122.7M(22.0G)->5837.2M(22.0G)] > [Times: user=3D1.42 sys=3D0.00, real=3D0.89 secs] > 2015-01-06T11:10:19.976+0000: 5036.608: [Full GC 5837M->5836M(21G), > 0.6378260 secs] > [Eden: 0.0B(8139.0M)->0.0B(8139.0M) Survivors: 0.0B->0.0B Heap: > 5837.2M(22.0G)->5836.5M(22.0G)] > [Times: user=3D0.93 sys=3D0.00, real=3D0.63 secs] > > From the last lineI see the heap only occupies 5837MB, and the capacity i= s > 22GB, so how can the OOM happen? Or is my interpretation of the gc log > wrong? > > I read some articles and onlhy got some basic concept of G1GC. I've tried > tools like GCViewer, but none gives me useful explanation of the details = of > the GC log. > > > Regards, > Shuai > --001a11349ff8675e62050c082180--