Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id CE41E11DAD for ; Fri, 25 Apr 2014 06:07:54 +0000 (UTC) Received: (qmail 9237 invoked by uid 500); 25 Apr 2014 06:07:51 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 8353 invoked by uid 500); 25 Apr 2014 06:07:47 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 8345 invoked by uid 99); 25 Apr 2014 06:07:47 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 25 Apr 2014 06:07:47 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [202.39.229.15] (HELO wehq.winbond.com) (202.39.229.15) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 25 Apr 2014 06:07:42 +0000 Received: from mail.winbond.com (wectmlhub01.winbond.com [10.6.10.101]) by wehq.winbond.com (Postfix) with ESMTP id ED7F534240B for ; Fri, 25 Apr 2014 14:07:15 +0800 (CST) From: Henry Hung To: "user@hbase.apache.org" Date: Fri, 25 Apr 2014 14:07:13 +0800 Subject: suggestion for how eliminate memory problem in heavy-write hbase region server Thread-Topic: suggestion for how eliminate memory problem in heavy-write hbase region server Thread-Index: Ac9gS+ZWsMtcaLzAR6OEN+fOQNodiQ== Message-ID: <53DC189E5FAEFA43BFA1BC02431031DB74086592E8@WECTMLBOX.winbond.com.tw> Accept-Language: zh-TW, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: zh-TW, en-US Content-Type: multipart/alternative; boundary="_000_53DC189E5FAEFA43BFA1BC02431031DB74086592E8WECTMLBOXwinb_" MIME-Version: 1.0 X-Virus-Checked: Checked by ClamAV on apache.org --_000_53DC189E5FAEFA43BFA1BC02431031DB74086592E8WECTMLBOXwinb_ Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Dear All, My current hbase environment is heavy write cluster with constant 2000+ ins= ert rows / second spread to 10 region servers. Each day I also need to do data deletion, and that will add a lot of IO to = the cluster. The problem is sometimes after a week, one of the region server will crash = because 2014-04-10T10:17:47.200+0800: 1281486.956: [GC 1281486.956: [ParNew (promot= ion failed): 235959K->235959K(235968K), 0.0836790 secs]1281487.040: [CMS201= 4-04-10T10:21:14.957+0800: 1281694.712: [CMS-concurrent-sweep: 267.111/279.= 155 secs] [Times: user=3D334.79 sys=3D14.38, real=3D279.11 secs] (concurrent mode failure): 13961950K->6802914K(16515072K), 209.9436660 secs= ] 14186496K->6802914K(16751040K), [CMS Perm : 42864K->42859K(71816K)], 210.= 0274680 secs] [Times: user=3D210.18 sys=3D0.01, real=3D209.99 secs] I look into the gc log and usually find some information about CMS concurre= nt sweep that took a very long time to complete, such as: 2014-04-10T10:15:56.929+0800: 1281376.684: [CMS-concurrent-sweep: 48.834/58= .027 secs] [Times: user=3D101.52 sys=3D11.82, real=3D58.02 secs] I do a lot of google-ing and already read the Todd Lipcon avoiding full GC,= or other blogs that sometimes tells me how to set jvm flags such as this: -XX:+UseParNewGC -XX:CMSInitiatingOccupancyFraction=3D70 -Xmn256m -Xmx16384m -XX:+DisableExplicitGC -XX:+UseCompressedOops -XX:PermSize=3D160m -XX:MaxPermSize=3D160m -XX:GCTimeRatio=3D19 -XX:SoftRefLRUPolicyMSPerMB=3D0 -XX:SurvivorRatio=3D2 -XX:MaxTenuringThreshold=3D1 -XX:+UseFastAccessorMethods -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:+UseCMSCompactAtFullCollection -XX:CMSFullGCsBeforeCompaction=3D0 -XX:+CMSClassUnloadingEnabled -XX:CMSMaxAbortablePrecleanTime=3D300 -XX:+CMSScavengeBeforeRemark But alas, the problem still exist. I also know that java 1.7 has a new G1GC that probably can be used to fix t= his problem, but I don't know if hbase 0.96 is ready to use it? I would really appreciate it if someone out there can share one or two thin= gs about jvm configuration to achieve a more stable region server. Best regards, Henry ________________________________ The privileged confidential information contained in this email is intended= for use only by the addressees as indicated by the original sender of this= email. If you are not the addressee indicated in this email or are not res= ponsible for delivery of the email to such a person, please kindly reply to= the sender indicating this fact and delete all copies of it from your comp= uter and network server immediately. Your cooperation is highly appreciated= . It is advised that any unauthorized use of confidential information of Wi= nbond is strictly prohibited; and any information in this email irrelevant = to the official business of Winbond shall be deemed as neither given nor en= dorsed by Winbond. --_000_53DC189E5FAEFA43BFA1BC02431031DB74086592E8WECTMLBOXwinb_--