Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 85472DA13 for ; Mon, 16 Jul 2012 22:55:39 +0000 (UTC) Received: (qmail 48326 invoked by uid 500); 16 Jul 2012 22:55:37 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 48283 invoked by uid 500); 16 Jul 2012 22:55:37 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 48272 invoked by uid 99); 16 Jul 2012 22:55:37 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 16 Jul 2012 22:55:37 +0000 X-ASF-Spam-Status: No, hits=-2.3 required=5.0 tests=FSL_RCVD_USER,RCVD_IN_DNSWL_MED,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of prattrs@adobe.com designates 64.18.1.191 as permitted sender) Received: from [64.18.1.191] (HELO exprod6og106.obsmtp.com) (64.18.1.191) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 16 Jul 2012 22:55:29 +0000 Received: from outbound-smtp-2.corp.adobe.com ([193.104.215.16]) by exprod6ob106.postini.com ([64.18.5.12]) with SMTP ID DSNKUASbzDuCmaqXeMCXe1mDD6q5Ok+YiHB9@postini.com; Mon, 16 Jul 2012 15:55:09 PDT Received: from inner-relay-4.eur.adobe.com (inner-relay-4b [10.128.4.237]) by outbound-smtp-2.corp.adobe.com (8.12.10/8.12.10) with ESMTP id q6GMt7EF010695 for ; Mon, 16 Jul 2012 15:55:07 -0700 (PDT) Received: from nahub01.corp.adobe.com (nahub01.corp.adobe.com [10.8.189.97]) by inner-relay-4.eur.adobe.com (8.12.10/8.12.9) with ESMTP id q6GMt5Ys014148 for ; Mon, 16 Jul 2012 15:55:07 -0700 (PDT) Received: from NAMBX02.corp.adobe.com ([10.8.127.96]) by nahub01.corp.adobe.com ([10.8.189.97]) with mapi; Mon, 16 Jul 2012 15:55:06 -0700 From: Sandy Pratt To: "user@hbase.apache.org" Date: Mon, 16 Jul 2012 15:55:03 -0700 Subject: RE: Hmaster and HRegionServer disappearance reason to ask Thread-Topic: Hmaster and HRegionServer disappearance reason to ask Thread-Index: Ac1bCa6H6Gcw6UIcQg+ph5jJtasgZgDncu0wAB15S0AAFk2yYAAbdY1AAPARGSA= Message-ID: <0D0534D89070F347A7ACC0D03FCE696B0720675BA1@NAMBX02.corp.adobe.com> References: <201207021630247931834@163.com> <870C7774-966B-4A8C-9758-0571C603571D@gmail.com> <1341532480.92715.YahooMailNeo@web192501.mail.sg3.yahoo.com> <004101cd5f1f$66584c20$3308e460$@ch@huawei.com> <004a01cd5fe5$ecbbe9e0$c633bda0$@ch@huawei.com> In-Reply-To: <004a01cd5fe5$ecbbe9e0$c633bda0$@ch@huawei.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: en-US Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Virus-Checked: Checked by ClamAV on apache.org This sounds similar to something I've seen before, but in that case I found= the winning GC arguments to be something like -XX:+UseParallelGC -XX:+UseParallelOldGC -XX:MaxDirectMemorySize=3D128m (note the old gen parallel compacting collector rather than the ParNew coll= ector which IIRC is used with concurrent GC by default) I don't recall the MaxDirectMemorySize on its own preventing massive off-he= ap memory allocations from piling up. Just my 2 cents, YMMV. Sandy -----Original Message----- From: Laxman [mailto:lakshman.ch@huawei.com]=20 Sent: Wednesday, July 11, 2012 9:22 PM To: 'Pablo Musa'; user@hbase.apache.org Subject: RE: Hmaster and HRegionServer disappearance reason to ask > > 1) Fix the direct memory usage to a fixed value - > XX:MaxDirectMemorySize=3D1G >=20 > This flag should be in RS ou DN? We need to apply for both but limit can be increased based on your load (Ma= y be 2G). Also we can to apply to all processes which are having following symptoms. 1) Allocated heap is few GB (4 to 8) 2) VIRT/RES will occupy double the heap (like 15GB) or even more 3) Long pauses in GC log (allocated heap is just <=3D8GB) 4) Your application uses lot of NIO/RMI calls(Ex: DataNode, RegionServer) In our cluster we apply for all server processes (NN, DN, HM, RS, JT, TT, Z= ooKeeper). Long pauses are disappeared after we set this flag (esp. for DN and RS). -- Regards, Laxman