Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id EC82D200CED for ; Fri, 18 Aug 2017 17:43:13 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id EB06816CED8; Fri, 18 Aug 2017 15:43:13 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id E089A16CED7 for ; Fri, 18 Aug 2017 17:43:12 +0200 (CEST) Received: (qmail 28073 invoked by uid 500); 18 Aug 2017 15:43:10 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 28060 invoked by uid 99); 18 Aug 2017 15:43:10 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 18 Aug 2017 15:43:10 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id E18A4C17F1 for ; Fri, 18 Aug 2017 15:43:09 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.198 X-Spam-Level: * X-Spam-Status: No, score=1.198 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_LOW=-0.7, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=yahoo.com Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id zG4i-lGTtF2B for ; Fri, 18 Aug 2017 15:43:05 +0000 (UTC) Received: from nm16-vm8.bullet.mail.sg3.yahoo.com (nm16-vm8.bullet.mail.sg3.yahoo.com [106.10.149.71]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id 862D460E3A for ; Fri, 18 Aug 2017 15:43:04 +0000 (UTC) Received: from [106.10.166.120] by nm16.bullet.mail.sg3.yahoo.com with NNFMP; 18 Aug 2017 15:42:57 -0000 Received: from [106.10.167.171] by tm9.bullet.mail.sg3.yahoo.com with NNFMP; 18 Aug 2017 15:42:57 -0000 Received: from [127.0.0.1] by smtp144.mail.sg3.yahoo.com with NNFMP; 18 Aug 2017 15:42:57 -0000 X-Yahoo-Newman-Id: 507242.54969.bm@smtp144.mail.sg3.yahoo.com X-Yahoo-Newman-Property: ymail-3 X-YMail-OSG: MxskrQoVM1mg1kSGiNKtpf0d6wRsHQJxAhuNW80Q92StHQP ToS3_yyQPMm8eoBhJEc6U_GCf0T9sFc8kVjv84ecIVx80YGbFww6zmGZ5iSU _.JNjHHgxKSCWbnqeEim3MX9kt7pRzYsIdvHZDSzg0W3bgyqQlIniP8fUgJW SXhAgEM_5I9O6RQLSAx5MC5ypUvvnSFYtAQGdgkoOXbtJ1dFNcaiwoS4VsLs caJFeQP1QE3.oDVyvnTvP0b5uQR8Nvy0iAOvKsk4fIRZnFYqFIvD_fas4p46 YIQBDiIUqPvw4qnVf5F_c4Z3jW2yLnePqGdwyBsC9NmppDHLf_n7frHMSQnW iPQ5nNwkWvpny1imZG2V9Pd9gjBzP4L6TRK7WEgLR2VoIb7LDwquOKGfw_e7 SAxQZ16tTrmfpbQdEWs6CRBJ8pMmhWj2GOizUR.RYAdHKBHklANDgHvplUIN jWGpfeP1cqXheuDdwURU3C5KGwaA2Z2RJF1tAxpp3pSC1D2_wGstAwGFqUSz FeDayYFh_Hn6AaQ-- X-Yahoo-SMTP: gCgK.JGswBDN_zWiqpzZh.x8ZiuyESdUphPEnQ-- Subject: Re: JVM OPTS about HDFS To: gu.yizhou@zte.com.cn References: <201708181319488347197@zte.com.cn> Cc: user@hadoop.apache.org From: Gurmukh Singh Message-ID: Date: Sat, 19 Aug 2017 01:42:48 +1000 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:52.0) Gecko/20100101 Thunderbird/52.1.1 MIME-Version: 1.0 In-Reply-To: <201708181319488347197@zte.com.cn> Content-Type: multipart/alternative; boundary="------------4337D0C2551038ACB99829A1" Content-Language: en-US archived-at: Fri, 18 Aug 2017 15:43:14 -0000 --------------4337D0C2551038ACB99829A1 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit 400GB as heap space for Namenode is bit high. The GC pause time will be very high. For a cluster with about 6PB, approx 20GB is decent memory. As you mentioned it is HA, so it is safe to assume that the fsimage is check pointed at regular intervals and we do not need to worry during a manual restart of namenode, about the memory to play edits into fsimage. But still it is good to account for as delta. But still not 400GB. A good way to estimate: Some of my tests: writing about 2TB of data on HDFS with block size = 128 MB, replication 3 - creates about 18k blocks (18051). *Memory needed for that blocks:* hdfs oiv -p XML -printToScreen -i /mnt/namenode/current/fsimage_0000000000000051228 | egrep "block|inode" | wc -l | awk '{printf "Objects=%d : Suggested Xms=%0dm Xmx=%0dm\n", $1, (($1 / 1000000 )*1024), (($1 / 1000000 )*1024)}' Objects=18051 : Suggested Xms=18m Xmx=18m *Maths for Cluster:* ---------------- 150 bytes per object(object is block, file, directory) 24 TB x 2000 nodes = 48000 TB Block size = 128 MB Total blocks = 48000TB/128MB = 393216000 Blocks Adjusting for replication factor, which is 3 by default. As each replicated block just takes about 16 bytes in memory of namenode. 393216000/3 = 131072000 x 150 + (16 bytes x 131072000 blocks) = 19660800000 + 2097152000 =*20.23 GB* In addition to this memory is needed for namespace metadata -> Each file name will also be accounted for 150 bytes of Namenode memory On 18/8/17 3:19 pm, gu.yizhou@zte.com.cn wrote: > > Hi All, > > HDFS Federation with PB+ rest data (Single Name Service is HA, Based > on QJM) , Apache 2.7.3 on Redhat 6.5 with JDK1.7. > > > 1.Plan to deploy NN on server(32cores, 512G) , any precious advice > about JVM OPTS? If set heap size to about 400G with CMS GC collector, > any obvious problems? > > > 2.If there are many groups of Name Services, is it more efficient > that part of Name Services share one group of JNs? Any advice to JN? > > > 3.Welcome any words to Federation, thanks! > > > Thanks in advance, > > Doris > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscribe@hadoop.apache.org > For additional commands, e-mail: user-help@hadoop.apache.org --------------4337D0C2551038ACB99829A1 Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: 8bit

400GB as heap space for Namenode is bit high. The GC pause time will be very high.

For a cluster with about 6PB, approx 20GB is decent memory.

As you mentioned it is HA, so it is safe to assume that the fsimage is check pointed at regular intervals and we do not need to worry during a manual restart of namenode, about the memory to play edits into fsimage. But still it is good to account for as delta. But still not 400GB.


A good way to estimate:

Some of my tests:

writing about 2TB of data on HDFS with block size = 128 MB, replication 3 - creates about 18k blocks (18051).

Memory needed for that blocks:

hdfs oiv -p XML -printToScreen -i /mnt/namenode/current/fsimage_0000000000000051228 | egrep "block|inode" | wc -l | awk '{printf "Objects=%d : Suggested Xms=%0dm Xmx=%0dm\n", $1, (($1 / 1000000 )*1024), (($1 / 1000000 )*1024)}'

Objects=18051 : Suggested Xms=18m Xmx=18m

 

Maths for Cluster:
----------------

150 bytes per object(object is block, file, directory)

24 TB x 2000 nodes = 48000 TB

Block size = 128 MB

Total blocks = 48000TB/128MB = 393216000 Blocks

Adjusting for replication factor, which is 3 by default. As each replicated block just takes about 16 bytes in memory of namenode.

393216000/3 = 131072000 x 150 + (16 bytes x 131072000 blocks) = 19660800000 + 2097152000 =20.23 GB

In addition to this memory is needed for namespace metadata -> Each file name will also be accounted for 150 bytes of Namenode memory




On 18/8/17 3:19 pm, gu.yizhou@zte.com.cn wrote:

Hi All,

    HDFS Federation with PB+ rest data (Single Name Service is HA, Based on QJM) ,  Apache 2.7.3 on Redhat 6.5 with JDK1.7.


    1.Plan to deploy NN on server(32cores, 512G) , any precious advice about JVM OPTS?  If set heap size to about 400G with CMS GC collector, any obvious problems?


    2.If there are many groups of Name Services, is it more efficient that part of Name Services share one group of JNs? Any advice to JN?


    3.Welcome any words to Federation, thanks!


Thanks in advance,

Doris




---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@hadoop.apache.org
For additional commands, e-mail: user-help@hadoop.apache.org

--------------4337D0C2551038ACB99829A1--