Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Subject: Re: JVM OPTS about HDFS
To: gu.yizhou@zte.com.cn
References: <201708181319488347197@zte.com.cn>
Cc: user@hadoop.apache.org
From: Gurmukh Singh <gurmukh.dhillon@yahoo.com.INVALID>
Message-ID: <b4d34d99-152b-cc74-e3e7-9756c5fae4d6@yahoo.com>
Date: Sat, 19 Aug 2017 01:42:48 +1000
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:52.0)
 Gecko/20100101 Thunderbird/52.1.1
MIME-Version: 1.0
In-Reply-To: <201708181319488347197@zte.com.cn>
Content-Type: multipart/alternative;
 boundary="------------4337D0C2551038ACB99829A1"
Content-Language: en-US
archived-at: Fri, 18 Aug 2017 15:43:14 -0000

--------------4337D0C2551038ACB99829A1
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit

400GB as heap space for Namenode is bit high. The GC pause time will be 
very high.

For a cluster with about 6PB, approx 20GB is decent memory.

As you mentioned it is HA, so it is safe to assume that the fsimage is 
check pointed at regular intervals and we do not need to worry during a 
manual restart of namenode, about the memory to play edits into fsimage. 
But still it is good to account for as delta. But still not 400GB.


A good way to estimate:

Some of my tests:

writing about 2TB of data on HDFS with block size = 128 MB, replication 
3 - creates about 18k blocks (18051).

*Memory needed for that blocks:*

hdfs oiv -p XML -printToScreen -i 
/mnt/namenode/current/fsimage_0000000000000051228 | egrep "block|inode" 
| wc -l | awk '{printf "Objects=%d : Suggested Xms=%0dm Xmx=%0dm\n", $1, 
(($1 / 1000000 )*1024), (($1 / 1000000 )*1024)}'

Objects=18051 : Suggested Xms=18m Xmx=18m

*Maths for Cluster:*
----------------

150 bytes per object(object is block, file, directory)

24 TB x 2000 nodes = 48000 TB

Block size = 128 MB

Total blocks = 48000TB/128MB = 393216000 Blocks

Adjusting for replication factor, which is 3 by default. As each 
replicated block just takes about 16 bytes in memory of namenode.

393216000/3 = 131072000 x 150 + (16 bytes x 131072000 blocks) = 
19660800000 + 2097152000 =*20.23 GB*

In addition to this memory is needed for namespace metadata -> Each file 
name will also be accounted for 150 bytes of Namenode memory


On 18/8/17 3:19 pm, gu.yizhou@zte.com.cn wrote:
>
> Hi All,
>
> HDFS Federation with PB+ rest data (Single Name Service is HA, Based 
> on QJM) ,  Apache 2.7.3 on Redhat 6.5 with JDK1.7.
>
>
>     1.Plan to deploy NN on server(32cores, 512G) , any precious advice 
> about JVM OPTS?  If set heap size to about 400G with CMS GC collector, 
> any obvious problems?
>
>
>     2.If there are many groups of Name Services, is it more efficient 
> that part of Name Services share one group of JNs? Any advice to JN?
>
>
>     3.Welcome any words to Federation, thanks!
>
>
> Thanks in advance,
>
> Doris
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@hadoop.apache.org
> For additional commands, e-mail: user-help@hadoop.apache.org


--------------4337D0C2551038ACB99829A1
Content-Type: text/html; charset=utf-8
Content-Transfer-Encoding: 8bit

<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    <p>400GB as heap space for Namenode is bit high. The GC pause time
      will be very high.</p>
    <p>For a cluster with about 6PB, approx 20GB is decent memory.</p>
    <p>As you mentioned it is HA, so it is safe to assume that the
      fsimage is check pointed at regular intervals and we do not need
      to worry during a manual restart of namenode, about the memory to
      play edits into fsimage. But still it is good to account for as
      delta. But still not 400GB.<br>
    </p>
    <p><br>
    </p>
    <p>A good way to estimate:</p>
    <p>Some of my tests:</p>
    <p>writing about 2TB of data on HDFS with block size = 128 MB,
      replication 3 - creates about 18k blocks (<span style="font-size:
        11pt;">18051)</span>.</p>
    <p><b>Memory needed for that blocks:</b><br>
    </p>
    <p class="MsoNormal" style="color: rgb(0, 0, 0); font-style: normal;
      font-variant-caps: normal; font-weight: normal; letter-spacing:
      normal; orphans: auto; text-align: start; text-indent: 0px;
      text-transform: none; white-space: normal; widows: auto;
      word-spacing: 0px; -webkit-text-size-adjust: auto;
      -webkit-text-stroke-width: 0px; margin: 0in 0in 0.0001pt;
      font-size: 12pt; font-family: Calibri;"><span style="font-size:
        11pt;">hdfs oiv -p XML -printToScreen -i
        /mnt/namenode/current/fsimage_0000000000000051228 | egrep
        "block|inode" | wc -l | awk '{printf "Objects=%d : Suggested
        Xms=%0dm Xmx=%0dm\n", $1, (($1 / 1000000 )*1024), (($1 / 1000000
        )*1024)}'</span></p>
    <p class="MsoNormal" style="color: rgb(0, 0, 0); font-style: normal;
      font-variant-caps: normal; font-weight: normal; letter-spacing:
      normal; orphans: auto; text-align: start; text-indent: 0px;
      text-transform: none; white-space: normal; widows: auto;
      word-spacing: 0px; -webkit-text-size-adjust: auto;
      -webkit-text-stroke-width: 0px; margin: 0in 0in 0.0001pt;
      font-size: 12pt; font-family: Calibri;"><span style="font-size:
        11pt;">Objects=18051 : Suggested Xms=18m Xmx=18m</span></p>
    <p class="MsoNormal" style="color: rgb(0, 0, 0); font-style: normal;
      font-variant-caps: normal; font-weight: normal; letter-spacing:
      normal; orphans: auto; text-align: start; text-indent: 0px;
      text-transform: none; white-space: normal; widows: auto;
      word-spacing: 0px; -webkit-text-size-adjust: auto;
      -webkit-text-stroke-width: 0px; margin: 0in 0in 0.0001pt;
      font-size: 12pt; font-family: Calibri;"><span style="font-size:
        11pt;"> </span></p>
    <p class="MsoNormal" style="color: rgb(0, 0, 0); font-style: normal;
      font-variant-caps: normal; font-weight: normal; letter-spacing:
      normal; orphans: auto; text-align: start; text-indent: 0px;
      text-transform: none; white-space: normal; widows: auto;
      word-spacing: 0px; -webkit-text-size-adjust: auto;
      -webkit-text-stroke-width: 0px; margin: 0in 0in 0.0001pt;
      font-size: 12pt; font-family: Calibri;"><b><span style="font-size:
          11pt;">Maths for Cluster:</span></b><span style="font-size:
        11pt;"><br>
        ----------------<br>
        <br>
        150 bytes per object(object is block, file, directory)<br>
        <br>
        24 TB x 2000 nodes = 48000 TB<br>
        <br>
        Block size = 128 MB<br>
        <br>
        Total blocks = 48000TB/128MB = 393216000 Blocks<br>
        <br>
        Adjusting for replication factor, which is 3 by default. As each
        replicated block just takes about 16 bytes in memory of
        namenode.<br>
        <br>
        393216000/3 = 131072000 x 150 + (16 bytes x 131072000 blocks) =
        19660800000 + 2097152000 =<b>20.23 GB</b><br>
        <br>
        In addition to this memory is needed for namespace metadata
        -&gt; Each file name will also be accounted for 150 bytes of
        Namenode memory<br>
      </span></p>
    <br class="Apple-interchange-newline">
    <p><br>
    </p>
    <br>
    <div class="moz-cite-prefix">On 18/8/17 3:19 pm,
      <a class="moz-txt-link-abbreviated" href="mailto:gu.yizhou@zte.com.cn">gu.yizhou@zte.com.cn</a> wrote:<br>
    </div>
    <blockquote type="cite" cite="mid:201708181319488347197@zte.com.cn">
      <div class="zcontentRow">
        <div class="zcontentRow">
          <div style="font-size:12px;">
            <div class="zcontentRow">
              <p style="box-sizing: border-box; margin-top: 0px;
                margin-bottom: 0px; min-height: 14px; font-family:
                Arial, 宋体, 'Microsoft Yahei', 'Lucida Grande', Verdana,
                Lucida, Helvetica, sans-serif; white-space: normal;
                line-height: 21px; outline: 0px !important;
                background-color: rgb(255, 255, 255);"><span
                  style="box-sizing: border-box; outline: 0px
                  !important; font-weight: 700;">Hi All,<br
                    style="box-sizing: border-box; outline: 0px
                    !important;">
                </span></p>
              <p style="box-sizing: border-box; margin-top: 0px;
                margin-bottom: 0px; min-height: 14px; font-family:
                Arial, 宋体, 'Microsoft Yahei', 'Lucida Grande', Verdana,
                Lucida, Helvetica, sans-serif; white-space: normal;
                line-height: 21px; outline: 0px !important;
                background-color: rgb(255, 255, 255);"><span
                  style="box-sizing: border-box; outline: 0px
                  !important; font-weight: 700;">    </span>HDFS
                Federation with PB+ rest data (Single Name Service is
                HA, Based on QJM) ,  Apache 2.7.3 on Redhat 6.5 with
                JDK1.7.</p>
              <p style="box-sizing: border-box; margin-top: 0px;
                margin-bottom: 0px; min-height: 14px; font-family:
                Arial, 宋体, 'Microsoft Yahei', 'Lucida Grande', Verdana,
                Lucida, Helvetica, sans-serif; white-space: normal;
                line-height: 21px; outline: 0px !important;
                background-color: rgb(255, 255, 255);"><br>
              </p>
              <p style="box-sizing: border-box; margin-top: 0px;
                margin-bottom: 0px; min-height: 14px; font-family:
                Arial, 宋体, 'Microsoft Yahei', 'Lucida Grande', Verdana,
                Lucida, Helvetica, sans-serif; white-space: normal;
                line-height: 21px; outline: 0px !important;
                background-color: rgb(255, 255, 255);">    1.Plan to
                deploy NN on server(32cores, 512G) , any precious advice
                about JVM OPTS?  If set heap size to about 400G with CMS
                GC collector, any obvious problems?</p>
              <p style="box-sizing: border-box; margin-top: 0px;
                margin-bottom: 0px; min-height: 14px; font-family:
                Arial, 宋体, 'Microsoft Yahei', 'Lucida Grande', Verdana,
                Lucida, Helvetica, sans-serif; white-space: normal;
                line-height: 21px; outline: 0px !important;
                background-color: rgb(255, 255, 255);"><br>
              </p>
              <p style="box-sizing: border-box; margin-top: 0px;
                margin-bottom: 0px; min-height: 14px; font-family:
                Arial, 宋体, 'Microsoft Yahei', 'Lucida Grande', Verdana,
                Lucida, Helvetica, sans-serif; white-space: normal;
                line-height: 21px; outline: 0px !important;
                background-color: rgb(255, 255, 255);">    2.If there
                are many groups of <span style="font-family: Arial, 宋体,
                  'Microsoft Yahei', 'Lucida Grande', Verdana, Lucida,
                  Helvetica, sans-serif; font-size: 12px; line-height:
                  21px; background-color: rgb(255, 255, 255);">Name
                  Services, is it more efficient that part of Name
                  Services share one group of JNs? Any advice to JN?</span></p>
              <p style="box-sizing: border-box; margin-top: 0px;
                margin-bottom: 0px; min-height: 14px; font-family:
                Arial, 宋体, 'Microsoft Yahei', 'Lucida Grande', Verdana,
                Lucida, Helvetica, sans-serif; white-space: normal;
                line-height: 21px; outline: 0px !important;
                background-color: rgb(255, 255, 255);"><span
                  style="font-family: Arial, 宋体, 'Microsoft Yahei',
                  'Lucida Grande', Verdana, Lucida, Helvetica,
                  sans-serif; font-size: 12px; line-height: 21px;
                  background-color: rgb(255, 255, 255);"><br>
                </span></p>
              <p style="box-sizing: border-box; margin-top: 0px;
                margin-bottom: 0px; min-height: 14px; font-family:
                Arial, 宋体, 'Microsoft Yahei', 'Lucida Grande', Verdana,
                Lucida, Helvetica, sans-serif; white-space: normal;
                line-height: 21px; outline: 0px !important;
                background-color: rgb(255, 255, 255);"><span
                  style="line-height: 21px;">    3.Welcome any words to
                  Federation, thanks!</span><br>
              </p>
              <div class="zMailSign" style="box-sizing: border-box;
                font-family: Arial, 宋体, 'Microsoft Yahei', 'Lucida
                Grande', Verdana, Lucida, Helvetica, sans-serif;
                line-height: 20px; white-space: normal; outline: 0px
                !important; background-color: rgb(255, 255, 255);">
                <p style="box-sizing: border-box; margin-top: 0px;
                  margin-bottom: 0px; min-height: 14px; font-family: 宋体;
                  line-height: normal; outline: 0px !important;"><span
                    style="box-sizing: border-box; outline: 0px
                    !important; font-family: arial, helvetica,
                    sans-serif;"><br style="box-sizing: border-box;
                      outline: 0px !important;">
                  </span></p>
                <p style="box-sizing: border-box; margin-top: 0px;
                  margin-bottom: 0px; min-height: 14px; font-family: 宋体;
                  line-height: normal; outline: 0px !important;"><span
                    style="box-sizing: border-box; outline: 0px
                    !important; font-weight: 700;"><span
                      style="box-sizing: border-box; outline: 0px
                      !important; font-family: arial, helvetica,
                      sans-serif;">Thanks in advance,</span></span></p>
                <p style="box-sizing: border-box; margin-top: 0px;
                  margin-bottom: 0px; min-height: 14px; outline: 0px
                  !important;"><span style="box-sizing: border-box;
                    outline: 0px !important; font-weight: 700;"><span
                      style="box-sizing: border-box; outline: 0px
                      !important; font-family: arial, helvetica,
                      sans-serif;">Doris</span></span></p>
              </div>
              <p><br>
              </p>
            </div>
          </div>
        </div>
      </div>
      <br>
      <fieldset class="mimeAttachmentHeader"></fieldset>
      <br>
      <pre wrap="">
---------------------------------------------------------------------
To unsubscribe, e-mail: <a class="moz-txt-link-abbreviated" href="mailto:user-unsubscribe@hadoop.apache.org">user-unsubscribe@hadoop.apache.org</a>
For additional commands, e-mail: <a class="moz-txt-link-abbreviated" href="mailto:user-help@hadoop.apache.org">user-help@hadoop.apache.org</a></pre>
    </blockquote>
    <br>
  </body>
</html>

--------------4337D0C2551038ACB99829A1--