Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hadoop.apache.org
Received-SPF: pass (nike.apache.org: domain of bhuffman@etinternational.com
 designates 65.222.140.81 as permitted sender)
Message-ID: <5463AB4C.8010703@etinternational.com>
Date: Wed, 12 Nov 2014 13:47:40 -0500
From: "Brian C. Huffman" <bhuffman@etinternational.com>
User-Agent: Mozilla/5.0 (X11; Linux x86_64;
 rv:31.0) Gecko/20100101 Thunderbird/31.2.0
MIME-Version: 1.0
To: user@hadoop.apache.org
Subject: Re: ext4 on a hadoop cluster datanodes
References: 
 <CAB-gU_s_9fEFQZMukbqe3uRb1cfQZhFt7JG2GpAHMosW=AdSiQ@mail.gmail.com>
 <CAChq9g3BCDXwox49AOzBXUXX2gRR7WwgYeeotO7zCpjdEjfjew@mail.gmail.com>
In-Reply-To: 
 <CAChq9g3BCDXwox49AOzBXUXX2gRR7WwgYeeotO7zCpjdEjfjew@mail.gmail.com>
Content-Type: multipart/alternative;
 boundary="------------040305060108020301050604"

This is a multi-part message in MIME format.
--------------040305060108020301050604
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit

Would this set of ext4 parameters be ok for a 500GB HDFS data drive?

Thanks,
Brian

On 10/06/2014 06:09 PM, Travis wrote:
> For filesystem creation, we use the following with mkfs.ext4
>
> mkfs.ext4 -T largefile -m 1 -O dir_index,extent,sparse_super -L 
> $HDFS_LABEL /dev/${DEV}1
>
> By default, mkfs creates way too many inodes, so we tune it a bit with 
> the "largefile" option, which modifies the inode_ratio.  This gives us 
> ~2 million usable inodes on a 2TB filesystem.
>
> As well, by default, mkfs sets the block reserve to 5%, which wastes a 
> fair amount of space, since this space is only accessible to the root 
> user.  We tune this down to 1% at mkfs time, but you can use tune2fs 
> at runtime to change it.
>
> I don't know that I would use writeback. This mode is problematic in 
> the event of a crash because it can allow old data to exist on the FS, 
> but with new metadata.  I consider this corruption.  Unless you know 
> your environment to be super stable (meaning no OS or hardware-induced 
> crashes) AND you have stable, UPS-backed power, I would steer clear of 
> this.
>
> If you're looking for the utmost in filesystem performance, you're 
> better off looking at the controller card you're using.  Right now, 
> we're using LSI9207-8i and seeing an aggregate 1.6-1.8GBytes/sec 
> throughput across 12 drives in JBOD.  Our older LSI-based cards can 
> only sustain maybe a quarter of that in the same disk configuration.
>
> Travis
>
> On Mon, Oct 6, 2014 at 4:46 PM, Colin Kincaid Williams <discord@uw.edu 
> <mailto:discord@uw.edu>> wrote:
>
>     Hi,
>
>     I'm trying to figure out what are more ideal settings for using
>     ext4 on hadoop cluster datanodes. From the hadoop site its
>     recommended nodelalloc option is chosen in the fstab. Is that
>     still a preferred option?
>
>     I read elsewhere to disable the ext4 journal, and use data=writeback.
>
>     http://fenidik.blogspot.com/2010/03/ext4-disable-journal.html
>
>     Finally, in some slides i read to use
>     dir_index,sparse_super,extent when creating the filesystem, and
>     mount noatime and nodiratime
>
>     http://www.slideshare.net/leonsp/best-practices-for-deploying-hadoop-biginsights-in-the-cloud
>
>
>
>
>
>
>
>
> -- 
> Travis Campbell
> travis@ghostar.org <mailto:travis@ghostar.org>


--------------040305060108020301050604
Content-Type: text/html; charset=utf-8
Content-Transfer-Encoding: 8bit

<html>
  <head>
    <meta content="text/html; charset=utf-8" http-equiv="Content-Type">
  </head>
  <body bgcolor="#FFFFFF" text="#000000">
    Would this set of ext4 parameters be ok for a 500GB HDFS data drive?<br>
    <br>
    Thanks,<br>
    Brian<br>
    <br>
    <div class="moz-cite-prefix">On 10/06/2014 06:09 PM, Travis wrote:<br>
    </div>
    <blockquote
cite="mid:CAChq9g3BCDXwox49AOzBXUXX2gRR7WwgYeeotO7zCpjdEjfjew@mail.gmail.com"
      type="cite">
      <div dir="ltr">For filesystem creation, we use the following with
        mkfs.ext4
        <div><br>
        </div>
        mkfs.ext4 -T largefile -m 1 -O dir_index,extent,sparse_super -L
        $HDFS_LABEL /dev/${DEV}1
        <div><br>
        </div>
        <div>By default, mkfs creates way too many inodes, so we tune it
          a bit with the "largefile" option, which modifies the
          inode_ratio.  This gives us ~2 million usable inodes on a 2TB
          filesystem.  </div>
        <div><br>
        </div>
        <div>As well, by default, mkfs sets the block reserve to 5%,
          which wastes a fair amount of space, since this space is only
          accessible to the root user.  We tune this down to 1% at mkfs
          time, but you can use tune2fs at runtime to change it.</div>
        <div><br>
        </div>
        <div>I don't know that I would use writeback. This mode is
          problematic in the event of a crash because it can allow old
          data to exist on the FS, but with new metadata.  I consider
          this corruption.  Unless you know your environment to be super
          stable (meaning no OS or hardware-induced crashes) AND you
          have stable, UPS-backed power, I would steer clear of this.</div>
        <div><br>
        </div>
        <div>If you're looking for the utmost in filesystem performance,
          you're better off looking at the controller card you're
          using.  Right now, we're using LSI9207-8i and seeing an
          aggregate 1.6-1.8GBytes/sec throughput across 12 drives in
          JBOD.  Our older LSI-based cards can only sustain maybe a
          quarter of that in the same disk configuration.</div>
        <div><br>
        </div>
        <div>Travis</div>
      </div>
      <div class="gmail_extra"><br>
        <div class="gmail_quote">On Mon, Oct 6, 2014 at 4:46 PM, Colin
          Kincaid Williams <span dir="ltr">&lt;<a
              moz-do-not-send="true" href="mailto:discord@uw.edu"
              target="_blank">discord@uw.edu</a>&gt;</span> wrote:<br>
          <blockquote class="gmail_quote" style="margin:0 0 0
            .8ex;border-left:1px #ccc solid;padding-left:1ex">
            <div dir="ltr">Hi,
              <div><br>
              </div>
              <div>I'm trying to figure out what are more ideal settings
                for using ext4 on hadoop cluster datanodes. From the
                hadoop site its recommended nodelalloc option is chosen
                in the fstab. Is that still a preferred option?</div>
              <div><br>
              </div>
              <div>I read elsewhere to disable the ext4 journal, and use
                data=writeback.</div>
              <div><br>
              </div>
              <div><a moz-do-not-send="true"
                  href="http://fenidik.blogspot.com/2010/03/ext4-disable-journal.html"
                  target="_blank">http://fenidik.blogspot.com/2010/03/ext4-disable-journal.html</a><br>
              </div>
              <div><br>
              </div>
              <div>Finally, in some slides i read to use
                dir_index,sparse_super,extent when creating the
                filesystem, and mount noatime and nodiratime</div>
              <div><br>
              </div>
              <div><a moz-do-not-send="true"
href="http://www.slideshare.net/leonsp/best-practices-for-deploying-hadoop-biginsights-in-the-cloud"
                  target="_blank">http://www.slideshare.net/leonsp/best-practices-for-deploying-hadoop-biginsights-in-the-cloud</a><br>
              </div>
              <div><br>
              </div>
              <div><br>
              </div>
              <div><br>
              </div>
              <div><br>
              </div>
              <div><br>
              </div>
            </div>
          </blockquote>
        </div>
        <br>
        <br clear="all">
        <div><br>
        </div>
        -- <br>
        Travis Campbell<br>
        <a moz-do-not-send="true" href="mailto:travis@ghostar.org"
          target="_blank">travis@ghostar.org</a>
      </div>
    </blockquote>
    <br>
  </body>
</html>

--------------040305060108020301050604--