hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andy Isaacson <...@cloudera.com>
Subject Re: one or more file system
Date Tue, 16 Oct 2012 23:45:15 GMT
RAID5 is suboptimal for HDFS due to the spindle imbalance issue (among
other problems). Read this paper for details:

"Disks are like Snowflakes: No Two Are Alike"

For best performance configure your storage as JBOD instead of RAID,
format each spindle as a separate ext4 filesystem, and put a datadir
on each spindle.

Your disk array will have a configuration utility to set JBOD instead
of RAID. Please consult the documentation for your disk array for the

If you must use RAID5 then one filesystem and one datadir is your best option.

For *BAD* performance, put multiple logical volumes on a single RAID
and put multiple datadirs on the RAID. This will result in low IOPS,
low throughput, and high contention.


On Tue, Oct 9, 2012 at 2:13 AM, Xiang Hua <beatls@gmail.com> wrote:
> Hi,
>    but how to "configure disk array as JBOD", we plan to use disk array
> with RAID5 and make LUN of 1T.
>   so we have many LUN of the size of 1T. and we mkfs on every LUN,so we
> have  12 fs /data1...../data12, which will be put into HDFS.
> Best R.
> beatls
> On Tue, Oct 9, 2012 at 1:45 AM, Andy Isaacson <adi@cloudera.com> wrote:
>> On Mon, Oct 8, 2012 at 8:30 AM, Xiang Hua <beatls@gmail.com> wrote:
>> > Hi,
>> >    we have 4T disk from a diskarray.
>> >    i want to split 2T*1 to 1T*2, then add to HDFS, which leads to more
>> > local storage directories.
>> >    this time we have 12 local directories(1T), is ti harmful to hdfs
>> > performance?
>> Assuming you're running a modern Hadoop on a recent Linux (2.6.38 or
>> later, or RHEL6):
>> For best performance you should configure your disk array as JBOD
>> rather than RAID, then put one ext4 filesystem on each spindle. Do not
>> put multiple storage directories on a single spindle, that results in
>> very bad performance and no benefit over a single storage directory
>> per spindle. And do not put multiple spindles under a single storage
>> directory, that results in poor utilization and bad performance with
>> no significant benefit.
>> 12 local storage directories will perform just fine assuming you have
>> enough CPU power to use them.
>> -andy

View raw message