hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Arun C Murthy <...@hortonworks.com>
Subject Re: one or more file system
Date Tue, 16 Oct 2012 23:55:15 GMT
Can you guys pls move this discussion to user@? Thanks.

On Oct 16, 2012, at 4:45 PM, Andy Isaacson wrote:

> RAID5 is suboptimal for HDFS due to the spindle imbalance issue (among
> other problems). Read this paper for details:
> 
> "Disks are like Snowflakes: No Two Are Alike"
> www.usenix.org/event/hotos11/tech/final_files/Krevat.pdf
> 
> For best performance configure your storage as JBOD instead of RAID,
> format each spindle as a separate ext4 filesystem, and put a datadir
> on each spindle.
> 
> Your disk array will have a configuration utility to set JBOD instead
> of RAID. Please consult the documentation for your disk array for the
> details.
> 
> If you must use RAID5 then one filesystem and one datadir is your best option.
> 
> For *BAD* performance, put multiple logical volumes on a single RAID
> and put multiple datadirs on the RAID. This will result in low IOPS,
> low throughput, and high contention.
> 
> -andy
> 
> On Tue, Oct 9, 2012 at 2:13 AM, Xiang Hua <beatls@gmail.com> wrote:
>> Hi,
>>   but how to "configure disk array as JBOD", we plan to use disk array
>> with RAID5 and make LUN of 1T.
>>  so we have many LUN of the size of 1T. and we mkfs on every LUN,so we
>> have  12 fs /data1...../data12, which will be put into HDFS.
>> 
>> 
>> Best R.
>> 
>> beatls
>> 
>> On Tue, Oct 9, 2012 at 1:45 AM, Andy Isaacson <adi@cloudera.com> wrote:
>> 
>>> On Mon, Oct 8, 2012 at 8:30 AM, Xiang Hua <beatls@gmail.com> wrote:
>>>> Hi,
>>>>   we have 4T disk from a diskarray.
>>>>   i want to split 2T*1 to 1T*2, then add to HDFS, which leads to more
>>>> local storage directories.
>>>>   this time we have 12 local directories(1T), is ti harmful to hdfs
>>>> performance?
>>> 
>>> Assuming you're running a modern Hadoop on a recent Linux (2.6.38 or
>>> later, or RHEL6):
>>> 
>>> For best performance you should configure your disk array as JBOD
>>> rather than RAID, then put one ext4 filesystem on each spindle. Do not
>>> put multiple storage directories on a single spindle, that results in
>>> very bad performance and no benefit over a single storage directory
>>> per spindle. And do not put multiple spindles under a single storage
>>> directory, that results in poor utilization and bad performance with
>>> no significant benefit.
>>> 
>>> 12 local storage directories will perform just fine assuming you have
>>> enough CPU power to use them.
>>> 
>>> -andy
>>> 

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/



Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message