hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Marc Spaggiari <jean-m...@spaggiari.org>
Subject Mutiple dfs.data.dir vs RAID0
Date Mon, 11 Feb 2013 01:57:39 GMT

I have a quick question regarding RAID0 performances vs multiple
dfs.data.dir entries.

Let's say I have 2 x 2TB drives.

I can configure them as 2 separate drives mounted on 2 folders and
assignes to hadoop using dfs.data.dir. Or I can mount the 2 drives
with RAID0 and assigned them as a single folder to dfs.data.dir.

With RAID0, the reads and writes are going to be spread over the 2
disks. This is significantly increasing the speed. But if I put 2
entries in dfs.data.dir, hadoop is going to spread over those 2
directories too, and at the end, ths results should the same, no?

Any experience/advice/results to share?



View raw message