On 8 August 2012 09:46, Sourygna Luangsay <sluangsay@pragsis.com> wrote:

Hi folks!

One of the scenario I can think in order to take advantage of HDFS RAID without suffering this penalty is:

-          Using normal HDFS with default replication=3 for my “fresh data”

-          Using HDFS RAID for my historical data (that is barely used by M/R)


exactly: less space use on cold data, with the penalty that access performance can be worse. As the majority of data on a hadoop cluster is usually "cold", it's a space and power efficient story for the archive data

Steve Loughran
Hortonworks Inc