hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pankaj Misra <pankaj.mi...@impetus.co.in>
Subject Erasure Coding in HDFS
Date Thu, 13 Dec 2012 09:50:50 GMT
Dear All,

I was looking at options for reducing the overall cost of storage that is incurred due to
replication of data across the datanodes for higher availability and data localization for

I stumbled on a few articles suggesting erasure coding (software-raid) as one such mechanism
which can provide upt 5 to 8 9s of availability while keeping the replication factor low.

I also came across a JIRA for erasure coding in HDFS

I will need some help to understand the following
1. How can I use erasure coding with Hadoop 1.1.1 release?
2. How will erasure coding work with replication mechanism and how will it affect the data
locality aspect for data processing, since erasure coding fragments the data?
3. How mature is the current implementation of erasure coding in HDFS?

Any help will be greatly appreciated.

Thanks and Regards
Pankaj Misra


NOTE: This message may contain information that is confidential, proprietary, privileged or
otherwise protected by law. The message is intended solely for the named addressee. If received
in error, please destroy and notify the sender. Any use of this email is prohibited when received
in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this
communication has been maintained nor that the communication is free of errors, virus, interception
or interference.

View raw message