hadoop-mapreduce-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Scott Chen (JIRA)" <j...@apache.org>
Subject [jira] Created: (MAPREDUCE-1969) Allow raid to use Reed-Solomon erasure codes
Date Mon, 26 Jul 2010 22:24:16 GMT
Allow raid to use Reed-Solomon erasure codes

                 Key: MAPREDUCE-1969
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1969
             Project: Hadoop Map/Reduce
          Issue Type: New Feature
          Components: contrib/raid
            Reporter: Scott Chen
             Fix For: 0.22.0

Currently raid uses one parity block per stripe which corrects one missing block on one stripe.
Using Reed-Solomon code, we can add any number of parity blocks to tolerate more missing blocks.
This way we can get a good file corrupt probability even if we set the replication to 1.

Here are some simple comparisons:
1. No raid, replication = 3:
File corruption probability = O(p^3), Storage space = 3x

2. Signal parity raid with stripe size = 10, replication = 2:
File corruption probability = O(p^4), Storage space = 2.2x 

3. Reed-Solomon raid with parity size = 4 and stripe size = 10, replication = 1:
File corruption probability = O(p^5), Storage space = 1.4x

where p is the missing block probability.
Reed-Solomon code can save lots of space without compromising the corruption probability.

To achieve this, we need some changes to raid:
1. Add a block placement policy that knows about raid logic and do not put blocks on the same
stripe on the same node.
2. Add an automatic block fixing mechanism. The block fixing will replace the replication
of under replicated blocks.
3. Allow raid to use general erasure code. It is now hard coded using Xor.
4. Add a Reed-Solomon code implementation

We are planing to use it on the older data only.
Because setting replication = 1 hurts the data locality.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message