hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wittawat Tantisiriroj (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-1969) Allow raid to use Reed-Solomon erasure codes
Date Mon, 30 Aug 2010 20:31:55 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12904327#action_12904327

Wittawat Tantisiriroj commented on MAPREDUCE-1969:

How fast this RS implementation encode per sec? In case we need a faster encoder, I am thinking
about porting Cauchy Reed-Solomon as described @ http://www.cs.utk.edu/~plank/plank/papers/FAST-2009.pdf
to Java. James S. Plank, the author, has already given me a permission to release it with
Apache License. 

> Allow raid to use Reed-Solomon erasure codes
> --------------------------------------------
>                 Key: MAPREDUCE-1969
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1969
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: contrib/raid
>            Reporter: Scott Chen
>             Fix For: 0.22.0
> Currently raid uses one parity block per stripe which corrects one missing block on one
> Using Reed-Solomon code, we can add any number of parity blocks to tolerate more missing
> This way we can get a good file corrupt probability even if we set the replication to
> Here are some simple comparisons:
> 1. No raid, replication = 3:
> File corruption probability = O(p^3), Storage space = 3x
> 2. Single parity raid with stripe size = 10, replication = 2:
> File corruption probability = O(p^4), Storage space = 2.2x 
> 3. Reed-Solomon raid with parity size = 4 and stripe size = 10, replication = 1:
> File corruption probability = O(p^5), Storage space = 1.4x
> where p is the missing block probability.
> Reed-Solomon code can save lots of space without compromising the corruption probability.
> To achieve this, we need some changes to raid:
> 1. Add a block placement policy that knows about raid logic and do not put blocks on
the same stripe on the same node.
> 2. Add an automatic block fixing mechanism. The block fixing will replace the replication
of under replicated blocks.
> 3. Allow raid to use general erasure code. It is now hard coded using Xor.
> 4. Add a Reed-Solomon code implementation
> We are planing to use it on the older data only.
> Because setting replication = 1 hurts the data locality.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message