hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ramkumar Vadali (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-2036) Enable Erasure Code in Tool similar to Hadoop Archive
Date Mon, 30 Aug 2010 23:02:56 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-2036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12904399#action_12904399

Ramkumar Vadali commented on MAPREDUCE-2036:

Hi Wittawat, good work on generating this patch! The concept of RAIDing files in a directory
is a good complement to the existing RAID, which requires larger files.
Some thoughts:
1. It will be really good to integrate this with the current RAID. Apart from the obvious
code reuse in DistributedRaidFileSystem, it has some automation around generating parity files.
I also have a lot of upcoming patches that automate repair of lost blocks.
2. I did not see code that reduces the replication for RAIDed files. Is that supposed to be
done independent of this tool?
3. A usage-related question: I assume the source directory under consideration is older data
such that users can tolerate some increase in read latency. If so, the source directory could
be HAR'ed and the result files then RAIDed using the current RAID. Thoughts? 

Looking forward to a good discussion!

> Enable Erasure Code in Tool similar to Hadoop Archive
> -----------------------------------------------------
>                 Key: MAPREDUCE-2036
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2036
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: contrib/raid, harchive
>            Reporter: Wittawat Tantisiriroj
>            Assignee: Wittawat Tantisiriroj
>            Priority: Minor
>         Attachments: hdfs-raid.tar.gz, MAPREDUCE-2036.patch, RaidTool.pdf
> Features:
> 1) HAR-like Tool
> 2) RAID5/RAID6 & pluggable interface to implement additional coding
> 3) Enable to group blocks across files
> 4) Portable across cluster since all necessary metadata is embedded
> While it was developed separately from HAR or RAID due to time constraints, it would
make sense to integrate with either of them.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message