hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zhe Zhang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-7285) Erasure Coding Support inside HDFS
Date Wed, 01 Apr 2015 18:44:01 GMT

    [ https://issues.apache.org/jira/browse/HDFS-7285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14391207#comment-14391207

Zhe Zhang commented on HDFS-7285:

Yesterday we had another offline meetup. I think the discussion was very productive. Below
please find the summary:
*Attendees*: Nicholas, Jing, Zhe

*Project phasing*
We went over the list of subtasks under this JIRA and separated them into 3 categories:
# Basic EC functionalities under the striping layout. Those subtasks were kept under this
umbrella JIRA. The goal is for the HDFS-7285 branch to be ready for merging into trunk upon
their completion.
# Follow-on tasks for EC+striping (including code and performance optimization, as well as
support for advanced HDFS features). Those subtasks were moved under HDFS-8031. Following
the common practice, those follow-on tasks are targeted for trunk, after HDFS-7285 is merged.
# EC with non-striping / contiguous block layout. Those subtasks were moved to HDFS-8030,
which represents the 2nd phase of the erasure coding project.

Extending from the initial [PoC prototype | https://issues.apache.org/jira/browse/HDFS-7285?focusedCommentId=14339006&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14339006],
the following _basic EC functionalities_ will be finished under this JIRA ([~szetszwo] please
let me know if I missed anything from your list):
* A striped block group is distributed evenly on racks
* NN handles striped block groups in existing block management logics:
** Missing and corrupted blocks
** To-invalidate blocks
** Lease recovery
** DN decommissioning
* NN periodically distributes tasks to DN to reconstruct missing striped blocks
* DN executes the reconstruction task by pulling data from peer DNs
* Client can read a striped block group even if some blocks are missing, through decoding
* Client should handle DN failures during writing
* Basic command for directory-level EC configuration (similar to a zone)
* Correctly handle striped block groups in file system statistics and metrics
* Documentation
* More comprehensive testing
* _Optional_: instead of hard-coding, incorporate the {{ECSchema}} class with 1~2 schemas

*Key remaining tasks*
We think the following remaining tasks are _key_ in terms of complexity and amount of work:
# Client writing: the basic striped writing logic is close to complete (patch available under
HDFS-7889), but it's challenging to handle failures during writing in an elegant way. 
# Client reading: the logic isn't too complex but amount of work is non-trivial
# DN reconstruction: logic is clean but work has not been started yet

*Client design*
We also dived into more details of the design of client reading/writing paths, and are synced
on the overall approach. A few points were raised and will be addressed:
# Cell size in striping currently has default value of 1M. We should study its impact more
carefully. Intuitively, a smaller value (like 128K) might be more suitable.
# Pread in striping format should always try to fetch data in parallel, when the requested
range spans multiple striping cells.
# Stateful read in striping format should maintain multiple block readers to minimize overhead
of creating new readers.

> Erasure Coding Support inside HDFS
> ----------------------------------
>                 Key: HDFS-7285
>                 URL: https://issues.apache.org/jira/browse/HDFS-7285
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>            Reporter: Weihua Jiang
>            Assignee: Zhe Zhang
>         Attachments: ECAnalyzer.py, ECParser.py, HDFS-7285-initial-PoC.patch, HDFSErasureCodingDesign-20141028.pdf,
HDFSErasureCodingDesign-20141217.pdf, HDFSErasureCodingDesign-20150204.pdf, HDFSErasureCodingDesign-20150206.pdf,
> Erasure Coding (EC) can greatly reduce the storage overhead without sacrifice of data
reliability, comparing to the existing HDFS 3-replica approach. For example, if we use a 10+4
Reed Solomon coding, we can allow loss of 4 blocks, with storage overhead only being 40%.
This makes EC a quite attractive alternative for big data storage, particularly for cold data.

> Facebook had a related open source project called HDFS-RAID. It used to be one of the
contribute packages in HDFS but had been removed since Hadoop 2.0 for maintain reason. The
drawbacks are: 1) it is on top of HDFS and depends on MapReduce to do encoding and decoding
tasks; 2) it can only be used for cold files that are intended not to be appended anymore;
3) the pure Java EC coding implementation is extremely slow in practical use. Due to these,
it might not be a good idea to just bring HDFS-RAID back.
> We (Intel and Cloudera) are working on a design to build EC into HDFS that gets rid of
any external dependencies, makes it self-contained and independently maintained. This design
lays the EC feature on the storage type support and considers compatible with existing HDFS
features like caching, snapshot, encryption, high availability and etc. This design will also
support different EC coding schemes, implementations and policies for different deployment
scenarios. By utilizing advanced libraries (e.g. Intel ISA-L library), an implementation can
greatly improve the performance of EC encoding/decoding and makes the EC solution even more
attractive. We will post the design document soon. 

This message was sent by Atlassian JIRA

View raw message