hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zheng, Kai" <kai.zh...@intel.com>
Subject RE: [VOTE] Merge HDFS-7285 (erasure coding) branch to trunk
Date Wed, 23 Sep 2015 01:17:17 GMT
Non-binding +1

According to our extensive performance tests, striping + ISA-L coder based erasure coding
not only can save storage, but also can increase the throughput of a client or a cluster.
It will be a great addition to HDFS and its users. Based on the latest branch codes, we also
observed it's very reliable in the concurrent tests. We'll provide the perf test report after
it's sorted out and hope it helps. 


-----Original Message-----
From: Gangumalla, Uma [mailto:uma.gangumalla@intel.com] 
Sent: Wednesday, September 23, 2015 8:50 AM
To: hdfs-dev@hadoop.apache.org; common-dev@hadoop.apache.org
Subject: Re: [VOTE] Merge HDFS-7285 (erasure coding) branch to trunk


Great addition to HDFS. Thanks all contributors for the nice work.


On 9/22/15, 3:40 PM, "Zhe Zhang" <zhezhang@cloudera.com> wrote:

>I'd like to propose a vote to merge the HDFS-7285 feature branch back 
>to trunk. Since November 2014 we have been designing and developing 
>this feature under the umbrella JIRAs HDFS-7285 and HADOOP-11264, and 
>have committed approximately 210 patches.
>The HDFS-7285 feature branch was created to support the first phase of 
>HDFS erasure coding (HDFS-EC). The objective of HDFS-EC is to 
>significantly reduce storage space usage in HDFS clusters. Instead of 
>always creating 3 replicas of each block with 200% storage space 
>overhead, HDFS-EC provides data durability through parity data blocks. 
>With most EC configurations, the storage overhead is no more than 50%. 
>Based on profiling results of production clusters, we decided to 
>support EC with the striped block layout in the first phase, so that 
>small files can be better handled. This means dividing each logical 
>HDFS file block into smaller units (striping cells) and spreading them 
>on a set of DataNodes in round-robin fashion. Parity cells are 
>generated for each stripe of original data cells. We have made changes 
>to NameNode, client, and DataNode to generalize the block concept and 
>handle the mapping between a logical file block and its internal 
>storage blocks. For further details please see the design doc on 
>HADOOP-11264 focuses on providing flexible and high-performance codec 
>calculation support.
>The nightly Jenkins job of the branch has reported several successful 
>runs, and doesn't show new flaky tests compared with trunk. We have 
>posted several versions of the test plan including both unit testing 
>and cluster testing, and have executed most tests in the plan. The most 
>basic functionalities have been extensively tested and verified in 
>several real clusters with different hardware configurations; results 
>have been very stable. We have created follow-on tasks for more 
>advanced error handling and optimization under the umbrella HDFS-8031. 
>We also plan to implement or harden the integration of EC with existing 
>features such as WebHDFS, snapshot, append, truncate, hflush, hsync, 
>and so forth.
>Development of this feature has been a collaboration across many 
>companies and institutions. I'd like to thank J. Andreina, Takanobu 
>Asanuma, Vinayakumar B, Li Bo, Takuya Fukudome, Uma Maheswara Rao G, 
>Rui Li, Yi Liu, Colin McCabe, Xinwei Qin, Rakesh R, Gao Rui, Kai 
>Sasaki, Walter Su, Tsz Wo Nicholas Sze, Andrew Wang, Yong Zhang, Jing 
>Zhao, Hui Zheng and Kai Zheng for their code contributions and reviews. 
>Andrew and Kai Zheng also made fundamental contributions to the initial 
>design. Rui Li, Gao Rui, Kai Sasaki, Kai Zheng and many other 
>contributors have made great efforts in system testing. Many thanks go 
>to Weihua Jiang for proposing the JIRA, and ATM, Todd Lipcon, Silvius 
>Rus, Suresh, as well as many others for providing helpful feedbacks.
>Following the community convention, this vote will last for 7 days 
>(ending September 29th). Votes from Hadoop committers are binding but 
>non-binding votes are very welcome as well. And here's my non-binding +1.
>Zhe Zhang

View raw message