hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sammi Chen (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-15558) Implementation of Clay Codes plugin (Coupled Layer MSR codes)
Date Wed, 22 Aug 2018 14:09:00 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-15558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16588900#comment-16588900

Sammi Chen commented on HADOOP-15558:

Hi [~shreya2205], several comments after gone through the slides and code.  
1. The encoding and decoding of Clay Codes involve PFT, PRT and RS computation. So basically
the idea is to reduce the network throughput and disk read during data repair phase by adding
additional computations. In single data node failure case, Clay Codes can save 2/3 network
bandwidth compared with RS. In worst case, Clay Codes will behave the same as RS from network
bandwidth wise.  Given that most of the failures are single node failure in the storage cluster,
cluster can benefit from Clay Codes with no doubt.  I assume all the benchmark data in slides
are collected in single data node failure case. Correct me if it's not correct.   
2. On P22 of slides, it says " total encoding time remains the same while Clay Codec has 70%
higher encode computation time".  Confused, could you explain it further? 
3. On P21 of slices, Fragmented Read, it says there is no impact on SSD when sub-chunk size
reaches 4KB. DO you have any data for HDD? Since the Hadoop/HDFS, HDD is still the majority.

4. P23, what does the "Degraded I/O" scenario means in the slides? 
5. From the slices, we can see to configure a Clay Codec, k, m, d and sub-trunk size all have
matter. While in the implementation, only k and m are configurable. What about d and sub-trunk?

6. I googled a lot but found very few links about PFT and PRT matrix. Do you have any documents
for them?
7. For implementation part, is clone input blocks a must when prepareEncodingStep?  Also could
you add more comments, such as whih part is PFT computation, and PRT computation. I will go
through the code again later. Also ClayCodeUtil is better to be placed in a new file. 
8. Code style. Here are a list of Hadoop code styles to follow. 
        a. Import * is not recommended
	b. a line cannot exceed 80 characters
	c. tab is 4 spaces
	d. new line indent is 2 spaces
	e. cross line indent is 4 spaces
	f. remove unnecessary empty line
	g. 1 space between operator and value,
    	for examples,   
	    if (rsRawDecoder==null) {     =>  if (rsRawDecoder == null) {
		new ErasureCoderOptions(2,2);   => new ErasureCoderOptions(2, 2);
		if(erasedIndexes.length==1){ =>    if (erasedIndexes.length == 1) {

> Implementation of Clay Codes plugin (Coupled Layer MSR codes) 
> --------------------------------------------------------------
>                 Key: HADOOP-15558
>                 URL: https://issues.apache.org/jira/browse/HADOOP-15558
>             Project: Hadoop Common
>          Issue Type: New Feature
>            Reporter: Chaitanya Mukka
>            Assignee: Chaitanya Mukka
>            Priority: Major
>         Attachments: ClayCodeCodecDesign-20180630.pdf, HADOOP-15558.001.patch, HADOOP-15558.002.patch
> [Clay Codes|https://www.usenix.org/conference/fast18/presentation/vajha] are new erasure
codes developed as a research project at Codes and Signal Design Lab, IISc Bangalore. A particular
Clay code, with storage overhead 1.25x, has been shown to reduce repair network traffic, disk
read and repair times by factors of 2.9, 3.4 and 3 respectively compared to the RS codes with
the same parameters. 
> This Jira aims to introduce Clay Codes to HDFS-EC as one of the pluggable erasure codec.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

View raw message