hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew Wang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-10971) Distcp should not copy replication factor if source file is erasure coded
Date Tue, 28 Mar 2017 23:02:41 GMT

    [ https://issues.apache.org/jira/browse/HDFS-10971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15946138#comment-15946138
] 

Andrew Wang commented on HDFS-10971:
------------------------------------

This looks great. A couple nitty comments:

{noformat}
170	      // If the target directory is Erasure Coded, target FileSystem takes care
171	      // of creating the erasure coded file where the file replication factor
172	      // is not applicable and the passed in replication factor is ignored.
{noformat}

Don't need to capitalize Erasure Coded. Also to be a little more accurate, I'd say something
like:

If there is an erasure coding policy set on the target directory, files will be written to
the target directory using this EC policy. The replication factor of the source file is ignored
and not preserved.

{noformat}
239	    // Preserve the replication attribute only when both source and
240	    // destination files are not Erasure Coded.
{noformat}

I'd expand this comment a bit to cover all the conditions in the if statement. You could also
restate the current as "The replication factor can only be preserved for replicated files.
It is ignored when either the source or target file are erasure coded."

Finally, in the new unit test:
* It'd be good to add messages to the asserts as a form of documentation.
* The test is also named "testPreserve..." whereas we might want to name it "testReplFactorNotPreserved..."
or "...Ignored..." for clarity
* Consider doing a static import on the asserts to make them a little more concise
* We test EC src and repl dest, should we also test repl src and EC dst? Also add EC to EC
test with different EC policies for completeness?
* We added StripedFileTestUtil#getDefaultECPolicy, maybe we also add a helper function like
DFSTestUtil#enableAllECPolicies but instead #enableDefaultECPolicy? Perhaps I should have
originally put #enableAllECPolicies in StripedFileTestUtil also. Could also be a cleanup JIRA
for later.

> Distcp should not copy replication factor if source file is erasure coded
> -------------------------------------------------------------------------
>
>                 Key: HDFS-10971
>                 URL: https://issues.apache.org/jira/browse/HDFS-10971
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: distcp
>    Affects Versions: 3.0.0-alpha1
>            Reporter: Wei-Chiu Chuang
>            Assignee: Manoj Govindassamy
>            Priority: Blocker
>              Labels: hdfs-ec-3.0-must-do
>         Attachments: HDFS-10971.01.patch, HDFS-10971.testcase.patch
>
>
> The current erasure coding implementation uses replication factor field to store erasure
coding policy.
> Distcp copies the source file's replication factor to the destination if {{-pr}} is specified.
However, if the source file is EC, the replication factor (which is EC policy) should not
be replicated to the destination file. When a HdfsFileStatus is converted to FileStatus, the
replication factor is set to 0 if it's an EC file.
> In fact, I will attach a test case that shows trying to replicate the replication factor
of an EC file results in an IOException: "Requested replication factor of 0 is less than the
required minimum of 1 for /tmp/dst/dest2"



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message