hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew Wang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-10996) Ability to specify per-file EC policy at create time
Date Wed, 29 Mar 2017 21:29:42 GMT

    [ https://issues.apache.org/jira/browse/HDFS-10996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15947931#comment-15947931

Andrew Wang commented on HDFS-10996:

Hi Sammi, thanks for working on this, took an initial pass at the patch:

* We should file a follow-on to clean up all the create() overloads in DFSClient, they seem
unnecessary since DistributedFileSystem also has very similar overloads for filling in defaults.

   * Another addition is ecPolicyName which specify a specific erasure coding
   * policy on this new file, stop new file from inheriting its parent
   * group's whatever policy. A value of null means inheriting policy from
   * parent group.

Recommend we reword as follows:

   * A non-null ecPolicyName specifies an explicit erasure coding policy for this
   * file, overriding the inherited policy. A null ecPolicyName means the file
   * will inherit its EC policy from an ancestor (the default).

* DistributedFileSystem, setEcPolicyName, should we have a Preconditions.checkNotNull check?
Since this is trunk-only, we can also add the @Nonnull Java 8 annotation on the parameter.
* {{addFile}}, can we reuse the existing code in setErasureCodingPolicy that checks if a policy
is enabled? This way it's centralized.
* {{addFile}}, also prefer that we stick with checking for null rather than {{StringUtils.isBlank}}
since we haven't put any restrictions yet on EC policy names. That'll probably happen as part
of Kai's pluggable EC policy work.
* When changing the test mocks, we should use anyObject() instead of null for the new parameter.

New unit test:

* I don't think we need a new test class for this.
* We already have builder-related tests where we can do validation of the additional parameter.
* Besides that, we just need a unit test that tries a few different combinations of parent
and per-file policies. This can be parameterized to make the code simple. I don't think validating
file contents is necessary since it's done by other tests.

Nit stuff:
* Need to start the block comments with two {{*}} for the method and class comments to be
* Timeout is set too long, the unit is milliseconds
* Typo: "stripped" -> "striped"
* {{filePath}} should be final.
* A lot of the policy-related class fields aren't used outside of setup, remove them?

    String ecPolicies =
            .collect(Collectors.joining(", "));
    conf.set(DFSConfigKeys.DFS_NAMENODE_EC_POLICIES_ENABLED_KEY, ecPolicies);

Could you use DFSTestUtil#enableAllECPolicies instead? If you copied this from somewhere else,
please update to use the helper method as well.

    Assert.assertTrue("Erasure coding policy mismatch!",

Can use assertEquals here.

* testParameters, file1, no need for a try/catch, just let it throw on error
* testParameters, file2, we should assert the exception is expected with GenericTestUtils#assertExceptionContains

> Ability to specify per-file EC policy at create time
> ----------------------------------------------------
>                 Key: HDFS-10996
>                 URL: https://issues.apache.org/jira/browse/HDFS-10996
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: erasure-coding
>    Affects Versions: 3.0.0-alpha1
>            Reporter: Andrew Wang
>            Assignee: SammiChen
>              Labels: hdfs-ec-3.0-nice-to-have
>         Attachments: HDFS-10996-v1.patch, HDFS-10996-v2.patch, HDFS-10996-v3.patch, HDFS-10996-v4.patch
> Based on discussion in HDFS-10971, it would be useful to specify the EC policy when the
file is created. This is useful for situations where app requirements do not map nicely to
the current directory-level policies.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org

View raw message