hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew Wang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-11382) Persist Erasure Coding Policy ID in a new optional field in INodeFile in FSImage
Date Fri, 24 Feb 2017 01:02:44 GMT

    [ https://issues.apache.org/jira/browse/HDFS-11382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15881673#comment-15881673
] 

Andrew Wang commented on HDFS-11382:
------------------------------------

Hi Manoj, thanks for the rev! Looks like my last wave of comments were mostly addressed, a
few follow-ups:

* FSDirWriteOp#addFileForEditLog and #addFile, would prefer that we set up the parameters
for newINodeFile outside of the call, like how we set up {{blockType}}. This is a little cleaner.
* [~ehiggs] hasn't responded, but I'd like to remove {{blockType}} from the INodeFile PB definition
since it looks extraneous to me, and is extra overhead. BlockType can remain as an in-memory
abstraction.
* When reading and writing the fsimage, shouldn't we set replication XOR ec policy? This is
in FSImageFormatPBINode. I'd also prefer we setup the parameters outside of the call, similar
to above review comments
* FSImageFormatPBSnapshot, if we use the boxed Short and Byte, we can avoid the ternary checks
right? Paranoia precondition checks are never a bad idea if you want to add some.
* INodeFile#DEFAULT_REPL_FOR_STRIPED_BLOCKS could use a javadoc comment, we can also explain
why we chose "1" for the value
* It looks like we use {{getSysPoliciesMinID}} and {{MaxID}} to validate that the policy ID
is valid; could we just check that {{getPolicyByPolicyID}} isn't null instead? I get that
this is more efficient, but it's cleaner to use the helper function instead, and we can optimize
the helper to use a hashtable later if this becomes a bottleneck.
* The new INodeFile header helpers look good, except it seems like we have some boiler plate
now that callers need to do around blockType. I think what Zhe wanted was for callers to use
a function like your {{getBlockLayoutRedundancy}} directly, which would in turn call into
the more specific {{getContiguous}} and {{getStriped}} that would be the two halves of the
{{blockType == STRIPED}} if statement currently in {{getBlockLayoutRedundancy}}. Hopefully
that makes sense, I can provide a code snippet if it's unclear.

For the findbugs issue, we should add this to the hdfs findbugsExcludeFile.xml, and possibly
a code comment near the signed OR for the future.

> Persist Erasure Coding Policy ID in a new optional field in INodeFile in FSImage
> --------------------------------------------------------------------------------
>
>                 Key: HDFS-11382
>                 URL: https://issues.apache.org/jira/browse/HDFS-11382
>             Project: Hadoop HDFS
>          Issue Type: Task
>          Components: hdfs
>    Affects Versions: 3.0.0-alpha1
>            Reporter: Manoj Govindassamy
>            Assignee: Manoj Govindassamy
>         Attachments: HDFS-11382.01.patch, HDFS-11382.02.patch
>
>
> For Erasure Coded files, replication field in INodeFile message is re-used  to store
the EC Policy ID. 
> *FSDirWriteFileOp#addFile*
> {noformat}
>   private static INodesInPath addFile(
>       FSDirectory fsd, INodesInPath existing, byte[] localName,
>       PermissionStatus permissions, short replication, long preferredBlockSize,
>       String clientName, String clientMachine)
>       throws IOException {
>     .. .. ..
>     try {
>       ErasureCodingPolicy ecPolicy = FSDirErasureCodingOp.
>           getErasureCodingPolicy(fsd.getFSNamesystem(), existing);
>       if (ecPolicy != null) {
>         replication = ecPolicy.getId();   <===
>       }
>       final BlockType blockType = ecPolicy != null?
>           BlockType.STRIPED : BlockType.CONTIGUOUS;
>       INodeFile newNode = newINodeFile(fsd.allocateNewInodeId(), permissions,
>           modTime, modTime, replication, preferredBlockSize, blockType);
>       newNode.setLocalName(localName);
>       newNode.toUnderConstruction(clientName, clientMachine);
>       newiip = fsd.addINode(existing, newNode, permissions.getPermission());
> {noformat}
> With HDFS-11268 fix, {{FSImageFormatPBINode#Loader#loadInodeFile}} is rightly getting
the EC ID from the replication field and then uses the right Policy to construct the blocks.
> *FSImageFormatPBINode#Loader#loadInodeFile*
> {noformat}
>       ErasureCodingPolicy ecPolicy = (blockType == BlockType.STRIPED) ?
>           ErasureCodingPolicyManager.getPolicyByPolicyID((byte) replication) :
>           null;
> {noformat}
> The original intention was to re-use the replication field so the in-memory representation
would be compact. But, this isn't necessary for the on-disk representation. replication is
an optional field, and if we add another optional field for the EC policy, it won't be any
extra space.
> Also, we need to make sure to have the appropriate asserts in place to make sure both
fields aren't set for the same INodeField.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message