hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Virajith Jalaparti (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-11639) [READ] Encode the BlockAlias in the client protocol
Date Thu, 11 May 2017 21:39:04 GMT

    [ https://issues.apache.org/jira/browse/HDFS-11639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16007247#comment-16007247

Virajith Jalaparti commented on HDFS-11639:

Thanks for working on this [~ehiggs]. I think the general direction of the patch (using {{BlockProvider#resolve()}}
in {{ProvidedBlocksBuilder}} to get the {{BlockAlias}}, passing it along as a set of bytes
to the client, and resolving it only on the Datanode) looks good. 

A few comments:

* In {{ProvidedBlocksBuilder#newLocatedBlock}}, the {{fileRegion}} should be resolved only
if the block has {{PROVIDED}} locations (i.e., {{hasProvidedLocations}} is true). When {{dfs.namenode.provided.enabled}}
is set to true, all {{LocatedBlock}} are created in this method, and for non-provided blocks,
a resolution of {{BlockAlias}} is needed.

* {{PBHelperClient#convertLocatedBlockProto()}} and {{PBHelperClient#convertLocatedBlock()}}
should be modified to decode/encode the {{BlockAlias}} bytes.

* How about decoding the {{blockAlias}} bytes in {{DataXceiver#readBlock}} using a new {{DataTransferProtoUtil#blockAliasFromProto(bytes[]
blockAlias}} method instead of using the {{BlockAlias#builder()}}? The former will be in-line
with the way the protobufs are decoded in {{DataXceiver}}. Further, if in the future a different
{{BlockAlias}} is used, the current implementation of using the {{FileRegion#Builder}} in
{{DataXceiver#readBlock}}  will be hard to extend (will end up being try {{FileRegion#Builder}},
if null try {{BlockAliasXX#Builder}} and so on). 

* Similar to passing on BlockAlias from {{DataXceiver#readBlock}} to {{BlockSender}}, it should
be passed along from {{DataXceiver#readBlock}}  to {{BlockReceiver}}. However, we would not
need it till we have writes implemented.

* {{DataStreamer#blockAlias}} will never be non-null. I think it should be initialized in

* Current implementation of {{TextReader#resolve}} can be expensive as it has to scan through
the whole list of blocks. I don't have a very good solution for this as one alternative would
be to maintain an in-memory map to store this information. I think it would be good to have
both implementations, although this is not a blocker.

* In {{DataTransfer#run()}}, the {{blockAlias}} should be null unless it is for a provided
block. I think this will entail adding BlockAlias to {{transferBlock()}} and also to {{BlockCommand}}
for the {{DatanodeProtocol.DNA_TRANSFER}} command. However, this will only be relevant for
writing provided blocks (and in particular recovery).

* I don't think {{TestDataTransferProtocol#testDataTransferProtocolWithBlockAlias}} is currently
correct/tests something useful. Shouldn't block alias be relevant for only provided blocks?

Looking over this patch, one thing that occurred to me is if it makes sense to unify {{FileRegionProvider}}
with {{BlockProvider}}? They both have very close functionality.

I like the use of {{BlockProvider#resolve()}}. If we unify {{FileRegionProvider}} with {{BlockProvider}},
then {{resolve}} can return {{null}} if the block map is accessible from the Datanodes also.
If it is accessible only from the Namenode, then a non-null value can be propagated to the

One of the motivations for adding the {{BlockAlias}} to the client protocol was to have the
blocks map only on the Namenode. In this scenario, the {{ReplicaMap}} in {{FsDatasetImpl}}
of will not have any replicas apriori. Thus, one way to ensure that the {{FsDatasetImpl}}
interface continues to function as today is to create a {{FinalizedProvidedReplica}} in {{FsDatasetImpl#getBlockInputStream}}
when {{BlockAlias}} is not null.

> [READ] Encode the BlockAlias in the client protocol
> ---------------------------------------------------
>                 Key: HDFS-11639
>                 URL: https://issues.apache.org/jira/browse/HDFS-11639
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: hdfs
>            Reporter: Ewan Higgs
>            Assignee: Virajith Jalaparti
>         Attachments: HDFS-11639-HDFS-9806.001.patch, HDFS-11639-HDFS-9806.002.patch
> As part of the {{PROVIDED}} storage type, we have a {{BlockAlias}} type which encodes
information about where the data comes from. i.e. URI, offset, length, and nonce value. This
data should be encoded in the protocol ({{LocatedBlockProto}} and the {{BlockTokenIdentifier}})
when a block is available using the PROVIDED storage type.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org

View raw message