hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ewan Higgs (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-11828) [READ] Refactor FsDatasetImpl to use the BlockAlias from ClientProtocol for PROVIDED blocks.
Date Wed, 23 Aug 2017 08:06:00 GMT

    [ https://issues.apache.org/jira/browse/HDFS-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16138040#comment-16138040

Ewan Higgs commented on HDFS-11828:

{quote} Ewan Higgs, Similar to HDFS-11639, we can make this a sub-task of HDFS-12090?{quote}
I moved it now.

> [READ] Refactor FsDatasetImpl to use the BlockAlias from ClientProtocol for PROVIDED
> --------------------------------------------------------------------------------------------
>                 Key: HDFS-11828
>                 URL: https://issues.apache.org/jira/browse/HDFS-11828
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Ewan Higgs
>            Assignee: Ewan Higgs
> From HDFS-11639:
> {quote}[~virajith]
> Looking over this patch, one thing that occurred to me is if it makes sense to unify
FileRegionProvider with BlockProvider? They both have very close functionality.
> I like the use of BlockProvider#resolve(). If we unify FileRegionProvider with BlockProvider,
then resolve can return null if the block map is accessible from the Datanodes also. If it
is accessible only from the Namenode, then a non-null value can be propagated to the Datanode.
> One of the motivations for adding the BlockAlias to the client protocol was to have the
blocks map only on the Namenode. In this scenario, the ReplicaMap in FsDatasetImpl of will
not have any replicas apriori. Thus, one way to ensure that the FsDatasetImpl interface continues
to function as today is to create a FinalizedProvidedReplica in FsDatasetImpl#getBlockInputStream
when BlockAlias is not null.
> {quote}
> {quote}[~ehiggs]
> With the pending refactoring of the FsDatasetImpl which won't have replicas a priori,
I wonder if it makes sense for the Datanode to have a FileRegionProvider or BlockProvider
at all. They are given the appropriate block ID and block alias in the readBlock or writeBlock
message. Maybe I'm overlooking what's still being provided.{quote}
> {quote}[~virajith]
> I was trying to reconcile the existing design (FsDatasetImpl knows about provided blocks
apriori) with the new design where FsDatasetImpl will not know about these before but just
constructs them on-the-fly using the BlockAlias from readBlock or writeBlock. Using BlockProvider#resolve()
allows us to have both designs exist in parallel. I was wondering if we should still retain
the earlier given the latter design.
> {quote}

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org

View raw message