hadoop-hdfs-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ewan Higgs (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HDFS-11828) Refactor FsDatasetImpl as the BlockAlias is in the wire protocol for PROVIDED blocks.
Date Tue, 16 May 2017 08:14:04 GMT
Ewan Higgs created HDFS-11828:

             Summary: Refactor FsDatasetImpl as the BlockAlias is in the wire protocol for
PROVIDED blocks.
                 Key: HDFS-11828
                 URL: https://issues.apache.org/jira/browse/HDFS-11828
             Project: Hadoop HDFS
          Issue Type: Sub-task
            Reporter: Ewan Higgs
            Assignee: Ewan Higgs

>From HDFS-11639:

Looking over this patch, one thing that occurred to me is if it makes sense to unify FileRegionProvider
with BlockProvider? They both have very close functionality.

I like the use of BlockProvider#resolve(). If we unify FileRegionProvider with BlockProvider,
then resolve can return null if the block map is accessible from the Datanodes also. If it
is accessible only from the Namenode, then a non-null value can be propagated to the Datanode.
One of the motivations for adding the BlockAlias to the client protocol was to have the blocks
map only on the Namenode. In this scenario, the ReplicaMap in FsDatasetImpl of will not have
any replicas apriori. Thus, one way to ensure that the FsDatasetImpl interface continues to
function as today is to create a FinalizedProvidedReplica in FsDatasetImpl#getBlockInputStream
when BlockAlias is not null.

With the pending refactoring of the FsDatasetImpl which won't have replicas a priori, I wonder
if it makes sense for the Datanode to have a FileRegionProvider or BlockProvider at all. They
are given the appropriate block ID and block alias in the readBlock or writeBlock message.
Maybe I'm overlooking what's still being provided.{quote}

I was trying to reconcile the existing design (FsDatasetImpl knows about provided blocks apriori)
with the new design where FsDatasetImpl will not know about these before but just constructs
them on-the-fly using the BlockAlias from readBlock or writeBlock. Using BlockProvider#resolve()
allows us to have both designs exist in parallel. I was wondering if we should still retain
the earlier given the latter design.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org

View raw message