hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Allen Wittenauer (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HDFS-7876) DataNodes start to scan blocks earlier
Date Wed, 06 May 2015 03:28:20 GMT

     [ https://issues.apache.org/jira/browse/HDFS-7876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Allen Wittenauer updated HDFS-7876:
    Labels: BB2015-05-TBR  (was: )

> DataNodes start to scan blocks earlier
> --------------------------------------
>                 Key: HDFS-7876
>                 URL: https://issues.apache.org/jira/browse/HDFS-7876
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: datanode, namenode
>    Affects Versions: 3.0.0
>            Reporter: Xinwei Qin 
>            Assignee: Xinwei Qin 
>              Labels: BB2015-05-TBR
>         Attachments: HDFS-7876.001.patch
> When Hadoop cluster restarts, DataNodes will scan local blocks, and report this infomation
to NameNode. DataNodes start to scan local blocks after obtaining the NamespaceInfo from NameNode
via RPC call versionRequest(), which needs the establishment of NameNode RPC server. 
> Now, the RPC server will not be created and started until the completion of loading FsImage.
So, DataNodes cannot start to scan blocks immediately, and must wait for NameNode to load
FsImage. This will cause time wasting of DataNode when the FsImage is very large. 
> Since the RPC server has very little dependence of FsImage, and the NamespaceInfo (namespaceID,
clustered, blockpoolID, cTime, etc.) can be constructed from VERSION file, we can create and
start RPC server before loading FsImage, so that DataNodes can get NamespaceInfo from NameNode
via RPC call as soon as possible, and start to scan blocks earlier, which will shorten restart

This message was sent by Atlassian JIRA

View raw message