Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Date: Wed, 11 Jan 2017 02:01:02 +0000 (UTC)
From: "Konstantin Shvachko (JIRA)" <jira@apache.org>
To: hdfs-issues@hadoop.apache.org
Message-ID: <JIRA.13033573.1484099359000.703925.1484100062583@Atlassian.JIRA>
In-Reply-To: <JIRA.13033573.1484099359000@Atlassian.JIRA>
References: <JIRA.13033573.1484099359000@Atlassian.JIRA> <JIRA.13033573.1484099359517@arcas>
Subject: [jira] [Commented] (HDFS-11313) Segmented Block Reports
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
archived-at: Wed, 11 Jan 2017 02:01:04 -0000


    [ https://issues.apache.org/jira/browse/HDFS-11313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15816841#comment-15816841 ] 

Konstantin Shvachko commented on HDFS-11313:
--------------------------------------------

Full block report (FBR) processing on the NameNode does four things:
# Update replica information reported by the DataNode for known blocks
# Add new replicas reported by the DataNode
# Instruct the DataNode to delete replicas, which belong to non-existing blocks
# Remove replicas, which NameNode assumed to be present on the DataNode, but which did not appear in the report

The main problem with current FBRs is that they are processed under global namesystem lock, and since the reports are big, other operation cannot proceed until the lock is released. On large clusters the current trend is to decrease FBR frequency, sending FBRs once in 6, 10, or even 12 hours. It would be beneficial to split FBRs into smaller even though more frequent RPC calls.

If a DataNode were to split its FBR into multiple RPCs arbitrarily, then NameNode wouldn't be able to distinguish between replicas which do not exist on the DataNode from those that have not been yet reported (see 4 above).
Therefore, the proposal is to introduce segmented block reports (SBR), where each report includes a segment of IDs. So the DataNode reports all its replicas in the given range of blockIDs, and if some block is not present in the report, the respective replica must be removed from the NameNode.
More details:
* NameNode allocates blockIDs sequentially. It should partition the set of allocated so far block IDs into reasonably sized segments. The last segment is open ended.
* BlockReportCommand is a new DatanodeCommand, which NameNode should send to a DataNode (in reply to a heartbeat) to order a block report within a specified segment.
* When DN receives a BlockReportCommand it forms SBR for the requested segment and sends it to NN. The report also includes the segment boundaries. This could be done per storage.
* NN processing of SBR is similar to FBR, but bounded to the reported segment.
* NN can eventually start optimizing to request SBRs when it is less busy.
* Periodic FBRs, can eventually be removed, but for now should remain for backward compatibility. That is if a DN does not receive any BlockReportCommands from NN, it should send FBR.

P.S. There is a lot of jiras discussing partial block reports since prehistoric times. I scanned through many, but found only one mentioning of a similar proposal. In HDFS-395 [~cutting] in [his comment|https://issues.apache.org/jira/browse/HDFS-395?focusedCommentId=12593583&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12593583] posted a link to an old discussion on the topic. Unfortunately the link is now stale.

> Segmented Block Reports
> -----------------------
>
>                 Key: HDFS-11313
>                 URL: https://issues.apache.org/jira/browse/HDFS-11313
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: datanode, namenode
>    Affects Versions: 2.6.2
>            Reporter: Konstantin Shvachko
>
> Block reports from a single DataNode can be currently split into multiple RPCs each reporting a single DataNode storage (disk). The reports are still large since disks are getting bigger. Splitting blockReport RPCs into multiple smaller calls would improve NameNode performance and overall HDFS stability.
> This was discussed in multiple jiras. Here the approach is to let NameNode divide blockID space into segments and then ask DataNodes to report replicas in a particular range of IDs.


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org