Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 99435200C09 for ; Wed, 11 Jan 2017 03:01:04 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 97F8D160B3D; Wed, 11 Jan 2017 02:01:04 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id E51F7160B4D for ; Wed, 11 Jan 2017 03:01:03 +0100 (CET) Received: (qmail 95174 invoked by uid 500); 11 Jan 2017 02:01:02 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 94933 invoked by uid 99); 11 Jan 2017 02:01:02 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 11 Jan 2017 02:01:02 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 8F2392C2ABA for ; Wed, 11 Jan 2017 02:01:02 +0000 (UTC) Date: Wed, 11 Jan 2017 02:01:02 +0000 (UTC) From: "Konstantin Shvachko (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HDFS-11313) Segmented Block Reports MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Wed, 11 Jan 2017 02:01:04 -0000 [ https://issues.apache.org/jira/browse/HDFS-11313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15816841#comment-15816841 ] Konstantin Shvachko commented on HDFS-11313: -------------------------------------------- Full block report (FBR) processing on the NameNode does four things: # Update replica information reported by the DataNode for known blocks # Add new replicas reported by the DataNode # Instruct the DataNode to delete replicas, which belong to non-existing blocks # Remove replicas, which NameNode assumed to be present on the DataNode, but which did not appear in the report The main problem with current FBRs is that they are processed under global namesystem lock, and since the reports are big, other operation cannot proceed until the lock is released. On large clusters the current trend is to decrease FBR frequency, sending FBRs once in 6, 10, or even 12 hours. It would be beneficial to split FBRs into smaller even though more frequent RPC calls. If a DataNode were to split its FBR into multiple RPCs arbitrarily, then NameNode wouldn't be able to distinguish between replicas which do not exist on the DataNode from those that have not been yet reported (see 4 above). Therefore, the proposal is to introduce segmented block reports (SBR), where each report includes a segment of IDs. So the DataNode reports all its replicas in the given range of blockIDs, and if some block is not present in the report, the respective replica must be removed from the NameNode. More details: * NameNode allocates blockIDs sequentially. It should partition the set of allocated so far block IDs into reasonably sized segments. The last segment is open ended. * BlockReportCommand is a new DatanodeCommand, which NameNode should send to a DataNode (in reply to a heartbeat) to order a block report within a specified segment. * When DN receives a BlockReportCommand it forms SBR for the requested segment and sends it to NN. The report also includes the segment boundaries. This could be done per storage. * NN processing of SBR is similar to FBR, but bounded to the reported segment. * NN can eventually start optimizing to request SBRs when it is less busy. * Periodic FBRs, can eventually be removed, but for now should remain for backward compatibility. That is if a DN does not receive any BlockReportCommands from NN, it should send FBR. P.S. There is a lot of jiras discussing partial block reports since prehistoric times. I scanned through many, but found only one mentioning of a similar proposal. In HDFS-395 [~cutting] in [his comment|https://issues.apache.org/jira/browse/HDFS-395?focusedCommentId=12593583&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12593583] posted a link to an old discussion on the topic. Unfortunately the link is now stale. > Segmented Block Reports > ----------------------- > > Key: HDFS-11313 > URL: https://issues.apache.org/jira/browse/HDFS-11313 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, namenode > Affects Versions: 2.6.2 > Reporter: Konstantin Shvachko > > Block reports from a single DataNode can be currently split into multiple RPCs each reporting a single DataNode storage (disk). The reports are still large since disks are getting bigger. Splitting blockReport RPCs into multiple smaller calls would improve NameNode performance and overall HDFS stability. > This was discussed in multiple jiras. Here the approach is to let NameNode divide blockID space into segments and then ask DataNodes to report replicas in a particular range of IDs. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org