hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tsz Wo Nicholas Sze (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-9011) Support splitting BlockReport of a storage into multiple RPC
Date Tue, 13 Oct 2015 04:25:05 GMT

    [ https://issues.apache.org/jira/browse/HDFS-9011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14954355#comment-14954355
] 

Tsz Wo Nicholas Sze commented on HDFS-9011:
-------------------------------------------

Here is a new idea -- we may partition the block ID space so that datanodes can send multiple
small full block reports for each partition.  The partitions needs not be fixed.
- When a full block report is larger than a threshold, the report is split into two reports,
one for blocks with odd ID and one for blocks with even IDs.  If these reports are still too
large, split them into four reports with ID suffixes 00, 01, 10 and 11.  The process continue
until the reports are smaller than the threshold.  Datanode sends each partitioned report
with its suffix.
- Since the block ID space is partitioned, Namenode can process each partitioned report without
knowing the remaining partitioned reports.

> Support splitting BlockReport of a storage into multiple RPC
> ------------------------------------------------------------
>
>                 Key: HDFS-9011
>                 URL: https://issues.apache.org/jira/browse/HDFS-9011
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Jing Zhao
>            Assignee: Jing Zhao
>         Attachments: HDFS-9011.000.patch, HDFS-9011.001.patch, HDFS-9011.002.patch
>
>
> Currently if a DataNode has too many blocks (more than 1m by default), it sends multiple
RPC to the NameNode for the block report, each RPC contains report for a single storage. However,
in practice we've seen sometimes even a single storage can contains large amount of blocks
and the report even exceeds the max RPC data length. It may be helpful to support sending
multiple RPC for the block report of a storage. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message