hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Colin Patrick McCabe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-9122) DN automatically add more volumes to avoid large volume
Date Tue, 22 Sep 2015 16:34:06 GMT

    [ https://issues.apache.org/jira/browse/HDFS-9122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14902920#comment-14902920

Colin Patrick McCabe commented on HDFS-9122:

It is an interesting idea, but I think moving to automatically created volumes is a pretty
big step to take.  It would violate a lot of assumptions currently in the code.  "Unsplitting"
volumes when blocks are removed would also be tricky.

Also, just like HDFS-9011, this doesn't solve the main problem with super-large block reports,
which is time consumed on the NN for the processing.  I think federation is a better workaround
in the short term than this JIRA.  We need better long term solutions, of course.

> DN automatically add more volumes to avoid large volume
> -------------------------------------------------------
>                 Key: HDFS-9122
>                 URL: https://issues.apache.org/jira/browse/HDFS-9122
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Walter Su
> Currently if a DataNode has too many blocks, it partition blockReport by storage. In
practice, we've seen a single storage can contains large amount of blocks and the report even
exceeds the max RPC data length. Storage density increases quickly, a DataNode can hold more
and more blocks. It's harder to include so many blocks in one RPC report. One option is "Support
splitting BlockReport of a storage into multiple RPC"(HDFS-9011). 
> I'm thinking maybe we could add more "logical" volumes (more storage directories in one
device). DataNodeStorageInfo in NameNode is cheap. And Processing a single blockReport need
NN hold the lock, so splitting one big volume to many volume can avoid a single processing
hold lock too long.
> We can support wildcard in dfs.datanode.data.dir. Like /physical-volume/dfs/data/dir*
> When a volume exceeds threshold(like 1m blocks), DN automatically create a new storage
directory, also a volume. We have to change RoundRobinVolumeChoosingPolicy as well, once we
chosen a physical volume, we choose the logical volume which has least number of blocks.

This message was sent by Atlassian JIRA

View raw message