hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Xiao Chen (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-11847) Enhance dfsadmin listOpenFiles command to list files blocking datanode decommissioning
Date Tue, 19 Dec 2017 05:46:00 GMT

    [ https://issues.apache.org/jira/browse/HDFS-11847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16296254#comment-16296254
] 

Xiao Chen commented on HDFS-11847:
----------------------------------

Thanks for working on this Manoj! This will be a nice tool for troubleshooting decommissioning.

Some comments:
- Since HDFS-10480 is released, we cannot change the APIs unfortunately. It seems to me we'd
have to provide an overload of {{listOpenFiles}}. I like the use of enums though, maybe we
should deprecate the existing API to encourage the new API to always be used.
- From API, do we support {{BLOCKING_DECOMMISSION}} and {{ALL_OPEN_FILES}} both specified?
Implementation in {{FSN#listOpenFiles}} doesn't look like so, but I'm also wondering how we
plan to support them on the same {{OpenFilesIterator}}. Do we want to have types on {{OpenFileEntry}}?
- Usage perspective, it may also be useful if we print out DataNodes.
- {{DatanodeAdminManager#processBlocksInternal}}, maybe we can skip if a block and inode is
inconsistent instead of throw from preconditions? Could log in NN to help debugging, but from
hdfsadmin we can still see other openfiles.
- {{DatanodeAdminManager#processBlocksInternal}}, can we simply use {{lowRedundancyOpenFiles.size()}}
and get rid of {{lowRedundancyBlocksInOpenFiles}}?
- {{LeavingServiceStatus}} similar to above, do we need both the counter and the set of openfiles?
(Holding all inode id would consume more memory, but since this only happens when decommissioning
+ open files, which hopefully would be a tiny portion of all files, I think we're okay)

Nits:
- {{LeavingServiceStatus}} trivial and pre-existing: comment at the end of this class should
say {{End of class LeavingServiceStatus}}, not {{DecommissioningStatus}}
- {{FSN#getFilesBlockingDecom}} suggest to add {{assert hasReadLock();}} to safeguard future
changes
- {{TestDecommission#verifyOpenFilesBlockingDecommission}}: Should save the previous {{System.out}}
as a local var, and set back when we're done. {{System.setOut(System.out);}} won't restore
to the old out. Also the restore logic should be in a finally block.
- {{TestDecommission}}, can we set the {{DFS_NAMENODE_REDUNDANCY_INTERVAL_SECONDS_KEY}} as
{{Integer.MAX_VALUE}}? 1-second may not be robust enough.

> Enhance dfsadmin listOpenFiles command to list files blocking datanode decommissioning
> --------------------------------------------------------------------------------------
>
>                 Key: HDFS-11847
>                 URL: https://issues.apache.org/jira/browse/HDFS-11847
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: hdfs
>    Affects Versions: 3.0.0-alpha1
>            Reporter: Manoj Govindassamy
>            Assignee: Manoj Govindassamy
>         Attachments: HDFS-11847.01.patch, HDFS-11847.02.patch
>
>
> HDFS-10480 adds {{listOpenFiles}} option is to {{dfsadmin}} command to list all the open
files in the system.
> Additionally, it would be very useful to only list open files that are blocking the DataNode
decommissioning. With thousand+ node clusters, where there might be machines added and removed
regularly for maintenance, any option to monitor and debug decommissioning status is very
helpful. Proposal here is to add suboptions to {{listOpenFiles}} for the above case.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message