hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Íñigo Goiri (JIRA) <j...@apache.org>
Subject [jira] [Commented] (HDFS-14624) When decommissioning a node, log remaining blocks to replicate periodically
Date Tue, 02 Jul 2019 20:31:00 GMT

    [ https://issues.apache.org/jira/browse/HDFS-14624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16877291#comment-16877291
] 

Íñigo Goiri commented on HDFS-14624:
------------------------------------

Thanks [~sodonnell] for the details.
I guess this is not too bad.

Any other insight on why you want to do this?
Let's see if we can add more details here.
Otherwise, I'm fine with [^HDFS-14624.001.patch].

> When decommissioning a node, log remaining blocks to replicate periodically
> ---------------------------------------------------------------------------
>
>                 Key: HDFS-14624
>                 URL: https://issues.apache.org/jira/browse/HDFS-14624
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: namenode
>    Affects Versions: 3.3.0
>            Reporter: Stephen O'Donnell
>            Assignee: Stephen O'Donnell
>            Priority: Major
>         Attachments: HDFS-14624.001.patch
>
>
> When a node is marked for decommission, there is a monitor thread which runs every 30
seconds by default, and checks if the node still has pending blocks to be replicated before
the node can complete replication.
> There are two existing debug level messages logged in the monitor thread, DatanodeAdminManager$Monitor.check(),
which log the correct information already, first as the pending blocks are replicated:
> {code:java}
> LOG.debug("Node {} still has {} blocks to replicate "
>     + "before it is a candidate to finish {}.",
>     dn, blocks.size(), dn.getAdminState());{code}
> And then after the initial set of blocks has completed and a rescan happens:
> {code:java}
> LOG.debug("Node {} {} healthy."
>     + " It needs to replicate {} more blocks."
>     + " {} is still in progress.", dn,
>     isHealthy ? "is": "isn't", blocks.size(), dn.getAdminState());{code}
> I would like to propose moving these messages to INFO level so it is easier to monitor
decommission progress over time from the Namenode log.
> Based on the default settings, this would result in at most 1 log message per node being
decommissioned every 30 seconds. The reason this is at the most, is because the monitor thread
stops after checking after 500K blocks and therefore in practice it could be as little as
1 log message per 30 seconds, even if many DNs are being decommissioned at the same time.
> Note that the namenode webUI does display the above information, but having this in the
NN logs would allow progress to be tracked more easily.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message