hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chen Zhang (Jira)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-12288) Fix DataNode's xceiver count calculation
Date Wed, 11 Sep 2019 07:19:00 GMT

    [ https://issues.apache.org/jira/browse/HDFS-12288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16927314#comment-16927314

Chen Zhang commented on HDFS-12288:

Hi [~shahrs87] [~elgoiri], do you have time to take a look? I changed the code according previous
discussion, and uploaded patch v3, it's not a complete patch, only a draft without tests.
{quote}The method {{DataNode#getActiveNumberOfThreads()}} will be return the sum of {{new
DataNode#getXceiverCount() * 2}} + {{Num of Block recovery threads}}.
We just need to have another metric or member variable to track currently running {{Block
recovery threads}}.
The reason we have multiplier of 2 is for every {{Dataxceiver}} thread, we also create {{Packet
Responder thread}}
Actually not all the DataXceiver thread creates PacketResponder thread, only the xceiver processing
WRITE_BLOCK operation will create a PacketResponder thread, so I added 2 additional metrics:
{{dataNodePacketResponderCount}} and {{dataNodeBlockRecoveryWorkerCount}}

> Fix DataNode's xceiver count calculation
> ----------------------------------------
>                 Key: HDFS-12288
>                 URL: https://issues.apache.org/jira/browse/HDFS-12288
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: datanode, hdfs
>            Reporter: Lukas Majercak
>            Assignee: Lukas Majercak
>            Priority: Major
>         Attachments: HDFS-12288.001.patch, HDFS-12288.002.patch, HDFS-12288.003.patch
> The problem with the ThreadGroup.activeCount() method is that the method is only a very
rough estimate, and in reality returns the total number of threads in the thread group as
opposed to the threads actually running.
> In some DNs, we saw this to return 50~ for a long time, even though the actual number
of DataXceiver threads was next to none.
> This is a big issue as we use the xceiverCount to make decisions on the NN for choosing
replication source DN or returning DNs to clients for R/W.
> The plan is to reuse the DataNodeMetrics.dataNodeActiveXceiversCount value which only
accounts for actual number of DataXcevier threads currently running and thus represents the
load on the DN much better.

This message was sent by Atlassian Jira

To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org

View raw message