hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rushabh S Shah (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-12288) Fix DataNode's xceiver count calculation
Date Mon, 14 Aug 2017 16:05:01 GMT

    [ https://issues.apache.org/jira/browse/HDFS-12288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16125883#comment-16125883

Rushabh S Shah commented on HDFS-12288:

bq. Add a new method called DataNode#getActiveNumberOfThreads() which has the old behavior
from DataNode#getXceiverCount() (using threadGroup.activeCount()). 
{{thread group}} _was intended_ to hold the sum of {{Dataxceiver thread}}, {{Packet Responder
thread}} and 
{{BlockRecoveryWorker#recoverBlocks()}} as [~hkoneru] mentioned in [this comment|https://issues.apache.org/jira/browse/HDFS-12288?focusedCommentId=16123794&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16123794]
Since  this is not accurate, we can fix this.
The method {{DataNode#getActiveNumberOfThreads()}} will be return the sum of {{new DataNode#getXceiverCount()
* 2}} + {{Num of Block recovery threads}}.
We just need to have another metric or member variable to track currently running {{Block
recovery threads}}.
The reason we have multiplier of 2 is for every {{Dataxceiver}} thread, we also create {{Packet
Responder thread}}
This way, {{DataNode#getXceiverCount()}} will return the correct number of currently running
num of DataXceiver threads and {{DataNode#getActiveNumberOfThreads()}} will fix the threadGroup.activeCount
Hope this proposal makes sense.
If not, I can write a simple patch to make it clear.
I don't want to hijack this jira. So I will let Lukas work on this since he found the bug
and did all the analysis.

> Fix DataNode's xceiver count calculation
> ----------------------------------------
>                 Key: HDFS-12288
>                 URL: https://issues.apache.org/jira/browse/HDFS-12288
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: datanode, hdfs
>            Reporter: Lukas Majercak
>            Assignee: Lukas Majercak
>         Attachments: HDFS-12288.001.patch, HDFS-12288.002.patch
> The problem with the ThreadGroup.activeCount() method is that the method is only a very
rough estimate, and in reality returns the total number of threads in the thread group as
opposed to the threads actually running.
> In some DNs, we saw this to return 50~ for a long time, even though the actual number
of DataXceiver threads was next to none.
> This is a big issue as we use the xceiverCount to make decisions on the NN for choosing
replication source DN or returning DNs to clients for R/W.
> The plan is to reuse the DataNodeMetrics.dataNodeActiveXceiversCount value which only
accounts for actual number of DataXcevier threads currently running and thus represents the
load on the DN much better.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org

View raw message