hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rushabh S Shah (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-12288) Fix DataNode's xceiver count calculation
Date Mon, 14 Aug 2017 16:05:01 GMT

    [ https://issues.apache.org/jira/browse/HDFS-12288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16125883#comment-16125883
] 

Rushabh S Shah commented on HDFS-12288:
---------------------------------------

bq. Add a new method called DataNode#getActiveNumberOfThreads() which has the old behavior
from DataNode#getXceiverCount() (using threadGroup.activeCount()). 
{{thread group}} _was intended_ to hold the sum of {{Dataxceiver thread}}, {{Packet Responder
thread}} and 
{{BlockRecoveryWorker#recoverBlocks()}} as [~hkoneru] mentioned in [this comment|https://issues.apache.org/jira/browse/HDFS-12288?focusedCommentId=16123794&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16123794]
Since  this is not accurate, we can fix this.
The method {{DataNode#getActiveNumberOfThreads()}} will be return the sum of {{new DataNode#getXceiverCount()
* 2}} + {{Num of Block recovery threads}}.
We just need to have another metric or member variable to track currently running {{Block
recovery threads}}.
The reason we have multiplier of 2 is for every {{Dataxceiver}} thread, we also create {{Packet
Responder thread}}
This way, {{DataNode#getXceiverCount()}} will return the correct number of currently running
num of DataXceiver threads and {{DataNode#getActiveNumberOfThreads()}} will fix the threadGroup.activeCount
bug.
Hope this proposal makes sense.
If not, I can write a simple patch to make it clear.
I don't want to hijack this jira. So I will let Lukas work on this since he found the bug
and did all the analysis.


> Fix DataNode's xceiver count calculation
> ----------------------------------------
>
>                 Key: HDFS-12288
>                 URL: https://issues.apache.org/jira/browse/HDFS-12288
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: datanode, hdfs
>            Reporter: Lukas Majercak
>            Assignee: Lukas Majercak
>         Attachments: HDFS-12288.001.patch, HDFS-12288.002.patch
>
>
> The problem with the ThreadGroup.activeCount() method is that the method is only a very
rough estimate, and in reality returns the total number of threads in the thread group as
opposed to the threads actually running.
> In some DNs, we saw this to return 50~ for a long time, even though the actual number
of DataXceiver threads was next to none.
> This is a big issue as we use the xceiverCount to make decisions on the NN for choosing
replication source DN or returning DNs to clients for R/W.
> The plan is to reuse the DataNodeMetrics.dataNodeActiveXceiversCount value which only
accounts for actual number of DataXcevier threads currently running and thus represents the
load on the DN much better.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message