hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David Mollitor (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HDFS-14295) Add Threadpool for DataTransfers
Date Tue, 12 Mar 2019 14:53:00 GMT

     [ https://issues.apache.org/jira/browse/HDFS-14295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

David Mollitor updated HDFS-14295:
----------------------------------
    Status: Patch Available  (was: Open)

> Add Threadpool for DataTransfers
> --------------------------------
>
>                 Key: HDFS-14295
>                 URL: https://issues.apache.org/jira/browse/HDFS-14295
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: datanode
>    Affects Versions: 3.2.0
>            Reporter: David Mollitor
>            Assignee: David Mollitor
>            Priority: Major
>         Attachments: HDFS-14295.1.patch, HDFS-14295.2.patch, HDFS-14295.3.patch, HDFS-14295.4.patch,
HDFS-14295.5.patch, HDFS-14295.6.patch, HDFS-14295.7.patch, HDFS-14295.8.patch, HDFS-14295.9.patch
>
>
> When a DataNode data transfers a block, is spins up a new thread for each transfer. 
[Here|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java#L2339]
and [Here|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java#L3019-L3022]. 
 Instead, add the threads to a {{CachedThreadPool}} so that when their threads complete the
transfer, they can be re-used for another transfer. This should save resources spent on creating
and spinning up transfer threads.
> One thing I'll point out that's a bit off, which I address in this patch, ...
> There are two places in the code where a {{DataTransfer}} thread is started. In [one
place|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java#L2339-L2341],
it's started in a default thread group. In [another place|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java#L3019-L3022],
it's started in the [dataXceiverServer|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java#L1164]
thread group.
> I do not think it's correct to include any of these threads in the {{dataXceiverServer}}
thread group. Anything submitted to the {{dataXceiverServer}} should probably be tied to the
{{dfs.datanode.max.transfer.threads}} configurations, and neither of these methods are. Instead,
they should be submitted into the same thread pool with its own thread group (probably the
default thread group, unless someone suggests otherwise) and is what I have included in this
patch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message