hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "BELUGA BEHR (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HDFS-14292) Introduce Java ExecutorService to DataXceiverServer
Date Mon, 18 Feb 2019 19:25:00 GMT

     [ https://issues.apache.org/jira/browse/HDFS-14292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

BELUGA BEHR updated HDFS-14292:
    Attachment:     (was: HDFS-14292.2.patc)

> Introduce Java ExecutorService to DataXceiverServer
> ---------------------------------------------------
>                 Key: HDFS-14292
>                 URL: https://issues.apache.org/jira/browse/HDFS-14292
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: datanode
>    Affects Versions: 3.2.0
>            Reporter: BELUGA BEHR
>            Assignee: BELUGA BEHR
>            Priority: Major
>         Attachments: HDFS-14292.1.patch
> I wanted to investigate {{dfs.datanode.max.transfer.threads}} from {{hdfs-site.xml}}.
 It is described as "Specifies the maximum number of threads to use for transferring data
in and out of the DN."   The default value is 4096.  I found it interesting because 4096 threads
sounds like a lot to me.  I'm not sure how a system with 8-16 cores would react to this large
a thread count.  Intuitively, I would say that the overhead of context switching would be
> During mt investigation, I discovered the [following|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiverServer.java#L203-L216]
setup in the {{DataXceiverServer}} class:
> # A peer connects to a DataNode
> # A new thread is spun up to service this connection
> # The thread runs to completion
> # The tread dies
> It would perhaps be better if we used a thread pool to better manage the lifecycle of
the service threads and to allow the DataNode to re-use existing threads, saving on the need
to create and spin-up threads on demand.
> In this JIRA, I have added a couple of things:
> # Added a thread pool that will always maintain a single thread running, always awaiting
a new connection should one arrive.  On-demand, it will create up to {{dfs.datanode.max.transfer.threads}}.
 A thread that has completed its prior duties will stay idle for up to 30 seconds, it will
be retired if no new work has arrived.
> # Added new methods to the {{Peer}} Interface to allow for better logging and less code
within each Thread ({{DataXceiver}}).
> # Updated the Thread code ({{DataXceiver}}) regarding its interactions with {{blockReceiver}}
instance variable

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org

View raw message