hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jiandan Yang (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HDFS-12200) Optimize CachedDNSToSwitchMapping to avoid high cpu utilization
Date Wed, 26 Jul 2017 09:27:00 GMT

     [ https://issues.apache.org/jira/browse/HDFS-12200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jiandan Yang  updated HDFS-12200:
---------------------------------
    Description: 
1. Background :
Our hadoop cluster is disaggregated storage and compute, HDFS is deployed to 600+ machines,
YARN is deployed to another machine pool.

We found that sometimes NameNode cpu utilization rate of 90% or even 100%. The most serious
is cpu utilization rate of 100% for a long time result in writing journalNode timeout, eventually
leading to NameNode hang up. The reason is  offline tasks running in a few hundred servers
access HDFS at the same time, NameNode resolve rack of client machine, started several hundreds
to two thousand  sub-process. 

{code:java}
"process reaper"#10864 daemon prio=10 os_prio=0 tid=0x00007fe270a31800 nid=0x38d93 runnable
[0x00007fcdc36fc000]
   java.lang.Thread.State: RUNNABLE
        at java.lang.UNIXProcess.waitForProcessExit(Native Method)
        at java.lang.UNIXProcess.lambda$initStreams$4(UNIXProcess.java:301)
        at java.lang.UNIXProcess$$Lambda$7/1447689627.run(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1147)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:622)
        at java.lang.Thread.run(Thread.java:834
{code}

Our configuration as follows:
{code:java}
net.topology.node.switch.mapping.impl = ScriptBasedMapping, 
net.topology.script.file.name = 'a python script'
{code}



2. Optimization
In order to solve these two problems, we have optimized the CachedDNSToSwitchMapping
(1) Added the DataNode IP list  to the file of  dfs.hosts configured. when NameNode starts
it  preloads DataNode rack information to the cache, get a batch of racks of hosts when running
script once (the corresponding configuration is net.topology.script.number,the default value
of 100)

(2) Step (1) has ensured that the cache has all the DataNodes’ rack,  so if the cache did
not hit, then the host must be a client machine, then directly return /default-rack,

(3) Each time you add new DataNodes you need to add the new DataNodes’ IP address to the
file specified by dfs.hosts, and then run command of bin/hdfs dfsadmin -refreshNodes, it will
put the newly added DataNodes’ rack into cache

(4) Add new configuration items dfs.namenode.topology.resolve-non-cache-host, the value is
false to open the above function, and the value is true to turn off the above functions, default
value is true to keep compatibility


  was:
1. Background :
Our hadoop cluster is disaggregated storage and compute, HDFS is deployed to 600+ machines,
YARN is deployed to another machine pool.

We found that sometimes NameNode cpu utilization rate of 90% or even 100%. The most serious
is cpu utilization rate of 100% for a long time result in writing journalNode timeout, eventually
leading to NameNode hang up. The reason is  offline tasks running in a few hundred servers
access HDFS at the same time, NameNode resolve rack of client machine, started several hundred
sub-process. 

{code:java}
"process reaper"#10864 daemon prio=10 os_prio=0 tid=0x00007fe270a31800 nid=0x38d93 runnable
[0x00007fcdc36fc000]
   java.lang.Thread.State: RUNNABLE
        at java.lang.UNIXProcess.waitForProcessExit(Native Method)
        at java.lang.UNIXProcess.lambda$initStreams$4(UNIXProcess.java:301)
        at java.lang.UNIXProcess$$Lambda$7/1447689627.run(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1147)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:622)
        at java.lang.Thread.run(Thread.java:834
{code}

Our configuration as follows:
{code:java}
net.topology.node.switch.mapping.impl = ScriptBasedMapping, 
net.topology.script.file.name = 'a python script'
{code}



2. Optimization
In order to solve these two problems, we have optimized the CachedDNSToSwitchMapping
(1) Added the DataNode IP list  to the file of  dfs.hosts configured. when NameNode starts
it  preloads DataNode rack information to the cache, get a batch of racks of hosts when running
script once (the corresponding configuration is net.topology.script.number,the default value
of 100)

(2) Step (1) has ensured that the cache has all the DataNodes’ rack,  so if the cache did
not hit, then the host must be a client machine, then directly return /default-rack,

(3) Each time you add new DataNodes you need to add the new DataNodes’ IP address to the
file specified by dfs.hosts, and then run command of bin/hdfs dfsadmin -refreshNodes, it will
put the newly added DataNodes’ rack into cache

(4) Add new configuration items dfs.namenode.topology.resolve-non-cache-host, the value is
false to open the above function, and the value is true to turn off the above functions, default
value is true to keep compatibility



> Optimize CachedDNSToSwitchMapping to avoid high cpu utilization
> ---------------------------------------------------------------
>
>                 Key: HDFS-12200
>                 URL: https://issues.apache.org/jira/browse/HDFS-12200
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: namenode
>            Reporter: Jiandan Yang 
>         Attachments: cpu_ utilization.png, HDFS-12200-001.patch, nn_thread_num.png
>
>
> 1. Background :
> Our hadoop cluster is disaggregated storage and compute, HDFS is deployed to 600+ machines,
YARN is deployed to another machine pool.
> We found that sometimes NameNode cpu utilization rate of 90% or even 100%. The most serious
is cpu utilization rate of 100% for a long time result in writing journalNode timeout, eventually
leading to NameNode hang up. The reason is  offline tasks running in a few hundred servers
access HDFS at the same time, NameNode resolve rack of client machine, started several hundreds
to two thousand  sub-process. 
> {code:java}
> "process reaper"#10864 daemon prio=10 os_prio=0 tid=0x00007fe270a31800 nid=0x38d93 runnable
[0x00007fcdc36fc000]
>    java.lang.Thread.State: RUNNABLE
>         at java.lang.UNIXProcess.waitForProcessExit(Native Method)
>         at java.lang.UNIXProcess.lambda$initStreams$4(UNIXProcess.java:301)
>         at java.lang.UNIXProcess$$Lambda$7/1447689627.run(Unknown Source)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1147)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:622)
>         at java.lang.Thread.run(Thread.java:834
> {code}
> Our configuration as follows:
> {code:java}
> net.topology.node.switch.mapping.impl = ScriptBasedMapping, 
> net.topology.script.file.name = 'a python script'
> {code}
> 2. Optimization
> In order to solve these two problems, we have optimized the CachedDNSToSwitchMapping
> (1) Added the DataNode IP list  to the file of  dfs.hosts configured. when NameNode starts
it  preloads DataNode rack information to the cache, get a batch of racks of hosts when running
script once (the corresponding configuration is net.topology.script.number,the default value
of 100)
> (2) Step (1) has ensured that the cache has all the DataNodes’ rack,  so if the cache
did not hit, then the host must be a client machine, then directly return /default-rack,
> (3) Each time you add new DataNodes you need to add the new DataNodes’ IP address to
the file specified by dfs.hosts, and then run command of bin/hdfs dfsadmin -refreshNodes,
it will put the newly added DataNodes’ rack into cache
> (4) Add new configuration items dfs.namenode.topology.resolve-non-cache-host, the value
is false to open the above function, and the value is true to turn off the above functions,
default value is true to keep compatibility



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message