hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hadoop QA (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-12200) Optimize CachedDNSToSwitchMapping to avoid 100% cpu utilization
Date Wed, 26 Jul 2017 12:02:00 GMT

    [ https://issues.apache.org/jira/browse/HDFS-12200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16101594#comment-16101594
] 

Hadoop QA commented on HDFS-12200:
----------------------------------

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 13s{color} | {color:blue}
Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  0s{color} |
{color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m  0s{color}
| {color:green} The patch appears to include 1 new or modified test files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 23s{color} | {color:blue}
Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 13m 12s{color}
| {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 13m 36s{color} |
{color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 58s{color}
| {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 27s{color} |
{color:green} trunk passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  1m 44s{color} | {color:red}
hadoop-hdfs-project/hadoop-hdfs in trunk has 10 extant Findbugs warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 37s{color} |
{color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 13s{color} | {color:blue}
Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 28s{color}
| {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 10m 16s{color} |
{color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 10m 16s{color} | {color:green}
the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  1m 58s{color}
| {color:orange} root: The patch generated 15 new + 449 unchanged - 0 fixed = 464 total (was
449) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 26s{color} |
{color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  0s{color} | {color:red}
The patch has 6 line(s) that end in whitespace. Use git apply --whitespace=fix <<patch_file>>.
Refer https://git-scm.com/docs/git-apply {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 25s{color} |
{color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 37s{color} |
{color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  8m  7s{color} | {color:red}
hadoop-common in the patch failed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 64m 17s{color} | {color:red}
hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 36s{color}
| {color:green} The patch does not generate ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black}133m 10s{color} | {color:black}
{color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.net.TestDNS |
|   | hadoop.tools.TestHdfsConfigFields |
|   | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure150 |
|   | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure070 |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:14b5c93 |
| JIRA Issue | HDFS-12200 |
| JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12878967/HDFS-12200-002.patch
|
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  unit  findbugs
 checkstyle  |
| uname | Linux 3f4a5266f42b 3.13.0-119-generic #166-Ubuntu SMP Wed May 3 12:18:55 UTC 2017
x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / a92bf39 |
| Default Java | 1.8.0_131 |
| findbugs | v3.1.0-RC1 |
| findbugs | https://builds.apache.org/job/PreCommit-HDFS-Build/20415/artifact/patchprocess/branch-findbugs-hadoop-hdfs-project_hadoop-hdfs-warnings.html
|
| checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/20415/artifact/patchprocess/diff-checkstyle-root.txt
|
| whitespace | https://builds.apache.org/job/PreCommit-HDFS-Build/20415/artifact/patchprocess/whitespace-eol.txt
|
| unit | https://builds.apache.org/job/PreCommit-HDFS-Build/20415/artifact/patchprocess/patch-unit-hadoop-common-project_hadoop-common.txt
|
| unit | https://builds.apache.org/job/PreCommit-HDFS-Build/20415/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
|
|  Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/20415/testReport/ |
| modules | C: hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs U: . |
| Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/20415/console |
| Powered by | Apache Yetus 0.6.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Optimize CachedDNSToSwitchMapping to avoid 100% cpu utilization
> ---------------------------------------------------------------
>
>                 Key: HDFS-12200
>                 URL: https://issues.apache.org/jira/browse/HDFS-12200
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: namenode
>            Reporter: Jiandan Yang 
>            Assignee: Jiandan Yang 
>         Attachments: cpu_ utilization.png, HDFS-12200-001.patch, HDFS-12200-002.patch,
nn_thread_num.png
>
>
> 1. Background :
> Our hadoop cluster is disaggregated storage and compute, HDFS is deployed to 600+ machines,
YARN is deployed to another machine pool.
> We found that sometimes NameNode cpu utilization rate of 90% or even 100%. The most serious
is cpu utilization rate of 100% for a long time result in writing journalNode timeout, eventually
leading to NameNode hang up. The reason is  offline tasks running in a few hundred servers
access HDFS at the same time, NameNode resolve rack of client machine, started several hundreds
to two thousand  sub-process. 
> {code:java}
> "process reaper"#10864 daemon prio=10 os_prio=0 tid=0x00007fe270a31800 nid=0x38d93 runnable
[0x00007fcdc36fc000]
>    java.lang.Thread.State: RUNNABLE
>         at java.lang.UNIXProcess.waitForProcessExit(Native Method)
>         at java.lang.UNIXProcess.lambda$initStreams$4(UNIXProcess.java:301)
>         at java.lang.UNIXProcess$$Lambda$7/1447689627.run(Unknown Source)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1147)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:622)
>         at java.lang.Thread.run(Thread.java:834
> {code}
> Our configuration as follows:
> {code:java}
> net.topology.node.switch.mapping.impl = ScriptBasedMapping, 
> net.topology.script.file.name = 'a python script'
> {code}
> 2. Optimization
> In order to solve these two problems, we have optimized the CachedDNSToSwitchMapping
> (1) Added the DataNode IP list  to the file of  dfs.hosts configured. when NameNode starts
it  preloads DataNode rack information to the cache, get a batch of racks of hosts when running
script once (the corresponding configuration is net.topology.script.number,the default value
of 100)
> (2) Step (1) has ensured that the cache has all the DataNodes’ rack,  so if the cache
did not hit, then the host must be a client machine, then directly return /default-rack,
> (3) Each time you add new DataNodes you need to add the new DataNodes’ IP address to
the file specified by dfs.hosts, and then run command of bin/hdfs dfsadmin -refreshNodes,
it will put the newly added DataNodes’ rack into cache
> (4) Add new configuration items dfs.namenode.topology.resolve-non-cache-host, the value
is false to open the above function, and the value is true to turn off the above functions,
default value is true to keep compatibility



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message