Return-Path: X-Original-To: apmail-hadoop-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id AEBC618952 for ; Fri, 24 Jul 2015 06:35:35 +0000 (UTC) Received: (qmail 19207 invoked by uid 500); 24 Jul 2015 06:35:27 -0000 Delivered-To: apmail-hadoop-user-archive@hadoop.apache.org Received: (qmail 19099 invoked by uid 500); 24 Jul 2015 06:35:27 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 19089 invoked by uid 99); 24 Jul 2015 06:35:27 -0000 Received: from Unknown (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 24 Jul 2015 06:35:27 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 16349C0484 for ; Fri, 24 Jul 2015 06:35:27 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.98 X-Spam-Level: ** X-Spam-Status: No, score=2.98 tagged_above=-999 required=6.31 tests=[HTML_MESSAGE=3, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Received: from mx1-us-east.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id p4EfaosYQsGq for ; Fri, 24 Jul 2015 06:35:16 +0000 (UTC) Received: from szxga01-in.huawei.com (szxga01-in.huawei.com [58.251.152.64]) by mx1-us-east.apache.org (ASF Mail Server at mx1-us-east.apache.org) with ESMTPS id 3410E43DDA for ; Fri, 24 Jul 2015 06:35:14 +0000 (UTC) Received: from 172.24.1.51 (EHLO szxeml425-hub.china.huawei.com) ([172.24.1.51]) by szxrg01-dlp.huawei.com (MOS 4.3.7-GA FastPath queued) with ESMTP id CRS83047; Fri, 24 Jul 2015 14:35:12 +0800 (CST) Received: from [127.0.0.1] (10.66.76.82) by szxeml425-hub.china.huawei.com (10.82.67.180) with Microsoft SMTP Server id 14.3.158.1; Fri, 24 Jul 2015 14:35:02 +0800 Message-ID: <55B1DC93.6070406@hisilicon.com> Date: Fri, 24 Jul 2015 14:34:59 +0800 From: Zhudacai User-Agent: Mozilla/5.0 (Windows NT 6.1; rv:24.0) Gecko/20100101 Thunderbird/24.4.0 MIME-Version: 1.0 To: , CC: zhanweitao Subject: Container exited with a non-zero exit code Content-Type: multipart/alternative; boundary="------------090508030702090902000009" X-Originating-IP: [10.66.76.82] X-CFilter-Loop: Reflected --------------090508030702090902000009 Content-Type: text/plain; charset="GB2312" Content-Transfer-Encoding: 7bit Hi.all, I've just done a fresh install of Hadoop with three nodes, one master (NameNode, SecondNameNode, ResourceTracker) and two slaves (DataNode). The HDFS are successfully formatted, all services are up. When I run the examples, e.g. teragen, terasort, I occasionally got this exception: /15/07/23 19:55:34 INFO mapreduce.Job: map 0% reduce 0%// //15/07/23 19:55:40 INFO mapreduce.Job: Task Id : attempt_1437652487249_0001_m_000000_0, Status : FAILED// //Exception from container-launch.// //Container id: container_1437652487249_0001_01_000002// //Exit code: 134// //Exception message: /bin/bash: line 1: 21736 Aborted // ///usr/openjdk-1.8.0-internal/bin/java -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN -Xmx200m // //-Djava.io.tmpdir=/home/hadoop3/tmp/nm-local-dir/usercache/root/appcache/application_1437652487249_0001/container_1437652487249_0001_01_000002/tmp // //-Dlog4j.configuration=container-log4j.properties // //-Dyarn.app.container.log.dir=/home/hadoop3/hadoop-2.6.0/logs/userlogs/application_1437652487249_0001/container_1437652487249_0001_01_000002 // //-Dyarn.app.container.log.filesize=0 // //-Dhadoop.root.logger=INFO,CLA org.apache.hadoop.mapred.YarnChild 192.168.1.9 39868 attempt_1437652487249_0001_m_000000_0 2 > // ///home/hadoop3/hadoop-2.6.0/logs/userlogs/application_1437652487249_0001/container_1437652487249_0001_01_000002/stdout 2> /home/hadoop3/hadoop-2.6.0/logs/userlogs/application_1437652487249_0001/container_1437652487249_0001_01_000002/stderr// // //Stack trace: ExitCodeException exitCode=134: /bin/bash: line 1: 21736 Aborted // ///usr/openjdk-1.8.0-internal/bin/java -Djava.net.preferIPv4Stack=true // //-Dhadoop.metrics.log.level=WARN -Xmx200m -Djava.io.tmpdir=/home/hadoop3/tmp/nm-local-dir/usercache/root/appcache/application_1437652487249_0001/container_1437652487249_0001_01_000002/tmp // //-Dlog4j.configuration=container-log4j.properties // //-Dyarn.app.container.log.dir=/home/hadoop3/hadoop-2.6.0/logs/userlogs/application_1437652487249_0001/container_1437652487249_0001_01_000002 -Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA org.apache.hadoop.mapred.YarnChild 192.168.1.9 39868 attempt_1437652487249_0001_m_000000_0 2 > /home/hadoop3/hadoop-2.6.0/logs/userlogs/application_1437652487249_0001/container_1437652487249_0001_01_000002/stdout 2> /home/hadoop3/hadoop-2.6.0/logs/userlogs/application_1437652487249_0001/container_1437652487249_0001_01_000002/stderr// // //at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)// //at org.apache.hadoop.util.Shell.run(Shell.java:455)// //at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715)// //at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211)// //at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)// //at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)// //at java.util.concurrent.FutureTask.run(FutureTask.java:266)// //at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)// //at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)// //at java.lang.Thread.run(Thread.java:745)// // // //Container exited with a non-zero exit code 134// // //15/07/23 19:55:46 INFO mapreduce.Job: map 3% reduce 0%/ But the job could succfully completed. Here're the configurations: core-site.xml /// //// //fs.defaultFS// //hdfs://master:9000// //true// //// //// //hadoop.tmp.dir// //file:/home/hadoop3/tmp// //// //// // //hdfs-site.xml// //// //// //dfs.namenode.secondary.http-address// //master:50090// //// //// //dfs.namenode.name.dir// //file:/home/hadoop3/tmp/dfs/name// //// //// //dfs.datanode.data.dir// //file:/home/hadoop3/tmp/dfs/data// //// //// //dfs.replication// //1// //// //// / yarn-site.xml /// //// //yarn.resourcemanager.hostname// //master// //// //// //yarn.nodemanager.aux-services// //mapreduce_shuffle// //// //// //yarn.nodemanager.aux-services.mapreduce_shuffle.class// //org.apache.hadoop.mapred.ShuffleHandler// //// //// //yarn.resourcemanager.address// //master:8032// //// //// //yarn.resourcemanager.scheduler.address// //master:8030// //// //// //yarn.resourcemanager.resource-tracker.address// //master:8035// //// //// //yarn.resourcemanager.admin.address// //master:8033// //// //// //yarn.resourcemanager.webapp.address// //master:8088// //// //// //yarn.nodemanager.resource.cpu-vcores// //16// //// /// mapred-site.xml /// //// //mapreduce.framework.name// //yarn// //// /// I also noticed if hadoop was installed on single node, the exception would never show up. The number of the exception increases with the values of -Dmapred.map.tasks and -Dmapred.reduce.tasks. I'm using hadoop 2.6.0, OpenJDK 1.8, runing on the arm64 platform. Best Regards Jared --------------090508030702090902000009 Content-Type: text/html; charset="GB2312" Content-Transfer-Encoding: 7bit Hi.all,

I've just done a fresh install of Hadoop with three nodes, one master (NameNode, SecondNameNode, ResourceTracker) and two slaves (DataNode). The HDFS are successfully formatted, all services are up. When I run the examples, e.g. teragen, terasort, I occasionally got this exception:

15/07/23 19:55:34 INFO mapreduce.Job:  map 0% reduce 0%
15/07/23 19:55:40 INFO mapreduce.Job: Task Id : attempt_1437652487249_0001_m_000000_0, Status : FAILED
Exception from container-launch.
Container id: container_1437652487249_0001_01_000002
Exit code: 134
Exception message: /bin/bash: line 1: 21736 Aborted                 
/usr/openjdk-1.8.0-internal/bin/java -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN -Xmx200m
-Djava.io.tmpdir=/home/hadoop3/tmp/nm-local-dir/usercache/root/appcache/application_1437652487249_0001/container_1437652487249_0001_01_000002/tmp
-Dlog4j.configuration=container-log4j.properties
-Dyarn.app.container.log.dir=/home/hadoop3/hadoop-2.6.0/logs/userlogs/application_1437652487249_0001/container_1437652487249_0001_01_000002
-Dyarn.app.container.log.filesize=0
-Dhadoop.root.logger=INFO,CLA org.apache.hadoop.mapred.YarnChild 192.168.1.9 39868 attempt_1437652487249_0001_m_000000_0 2 >
/home/hadoop3/hadoop-2.6.0/logs/userlogs/application_1437652487249_0001/container_1437652487249_0001_01_000002/stdout 2> /home/hadoop3/hadoop-2.6.0/logs/userlogs/application_1437652487249_0001/container_1437652487249_0001_01_000002/stderr

Stack trace: ExitCodeException exitCode=134: /bin/bash: line 1: 21736 Aborted                 
/usr/openjdk-1.8.0-internal/bin/java -Djava.net.preferIPv4Stack=true
-Dhadoop.metrics.log.level=WARN -Xmx200m -Djava.io.tmpdir=/home/hadoop3/tmp/nm-local-dir/usercache/root/appcache/application_1437652487249_0001/container_1437652487249_0001_01_000002/tmp
-Dlog4j.configuration=container-log4j.properties
-Dyarn.app.container.log.dir=/home/hadoop3/hadoop-2.6.0/logs/userlogs/application_1437652487249_0001/container_1437652487249_0001_01_000002 -Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA org.apache.hadoop.mapred.YarnChild 192.168.1.9 39868 attempt_1437652487249_0001_m_000000_0 2 > /home/hadoop3/hadoop-2.6.0/logs/userlogs/application_1437652487249_0001/container_1437652487249_0001_01_000002/stdout 2> /home/hadoop3/hadoop-2.6.0/logs/userlogs/application_1437652487249_0001/container_1437652487249_0001_01_000002/stderr

    at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
    at org.apache.hadoop.util.Shell.run(Shell.java:455)
    at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715)
    at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211)
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)


Container exited with a non-zero exit code 134

15/07/23 19:55:46 INFO mapreduce.Job:  map 3% reduce 0%

But the job could succfully completed.
Here're the configurations:

core-site.xml
<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://master:9000</value>
        <final>true</final>
    </property>
    <property>
        <name>hadoop.tmp.dir</name>
        <value>file:/home/hadoop3/tmp</value>
    </property>
</configuration>

hdfs-site.xml
<configuration>
<property>
    <name>dfs.namenode.secondary.http-address</name>
    <value>master:50090</value>
</property>
<property>
    <name>dfs.namenode.name.dir</name>
    <value>file:/home/hadoop3/tmp/dfs/name</value>
</property>
<property>
    <name>dfs.datanode.data.dir</name>
    <value>file:/home/hadoop3/tmp/dfs/data</value>
</property>
<property>
    <name>dfs.replication</name>
    <value>1</value>
</property>
</configuration>

yarn-site.xml
<configuration>
  <property>
    <name>yarn.resourcemanager.hostname</name>
    <value>master</value>
  </property>
  <property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
  </property>
  <property>
   <name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
   <value>org.apache.hadoop.mapred.ShuffleHandler</value>
  </property>
  <property>
   <name>yarn.resourcemanager.address</name>
   <value>master:8032</value>
  </property>
  <property>
   <name>yarn.resourcemanager.scheduler.address</name>
   <value>master:8030</value>
  </property>
  <property>
   <name>yarn.resourcemanager.resource-tracker.address</name>
   <value>master:8035</value>
  </property>
  <property>
   <name>yarn.resourcemanager.admin.address</name>
   <value>master:8033</value>
  </property>
  <property>
   <name>yarn.resourcemanager.webapp.address</name>
   <value>master:8088</value>
  </property>
  <property>
   <name>yarn.nodemanager.resource.cpu-vcores</name>
   <value>16</value>
  </property>
</configuration>

mapred-site.xml
<configuration>
<property>
    <name>mapreduce.framework.name</name>
    <value>yarn</value>
</property>
</configuration>

I also noticed if hadoop was installed on single node, the exception would never show up.
The number of the exception increases with the values of -Dmapred.map.tasks and -Dmapred.reduce.tasks.

I'm using hadoop 2.6.0,  OpenJDK 1.8, runing on the arm64 platform.

Best Regards

Jared

--------------090508030702090902000009--