hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From blinkeye <li...@blinkeye.ch.INVALID>
Subject WordCount demo does not work with Yarn: NodeManager: RECEIVED SIGNAL 15: SIGTERM and exit code from container xy is : 143 [hadoop 2.6.4, hadoop 2.7.2, hadoop 2.7.3]
Date Sun, 04 Sep 2016 16:33:55 GMT
Hi everybody

I'm trying to run the wordcount example with a 1000 copies of a 
gutenberg file downloaded and extracted from: 
http://www.gutenberg.lib.md.us/1/0/0/0/10001/10001.zip, e.g. like:

$ cd /home/input
$ for i in `seq 1 999`; do cp 0.txt $i.txt; done
$ start-dfs.sh
$ hdfs namenode -format
$ hdfs dfs -mkdir /user
$ hdfs dfs -mkdir /user/hduser
$ hdfs dfs -put /home/input/ input

As you can see from http://hdmaster:50070/dfshealth.html#tab-datanode 
the multiplied gutenburg file has been successfully uploaded 1000x times 
to hdfs:

Node				Last contact	Admin State	Capacity	Used		Non DFS 
Used	Remaining	Blocks	Block pool used		Failed Volumes	Version
hdmaster (10.10.10.10:50010)	1		In Service	238.32 GB	56.11 MB	48.73 
GB	189.53 GB	1000	56.11 MB (0.02%)	0		2.6.4

When I run the wordcount example without yarn in a pseudo cluster, it 
works (and fast):

$ hadoop jar 
./hadoop-2.6.4/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.4.jar 
wordcount input output

16/09/04 18:18:29 INFO mapred.LocalJobRunner: 1000 / 1000 copied.
16/09/04 18:18:29 INFO reduce.MergeManagerImpl: finalMerge called with 
1000 in-memory map-outputs and 0 on-disk map-outputs
16/09/04 18:18:29 INFO mapred.Merger: Merging 1000 sorted segments
16/09/04 18:18:29 INFO mapred.Merger: Down to the last merge-pass, with 
1000 segments left of total size: 40716000 bytes
16/09/04 18:18:31 INFO reduce.MergeManagerImpl: Merged 1000 segments, 
40722000 bytes to disk to satisfy reduce memory limit
16/09/04 18:18:31 INFO reduce.MergeManagerImpl: Merging 1 files, 
40720006 bytes from disk
16/09/04 18:18:31 INFO reduce.MergeManagerImpl: Merging 0 segments, 0 
bytes from memory into reduce
16/09/04 18:18:31 INFO mapred.Merger: Merging 1 sorted segments
16/09/04 18:18:31 INFO mapred.Merger: Down to the last merge-pass, with 
1 segments left of total size: 40719996 bytes
16/09/04 18:18:31 INFO mapred.LocalJobRunner: 1000 / 1000 copied.
16/09/04 18:18:31 INFO Configuration.deprecation: mapred.skip.on is 
deprecated. Instead, use mapreduce.job.skiprecords
16/09/04 18:18:33 INFO mapred.Task: 
Task:attempt_local1736492896_0001_r_000000_0 is done. And is in the 
process of committing
16/09/04 18:18:33 INFO mapred.LocalJobRunner: 1000 / 1000 copied.
16/09/04 18:18:33 INFO mapred.Task: Task 
attempt_local1736492896_0001_r_000000_0 is allowed to commit now
16/09/04 18:18:33 INFO output.FileOutputCommitter: Saved output of task 
'attempt_local1736492896_0001_r_000000_0' to 
hdfs://hdmaster:54310/user/hduser/output/_temporary/0/task_local1736492896_0001_r_000000
16/09/04 18:18:33 INFO mapred.LocalJobRunner: reduce > reduce
16/09/04 18:18:33 INFO mapred.Task: Task 
'attempt_local1736492896_0001_r_000000_0' done.
16/09/04 18:18:33 INFO mapred.LocalJobRunner: Finishing task: 
attempt_local1736492896_0001_r_000000_0
16/09/04 18:18:33 INFO mapred.LocalJobRunner: reduce task executor 
complete.
16/09/04 18:18:33 INFO mapreduce.Job:  map 100% reduce 100%
16/09/04 18:18:33 INFO mapreduce.Job: Job job_local1736492896_0001 
completed successfully
16/09/04 18:18:33 INFO mapreduce.Job: Counters: 38
	File System Counters
		FILE: Number of bytes read=2865334368
		FILE: Number of bytes written=21137755248
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=26333765000
		HDFS: Number of bytes written=37800
		HDFS: Number of read operations=1007007
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=1003
	Map-Reduce Framework
		Map input records=958000
		Map output records=8807000
		Map output bytes=85735000
		Map output materialized bytes=40726000
		Input split bytes=111890
		Combine input records=8807000
		Combine output records=3035000
		Reduce input groups=3035
		Reduce shuffle bytes=40726000
		Reduce input records=3035000
		Reduce output records=3035
		Spilled Records=6070000
		Shuffled Maps =1000
		Failed Shuffles=0
		Merged Map outputs=1000
		GC time elapsed (ms)=6358
		CPU time spent (ms)=0
		Physical memory (bytes) snapshot=0
		Virtual memory (bytes) snapshot=0
		Total committed heap usage (bytes)=1586989891584
	Shuffle Errors
		BAD_ID=0
		CONNECTION=0
		IO_ERROR=0
		WRONG_LENGTH=0
		WRONG_MAP=0
		WRONG_REDUCE=0
	File Input Format Counters
		Bytes Read=52510000
	File Output Format Counters
		Bytes Written=37800
90.20user 2.64system 1:01.70elapsed 150%CPU (0avgtext+0avgdata 
1821840maxresident)k
0inputs+170688outputs (0major+61251minor)pagefaults 0swaps


When I add the minimal settings to mapred-site.xml to run the program in 
a pseudo distributed yarn cluster:

mapred-site.xml:

    <property>
       <name>mapreduce.framework.name</name>
       <value>yarn</value>
       <description>The framework for running mapreduce 
jobs</description>
    </property>

and yarn-site.xml:

   <property>
      <name>yarn.nodemanager.aux-services</name>
      <value>mapreduce_shuffle</value>
   </property>

the wordcount program randomly exits and never finishes. I've tried 
different hadoop versions (2.6.4, 2.7.2, 2.7.3) and an up to date Ubuntu 
16.04 release and an up to date Gentoo Linux system. I've tried with 
Oracle JDK 1.7 and 1.8 to no avail.

I cannot figure out how to debug the problem, I've now spent days trying 
to investigate the problem.

$ hadoop jar 
./hadoop-2.6.4/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.4.jar 
wordcount input output

Below the output after about 190 reduce operations. As you can see the 
whole cluster comes down (login shell gets terminated as well). 
Sometimes the cluster is terminated after a handful of operations 
already.

As you can see from the dstat output while executing the wordcount 
example the memory is not a problem:

$ dstat -a -m

----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system-- 
------memory-usage-----
usr sys idl wai hiq siq| read  writ| recv  send|  in   out | int   csw | 
used  buff  cach  free
  97   3   0   0   0   0|   0  1316k|  18k  776B|   0     0 |3284    
10k|8396M  119M 4338M 18.7G
  96   4   0   0   0   0|   0     0 | 241B  152B|   0     0 |3182    
11k|8705M  119M 4339M 18.4G
  96   4   0   0   0   0|   0   936k| 582B  316B|   0     0 |3288    
15k|8251M  119M 4339M 18.9G
  96   4   0   0   0   0|  52k  628k| 518B  316B|   0     0 |3141    
10k|8326M  119M 4339M 18.8G
  97   3   0   0   0   0|   0    32k| 652B  450B|   0     0 |3081  9674 
|8490M  119M 4339M 18.7G
  97   3   0   0   0   0|  52k 1372k| 393B  292B|   0     0 |3115    
12k|8514M  119M 4340M 18.6G
  96   4   0   0   0   0|   0   184k|   0    70B|   0     0 |3107    
11k|8291M  119M 4340M 18.8G
  97   3   0   0   0   0|   0    72k| 847B  474B|   0     0 |3156  8630 
|8217M  119M 4344M 18.9G
  97   3   0   0   0   0|   0   144k| 546B  298B|   0     0 |3176  7967 
|8048M  119M 4345M 19.1G
  97   3   0   0   0   0|   0   836k| 399B  298B|   0     0 |3228    
11k|8231M  119M 4345M 18.9G
  97   3   0   0   0   0|   0  1448k|  70B   70B|   0     0 |3096    
16k|8469M  119M 4345M 18.7G
  96   4   0   0   0   0|   0    88k| 259B  228B|   0     0 |3197    
13k|8387M  119M 4346M 18.7G
  97   3   0   0   0   0|   0   144k| 917B  544B|   0     0 |3044  7261 
|8293M  119M 4346M 18.8G
  96   4   0   0   0   0|   0    72k| 259B  158B|   0     0 |3231    
12k|8342M  119M 4346M 18.8G
  97   3   0   0   0   0|   0   828k| 518B  316B|   0     0 |3091  9285 
|8828M  119M 4347M 18.3G
  97   3   0   0   0   0|   0  2884k|   0     0 |   0     0 |3082    
11k|8511M  119M 4347M 18.6G
  96   4   0   0   0   0|   0   188k| 259B  158B|   0     0 |3105    
24k|8303M  119M 4347M 18.8G
  96   4   0   0   0   0|   0   184k| 518B  316B|   0     0 |3194  9256 
|8161M  119M 4347M 19.0G
  97   3   0   0   0   0|  52k   40k| 259B  158B|   0     0 |3014  8124 
|8503M  119M 4348M 18.6G
  97   3   0   0   0   0|  52k  812k|   0     0 |   0     0 |3217    
23k|8531M  119M 4348M 18.6G
  95   4   0   0   0   0|   0  1692k| 329B  428B|   0     0 |3345    
18k|7954M  119M 4348M 19.2G
  17   5  78   0   0   0|   0   616k| 288B  213B|   0     0 |2349    
11k|3462M  119M 4347M 23.6G
   0   0  99   0   0   0|   0  1012k|   0     0 |   0     0 | 290   640 
|3462M  119M 4347M 23.6G
   0   0 100   0   0   0|   0     0 |   0     0 |   0     0 | 340   979 
|3462M  119M 4347M 23.6G


Here's the excerpt from yarn-hduser-nodemanager-hdmaster.log:

5642 2016-09-04 17:22:44,012 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: 
Start request for container_1473002371122_0001_01_000194 by user hduser
  5643 2016-09-04 17:22:44,012 INFO 
org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=hduser  
IP=10.10.10.10 OPERATION=Start Container Request   
TARGET=ContainerManageImpl RESULT=SUCCESS 
APPID=application_1473002371122_0001   
CONTAINERID=container_1473002371122_0001_01_000194
  5644 2016-09-04 17:22:44,015 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: 
Adding container_1473002371122_0001_01_000194 to application 
application_1473002371122_0001
  5645 2016-09-04 17:22:44,016 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: 
Container container_1473002371122_0001_01_000194 transitioned from NEW 
to LOCALIZING
  5646 2016-09-04 17:22:44,016 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: 
Got event CONTAINER_INIT for appId application_1473002371122_0001
  5647 2016-09-04 17:22:44,016 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: 
Got event APPLICATION_INIT for appId application_1473002371122_0001
  5648 2016-09-04 17:22:44,016 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: 
Got APPLICATION_INIT for service mapreduce_shuffle
  5649 2016-09-04 17:22:44,016 INFO 
org.apache.hadoop.mapred.ShuffleHandler: Added token for 
job_1473002371122_0001
  5650 2016-09-04 17:22:44,016 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: 
Container container_1473002371122_0001_01_000194 transitioned from 
LOCALIZING to LOCALIZED
  5651 2016-09-04 17:22:44,032 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: 
Start request for container_1473002371122_0001_01_000193 by user hduser
  5652 2016-09-04 17:22:44,036 INFO 
org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=hduser  
IP=10.10.10.10 OPERATION=Start Container Request   
TARGET=ContainerManageImpl RESULT=SUCCESS 
APPID=application_1473002371122_0001   
CONTAINERID=container_1473002371122_0001_01_000193
  5653 2016-09-04 17:22:44,037 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: 
Adding container_1473002371122_0001_01_000193 to application 
application_1473002371122_0001
  5654 2016-09-04 17:22:44,037 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: 
Container container_1473002371122_0001_01_000193 transitioned from NEW 
to LOCALIZING
  5655 2016-09-04 17:22:44,037 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: 
Got event CONTAINER_INIT for appId application_1473002371122_0001
  5656 2016-09-04 17:22:44,037 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: 
Got event APPLICATION_INIT for appId application_1473002371122_0001
  5657 2016-09-04 17:22:44,037 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: 
Got APPLICATION_INIT for service mapreduce_shuffle
  5658 2016-09-04 17:22:44,037 INFO 
org.apache.hadoop.mapred.ShuffleHandler: Added token for 
job_1473002371122_0001
  5659 2016-09-04 17:22:44,037 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: 
Container container_1473002371122_0001_01_000193 transitioned from 
LOCALIZING to LOCALIZED
  5660 2016-09-04 17:22:44,097 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: 
Container container_1473002371122_0001_01_000194 transitioned from 
LOCALIZED to RUNNING
  5661 2016-09-04 17:22:44,112 INFO 
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: 
launchContainer: [bash, 
/home/hduser/tmp/nm-local-dir/usercache/hduser/appcache/application_1473002371122_0001/container_1473002371122_0001_01_000194/default_container_executor.sh]
  5662 2016-09-04 17:22:44,118 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: 
Memory usage of ProcessTree 1354 for container-id 
container_1473002371122_0001_01_000179: 167.5 MB of 1 GB physical memory 
used; 683.8 MB of 6 GB virtual memory used
  5663 2016-09-04 17:22:44,188 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: 
Container container_1473002371122_0001_01_000193 transitioned from 
LOCALIZED to RUNNING
  5664 2016-09-04 17:22:44,191 INFO 
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: 
launchContainer: [bash, 
/home/hduser/tmp/nm-local-dir/usercache/hduser/appcache/application_1473002371122_0001/container_1473002371122_0001_01_000193/default_container_executor.sh]
  5665 2016-09-04 17:22:44,212 INFO 
SecurityLogger.org.apache.hadoop.ipc.Server: Auth successful for 
appattempt_1473002371122_0001_000001 (auth:SIMPLE)
  5666 2016-09-04 17:22:44,213 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: 
Stopping container with container Id: 
container_1473002371122_0001_01_000174
  5667 2016-09-04 17:22:44,213 INFO 
org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=hduser  
IP=10.10.10.10 OPERATION=Stop Container Request 
TARGET=ContainerManageImpl RESULT=SUCCESS 
APPID=application_1473002371122_0001   
CONTAINERID=container_1473002371122_0001_01_000174
  5668 2016-09-04 17:22:44,218 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: 
Container container_1473002371122_0001_01_000174 transitioned from 
RUNNING to KILLING
  5669 2016-09-04 17:22:44,218 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: 
Cleaning up container container_1473002371122_0001_01_000174
  5670 2016-09-04 17:22:44,245 WARN 
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit 
code from container container_1473002371122_0001_01_000181 is : 143
  5671 2016-09-04 17:22:44,246 WARN 
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit 
code from container container_1473002371122_0001_01_000183 is : 143
  5672 2016-09-04 17:22:44,246 WARN 
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit 
code from container container_1473002371122_0001_01_000189 is : 143
  5673 2016-09-04 17:22:44,246 WARN 
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit 
code from container container_1473002371122_0001_01_000191 is : 143
  5674 2016-09-04 17:22:44,247 WARN 
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit 
code from container container_1473002371122_0001_01_000190 is : 143
  5675 2016-09-04 17:22:44,247 WARN 
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit 
code from container container_1473002371122_0001_01_000182 is : 143
  5676 2016-09-04 17:22:44,248 WARN 
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit 
code from container container_1473002371122_0001_01_000186 is : 143
  5677 2016-09-04 17:22:44,252 WARN 
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit 
code from container container_1473002371122_0001_01_000192 is : 143
  5678 2016-09-04 17:22:44,255 WARN 
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit 
code from container container_1473002371122_0001_01_000180 is : 143
  5679 2016-09-04 17:22:44,255 WARN 
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit 
code from container container_1473002371122_0001_01_000187 is : 143
  5680 2016-09-04 17:22:44,270 WARN 
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit 
code from container container_1473002371122_0001_01_000185 is : 143
  5681 2016-09-04 17:22:44,281 WARN 
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit 
code from container container_1473002371122_0001_01_000184 is : 143
  5682 2016-09-04 17:22:44,281 WARN 
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit 
code from container container_1473002371122_0001_01_000188 is : 143
  5683 2016-09-04 17:22:44,281 WARN 
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit 
code from container container_1473002371122_0001_01_000179 is : 143
  5684 2016-09-04 17:22:44,282 WARN 
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit 
code from container container_1473002371122_0001_01_000001 is : 143
  5685 2016-09-04 17:22:44,282 WARN 
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit 
code from container container_1473002371122_0001_01_000194 is : 143
  5686 2016-09-04 17:22:44,296 WARN 
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit 
code from container container_1473002371122_0001_01_000178 is : 143
  5687 2016-09-04 17:22:44,296 WARN 
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit 
code from container container_1473002371122_0001_01_000175 is : 143
  5688 2016-09-04 17:22:44,307 WARN 
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit 
code from container container_1473002371122_0001_01_000176 is : 143
  5689 2016-09-04 17:22:44,308 WARN 
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit 
code from container container_1473002371122_0001_01_000174 is : 143
  5690 2016-09-04 17:22:44,316 WARN 
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit 
code from container container_1473002371122_0001_01_000177 is : 143
  5691 2016-09-04 17:22:44,323 ERROR 
org.apache.hadoop.yarn.server.nodemanager.NodeManager: RECEIVED SIGNAL 
15: SIGTERM
  5692 2016-09-04 17:22:44,348 WARN 
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit 
code from container container_1473002371122_0001_01_000076 is : 143
  5693 2016-09-04 17:22:44,383 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: 
Container container_1473002371122_0001_01_000181 transitioned from 
RUNNING to EXITED_WITH_FAILURE
  5694 2016-09-04 17:22:44,383 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: 
Container container_1473002371122_0001_01_000189 transitioned from 
RUNNING to EXITED_WITH_FAILURE
  5695 2016-09-04 17:22:44,383 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: 
Container container_1473002371122_0001_01_000183 transitioned from 
RUNNING to EXITED_WITH_FAILURE
  5696 2016-09-04 17:22:44,383 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: 
Container container_1473002371122_0001_01_000191 transitioned from 
RUNNING to EXITED_WITH_FAILURE
  5697 2016-09-04 17:22:44,383 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: 
Container container_1473002371122_0001_01_000190 transitioned from 
RUNNING to EXITED_WITH_FAILURE
  5698 2016-09-04 17:22:44,383 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: 
Container container_1473002371122_0001_01_000182 transitioned from 
RUNNING to EXITED_WITH_FAILURE
  5699 2016-09-04 17:22:44,384 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: 
Container container_1473002371122_0001_01_000186 transitioned from 
RUNNING to EXITED_WITH_FAILURE
  5700 2016-09-04 17:22:44,384 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: 
Container container_1473002371122_0001_01_000192 transitioned from 
RUNNING to EXITED_WITH_FAILURE
  5701 2016-09-04 17:22:44,384 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: 
Container container_1473002371122_0001_01_000180 transitioned from 
RUNNING to EXITED_WITH_FAILURE
  5702 2016-09-04 17:22:44,384 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: 
Container container_1473002371122_0001_01_000187 transitioned from 
RUNNING to EXITED_WITH_FAILURE
  5703 2016-09-04 17:22:44,384 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: 
Container container_1473002371122_0001_01_000185 transitioned from 
RUNNING to EXITED_WITH_FAILURE
  5704 2016-09-04 17:22:44,384 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: 
Container container_1473002371122_0001_01_000184 transitioned from 
RUNNING to EXITED_WITH_FAILURE
  5705 2016-09-04 17:22:44,384 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: 
Container container_1473002371122_0001_01_000188 transitioned from 
RUNNING to EXITED_WITH_FAILURE
  5706 2016-09-04 17:22:44,384 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: 
Container container_1473002371122_0001_01_000179 transitioned from 
RUNNING to EXITED_WITH_FAILURE
  5707 2016-09-04 17:22:44,384 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: 
Container container_1473002371122_0001_01_000001 transitioned from 
RUNNING to EXITED_WITH_FAILURE
  5708 2016-09-04 17:22:44,384 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: 
Container container_1473002371122_0001_01_000194 transitioned from 
RUNNING to EXITED_WITH_FAILURE
  5709 2016-09-04 17:22:44,384 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: 
Container container_1473002371122_0001_01_000178 transitioned from 
RUNNING to EXITED_WITH_FAILURE
  5710 2016-09-04 17:22:44,384 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: 
Container container_1473002371122_0001_01_000175 transitioned from 
RUNNING to EXITED_WITH_FAILURE
  5711 2016-09-04 17:22:44,384 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: 
Container container_1473002371122_0001_01_000176 transitioned from 
RUNNING to EXITED_WITH_FAILURE
  5712 2016-09-04 17:22:44,385 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: 
Container container_1473002371122_0001_01_000174 transitioned from 
KILLING to CONTAINER_CLEANEDUP_AFTER_KILL
  5713 2016-09-04 17:22:44,385 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: 
Container container_1473002371122_0001_01_000177 transitioned from 
RUNNING to EXITED_WITH_FAILURE
  5714 2016-09-04 17:22:44,385 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: 
Container container_1473002371122_0001_01_000076 transitioned from 
RUNNING to EXITED_WITH_FAILURE
  5715 2016-09-04 17:22:44,385 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: 
Cleaning up container container_1473002371122_0001_01_000181
  5716 2016-09-04 17:22:44,389 WARN 
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit 
code from container container_1473002371122_0001_01_000193 is : 143
  5717 2016-09-04 17:22:44,394 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: 
Memory usage of ProcessTree 1579 for container-id 
container_1473002371122_0001_01_000183: 135.3 MB of 1 GB physical memory 
used; 682.6 MB of 6 GB virtual memory used
  5718 2016-09-04 17:22:44,399 ERROR 
org.apache.hadoop.yarn.server.nodemanager.NodeManager: RECEIVED SIGNAL 
15: SIGTERM
  5719 2016-09-04 17:22:44,407 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: 
Memory usage of ProcessTree 1722 for container-id 
container_1473002371122_0001_01_000186: 0B of 1 GB physical memory used; 
0B of 6 GB virtual memory used
  5720 2016-09-04 17:22:44,414 INFO org.mortbay.log: Stopped 
HttpServer2$SelectChannelConnectorWithSafeStartup@0.0.0.0:8042
  5721 2016-09-04 17:22:44,415 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: 
Cleaning up container container_1473002371122_0001_01_000189
  5722 2016-09-04 17:22:44,420 ERROR 
org.apache.hadoop.yarn.server.nodemanager.NodeManager: RECEIVED SIGNAL 
15: SIGTERM
  5723 2016-09-04 17:22:44,420 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: 
Memory usage of ProcessTree 1265 for container-id 
container_1473002371122_0001_01_000176: 0B of 1 GB physical memory used; 
0B of 6 GB virtual memory used


Here's the output from running:

$ hadoop org.apache.hadoop.conf.Configuration


<?xml version="1.0" encoding="UTF-8" standalone="no"?><configuration>
<property><name>ha.failover-controller.cli-check.rpc-timeout.ms</name><value>20000</value><source>core-default.xml</source></property>
<property><name>ipc.client.connect.max.retries.on.timeouts</name><value>45</value><source>core-default.xml</source></property>
<property><name>hadoop.user.group.static.mapping.overrides</name><value>dr.who=;</value><source>core-default.xml</source></property>
<property><name>hadoop.tmp.dir</name><value>/home/hduser/tmp</value><source>core-site.xml</source></property>
<property><name>hadoop.security.java.secure.random.algorithm</name><value>SHA1PRNG</value><source>core-default.xml</source></property>
<property><name>nfs.exports.allowed.hosts</name><value>* 
rw</value><source>core-default.xml</source></property>
<property><name>ha.health-monitor.check-interval.ms</name><value>1000</value><source>core-default.xml</source></property>
<property><name>ipc.client.idlethreshold</name><value>4000</value><source>core-default.xml</source></property>
<property><name>fs.trash.checkpoint.interval</name><value>0</value><source>core-default.xml</source></property>
<property><name>io.skip.checksum.errors</name><value>false</value><source>core-default.xml</source></property>
<property><name>hadoop.security.groups.negative-cache.secs</name><value>30</value><source>core-default.xml</source></property>
<property><name>fs.har.impl.disable.cache</name><value>true</value><source>core-default.xml</source></property>
<property><name>fs.defaultFS</name><value>hdfs://hdmaster:54310</value><source>core-site.xml</source></property>
<property><name>fs.client.resolve.remote.symlinks</name><value>true</value><source>core-default.xml</source></property>
<property><name>hadoop.rpc.socket.factory.class.default</name><value>org.apache.hadoop.net.StandardSocketFactory</value><source>core-default.xml</source></property>
<property><name>hadoop.security.kms.client.authentication.retry-count</name><value>1</value><source>core-default.xml</source></property>
<property><name>io.mapfile.bloom.size</name><value>1048576</value><source>core-default.xml</source></property>
<property><name>hadoop.rpc.protection</name><value>authentication</value><source>core-default.xml</source></property>
<property><name>net.topology.impl</name><value>org.apache.hadoop.net.NetworkTopology</value><source>core-default.xml</source></property>
<property><name>hadoop.ssl.require.client.cert</name><value>false</value><source>core-default.xml</source></property>
<property><name>io.bytes.per.checksum</name><value>512</value><source>core-default.xml</source></property>
<property><name>file.stream-buffer-size</name><value>4096</value><source>core-default.xml</source></property>
<property><name>ha.failover-controller.new-active.rpc-timeout.ms</name><value>60000</value><source>core-default.xml</source></property>
<property><name>ha.zookeeper.acl</name><value>world:anyone:rwcda</value><source>core-default.xml</source></property>
<property><name>fs.ftp.host.port</name><value>21</value><source>core-default.xml</source></property>
<property><name>hadoop.security.group.mapping</name><value>org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback</value><source>core-default.xml</source></property>
<property><name>hadoop.ssl.keystores.factory.class</name><value>org.apache.hadoop.security.ssl.FileBasedKeyStoresFactory</value><source>core-default.xml</source></property>
<property><name>s3.replication</name><value>3</value><source>core-default.xml</source></property>
<property><name>net.topology.node.switch.mapping.impl</name><value>org.apache.hadoop.net.ScriptBasedMapping</value><source>core-default.xml</source></property>
<property><name>fs.s3.buffer.dir</name><value>${hadoop.tmp.dir}/s3</value><source>core-default.xml</source></property>
<property><name>s3native.bytes-per-checksum</name><value>512</value><source>core-default.xml</source></property>
<property><name>fs.s3a.multipart.purge</name><value>false</value><source>core-default.xml</source></property>
<property><name>s3.client-write-packet-size</name><value>65536</value><source>core-default.xml</source></property>
<property><name>io.mapfile.bloom.error.rate</name><value>0.005</value><source>core-default.xml</source></property>
<property><name>ftp.bytes-per-checksum</name><value>512</value><source>core-default.xml</source></property>
<property><name>hadoop.security.group.mapping.ldap.search.attr.group.name</name><value>cn</value><source>core-default.xml</source></property>
<property><name>ha.health-monitor.rpc-timeout.ms</name><value>45000</value><source>core-default.xml</source></property>
<property><name>hadoop.security.authorization</name><value>false</value><source>core-default.xml</source></property>
<property><name>s3.bytes-per-checksum</name><value>512</value><source>core-default.xml</source></property>
<property><name>fs.s3n.multipart.uploads.block.size</name><value>67108864</value><source>core-default.xml</source></property>
<property><name>ipc.client.fallback-to-simple-auth-allowed</name><value>false</value><source>core-default.xml</source></property>
<property><name>ipc.server.listen.queue.size</name><value>128</value><source>core-default.xml</source></property>
<property><name>hadoop.ssl.enabled.protocols</name><value>TLSv1</value><source>core-default.xml</source></property>
<property><name>hadoop.security.kms.client.encrypted.key.cache.low-watermark</name><value>0.3f</value><source>core-default.xml</source></property>
<property><name>s3native.blocksize</name><value>67108864</value><source>core-default.xml</source></property>
<property><name>file.replication</name><value>1</value><source>core-default.xml</source></property>
<property><name>ftp.client-write-packet-size</name><value>65536</value><source>core-default.xml</source></property>
<property><name>hadoop.work.around.non.threadsafe.getpwuid</name><value>false</value><source>core-default.xml</source></property>
<property><name>fs.du.interval</name><value>600000</value><source>core-default.xml</source></property>
<property><name>hadoop.http.authentication.type</name><value>simple</value><source>core-default.xml</source></property>
<property><name>hadoop.http.staticuser.user</name><value>dr.who</value><source>core-default.xml</source></property>
<property><name>hadoop.util.hash.type</name><value>murmur</value><source>core-default.xml</source></property>
<property><name>hadoop.security.instrumentation.requires.admin</name><value>false</value><source>core-default.xml</source></property>
<property><name>hadoop.security.kms.client.encrypted.key.cache.size</name><value>500</value><source>core-default.xml</source></property>
<property><name>fs.s3a.connection.maximum</name><value>15</value><source>core-default.xml</source></property>
<property><name>fs.s3a.attempts.maximum</name><value>10</value><source>core-default.xml</source></property>
<property><name>io.map.index.interval</name><value>128</value><source>core-default.xml</source></property>
<property><name>hadoop.ssl.client.conf</name><value>ssl-client.xml</value><source>core-default.xml</source></property>
<property><name>hadoop.security.kms.client.encrypted.key.cache.expiry</name><value>43200000</value><source>core-default.xml</source></property>
<property><name>hadoop.kerberos.kinit.command</name><value>kinit</value><source>core-default.xml</source></property>
<property><name>fs.AbstractFileSystem.hdfs.impl</name><value>org.apache.hadoop.fs.Hdfs</value><source>core-default.xml</source></property>
<property><name>io.map.index.skip</name><value>0</value><source>core-default.xml</source></property>
<property><name>hadoop.http.authentication.token.validity</name><value>36000</value><source>core-default.xml</source></property>
<property><name>hadoop.jetty.logs.serve.aliases</name><value>true</value><source>core-default.xml</source></property>
<property><name>ftp.replication</name><value>3</value><source>core-default.xml</source></property>
<property><name>io.compression.codec.bzip2.library</name><value>system-native</value><source>core-default.xml</source></property>
<property><name>ha.failover-controller.graceful-fence.connection.retries</name><value>1</value><source>core-default.xml</source></property>
<property><name>fs.swift.impl</name><value>org.apache.hadoop.fs.swift.snative.SwiftNativeFileSystem</value><source>core-default.xml</source></property>
<property><name>ha.health-monitor.sleep-after-disconnect.ms</name><value>1000</value><source>core-default.xml</source></property>
<property><name>fs.s3a.connection.timeout</name><value>5000</value><source>core-default.xml</source></property>
<property><name>ipc.client.rpc-timeout.ms</name><value>0</value><source>core-default.xml</source></property>
<property><name>file.client-write-packet-size</name><value>65536</value><source>core-default.xml</source></property>
<property><name>fs.AbstractFileSystem.viewfs.impl</name><value>org.apache.hadoop.fs.viewfs.ViewFs</value><source>core-default.xml</source></property>
<property><name>hadoop.security.group.mapping.ldap.search.filter.group</name><value>(objectClass=group)</value><source>core-default.xml</source></property>
<property><name>hadoop.security.crypto.codec.classes.aes.ctr.nopadding</name><value>org.apache.hadoop.crypto.OpensslAesCtrCryptoCodec,org.apache.hadoop.crypto.JceAesCtrCryptoCodec</value><source>core-default.xml</source></property>
<property><name>fs.s3n.block.size</name><value>67108864</value><source>core-default.xml</source></property>
<property><name>hadoop.security.crypto.cipher.suite</name><value>AES/CTR/NoPadding</value><source>core-default.xml</source></property>
<property><name>net.topology.script.number.args</name><value>100</value><source>core-default.xml</source></property>
<property><name>dfs.ha.fencing.ssh.connect-timeout</name><value>30000</value><source>core-default.xml</source></property>
<property><name>hadoop.security.authentication</name><value>simple</value><source>core-default.xml</source></property>
<property><name>tfile.fs.output.buffer.size</name><value>262144</value><source>core-default.xml</source></property>
<property><name>hadoop.security.groups.cache.secs</name><value>300</value><source>core-default.xml</source></property>
<property><name>ha.failover-controller.graceful-fence.rpc-timeout.ms</name><value>5000</value><source>core-default.xml</source></property>
<property><name>fs.AbstractFileSystem.file.impl</name><value>org.apache.hadoop.fs.local.LocalFs</value><source>core-default.xml</source></property>
<property><name>fs.s3a.impl</name><value>org.apache.hadoop.fs.s3a.S3AFileSystem</value><source>core-default.xml</source></property>
<property><name>ha.health-monitor.connect-retry-interval.ms</name><value>1000</value><source>core-default.xml</source></property>
<property><name>fs.s3a.multipart.threshold</name><value>2147483647</value><source>core-default.xml</source></property>
<property><name>fs.s3.maxRetries</name><value>4</value><source>core-default.xml</source></property>
<property><name>fs.s3n.multipart.uploads.enabled</name><value>false</value><source>core-default.xml</source></property>
<property><name>hadoop.security.group.mapping.ldap.directory.search.timeout</name><value>10000</value><source>core-default.xml</source></property>
<property><name>file.blocksize</name><value>67108864</value><source>core-default.xml</source></property>
<property><name>fs.ftp.host</name><value>0.0.0.0</value><source>core-default.xml</source></property>
<property><name>file.bytes-per-checksum</name><value>512</value><source>core-default.xml</source></property>
<property><name>ha.zookeeper.parent-znode</name><value>/hadoop-ha</value><source>core-default.xml</source></property>
<property><name>fs.s3a.multipart.size</name><value>104857600</value><source>core-default.xml</source></property>
<property><name>fs.s3a.multipart.purge.age</name><value>86400</value><source>core-default.xml</source></property>
<property><name>fs.s3n.multipart.copy.block.size</name><value>5368709120</value><source>core-default.xml</source></property>
<property><name>fs.trash.interval</name><value>0</value><source>core-default.xml</source></property>
<property><name>fs.s3.sleepTimeSeconds</name><value>10</value><source>core-default.xml</source></property>
<property><name>rpc.metrics.quantile.enable</name><value>false</value><source>core-default.xml</source></property>
<property><name>ftp.stream-buffer-size</name><value>4096</value><source>core-default.xml</source></property>
<property><name>hadoop.http.authentication.signature.secret.file</name><value>${user.home}/hadoop-http-auth-signature-secret</value><source>core-default.xml</source></property>
<property><name>io.seqfile.sorter.recordlimit</name><value>1000000</value><source>core-default.xml</source></property>
<property><name>s3.blocksize</name><value>67108864</value><source>core-default.xml</source></property>
<property><name>fs.permissions.umask-mode</name><value>022</value><source>core-default.xml</source></property>
<property><name>hadoop.ssl.server.conf</name><value>ssl-server.xml</value><source>core-default.xml</source></property>
<property><name>fs.s3a.connection.ssl.enabled</name><value>true</value><source>core-default.xml</source></property>
<property><name>fs.s3a.buffer.dir</name><value>${hadoop.tmp.dir}/s3a</value><source>core-default.xml</source></property>
<property><name>s3native.stream-buffer-size</name><value>4096</value><source>core-default.xml</source></property>
<property><name>hadoop.security.groups.cache.warn.after.ms</name><value>5000</value><source>core-default.xml</source></property>
<property><name>hadoop.http.authentication.kerberos.principal</name><value>HTTP/_HOST@LOCALHOST</value><source>core-default.xml</source></property>
<property><name>hadoop.security.kms.client.encrypted.key.cache.num.refill.threads</name><value>2</value><source>core-default.xml</source></property>
<property><name>hadoop.security.group.mapping.ldap.search.filter.user</name><value>(&amp;(objectClass=user)(sAMAccountName={0}))</value><source>core-default.xml</source></property>
<property><name>fs.automatic.close</name><value>true</value><source>core-default.xml</source></property>
<property><name>ipc.client.connect.retry.interval</name><value>1000</value><source>core-default.xml</source></property>
<property><name>fs.s3a.paging.maximum</name><value>5000</value><source>core-default.xml</source></property>
<property><name>s3.stream-buffer-size</name><value>4096</value><source>core-default.xml</source></property>
<property><name>ha.zookeeper.session-timeout.ms</name><value>5000</value><source>core-default.xml</source></property>
<property><name>fs.AbstractFileSystem.har.impl</name><value>org.apache.hadoop.fs.HarFs</value><source>core-default.xml</source></property>
<property><name>io.seqfile.compress.blocksize</name><value>1000000</value><source>core-default.xml</source></property>
<property><name>hadoop.http.filter.initializers</name><value>org.apache.hadoop.http.lib.StaticUserWebFilter</value><source>core-default.xml</source></property>
<property><name>fs.s3.block.size</name><value>67108864</value><source>core-default.xml</source></property>
<property><name>hadoop.http.authentication.simple.anonymous.allowed</name><value>true</value><source>core-default.xml</source></property>
<property><name>ftp.blocksize</name><value>67108864</value><source>core-default.xml</source></property>
<property><name>io.seqfile.lazydecompress</name><value>true</value><source>core-default.xml</source></property>
<property><name>hadoop.ssl.enabled</name><value>false</value><source>core-default.xml</source></property>
<property><name>hadoop.common.configuration.version</name><value>0.23.0</value><source>core-default.xml</source></property>
<property><name>hadoop.security.group.mapping.ldap.search.attr.member</name><value>member</value><source>core-default.xml</source></property>
<property><name>hadoop.security.random.device.file.path</name><value>/dev/urandom</value><source>core-default.xml</source></property>
<property><name>ipc.client.connection.maxidletime</name><value>10000</value><source>core-default.xml</source></property>
<property><name>ipc.client.connect.timeout</name><value>20000</value><source>core-default.xml</source></property>
<property><name>hadoop.security.uid.cache.secs</name><value>14400</value><source>core-default.xml</source></property>
<property><name>ipc.client.ping</name><value>true</value><source>core-default.xml</source></property>
<property><name>ipc.client.kill.max</name><value>10</value><source>core-default.xml</source></property>
<property><name>ipc.client.connect.max.retries</name><value>10</value><source>core-default.xml</source></property>
<property><name>ipc.ping.interval</name><value>60000</value><source>core-default.xml</source></property>
<property><name>io.seqfile.local.dir</name><value>${hadoop.tmp.dir}/io/local</value><source>core-default.xml</source></property>
<property><name>hadoop.security.crypto.buffer.size</name><value>8192</value><source>core-default.xml</source></property>
<property><name>io.native.lib.available</name><value>true</value><source>core-default.xml</source></property>
<property><name>io.file.buffer.size</name><value>4096</value><source>core-default.xml</source></property>
<property><name>io.serializations</name><value>org.apache.hadoop.io.serializer.WritableSerialization,org.apache.hadoop.io.serializer.avro.AvroSpecificSerialization,org.apache.hadoop.io.serializer.avro.AvroReflectSerialization</value><source>core-default.xml</source></property>
<property><name>tfile.fs.input.buffer.size</name><value>262144</value><source>core-default.xml</source></property>
<property><name>hadoop.security.group.mapping.ldap.ssl</name><value>false</value><source>core-default.xml</source></property>
<property><name>fs.df.interval</name><value>60000</value><source>core-default.xml</source></property>
<property><name>hadoop.http.authentication.kerberos.keytab</name><value>${user.home}/hadoop.keytab</value><source>core-default.xml</source></property>
<property><name>s3native.client-write-packet-size</name><value>65536</value><source>core-default.xml</source></property>
<property><name>s3native.replication</name><value>3</value><source>core-default.xml</source></property>
<property><name>tfile.io.chunk.size</name><value>1048576</value><source>core-default.xml</source></property>
<property><name>hadoop.ssl.hostname.verifier</name><value>DEFAULT</value><source>core-default.xml</source></property>



---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@hadoop.apache.org
For additional commands, e-mail: user-help@hadoop.apache.org


Mime
View raw message