hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From sunww <spe...@outlook.com>
Subject container fail after nodemanager restart
Date Wed, 16 Dec 2015 03:07:33 GMT
HI
    I'm using hadoop 2.7.1 witch kerberos enabled. After I restart a nodemanager, some of
the nodemanager's containers  sometimes failed.
    I find some error log in the nodemanage. Any suggestion will be appreciated. Thanks.
    And the container-executor is like this:
    ---Sr-s--- 1 root hadoop 114398 Oct  1 02:31 container-executor
    
2015-12-16 09:06:55,478 INFO  nodemanager.ContainerExecutor (ContainerExecutor.java:logOutput(286))
- The configured nodemanager group 1001 is different from the group of the executable 0
2015-12-16 09:06:55,479 ERROR launcher.RecoveredContainerLaunch (RecoveredContainerLaunch.java:call(88))
- Unable to recover container container_e14_1449136946007_0006_01_000010
java.io.IOException: Problem signalling container 967 with NULL; output: The configured nodemanager
group 1001 is different from the group of the executable 0
 and exitCode: 22
        at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.signalContainer(LinuxContainerExecutor.java:483)
        at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.isContainerProcessAlive(LinuxContainerExecutor.java:538)
        at org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor.reacquireContainer(ContainerExecutor.java:182)
        at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.reacquireContainer(LinuxContainerExecutor.java:441)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.RecoveredContainerLaunch.call(RecoveredContainerLaunch.java:83)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.RecoveredContainerLaunch.call(RecoveredContainerLaunch.java:46)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
Caused by: ExitCodeException exitCode=22: Invalid permissions on container-executor binary.

        at org.apache.hadoop.util.Shell.runCommand(Shell.java:576)
        at org.apache.hadoop.util.Shell.run(Shell.java:487)
        at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:753)
        at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.signalContainer(LinuxContainerExecutor.java:474)
        ... 9 more
 		 	   		  
Mime
View raw message