hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eric Yang (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-3112) Calling hadoop cli inside mapreduce job leads to errors
Date Wed, 28 Sep 2011 15:48:45 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-3112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13116551#comment-13116551
] 

Eric Yang commented on MAPREDUCE-3112:
--------------------------------------

In previous release of HADOOP, we don't have this problem because we are always reconstructing
HADOOP_OPTS from scratch in the invoking process.  hadoop.log.dir is setup by the parent process
to ensure the output are redirected properly to the desired location.  This change was done
as part of request from HCatalog to have ability to override the HADOOP_OPTS.  HCatalog's
request may be supported by changing HADOOP_OPTS overrides to HADOOP_USER_OPTS, and make HADOOP_USER_OPTS
as the prefix of HADOOP_OPTS.

In streaming job, we should unset HADOOP_ROOT_LOGGER environment variable to ensure hadoop
command invoked in streaming job is output to console which gets redirected to TaskLogAppender
by the task attempt.
                
> Calling hadoop cli inside mapreduce job leads to errors
> -------------------------------------------------------
>
>                 Key: MAPREDUCE-3112
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3112
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: contrib/streaming
>    Affects Versions: 0.20.205.0
>         Environment: Java, Linux
>            Reporter: Eric Yang
>            Assignee: Eric Yang
>             Fix For: 0.20.205.0
>
>
> When running a streaming job with mapper
> bin/hadoop --config /etc/hadoop/ jar contrib/streaming/hadoop-streaming-0.20.205.0.jar
-mapper "hadoop --config /etc/hadoop/ dfs -help" -reducer NONE -input "/tmp/input.txt" -output
NONE
> Task log shows:
> {noformat}
> Exception in thread "main" java.lang.ExceptionInInitializerError
> 	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:57)
> 	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
> 	at org.apache.hadoop.fs.FsShell.main(FsShell.java:1895)
> Caused by: org.apache.commons.logging.LogConfigurationException: User-specified log class
'org.apache.commons.logging.impl.Log4JLogger' cannot be found or is not useable.
> 	at org.apache.commons.logging.impl.LogFactoryImpl.discoverLogImplementation(LogFactoryImpl.java:874)
> 	at org.apache.commons.logging.impl.LogFactoryImpl.newInstance(LogFactoryImpl.java:604)
> 	at org.apache.commons.logging.impl.LogFactoryImpl.getInstance(LogFactoryImpl.java:336)
> 	at org.apache.commons.logging.impl.LogFactoryImpl.getInstance(LogFactoryImpl.java:310)
> 	at org.apache.commons.logging.LogFactory.getLog(LogFactory.java:685)
> 	at org.apache.hadoop.conf.Configuration.<clinit>(Configuration.java:142)
> 	... 3 more
> java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code
1
> 	at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:311)
> 	at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:545)
> 	at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:132)
> 	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
> 	at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:36)
> 	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
> 	at org.apache.hadoop.mapred.Child$4.run(Child.java:261)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:396)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
> 	at org.apache.hadoop.mapred.Child.main(Child.java:255)
> {noformat}
> Upon inspection, there are two problems in the inherited from environment which prevent
the logger initialization to work properly.  In hadoop-env.sh, the HADOOP_OPTS is inherited
from the parent process.  This configuration was requested by user to have a way to override
HADOOP environment in the configuration template:
> {noformat}
> export HADOOP_OPTS="-Djava.net.preferIPv4Stack=true $HADOOP_OPTS"
> {noformat}
> -Dhadoop.log.dir=$HADOOP_LOG_DIR/task_tracker_user is injected into HADOOP_OPTS in the
tasktracker environment.  Hence, the running task would inherit the wrong logging directory,
which the end user might not have sufficient access to write.  Second, $HADOOP_ROOT_LOGGER
is override to: -Dhadoop.root.logger=INFO,TLA by the task controller, therefore, the bin/hadoop
script will attempt to use hadoop.root.logger=INFO,TLA, but fail to initialize.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message