hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matt Foley (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-3112) Calling hadoop cli inside mapreduce job leads to errors
Date Wed, 28 Sep 2011 10:07:45 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-3112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13116318#comment-13116318
] 

Matt Foley commented on MAPREDUCE-3112:
---------------------------------------

If HADOOP_OPTS is viewed as a dictionary for sharing key/value pairs among Hadoop processes,
then it seems that "hadoop.log.dir" should not be in HADOOP_OPTS.  Either:
* all processes can continue to use the name "hadoop.log.dir" for this parameter, but not
share it through HADOOP_OPTS.  Instead, sets of processes that need to share this value can
share it through some other mechanism, perhaps a <PROCESS_SET_X>_SHARED_LOG parameter
list, where each such process set have a differently named list; or
* each set of processes that CAN share a value for log location should have its own name for
the log location parameter, such as "hadoop.log.dir" and "tasktracker.log.dir".  Then all
(or none) of these parameters could be shared in HADOOP_OPTS.

                
> Calling hadoop cli inside mapreduce job leads to errors
> -------------------------------------------------------
>
>                 Key: MAPREDUCE-3112
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3112
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: contrib/streaming
>    Affects Versions: 0.20.205.0
>         Environment: Java, Linux
>            Reporter: Eric Yang
>            Assignee: Eric Yang
>             Fix For: 0.20.205.0
>
>
> When running a streaming job with mapper
> bin/hadoop --config /etc/hadoop/ jar contrib/streaming/hadoop-streaming-0.20.205.0.jar
-mapper "hadoop --config /etc/hadoop/ dfs -help" -reducer NONE -input "/tmp/input.txt" -output
NONE
> Task log shows:
> {noformat}
> Exception in thread "main" java.lang.ExceptionInInitializerError
> 	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:57)
> 	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
> 	at org.apache.hadoop.fs.FsShell.main(FsShell.java:1895)
> Caused by: org.apache.commons.logging.LogConfigurationException: User-specified log class
'org.apache.commons.logging.impl.Log4JLogger' cannot be found or is not useable.
> 	at org.apache.commons.logging.impl.LogFactoryImpl.discoverLogImplementation(LogFactoryImpl.java:874)
> 	at org.apache.commons.logging.impl.LogFactoryImpl.newInstance(LogFactoryImpl.java:604)
> 	at org.apache.commons.logging.impl.LogFactoryImpl.getInstance(LogFactoryImpl.java:336)
> 	at org.apache.commons.logging.impl.LogFactoryImpl.getInstance(LogFactoryImpl.java:310)
> 	at org.apache.commons.logging.LogFactory.getLog(LogFactory.java:685)
> 	at org.apache.hadoop.conf.Configuration.<clinit>(Configuration.java:142)
> 	... 3 more
> java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code
1
> 	at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:311)
> 	at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:545)
> 	at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:132)
> 	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
> 	at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:36)
> 	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
> 	at org.apache.hadoop.mapred.Child$4.run(Child.java:261)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:396)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
> 	at org.apache.hadoop.mapred.Child.main(Child.java:255)
> {noformat}
> Upon inspection, there are two problems in the inherited from environment which prevent
the logger initialization to work properly.  In hadoop-env.sh, the HADOOP_OPTS is inherited
from the parent process.  This configuration was requested by user to have a way to override
HADOOP environment in the configuration template:
> {noformat}
> export HADOOP_OPTS="-Djava.net.preferIPv4Stack=true $HADOOP_OPTS"
> {noformat}
> -Dhadoop.log.dir=$HADOOP_LOG_DIR/task_tracker_user is injected into HADOOP_OPTS in the
tasktracker environment.  Hence, the running task would inherit the wrong logging directory,
which the end user might not have sufficient access to write.  Second, $HADOOP_ROOT_LOGGER
is override to: -Dhadoop.root.logger=INFO,TLA by the task controller, therefore, the bin/hadoop
script will attempt to use hadoop.root.logger=INFO,TLA, but fail to initialize.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message