hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hudson (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-3112) Calling hadoop cli inside mapreduce job leads to errors
Date Tue, 04 Oct 2011 13:37:35 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-3112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13120108#comment-13120108
] 

Hudson commented on MAPREDUCE-3112:
-----------------------------------

Integrated in Hadoop-Hdfs-0.23-Build #29 (See [https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/29/])
    MAPREDUCE-3112. Fixed recursive sourcing of HADOOP_OPTS environment
variable. (Eric Yang)

eyang : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1178658
Files : 
* /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/main/packages/templates/conf/hadoop-env.sh
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/contrib/streaming/src/java/org/apache/hadoop/streaming/StreamJob.java

                
> Calling hadoop cli inside mapreduce job leads to errors
> -------------------------------------------------------
>
>                 Key: MAPREDUCE-3112
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3112
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: contrib/streaming
>    Affects Versions: 0.20.205.0, 0.23.0
>         Environment: Java, Linux
>            Reporter: Eric Yang
>            Assignee: Eric Yang
>             Fix For: 0.20.205.0, 0.23.0
>
>         Attachments: HAPREDUCE-3112-1.patch, MAPREDUCE-3112-trunk-2.patch, MAPREDUCE-3112-trunk.patch,
MAPREDUCE-3112.patch
>
>
> When running a streaming job with mapper
> bin/hadoop --config /etc/hadoop/ jar contrib/streaming/hadoop-streaming-0.20.205.0.jar
-mapper "hadoop --config /etc/hadoop/ dfs -help" -reducer NONE -input "/tmp/input.txt" -output
NONE
> Task log shows:
> {noformat}
> Exception in thread "main" java.lang.ExceptionInInitializerError
> 	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:57)
> 	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
> 	at org.apache.hadoop.fs.FsShell.main(FsShell.java:1895)
> Caused by: org.apache.commons.logging.LogConfigurationException: User-specified log class
'org.apache.commons.logging.impl.Log4JLogger' cannot be found or is not useable.
> 	at org.apache.commons.logging.impl.LogFactoryImpl.discoverLogImplementation(LogFactoryImpl.java:874)
> 	at org.apache.commons.logging.impl.LogFactoryImpl.newInstance(LogFactoryImpl.java:604)
> 	at org.apache.commons.logging.impl.LogFactoryImpl.getInstance(LogFactoryImpl.java:336)
> 	at org.apache.commons.logging.impl.LogFactoryImpl.getInstance(LogFactoryImpl.java:310)
> 	at org.apache.commons.logging.LogFactory.getLog(LogFactory.java:685)
> 	at org.apache.hadoop.conf.Configuration.<clinit>(Configuration.java:142)
> 	... 3 more
> java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code
1
> 	at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:311)
> 	at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:545)
> 	at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:132)
> 	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
> 	at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:36)
> 	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
> 	at org.apache.hadoop.mapred.Child$4.run(Child.java:261)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:396)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
> 	at org.apache.hadoop.mapred.Child.main(Child.java:255)
> {noformat}
> Upon inspection, there are two problems in the inherited from environment which prevent
the logger initialization to work properly.  In hadoop-env.sh, the HADOOP_OPTS is inherited
from the parent process.  This configuration was requested by user to have a way to override
HADOOP environment in the configuration template:
> {noformat}
> export HADOOP_OPTS="-Djava.net.preferIPv4Stack=true $HADOOP_OPTS"
> {noformat}
> -Dhadoop.log.dir=$HADOOP_LOG_DIR/task_tracker_user is injected into HADOOP_OPTS in the
tasktracker environment.  Hence, the running task would inherit the wrong logging directory,
which the end user might not have sufficient access to write.  Second, $HADOOP_ROOT_LOGGER
is override to: -Dhadoop.root.logger=INFO,TLA by the task controller, therefore, the bin/hadoop
script will attempt to use hadoop.root.logger=INFO,TLA, but fail to initialize.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message