hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adam Kawa <kawa.a...@gmail.com>
Subject Re: How to limit MRJob's stdout/stderr size(yarn2.3)
Date Tue, 08 Jul 2014 23:21:44 GMT
There are a setting like

<property>
  <name>mapreduce.task.userlog.limit.kb</name>
  <value>0</value>
  <description>The maximum size of user-logs of each task in KB. 0 disables
the cap.
  </description>
</property>

but I have not tried it on YARN.

If your disks are full, because you run many application+tasks that produce
logs, you could also consider enabling log aggregation in HDFS. Truncating
logs has the disadvantage that you might lose important information that
could be useful for debugging or performance analysis (e.g. a limit can be
good for some jobs, but for some of them you might want to access a
complete log).


2014-07-03 6:21 GMT+02:00 huozhanfeng@gmail.com <huozhanfeng@gmail.com>:

> Hi,friend:
>
>     When a MRJob print too much stdout or stderr log, the disk will be
> filled. Now it has influence our platform management.
>
>     I have improved org.apache.hadoop.mapred.MapReduceChildJVM.java(come
> from@org.apache.hadoop.mapred.TaskLog) to generate the execute cmd
> as follows:
>
> exec /bin/bash -c "( $JAVA_HOME/bin/java -Djava.net.preferIPv4Stack=true
> -Dhadoop.metrics.log.level=WARN -Xmx1024m -Djava.io.tmpdir=$PWD/tmp -
> Dlog4j.configuration=container-log4j.properties
> -Dyarn.app.container.log.dir=/logs/userlogs/application_1403930653208_0003/container_1403930653208_0003_01_000002
> -Dyarn.app.container.log.filesize=10240 -Dhadoop.root.logger=DEBUG,CLA
> org.apache.hadoop.mapred.YarnChild $test_IP 53911
> attempt_1403930653208_0003_m_000000_0 2 | tail -c 102
> >/logs/userlogs/application_1403930653208_0003/container_1403930653208_0003_01_000002/stdout
> ; exit $PIPESTATUS ) 2>&1 | tail -c 10240
> >/logs/userlogs/application_1403930653208_0003/container_1403930653208_0003_01_000002/stderr
> ; exit $PIPESTATUS "
>
>
>     But it doesn't take effect.
>
>     And then, when I use "export YARN_NODEMANAGER_OPTS=-Xdebug
> -Xrunjdwp:transport=dt_socket,address=8788,server=y,suspend=y" for debuging
> NodeManager, I find when I set the BreakPoints at
> org.apache.hadoop.util.Shell(line 450:process = builder.start()) and
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch(line
> 161:List<String> newCmds = new ArrayList<String>(command.size())) the cmd
> will work.
>
>     I doubt there's concurrency problem caused pipe shell will not perform
> properly. It matters, and I need help.
>
>    @https://issues.apache.org/jira/browse/YARN-2231
>
> thanks
>
> ------------------------------
> Zhanfeng Huo
>

Mime
View raw message