hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "huozhanfeng (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-2231) Provide feature to limit MRJob's stdout/stderr size
Date Sun, 29 Jun 2014 09:03:24 GMT

    [ https://issues.apache.org/jira/browse/YARN-2231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14047081#comment-14047081
] 

huozhanfeng commented on YARN-2231:
-----------------------------------

Index: MapReduceChildJVM.java
===================================================================
--- MapReduceChildJVM.java	(revision 1387)
+++ MapReduceChildJVM.java	(revision 1388)
@@ -37,6 +37,7 @@
 @SuppressWarnings("deprecation")
 public class MapReduceChildJVM {
 
+	private static final String tailCommand = "tail";
   private static String getTaskLogFile(LogName filter) {
     return ApplicationConstants.LOG_DIR_EXPANSION_VAR + Path.SEPARATOR + 
         filter.toString();
@@ -161,9 +162,12 @@
 
     TaskAttemptID attemptID = task.getTaskID();
     JobConf conf = task.conf;
-
+    long logSize = TaskLog.getTaskLogLength(conf);
+    
     Vector<String> vargs = new Vector<String>(8);
-
+    if(logSize > 0){
+    	vargs.add("(");
+    }
     vargs.add(Environment.JAVA_HOME.$() + "/bin/java");
 
     // Add child (task) java-vm options.
@@ -206,7 +210,6 @@
     vargs.add("-Djava.io.tmpdir=" + childTmpDir);
 
     // Setup the log4j prop
-    long logSize = TaskLog.getTaskLogLength(conf);
     setupLog4jProperties(task, vargs, logSize);
 
     if (conf.getProfileEnabled()) {
@@ -229,8 +232,22 @@
 
     // Finally add the jvmID
     vargs.add(String.valueOf(jvmID.getId()));
-    vargs.add("1>" + getTaskLogFile(TaskLog.LogName.STDOUT));
-    vargs.add("2>" + getTaskLogFile(TaskLog.LogName.STDERR));
+    if (logSize > 0) {
+    	vargs.add("|");
+        vargs.add(tailCommand);
+        vargs.add("-c");
+        vargs.add(String.valueOf(logSize));
+        vargs.add(">>"+getTaskLogFile(TaskLog.LogName.STDOUT));
+        vargs.add("; exit $PIPESTATUS ) 2>&1 | ");
+        vargs.add(tailCommand);
+        vargs.add("-c");
+        vargs.add(String.valueOf(logSize));
+        vargs.add(">>"+getTaskLogFile(TaskLog.LogName.STDERR));
+        vargs.add("; exit $PIPESTATUS");
+      } else {
+	    vargs.add("1>" + getTaskLogFile(TaskLog.LogName.STDOUT));
+	    vargs.add("2>" + getTaskLogFile(TaskLog.LogName.STDERR));
+      }
 
     // Final commmand
     StringBuilder mergedCommand = new StringBuilder();


> Provide feature  to limit MRJob's stdout/stderr size
> ----------------------------------------------------
>
>                 Key: YARN-2231
>                 URL: https://issues.apache.org/jira/browse/YARN-2231
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: log-aggregation, nodemanager
>    Affects Versions: 2.3.0
>         Environment: CentOS release 5.8 (Final)
>            Reporter: huozhanfeng
>              Labels: features
>
> When a MRJob print too much stdout or stderr log, the disk will be filled. Now it has
influence our platform management.
> I have improved org.apache.hadoop.mapred.MapReduceChildJVM.java(come from@org.apache.hadoop.mapred.TaskLog)
to generate the execute cmd
> as follows:
> exec /bin/bash -c "( $JAVA_HOME/bin/java -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN
 -Xmx1024m -Djava.io.tmpdir=$PWD/tmp -Dlog4j.configuration=container-log4j.properties -Dyarn.app.container.log.dir=/logs/userlogs/application_1403930653208_0003/container_1403930653208_0003_01_000002
-Dyarn.app.container.log.filesize=10240 -Dhadoop.root.logger=DEBUG,CLA org.apache.hadoop.mapred.YarnChild
10.106.24.108 53911 attempt_1403930653208_0003_m_000000_0 2 | tail -c 102 >/logs/userlogs/application_1403930653208_0003/container_1403930653208_0003_01_000002/stdout
; exit $PIPESTATUS ) 2>&1 |  tail -c 10240 >/logs/userlogs/application_1403930653208_0003/container_1403930653208_0003_01_000002/stderr
; exit $PIPESTATUS "
> But it doesn't take effect.
> And then, when I use "export YARN_NODEMANAGER_OPTS=-Xdebug -Xrunjdwp:transport=dt_socket,address=8788,server=y,suspend=y"
for debuging NodeManager, I find when I set the BreakPoints at org.apache.hadoop.util.Shell(line
450:process = builder.start()) and org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch(line
161:List<String> newCmds = new ArrayList<String>(command.size())) the cmd will
work.
> I doubt there's concurrency problem caused  pipe shell will not perform properly. It
matters, and I need your help.
> my email: huozhanfeng@gmail.com
> thanks



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message