Mailing-List: contact hadoop-commits-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: hadoop-dev@lucene.apache.org
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
From: Apache Wiki <wikidiffs@apache.org>
To: hadoop-commits@lucene.apache.org
Date: Fri, 26 Oct 2007 09:00:28 -0000
Message-ID: <20071026090028.11474.17169@eos.apache.org>
Subject: [Lucene-hadoop Wiki] Update of "HowToDebugMapReducePrograms" by
 Amareshwari

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Lucene-hadoop Wiki" for change notification.

The following page has been changed by Amareshwari:
http://wiki.apache.org/lucene-hadoop/HowToDebugMapReducePrograms

------------------------------------------------------------------------------
  }}}
  and run your executable under the debugger or valgrind. It will run as if the framework was feeding it commands and data and produce a output file downlink.data.out with the binary commands that it would have sent up to the framework. Eventually, I'll probably make the downlink.data.out file into a text-based format, but for now it is binary. Most problems however, will be pretty clear in the debugger or valgrind, even without looking at the generated data.
  
- = The following sections are applicable only for Hadoop 0.15.0 and above =
+ = The following sections are applicable only for Hadoop 0.16.0 and above =
  
  = Run a debug script when Task fails =
  
@@ -83, +83 @@

  
  == How to submit debug script ==
  
- A quick way to set debug script is to set the properties "mapred.map.task.debug.script" and "mapred.reduce.task.debug.script" for debugging map task and reduce task respectively. These properties can also be set by APIs conf.setMapDebugScript(String script) and conf.setReduceDebugScript(String script).
+ A quick way to set debug script is to set the properties "mapred.map.task.debug.script" and "mapred.reduce.task.debug.script" for debugging map task and reduce task respectively. These properties can also be set by APIs JobConf.setMapDebugScript and JobConf.setReduceDebugScript.
- The debug command is run as $script $stdout $stderr $syslog $jobconf. Task's stdout, stderr, syslog and jobconf files can be accessed inside the script as $1, $2, $3 and $4. In case of streaming, debug script can be submitted with command-line options -mapdebug, -reducedebug for debugging mapper and redcuer respectively.
  
- To submit the debug script file, first put the file in dfs.
- Make sure the property "mapred.create.symlink" is set to "yes". This can also be set by [http://lucene.apache.org/hadoop/api/org/apache/hadoop/filecache/DistributedCache.html#createSymlink(org.apache.hadoop.conf.Configuration) DistributedCache.createSymLink]
+ The script is given task's stdout, stderr, syslog, jobconf files as arguments.
+ The debug command, run on the node where the map/reduce  failed, is:
  
+ {{{ $script $stdout $stderr $syslog $jobconf }}}
+ 
+ For streaming, debug script can be submitted with command-line options -mapdebug, -reducedebug for debugging mapper and reducer respectively. 
+ 
+ Pipes programs have the c++ program name as a fifth argument for the command. Thus for the pipes programs the command is 
+ 
+ {{{ $script $stdout $stderr $syslog $jobconf $program }}}
+ 
+ 
+ To submit the debug script file, first put the file in dfs. 
+ 
- The file can be added by setting the property "mapred.cache.files" with value <path>#<script-name>. For more than one file, they can be added as comma seperated paths. 
+ The file can be distributed by setting the property "mapred.cache.files" with value <path>#<script-name>. For more than one file, they can be added as comma seperated paths.
+ The script file needs to be symlinked.
+ 
  This property can also be set by APIs 
  [http://lucene.apache.org/hadoop/api/org/apache/hadoop/filecache/DistributedCache.html#addCacheFile(java.net.URI,%20org.apache.hadoop.conf.Configuration) DistributedCache.addCacheFile(URI,conf)] and [http://lucene.apache.org/hadoop/api/org/apache/hadoop/filecache/DistributedCache.html#setCacheFiles DistributedCache.setCacheFiles(URIs,conf)] where URI is of the form "hdfs://host:port/<absolutepath>#<script-name>".
  For Streaming, the file can be added through command line option -cacheFile.
+ To create symlink for the file, the property "mapred.create.symlink" is set to "yes". This can also be set by [http://lucene.apache.org/hadoop/api/org/apache/hadoop/filecache/DistributedCache.html#createSymlink(org.apache.hadoop.conf.Configuration) DistributedCache.createSymLink]
+ 
+ Here is an example on how to submit a script 
+ {{{
+     jobConf.setMapDebugScript("./myscript");
+     DistributedCache.createSymlink(jobConf);
+     DistributedCache.addCacheFile("/debug/scripts/myscript#myscript");
+ }}}
  
  == Default Behavior ==
  
@@ -101, +121 @@

  
  For Pipes:
  Stdout, stderr are shown on the job UI.
- Default gdb script is run which prints info abt threads: thread Id and function in which it was running when task failed. 
+ If the failed task has core file, Default gdb script is run which prints info abt threads: thread Id and function in which it was running when task failed. 
- And prints stack tarce where task has failed.
+ And prints stack trace where task has failed.
  
  For Streaming:
  Stdout, stderr are shown on the Job UI.