hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hadoop Wiki] Update of "HowToDebugMapReducePrograms" by AmareshwariSriRamadasu
Date Thu, 21 Feb 2008 06:52:23 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The following page has been changed by AmareshwariSriRamadasu:
http://wiki.apache.org/hadoop/HowToDebugMapReducePrograms

------------------------------------------------------------------------------
  
  = Run a debug script when Task fails =
  
- A facility is provided, via user-provided scripts, for doing post-processing on task logs,
task's stdout, stderr, syslog. For pipes, a default script is run which processes core dumps
under gdb, prints stack trace and gives info about running threads. The stdout and stderr
of debug script are printed on the diagnostics. These outputs are displayed on job UI on demand.

+ When map/reduce task fails, there is a facility provided, via user-provided scripts, for
doing post-processing on task logs i.e task's stdout, stderr, syslog.  The stdout and stderr
of the user-provided debug script are printed on the diagnostics. These outputs are displayed
on job UI on demand.
+ 
+ For pipes, a default script is run which processes core dumps under gdb, prints stack trace
and gives info about running threads.
+ 
+ In the following sections we discuss how to submit debug script along with the job. We also
discuss what the default behavior is.
+ For submiting debug script, first it has to distributed. Then the script has to supplied
in Configuration.
+ 
+ == How to submit debug script file ==
+ 
+ To submit the debug script file, first put the file in dfs. 
+ 
+ The file can be distributed by setting the property "mapred.cache.files" with value <path>#<script-name>.
For more than one file, they can be added as comma seperated paths.
+ The script file needs to be symlinked.
+ 
+ This property can also be set by APIs 
+ [http://hadoop.apache.org/core/api/org/apache/hadoop/filecache/DistributedCache.html#addCacheFile(java.net.URI,%20org.apache.hadoop.conf.Configuration)
DistributedCache.addCacheFile(URI,conf)] and [http://hadoop.apache.org/core/api/org/apache/hadoop/filecache/DistributedCache.html#setCacheFiles
DistributedCache.setCacheFiles(URIs,conf)] where URI is of the form "hdfs://host:port/<absolutepath>#<script-name>".
+ For Streaming, the file can be added through command line option -cacheFile.
+ To create symlink for the file, the property "mapred.create.symlink" is set to "yes". This
can also be set by [http://hadoop.apache.org/core/api/org/apache/hadoop/filecache/DistributedCache.html#createSymlink(org.apache.hadoop.conf.Configuration)
DistributedCache.createSymLink]
  
  == How to submit debug script ==
  
- A quick way to set debug script is to set the properties "mapred.map.task.debug.script"
and "mapred.reduce.task.debug.script" for debugging map task and reduce task respectively.
These properties can also be set by APIs JobConf.setMapDebugScript and JobConf.setReduceDebugScript.
+ A quick way to submit debug script is to set values for the properties "mapred.map.task.debug.script"
and "mapred.reduce.task.debug.script" for debugging map task and reduce task respectively.
These properties can also be set by APIs [http://hadoop.apache.org/core/api/org/apache/hadoop/mapred/JobConf.html#setMapDebugScript(java.lang.String)
JobConf.setMapDebugScript]
- 
+ [http://hadoop.apache.org/core/docs/r0.16.0/api/org/apache/hadoop/mapred/JobConf.html#setReduceDebugScript(java.lang.String)
JobConf.setReduceDebugScript].
  The script is given task's stdout, stderr, syslog, jobconf files as arguments.
  The debug command, run on the node where the map/reduce  failed, is:
  
@@ -96, +113 @@

  
  {{{ $script $stdout $stderr $syslog $jobconf $program }}}
  
- 
- To submit the debug script file, first put the file in dfs. 
- 
- The file can be distributed by setting the property "mapred.cache.files" with value <path>#<script-name>.
For more than one file, they can be added as comma seperated paths.
- The script file needs to be symlinked.
- 
- This property can also be set by APIs 
- [http://hadoop.apache.org/core/api/org/apache/hadoop/filecache/DistributedCache.html#addCacheFile(java.net.URI,%20org.apache.hadoop.conf.Configuration)
DistributedCache.addCacheFile(URI,conf)] and [http://hadoop.apache.org/core/api/org/apache/hadoop/filecache/DistributedCache.html#setCacheFiles
DistributedCache.setCacheFiles(URIs,conf)] where URI is of the form "hdfs://host:port/<absolutepath>#<script-name>".
- For Streaming, the file can be added through command line option -cacheFile.
- To create symlink for the file, the property "mapred.create.symlink" is set to "yes". This
can also be set by [http://hadoop.apache.org/core/api/org/apache/hadoop/filecache/DistributedCache.html#createSymlink(org.apache.hadoop.conf.Configuration)
DistributedCache.createSymLink]
- 
  Here is an example on how to submit a script 
  {{{
      jobConf.setMapDebugScript("./myscript");
@@ -115, +121 @@

  }}}
  
  == Default Behavior ==
+ The default behavior for failed map/reduce tasks is 
  
  For Java programs:
  Stdout, stderr are shown on job UI. Stack trace is printed on diagnostics.

Mime
View raw message