hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Lucene-hadoop Wiki] Update of "HowToDebugMapReducePrograms" by Amareshwari
Date Thu, 11 Oct 2007 09:55:52 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Lucene-hadoop Wiki" for change notification.

The following page has been changed by Amareshwari:
http://wiki.apache.org/lucene-hadoop/HowToDebugMapReducePrograms

------------------------------------------------------------------------------
  
  = Run a debug script when Task fails =
  
+ A facility is provided, via user-provided scripts, for doing post-processing on task logs,
task's stdout, stderr, syslog. For pipes, a default script is run which processes core dumps
under gdb, prints stack trace and gives info about running threads. The stdout and stderr
of debug script are printed on the diagnostics. These outputs are displayed on job UI on demand.

- A facility is provided, via user-provided scripts, for doing post-processing on task logs,
task's stdout, stderr, syslog and core files. There is a default script which processes core
dumps under gdb and prints stack trace. The last five lines from stdout and stderr of debug
script are printed on the diagnostics. These outputs are displayed on job UI on demand. 
- 
- == How to submit debug command ==
- 
- A quick way to set debug command is to set the properties "mapred.map.task.debug.command"
and "mapred.reduce.task.debug.command" for debugging map task and reduce task respectively.
- These properties can also be set by APIs conf.setMapDebugCommand(String cmd) and conf.setReduceDebugCommand(String
cmd).
- The debug command can consist of @stdout@, @stderr@, @syslog@ and @core@ to access task's
stdout, stderr, syslog and core files respectively.
- In case of streaming debug command can be submitted with command-line options -mapdebug,
-reducedebug for debugging mapper and redcuer respectively.
- 
- For example, the debug command can be 'myScript @stderr@'. This command has executable myScript.
And myScript processes failed task's stderr.
- 
- The debug command can be a gdb command where user can submit a command file to execute using
-x option. 
- Then debug command can look like 'gdb <program-name> -c @core@ -x <gdb-cmd-fle>
'. This command processes core file of the failed task <program-name> and executes commands
in <gdb-cmd-file>. Please make sure gdb command file has 'quit' in its last line.
  
  == How to submit debug script ==
  
+ A quick way to set debug script is to set the properties "mapred.map.task.debug.script"
and "mapred.reduce.task.debug.script" for debugging map task and reduce task respectively.
These properties can also be set by APIs conf.setMapDebugScript(String script) and conf.setReduceDebugScript(String
script).
+ The debug command is run as $script $stdout $stderr $syslog. Task's stdout, stderr and syslog
files can be accessed inside the script as $1, $2 and $3.
+ In case of streaming, debug script can be submitted with command-line options -mapdebug,
-reducedebug for debugging mapper and redcuer respectively.
+ 
  To submit the debug script file, first put the file in dfs.
+ Make sure the property "mapred.create.symlink" is set to "yes". This can also be set by
[http://lucene.apache.org/hadoop/api/org/apache/hadoop/filecache/DistributedCache.html#createSymlink(org.apache.hadoop.conf.Configuration)
DistributedCache.createSymLink]
  
- The executable can be added by setting the property "mapred.cache.executables" with value
<path>#<executable-name>. For more than one executable, they can be added as comma
seperated executable paths. 
+ The file can be added by setting the property "mapred.cache.files" with value <path>#<script-name>.
For more than one file, they can be added as comma seperated paths. 
- Executable property can also be set by APIs DistributedCache.addCacheExecutable(URI,conf)
and DistributedCache.setCacheExecutables(URI[],conf) where URI is of the form "hdfs://host:port/<path>#<executable-name>".
- For Streaming, the executable can be added through -cacheExecutable URI.
+ This property can also be set by APIs 
+ [http://lucene.apache.org/hadoop/api/org/apache/hadoop/filecache/DistributedCache.html#addCacheFile(java.net.URI,%20org.apache.hadoop.conf.Configuration)
DistributedCache.addCacheFile(URI,conf)] and [http://lucene.apache.org/hadoop/api/org/apache/hadoop/filecache/DistributedCache.html#setCacheFiles
DistributedCache.setCacheFiles(URIs,conf)] where URI is of the form "hdfs://host:port/<absolutepath>#<script-name>".
+ For Streaming, the file can be added through command line option -cacheFile.
  
- For gdb, the gdb command file need not be executable. But, the command file needs to be
in dfs. It can be added to cache by setting the property "mapred.cache.files" with the value
<path>#<cmd-file> or through the API DistributedCache.addCacheFile(URI,conf).
- Please make sure the property "mapred.create.symlink" is set to "yes"
- 

Mime
View raw message