hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Amareshwari Sri Ramadasu (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-1857) Ability to run a script when a task fails to capture stack traces
Date Wed, 12 Sep 2007 08:37:32 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-1857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12526709

Amareshwari Sri Ramadasu commented on HADOOP-1857:

The proposal is as follows:

1. API for the script:

  API is added through JobConf.
  JobConf.{set/get}DebugScript(file) will set or get Debug script the user wants to run when
the task fails.
  JobConf.{set/get}DebugCommand(String cmd) will set or get the Debug Command to run the script.

  For example, the command can look like the following:

  $> script_name -intput $stdout / $stderr
                 -core $core
                 -output $output_file

  $stdout, $stderr are the task's stdout and stderr files respectively.
  $core is the core file to be processed.
  $ouput_file is the file to store the output of the script.
  User can use $stdout, $stderr, $core parameters to get the required done.

2. Distributed Cache:

   The script is copied into the nodes using DistributedCache by adding methods addCacheExecutable()
and getCacheExecutables() and variable isExecutable similar to addCacheArchive(), getCacheArchives()
and isArchive. 
3. gdb:
   Default scripts to run core dumps under gdb will be provided. User can specify gdb parameters
in .gdbinit file

4. When to call the script?
   The script can be called in two positions. 
   i) Whenever a task fails; before releaseCache().
   ii) Whenever a Job fails; have to make sure the cache files exists.

5. Display output:

   The output of the script is saved in $output_file. And the output is sent to JobTracker
using TaskTracker.reportDiagnosticInfo() and displayed on the Job UI on demand.

Please let me know your comments on the proposal. especially on when to call the script.

> Ability to run a script when a task fails to capture stack traces
> -----------------------------------------------------------------
>                 Key: HADOOP-1857
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1857
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.14.0
>            Reporter: Amareshwari Sri Ramadasu
>            Assignee: Amareshwari Sri Ramadasu
>             Fix For: 0.15.0
> This basically is for providing a better user interface for debugging failed
> jobs. Today we see stack traces for failed tasks on the job ui if the job
> happened to be a Java MR job. For non-Java jobs like Streaming, Pipes, the
> diagnostic info on the job UI is not helpful enough to debug what might have
> gone wrong. They are usually framework traces and not app traces.
> We want to be able to provide a facility, via user-provided scripts, for doing
> post-processing on task logs, input, output, etc. There should be some default
> scripts like running core dumps under gdb for locating illegal instructions,
> the last few lines from stderr, etc.  These outputs could be sent to the
> tasktracker and in turn to the jobtracker which would then display it on the
> job UI on demand.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message