Return-Path: Delivered-To: apmail-lucene-hadoop-dev-archive@locus.apache.org Received: (qmail 71799 invoked from network); 13 Sep 2007 16:34:56 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 13 Sep 2007 16:34:56 -0000 Received: (qmail 25981 invoked by uid 500); 13 Sep 2007 16:34:48 -0000 Delivered-To: apmail-lucene-hadoop-dev-archive@lucene.apache.org Received: (qmail 25950 invoked by uid 500); 13 Sep 2007 16:34:48 -0000 Mailing-List: contact hadoop-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hadoop-dev@lucene.apache.org Delivered-To: mailing list hadoop-dev@lucene.apache.org Received: (qmail 25941 invoked by uid 99); 13 Sep 2007 16:34:48 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 13 Sep 2007 09:34:48 -0700 X-ASF-Spam-Status: No, hits=-100.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.4] (HELO brutus.apache.org) (140.211.11.4) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 13 Sep 2007 16:36:34 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 4D98371420B for ; Thu, 13 Sep 2007 09:34:32 -0700 (PDT) Message-ID: <29971645.1189701272315.JavaMail.jira@brutus> Date: Thu, 13 Sep 2007 09:34:32 -0700 (PDT) From: "Sameer Paranjpye (JIRA)" To: hadoop-dev@lucene.apache.org Subject: [jira] Commented: (HADOOP-1857) Ability to run a script when a task fails to capture stack traces In-Reply-To: <33070422.1189155571005.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HADOOP-1857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12527161 ] Sameer Paranjpye commented on HADOOP-1857: ------------------------------------------ A couple of comments: - Why have separate config variables for the script and the command line? Seems like you could just supply the command line. The DistributedCache should be used to send the script over if needed (otherwise we can assume that the specified program is installed) - Where does the .gdbinit file get picked up from? The users home directory? That seems brittle. How does it get sent over to the executing node? The .gdbinit file should also be sent over via the DistributedCache. I'd vote for handling perl and python in a second pass. > Ability to run a script when a task fails to capture stack traces > ----------------------------------------------------------------- > > Key: HADOOP-1857 > URL: https://issues.apache.org/jira/browse/HADOOP-1857 > Project: Hadoop > Issue Type: Improvement > Components: mapred > Affects Versions: 0.14.0 > Reporter: Amareshwari Sri Ramadasu > Assignee: Amareshwari Sri Ramadasu > Fix For: 0.15.0 > > > This basically is for providing a better user interface for debugging failed > jobs. Today we see stack traces for failed tasks on the job ui if the job > happened to be a Java MR job. For non-Java jobs like Streaming, Pipes, the > diagnostic info on the job UI is not helpful enough to debug what might have > gone wrong. They are usually framework traces and not app traces. > We want to be able to provide a facility, via user-provided scripts, for doing > post-processing on task logs, input, output, etc. There should be some default > scripts like running core dumps under gdb for locating illegal instructions, > the last few lines from stderr, etc. These outputs could be sent to the > tasktracker and in turn to the jobtracker which would then display it on the > job UI on demand. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.