Return-Path: Delivered-To: apmail-lucene-hadoop-commits-archive@locus.apache.org Received: (qmail 9114 invoked from network); 18 Jun 2007 05:54:18 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 18 Jun 2007 05:54:18 -0000 Received: (qmail 44395 invoked by uid 500); 18 Jun 2007 05:54:21 -0000 Delivered-To: apmail-lucene-hadoop-commits-archive@lucene.apache.org Received: (qmail 44373 invoked by uid 500); 18 Jun 2007 05:54:21 -0000 Mailing-List: contact hadoop-commits-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hadoop-dev@lucene.apache.org Delivered-To: mailing list hadoop-commits@lucene.apache.org Received: (qmail 44364 invoked by uid 99); 18 Jun 2007 05:54:21 -0000 Received: from herse.apache.org (HELO herse.apache.org) (140.211.11.133) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 17 Jun 2007 22:54:21 -0700 Received: from [140.211.11.130] (HELO eos.apache.org) (140.211.11.130) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 17 Jun 2007 22:54:17 -0700 Received: from eos.apache.org (localhost [127.0.0.1]) by eos.apache.org (Postfix) with ESMTP id 2E9455A24F for ; Mon, 18 Jun 2007 05:53:57 +0000 (GMT) Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit From: Apache Wiki To: hadoop-commits@lucene.apache.org Date: Mon, 18 Jun 2007 05:53:56 -0000 Message-ID: <20070618055356.27277.82501@eos.apache.org> Subject: [Lucene-hadoop Wiki] Update of "HowToDebugMapReducePrograms" by OwenOMalley X-Virus-Checked: Checked by ClamAV on apache.org Dear Wiki user, You have subscribed to a wiki page or wiki category on "Lucene-hadoop Wiki" for change notification. The following page has been changed by OwenOMalley: http://wiki.apache.org/lucene-hadoop/HowToDebugMapReducePrograms ------------------------------------------------------------------------------ In order to debug Pipes programs you need to keep the downloaded commands. - First, to keep the !TaskTracker from deleting the files when the task is finished, you need to set either keep.failed.task.files (set to true if the task you want to debug fails) or keep.task.files.pattern (set to a regex of the task name you want to debug). + First, to keep the !TaskTracker from deleting the files when the task is finished, you need to set either keep.failed.task.files (set it to true if the interesting task always fails) or keep.task.files.pattern (set to a regex that includes the interesting task name). - Second, your job should set hadoop.pipes.command-file.keep to true in the JobConf. This will cause all of the tasks in the job to write their command stream to a file in the working directory named downlink.data. This file will contain the JobConf, the task information, and the task input, so it may be large. But it provides enough information that your executable will run without any interaction with the framework. + Second, your job should set hadoop.pipes.command-file.keep to true in the !JobConf. This will cause all of the tasks in the job to write their command stream to a file in the working directory named downlink.data. This file will contain the JobConf, the task information, and the task input, so it may be large. But it provides enough information that your executable will run without any interaction with the framework. Third, go to the host where the problem task ran, go into the work directory and {{{ setenv hadoop.pipes.command.file downlink.data }}} - and run your executable under the debugger or valgrind. It will run as if the framework was feeding it commands and data and produce a output file downlink.data.out with the binary commands that it would have sent up to the framework. I guess eventually, I should have the output file be written in text rather than binary... + and run your executable under the debugger or valgrind. It will run as if the framework was feeding it commands and data and produce a output file downlink.data.out with the binary commands that it would have sent up to the framework. Eventually, I'll probably make the downlink.data.out file into a text-based format, but for now it is binary. Most problems however, will be pretty clear in the debugger or valgrind, even without looking at the generated data.