Return-Path: Delivered-To: apmail-lucene-hadoop-commits-archive@locus.apache.org Received: (qmail 54541 invoked from network); 9 Aug 2006 22:54:39 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 9 Aug 2006 22:54:39 -0000 Received: (qmail 76516 invoked by uid 500); 9 Aug 2006 22:54:39 -0000 Delivered-To: apmail-lucene-hadoop-commits-archive@lucene.apache.org Received: (qmail 76494 invoked by uid 500); 9 Aug 2006 22:54:38 -0000 Mailing-List: contact hadoop-commits-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hadoop-dev@lucene.apache.org Delivered-To: mailing list hadoop-commits@lucene.apache.org Received: (qmail 76485 invoked by uid 99); 9 Aug 2006 22:54:38 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 09 Aug 2006 15:54:38 -0700 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_HELO_PASS X-Spam-Check-By: apache.org Received: from [192.87.106.226] (HELO ajax.apache.org) (192.87.106.226) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 09 Aug 2006 15:54:38 -0700 Received: from ajax.apache.org (localhost [127.0.0.1]) by ajax.apache.org (Postfix) with ESMTP id 585F2D495A for ; Wed, 9 Aug 2006 23:54:17 +0100 (BST) Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit From: Apache Wiki To: hadoop-commits@lucene.apache.org Date: Wed, 09 Aug 2006 22:54:17 -0000 Message-ID: <20060809225417.24443.95165@ajax.apache.org> Subject: [Lucene-hadoop Wiki] Update of "HowToDebugMapReducePrograms" by OwenOMalley X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N Dear Wiki user, You have subscribed to a wiki page or wiki category on "Lucene-hadoop Wiki" for change notification. The following page has been changed by OwenOMalley: http://wiki.apache.org/lucene-hadoop/HowToDebugMapReducePrograms New page: = How to Debug Map/Reduce Programs = Debugging distributed programs is always difficult, because very few debuggers will let you connect to a remote program that wasn't run with the proper command line arguments. 1. Start by getting everything running (likely on a small input) in the local runner. You do this by setting your job tracker to "local" in your config. The local runner can run under the debugger and runs on your development machine. 2. Run the small input on a 1 node cluster. This will smoke out all of the issues that happen with distribution and the "real" task runner, but you only have a single place to look at logs. Most useful are the task and job tracker logs. Make sure you are logging at the INFO level or you will miss clues like the output of your tasks. 3. Run on a big cluster. Recently, I added the keep.failed.task.files config variable that tells the system to keep files for tasks that fail. This leaves "dead" files around that you can debug with. On the node with the failed task, go to the task tracker's local directory and cd to ''''/taskTracker/'''' and run {{{ % hadoop org.apache.hadoop.IsolationRunner job.xml }}} This will run the failed task in a single jvm, which can be in the debugger, over precisely the same input. There is also a configuration variable (keep.task.files.pattern) that will let you specify a task to keep by name, even if it doesn't fail. Other than that, logging is your friend.