Mailing-List: contact hadoop-commits-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: hadoop-dev@lucene.apache.org
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
From: Apache Wiki <wikidiffs@apache.org>
To: hadoop-commits@lucene.apache.org
Date: Wed, 09 Aug 2006 22:54:17 -0000
Message-ID: <20060809225417.24443.95165@ajax.apache.org>
Subject: [Lucene-hadoop Wiki] Update of "HowToDebugMapReducePrograms" by
 OwenOMalley

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Lucene-hadoop Wiki" for change notification.

The following page has been changed by OwenOMalley:
http://wiki.apache.org/lucene-hadoop/HowToDebugMapReducePrograms

New page:
= How to Debug Map/Reduce Programs =

Debugging distributed programs is always difficult, because very few debuggers will let you connect to a remote program that wasn't run with the proper command line arguments.

 1. Start by getting everything running (likely on a small input) in the local runner. 
    You do this by setting your job tracker to "local" in your config. The local runner can run 
    under the debugger and runs on your development machine.

 2. Run the small input on a 1 node cluster. This will smoke out all of the issues that happen with
    distribution and the "real" task runner, but you only have a single place to look at logs. Most 
    useful are the task and job tracker logs. Make sure you are logging at the INFO level or you will 
    miss clues like the output of your tasks.

 3. Run on a big cluster. Recently, I added the keep.failed.task.files config variable that tells the
    system to keep files for tasks that fail. This leaves "dead" files around that you can debug with. 
    On the node with the failed task, go to the task tracker's local directory and cd to
    ''<local>''/taskTracker/''<taskid>'' and run
    {{{
% hadoop org.apache.hadoop.IsolationRunner job.xml
    }}}
   This will run the failed task in a single jvm, which can be in the debugger, over precisely the same    
   input.

There is also a configuration variable (keep.task.files.pattern) that will let you specify a task to keep by name, even if it doesn't fail. Other than that, logging is your friend.