hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Lucene-hadoop Wiki] Update of "HowToDebugMapReducePrograms" by OwenOMalley
Date Wed, 09 Aug 2006 22:54:17 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Lucene-hadoop Wiki" for change notification.

The following page has been changed by OwenOMalley:

New page:
= How to Debug Map/Reduce Programs =

Debugging distributed programs is always difficult, because very few debuggers will let you
connect to a remote program that wasn't run with the proper command line arguments.

 1. Start by getting everything running (likely on a small input) in the local runner. 
    You do this by setting your job tracker to "local" in your config. The local runner can
    under the debugger and runs on your development machine.

 2. Run the small input on a 1 node cluster. This will smoke out all of the issues that happen
    distribution and the "real" task runner, but you only have a single place to look at logs.
    useful are the task and job tracker logs. Make sure you are logging at the INFO level
or you will 
    miss clues like the output of your tasks.

 3. Run on a big cluster. Recently, I added the keep.failed.task.files config variable that
tells the
    system to keep files for tasks that fail. This leaves "dead" files around that you can
debug with. 
    On the node with the failed task, go to the task tracker's local directory and cd to
    ''<local>''/taskTracker/''<taskid>'' and run
% hadoop org.apache.hadoop.IsolationRunner job.xml
   This will run the failed task in a single jvm, which can be in the debugger, over precisely
the same    

There is also a configuration variable (keep.task.files.pattern) that will let you specify
a task to keep by name, even if it doesn't fail. Other than that, logging is your friend.

View raw message