Return-Path: Delivered-To: apmail-hadoop-core-user-archive@www.apache.org Received: (qmail 45157 invoked from network); 8 Aug 2008 17:12:25 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 8 Aug 2008 17:12:25 -0000 Received: (qmail 53331 invoked by uid 500); 8 Aug 2008 17:12:19 -0000 Delivered-To: apmail-hadoop-core-user-archive@hadoop.apache.org Received: (qmail 53299 invoked by uid 500); 8 Aug 2008 17:12:19 -0000 Mailing-List: contact core-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: core-user@hadoop.apache.org Delivered-To: mailing list core-user@hadoop.apache.org Received: (qmail 53288 invoked by uid 99); 8 Aug 2008 17:12:19 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 08 Aug 2008 10:12:19 -0700 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [128.9.160.146] (HELO yap.isi.edu) (128.9.160.146) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 08 Aug 2008 17:11:23 +0000 Received: from yap.isi.edu (localhost.localdomain [127.0.0.1]) by yap.isi.edu (8.14.2/8.14.2) with ESMTP id m78H9o75009595; Fri, 8 Aug 2008 10:09:50 -0700 Received: from localhost (localhost [[UNIX: localhost]]) by yap.isi.edu (8.14.2/8.14.2/Submit) id m78H9nkh009594; Fri, 8 Aug 2008 10:09:49 -0700 From: Yuri Pradkin Organization: USC/ISI To: core-user@hadoop.apache.org Subject: Re: extracting input to a task from a (streaming) job? Date: Fri, 8 Aug 2008 10:09:48 -0700 User-Agent: KMail/1.9.9 Cc: John Heidemann References: <24911.1218152590@dash.isi.edu> In-Reply-To: <24911.1218152590@dash.isi.edu> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200808081009.48174.yuri@isi.edu> X-Virus-Checked: Checked by ClamAV on apache.org On Thursday 07 August 2008 16:43:10 John Heidemann wrote: > On Thu, 07 Aug 2008 19:42:05 +0200, "Leon Mergen" wrote: > >Hello John, > > > >On Thu, Aug 7, 2008 at 6:30 PM, John Heidemann wrote: > >> I have a large Hadoop streaming job that generally works fine, > >> but a few (2-4) of the ~3000 maps and reduces have problems. > >> To make matters worse, the problems are system-dependent (we run an a > >> cluster with machines of slightly different OS versions). > >> I'd of course like to debug these problems, but they are embedded in a > >> large job. > >> > >> Is there a way to extract the input given to a reducer from a job, given > >> the task identity? (This would also be helpful for mappers.) > > > >I believe you should set "keep.failed.tasks.files" to true -- this way, > > give a task id, you can see what input files it has in ~/ > >taskTracker/${taskid}/work (source: > >http://hadoop.apache.org/core/docs/r0.17.0/mapred_tutorial.html#IsolationR > >unner ) IsolationRunner does not work as described in the tutorial. After the task hung, I failed it via the web interface. Then I went to the node that was running this task $ cd ...local/taskTracker/jobcache/job_200808071645_0001/work (this path is already different from the tutorial's) $ hadoop org.apache.hadoop.mapred.IsolationRunner ../job.xml Exception in thread "main" java.lang.NullPointerException at org.apache.hadoop.mapred.IsolationRunner.main(IsolationRunner.java:164) Looking at IsolationRunner code, I see this: 164 File workDirName = new File(lDirAlloc.getLocalPathToRead( 165 TaskTracker.getJobCacheSubdir() 166 + Path.SEPARATOR + taskId.getJobID() 167 + Path.SEPARATOR + taskId 168 + Path.SEPARATOR + "work", 169 conf). toString()); I.e. it assumes there is supposed to be a taskID subdirectory under the job dir, but: $ pwd ...mapred/local/taskTracker/jobcache/job_200808071645_0001 $ ls jars job.xml work -- it's not there. Any suggestions? Thanks, -Yuri