Return-Path: Delivered-To: apmail-hadoop-core-dev-archive@www.apache.org Received: (qmail 66626 invoked from network); 15 Jun 2009 03:36:22 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 15 Jun 2009 03:36:22 -0000 Received: (qmail 54420 invoked by uid 500); 15 Jun 2009 03:36:32 -0000 Delivered-To: apmail-hadoop-core-dev-archive@hadoop.apache.org Received: (qmail 54357 invoked by uid 500); 15 Jun 2009 03:36:32 -0000 Mailing-List: contact core-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: core-dev@hadoop.apache.org Delivered-To: mailing list core-dev@hadoop.apache.org Received: (qmail 54347 invoked by uid 99); 15 Jun 2009 03:36:32 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 15 Jun 2009 03:36:32 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 15 Jun 2009 03:36:29 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id BF45129A0013 for ; Sun, 14 Jun 2009 20:36:07 -0700 (PDT) Message-ID: <1135246120.1245036967782.JavaMail.jira@brutus> Date: Sun, 14 Jun 2009 20:36:07 -0700 (PDT) From: "Philip Zeyliger (JIRA)" To: core-dev@hadoop.apache.org Subject: [jira] Updated: (HADOOP-4041) IsolationRunner does not work as documented In-Reply-To: <262249696.1219940324151.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HADOOP-4041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philip Zeyliger updated HADOOP-4041: ------------------------------------ Comment: was deleted (was: bq. In DistributedCacheHandle the class doc should go before the class declaration, not at the beginning of the file. Also need to add Apache license. Done. bq. Use an enum rather than a boolean for isArchive in CacheFile. Done. bq. We shouldn't remove public methods to DistributedCache, but rather deprecate them and remove them in a future release. Can DistributedCache delegate to DistributedCacheManager? I like the fact you have documented the intended audience for each public method of DistributedCache. (This paves the way to separating the public and private interfaces in future.) Done. My current thinking on APIs (for a future JIRA) is that users should access DistributedCache through Job.addToCache(URI, flags) and Context.getCachedFiles(). But there's some more work to get there. bq. Is there duplication between TestMRWithDistributedCache and tests that use MRCaching that could be avoided? Probably, but it's hard to tease out. MRCaching is more complicated than the test I'm adding, and does, I believe, test some things that I don't. On the other hand, TestMRWithDistributedCache tests the classpath stuff. I'm loath to delete tests too eagerly. bq. Could TestMRWithDistributedCache also test symlinking? It does now test symlinking. However, I couldn't (easily) get LocalJobRunner to do symlinks appropriately. LocalJobRunner doesn't currently have a notion of task directory, and I think this patch is already quite large.) > IsolationRunner does not work as documented > ------------------------------------------- > > Key: HADOOP-4041 > URL: https://issues.apache.org/jira/browse/HADOOP-4041 > Project: Hadoop Core > Issue Type: Bug > Components: documentation, mapred > Affects Versions: 0.18.0 > Reporter: Yuri Pradkin > Assignee: Philip Zeyliger > Attachments: HADOOP-4041-v2.patch, HADOOP-4041-v3.patch, HADOOP-4041-v4.patch, hadoop-4041.patch, org.apache.hadoop.fs.LocalDirAllocator.html > > > IsolationRunner does not work as documented in the tutorial. > The tutorial says "To use the IsolationRunner, first set keep.failed.tasks.files to true (also see keep.tasks.files.pattern)." > Should be: > keep.failed.task.files (not tasks) > After the above was set (quoted from my message on hadoop-core): > > After the task > > hung, I failed it via the web interface. Then I went to the node that was > > running this task > > > > $ cd ...local/taskTracker/jobcache/job_200808071645_0001/work > > (this path is already different from the tutorial's) > > > > $ hadoop org.apache.hadoop.mapred.IsolationRunner ../job.xml > > Exception in thread "main" java.lang.NullPointerException > > at > > org.apache.hadoop.mapred.IsolationRunner.main(IsolationRunner.java:164) > > > > Looking at IsolationRunner code, I see this: > > > > 164 File workDirName = new File(lDirAlloc.getLocalPathToRead( > > 165 TaskTracker.getJobCacheSubdir() > > 166 + Path.SEPARATOR + taskId.getJobID() > > 167 + Path.SEPARATOR + taskId > > 168 + Path.SEPARATOR + "work", > > 169 conf). toString()); > > > > I.e. it assumes there is supposed to be a taskID subdirectory under the job > > dir, but: > > $ pwd > > ...mapred/local/taskTracker/jobcache/job_200808071645_0001 > > $ ls > > jars job.xml work > > > > -- it's not there. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.