Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 7424FDCB1 for ; Sat, 6 Oct 2012 16:12:19 +0000 (UTC) Received: (qmail 10635 invoked by uid 500); 6 Oct 2012 16:12:14 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 10481 invoked by uid 500); 6 Oct 2012 16:12:14 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 10464 invoked by uid 99); 6 Oct 2012 16:12:13 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 06 Oct 2012 16:12:13 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of harsh@cloudera.com designates 209.85.214.176 as permitted sender) Received: from [209.85.214.176] (HELO mail-ob0-f176.google.com) (209.85.214.176) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 06 Oct 2012 16:11:59 +0000 Received: by mail-ob0-f176.google.com with SMTP id x4so3318070obh.35 for ; Sat, 06 Oct 2012 09:11:38 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type:x-gm-message-state; bh=rDrSPawlUB8yVaNlwEnP11MiUESfbqoka+PJyWnT+F4=; b=gQmBPIGNDTENnskX36CvH/grZEwER+hl3Sm4bEG/dFIB6S2l0E/GyoGpIx1VftcEpB HUkHPdGczbaxMj9Vxrfgu56TrId1mMjXJNWXShJmyyRQX68ir8DpuQ6CIcTEQvc60K5r OLL+X+xfBPRO6JKaezpSUz8dizmxOiPUYAq00k0G6Wi/QtuUItF656Ays+XG2qqQZIYB saPzPUmF1Yh9HCMLzjT9bfl5JmZO2gQfuDVgkZVLn3A/+/pcAOnYUNwZcBS0ox7WKzQs 8mE1FGPH8JCRl4AIkGdG1AZrdzEgX6uc60JFFbqZKyVhnzCKhTSr4AVvlUGKeAi9jaMP 3q/w== Received: by 10.60.27.101 with SMTP id s5mr9306435oeg.138.1349539898353; Sat, 06 Oct 2012 09:11:38 -0700 (PDT) MIME-Version: 1.0 Received: by 10.76.76.227 with HTTP; Sat, 6 Oct 2012 09:11:18 -0700 (PDT) In-Reply-To: References: From: Harsh J Date: Sat, 6 Oct 2012 21:41:18 +0530 Message-ID: Subject: Re: Job jar not removed from staging directory on job failure/how to share a job jar using distributed cache To: user@hadoop.apache.org Content-Type: text/plain; charset=ISO-8859-1 X-Gm-Message-State: ALoCoQlrY80UU4Q5Jm6f6GyQqFAYaSG6NhcrNSpvPt+zbzPf7P8ZoQE15vkpMRrqnMrltBUHUDmx X-Virus-Checked: Checked by ClamAV on apache.org Bertrand, Yes this is an unfortunate edge case. Though, this is fixed in the trunk/2.x client rewrite and tracked as a test now by https://issues.apache.org/jira/browse/MAPREDUCE-2384. On Fri, Oct 5, 2012 at 10:28 PM, Bertrand Dechoux wrote: > Hi, > > I am launching my job using the command line and I observed that when the > provided input path do not match any files, the jar in the staging > repository is not removed. > It is removed on job termination (success or failure) but here the job isn't > even really started so it may be an edge case. > Has anyone seen the same behaviour? (I am using 1.0.3) > > Here is an extract of the stack trace with hadoop related classes. > >> org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path >> does not exist: [removed] >> at >> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:235) >> at >> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:252) >> at >> org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:902) >> at >> org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:919) >> at >> org.apache.hadoop.mapred.JobClient.access$500(JobClient.java:170) >> at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:838) >> at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:791) >> at java.security.AccessController.doPrivileged(Native Method) >> at javax.security.auth.Subject.doAs(Subject.java:396) >> at >> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059) >> at >> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:791) >> at org.apache.hadoop.mapreduce.Job.submit(Job.java:465) >> at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:494) > > > Second question is a bit related because one of its consequence would > nullify the impact of the above 'bug'. > Is it possible to set directly the main job jar as a jar already inside > HDFS? > From what I know, the configuration points to a local jar archive which is > uploaded each time to the staging repository. > > The same question was asked in the jira but without clear resolution. > https://issues.apache.org/jira/browse/MAPREDUCE-236 > > My question might be related to > https://issues.apache.org/jira/browse/MAPREDUCE-4408 > which is resolved for next version. But it seems to be only about uberjar > and I am using a standard jar. > If it works with a hdfs location, what are the details? Won't it be cleaned > during job termination? Why not? Will it also be setup within the > distributed cache? > > Regards > > Bertrand > > PS : I know there are others solutions to my problem. I will look at Oozie. > And worst case, I can create a FileSystem instance myself to check whether > the job should be really launched or not. Both could work but both seem > overkill in my context. -- Harsh J