hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <ha...@cloudera.com>
Subject Re: Job jar not removed from staging directory on job failure/how to share a job jar using distributed cache
Date Sat, 06 Oct 2012 16:11:18 GMT

Yes this is an unfortunate edge case. Though, this is fixed in the
trunk/2.x client rewrite and tracked as a test now by

On Fri, Oct 5, 2012 at 10:28 PM, Bertrand Dechoux <dechouxb@gmail.com> wrote:
> Hi,
> I am launching my job using the command line and I observed that when the
> provided input path do not match any files, the jar in the staging
> repository is not removed.
> It is removed on job termination (success or failure) but here the job isn't
> even really started so it may be an edge case.
> Has anyone seen the same behaviour? (I am using 1.0.3)
> Here is an extract of the stack trace with hadoop related classes.
>> org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path
>> does not exist: [removed]
>>         at
>> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:235)
>>         at
>> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:252)
>>         at
>> org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:902)
>>         at
>> org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:919)
>>         at
>> org.apache.hadoop.mapred.JobClient.access$500(JobClient.java:170)
>>         at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:838)
>>         at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:791)
>>         at java.security.AccessController.doPrivileged(Native Method)
>>         at javax.security.auth.Subject.doAs(Subject.java:396)
>>         at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
>>         at
>> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:791)
>>         at org.apache.hadoop.mapreduce.Job.submit(Job.java:465)
>>         at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:494)
> Second question is a bit related because one of its consequence would
> nullify the impact of the above 'bug'.
> Is it possible to set directly the main job jar as a jar already inside
> From what I know, the configuration points to a local jar archive which is
> uploaded each time to the staging repository.
> The same question was asked in the jira but without clear resolution.
> https://issues.apache.org/jira/browse/MAPREDUCE-236
> My question might be related to
> https://issues.apache.org/jira/browse/MAPREDUCE-4408
> which is resolved for next version. But it seems to be only about uberjar
> and I am using a standard jar.
> If it works with a hdfs location, what are the details? Won't it be cleaned
> during job termination? Why not? Will it also be setup within the
> distributed cache?
> Regards
> Bertrand
> PS : I know there are others solutions to my problem. I will look at Oozie.
> And worst case, I can create a FileSystem instance myself to check whether
> the job should be really launched or not. Both could work but both seem
> overkill in my context.

Harsh J

View raw message