[ https://issues.apache.org/jira/browse/MAPREDUCE-6846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15860382#comment-15860382
]
Chris Trezzo commented on MAPREDUCE-6846:
-----------------------------------------
bq. I was under the impression that if the wildcard mapped to only one file then we would
not convey this as a wildcard through to the staging directory but instead remap it to the
one entry that it globbed to (i.e.: as if the user had specified the one path directly rather
than a glob to that one path).
True, once it is in the staging dir it will not look like a wildcard. That being said, there
is a second part to the feature. I will attempt to explain my current understanding:
See {{JobResourceUploader#uploadLibJars}}:
{code:java}
private void uploadLibJars(Configuration conf, Collection<String> libjars,
Path submitJobDir, FsPermission mapredSysPerms, short submitReplication)
throws IOException {
Path libjarsDir = JobSubmissionFiles.getJobDistCacheLibjars(submitJobDir);
if (!libjars.isEmpty()) {
FileSystem.mkdirs(jtFs, libjarsDir, mapredSysPerms);
for (String tmpjars : libjars) {
Path tmp = new Path(tmpjars);
Path newPath =
copyRemoteFiles(libjarsDir, tmp, conf, submitReplication);
// Add each file to the classpath
DistributedCache.addFileToClassPath(
new Path(newPath.toUri().getPath()), conf, jtFs, !useWildcard);
}
if (useWildcard) {
// Add the whole directory to the cache
Path libJarsDirWildcard =
jtFs.makeQualified(new Path(libjarsDir, DistributedCache.WILDCARD));
DistributedCache.addCacheFile(libJarsDirWildcard.toUri(), conf);
}
}
}
{code}
{{useWildcard}} is set by the {{mapreduce.client.libjars.wildcard}} config parameter. If this
is set to true, then we add the files individually to the classpath (i.e. {{mapreduce.job.classpath.files}}),
but then we glob them all together when adding them to the distributed cache (i.e. {{mapreduce.job.cache.files}}).
At that point, we would loose the fragment name because the LocalResource objects submitted
to YARN are created based off of those paths.
> Fragments specified for libjar paths are not handled correctly
> --------------------------------------------------------------
>
> Key: MAPREDUCE-6846
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6846
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Affects Versions: 2.7.3, 3.0.0-alpha2
> Reporter: Chris Trezzo
> Assignee: Chris Trezzo
> Priority: Minor
>
> If a user specifies a fragment for a libjars path via generic options parser, the client
crashes with a FileNotFoundException:
> {noformat}
> java.io.FileNotFoundException: File file:/home/mapred/test.txt#testFrag.txt does not
exist
> at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:638)
> at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:864)
> at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:628)
> at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:442)
> at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:363)
> at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:314)
> at org.apache.hadoop.mapreduce.JobResourceUploader.copyRemoteFiles(JobResourceUploader.java:387)
> at org.apache.hadoop.mapreduce.JobResourceUploader.uploadLibJars(JobResourceUploader.java:154)
> at org.apache.hadoop.mapreduce.JobResourceUploader.uploadResources(JobResourceUploader.java:105)
> at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:102)
> at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:197)
> at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1344)
> at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1341)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1892)
> at org.apache.hadoop.mapreduce.Job.submit(Job.java:1341)
> at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1362)
> at org.apache.hadoop.examples.QuasiMonteCarlo.estimatePi(QuasiMonteCarlo.java:306)
> at org.apache.hadoop.examples.QuasiMonteCarlo.run(QuasiMonteCarlo.java:359)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
> at org.apache.hadoop.examples.QuasiMonteCarlo.main(QuasiMonteCarlo.java:367)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71)
> at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)
> at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:74)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at org.apache.hadoop.util.RunJar.run(RunJar.java:239)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:153)
> {noformat}
> This is actually inconsistent with the behavior for files and archives. Here is a table
showing the current behavior for each type of path and resource:
> | || Qualified path (i.e. file://home/mapred/test.txt#frag.txt) || Absolute path (i.e.
/home/mapred/test.txt#frag.txt) || Relative path (i.e. test.txt#frag.txt) ||
> || -libjars | FileNotFound | FileNotFound|FileNotFound|
> || -files | (/) | (/) | (/) |
> || -archives | (/) | (/) | (/) |
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-help@hadoop.apache.org
|