hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Trezzo (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-6846) Fragments specified for libjar paths are not handled correctly
Date Thu, 09 Feb 2017 23:25:42 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-6846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15860382#comment-15860382
] 

Chris Trezzo commented on MAPREDUCE-6846:
-----------------------------------------

bq. I was under the impression that if the wildcard mapped to only one file then we would
not convey this as a wildcard through to the staging directory but instead remap it to the
one entry that it globbed to (i.e.: as if the user had specified the one path directly rather
than a glob to that one path).

True, once it is in the staging dir it will not look like a wildcard. That being said, there
is a second part to the feature. I will attempt to explain my current understanding:

See {{JobResourceUploader#uploadLibJars}}:
{code:java}
  private void uploadLibJars(Configuration conf, Collection<String> libjars,
      Path submitJobDir, FsPermission mapredSysPerms, short submitReplication)
      throws IOException {
    Path libjarsDir = JobSubmissionFiles.getJobDistCacheLibjars(submitJobDir);
    if (!libjars.isEmpty()) {
      FileSystem.mkdirs(jtFs, libjarsDir, mapredSysPerms);
      for (String tmpjars : libjars) {
        Path tmp = new Path(tmpjars);
        Path newPath =
            copyRemoteFiles(libjarsDir, tmp, conf, submitReplication);

        // Add each file to the classpath
        DistributedCache.addFileToClassPath(
            new Path(newPath.toUri().getPath()), conf, jtFs, !useWildcard);
      }

      if (useWildcard) {
        // Add the whole directory to the cache
        Path libJarsDirWildcard =
            jtFs.makeQualified(new Path(libjarsDir, DistributedCache.WILDCARD));

        DistributedCache.addCacheFile(libJarsDirWildcard.toUri(), conf);
      }
    }
  }
{code}
{{useWildcard}} is set by the {{mapreduce.client.libjars.wildcard}} config parameter. If this
is set to true, then we add the files individually to the classpath (i.e. {{mapreduce.job.classpath.files}}),
but then we glob them all together when adding them to the distributed cache (i.e. {{mapreduce.job.cache.files}}).
At that point, we would loose the fragment name because the LocalResource objects submitted
to YARN are created based off of those paths.

> Fragments specified for libjar paths are not handled correctly
> --------------------------------------------------------------
>
>                 Key: MAPREDUCE-6846
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6846
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 2.7.3, 3.0.0-alpha2
>            Reporter: Chris Trezzo
>            Assignee: Chris Trezzo
>            Priority: Minor
>
> If a user specifies a fragment for a libjars path via generic options parser, the client
crashes with a FileNotFoundException:
> {noformat}
> java.io.FileNotFoundException: File file:/home/mapred/test.txt#testFrag.txt does not
exist
> 	at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:638)
> 	at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:864)
> 	at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:628)
> 	at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:442)
> 	at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:363)
> 	at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:314)
> 	at org.apache.hadoop.mapreduce.JobResourceUploader.copyRemoteFiles(JobResourceUploader.java:387)
> 	at org.apache.hadoop.mapreduce.JobResourceUploader.uploadLibJars(JobResourceUploader.java:154)
> 	at org.apache.hadoop.mapreduce.JobResourceUploader.uploadResources(JobResourceUploader.java:105)
> 	at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:102)
> 	at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:197)
> 	at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1344)
> 	at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1341)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:422)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1892)
> 	at org.apache.hadoop.mapreduce.Job.submit(Job.java:1341)
> 	at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1362)
> 	at org.apache.hadoop.examples.QuasiMonteCarlo.estimatePi(QuasiMonteCarlo.java:306)
> 	at org.apache.hadoop.examples.QuasiMonteCarlo.run(QuasiMonteCarlo.java:359)
> 	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
> 	at org.apache.hadoop.examples.QuasiMonteCarlo.main(QuasiMonteCarlo.java:367)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 	at java.lang.reflect.Method.invoke(Method.java:498)
> 	at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71)
> 	at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)
> 	at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:74)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 	at java.lang.reflect.Method.invoke(Method.java:498)
> 	at org.apache.hadoop.util.RunJar.run(RunJar.java:239)
> 	at org.apache.hadoop.util.RunJar.main(RunJar.java:153)
> {noformat}
> This is actually inconsistent with the behavior for files and archives. Here is a table
showing the current behavior for each type of path and resource:
> | || Qualified path (i.e. file://home/mapred/test.txt#frag.txt) || Absolute path (i.e.
/home/mapred/test.txt#frag.txt) || Relative path (i.e. test.txt#frag.txt) ||
> || -libjars | FileNotFound | FileNotFound|FileNotFound|
> || -files | (/) | (/) | (/) |
> || -archives | (/) | (/) | (/) |



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-help@hadoop.apache.org


Mime
View raw message