hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Joseph Evans (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-4820) MRApps distributed-cache duplicate checks are incorrect
Date Tue, 26 Mar 2013 19:49:15 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-4820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13614471#comment-13614471
] 

Robert Joseph Evans commented on MAPREDUCE-4820:
------------------------------------------------

Looking at the confs I see the following in launcher-job.conf.xml

{noformat}
<property><name>mapreduce.job.cache.files</name><value>hdfs://ip-10-113-15-16.ec2.internal:17020/user/root/oozie-oozi/0000003-130320172938946-oozie-oozi-W/mr-node--map-reduce/map-reduce-launcher.jar,hdfs://ip-10-113-15-16.ec2.internal:17020/user/root/examples/apps/map-reduce/lib/oozie-examples-3.3.1.jar</value><source>programatically</source><source>job.xml</source></property>
{noformat}

But there is no mapreduce.job.cache.files set for mr-job.conf.xml

Also there is no mapreduce.job.cache.archives set in either of these configs.

The missing cache.files seems more likely to be the cause of the issue.  The code in MRApps
does not manipulate the conf, it just translates it into a Map that can be sent to the RM.
 This seems to indicate that the issue is happening prior to getting the the MRApps code.
 Somewhere when the conf is being generated inside the launcher job, or possibly further up
in the MR client that is setting up the distributed cache items.

I'm just trying to help you not go chasing a white rabbit in MAPREDUCE-4549 and MAPREDUCE-4503.
                
> MRApps distributed-cache duplicate checks are incorrect
> -------------------------------------------------------
>
>                 Key: MAPREDUCE-4820
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4820
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mr-am
>    Affects Versions: 2.0.2-alpha
>            Reporter: Alejandro Abdelnur
>            Priority: Blocker
>             Fix For: 2.0.4-alpha
>
>         Attachments: launcher-job.conf.xml, launcher-job.logs.txt, mr-job.conf.xml, mr-job.logs.txt
>
>
> This seems a combination of issues that are being exposed in 2.0.2-alpha by MAPREDUCE-4549.
> MAPREDUCE-4549 introduces a check to to ensure there are not duplicate JARs in the distributed-cache
(using the JAR name as identity).
> In Hadoop 2 (different from Hadoop 1), all JARs in the distributed-cache are symlink-ed
to the current directory of the task.
> MRApps, when setting up the DistributedCache (MRApps#setupDistributedCache->parseDistributedCacheArtifacts)
assumes that the local resources (this includes files in the CURRENT_DIR/, CURRENT_DIR/classes/
and files in CURRENT_DIR/lib/) are part of the distributed-cache already.
> For systems, like Oozie, which use a launcher job to submit the real job this poses a
problem because MRApps is run from the launcher job to submit the real job. The configuration
of the real job has the correct distributed-cache entries (no duplicates), but because the
current dir has the same files, the submission fails.
> It seems that MRApps should not be checking dups in the distributed-cached against JARs
in the CURRENT_DIR/ or CURRENT_DIR/lib/. The dup check should be done among distributed-cached
entries only.
> It seems YARNRunner is symlink-ing all files in the distributed cached in the current
directory. In Hadoop 1 this was done only for files added to the distributed-cache using a
fragment (ie "#FOO") to trigger a symlink creation. 
> Marking as a blocker because without a fix for this, Oozie cannot submit jobs to Hadoop
2 (i've debugged Oozie in a live cluster being used by BigTop -thanks Roman- to test their
release work, and I've verified that Oozie 3.3 does not create duplicated entries in the distributed-cache)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message