Return-Path: X-Original-To: apmail-hadoop-mapreduce-dev-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 3D451E5E3 for ; Mon, 26 Nov 2012 20:03:00 +0000 (UTC) Received: (qmail 10339 invoked by uid 500); 26 Nov 2012 20:02:59 -0000 Delivered-To: apmail-hadoop-mapreduce-dev-archive@hadoop.apache.org Received: (qmail 10224 invoked by uid 500); 26 Nov 2012 20:02:59 -0000 Mailing-List: contact mapreduce-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mapreduce-dev@hadoop.apache.org Delivered-To: mailing list mapreduce-dev@hadoop.apache.org Received: (qmail 10125 invoked by uid 99); 26 Nov 2012 20:02:59 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 26 Nov 2012 20:02:58 +0000 Date: Mon, 26 Nov 2012 20:02:58 +0000 (UTC) From: "Alejandro Abdelnur (JIRA)" To: mapreduce-dev@hadoop.apache.org Message-ID: <759619677.24370.1353960178991.JavaMail.jiratomcat@arcas> Subject: [jira] [Created] (MAPREDUCE-4820) MRApps distributed-cache duplicate checks are incorrect MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 Alejandro Abdelnur created MAPREDUCE-4820: --------------------------------------------- Summary: MRApps distributed-cache duplicate checks are incorrect Key: MAPREDUCE-4820 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4820 Project: Hadoop Map/Reduce Issue Type: Bug Components: mr-am Affects Versions: 2.0.2-alpha Reporter: Alejandro Abdelnur Priority: Blocker Fix For: 2.0.3-alpha This seems a combination of issues that are being exposed in 2.0.2-alpha by MAPREDUCE-4549. MAPREDUCE-4549 introduces a check to to ensure there are not duplicate JARs in the distributed-cache (using the JAR name as identity). In Hadoop 2 (different from Hadoop 1), all JARs in the distributed-cache are symlink-ed to the current directory of the task. MRApps, when setting up the DistributedCache (MRApps#setupDistributedCache->parseDistributedCacheArtifacts) assumes that the local resources (this includes files in the CURRENT_DIR/, CURRENT_DIR/classes/ and files in CURRENT_DIR/lib/) are part of the distributed-cache already. For systems, like Oozie, which use a launcher job to submit the real job this poses a problem because MRApps is run from the launcher job to submit the real job. The configuration of the real job has the correct distributed-cache entries (no duplicates), but because the current dir has the same files, the submission fails. It seems that MRApps should not be checking dups in the distributed-cached against JARs in the CURRENT_DIR/ or CURRENT_DIR/lib/. The dup check should be done among distributed-cached entries only. It seems YARNRunner is symlink-ing all files in the distributed cached in the current directory. In Hadoop 1 this was done only for files added to the distributed-cache using a fragment (ie "#FOO") to trigger a symlink creation. Marking as a blocker because without a fix for this, Oozie cannot submit jobs to Hadoop 2 (i've debugged Oozie in a live cluster being used by BigTop -thanks Roman- to test their release work, and I've verified that Oozie 3.3 does not create duplicated entries in the distributed-cache) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira