Return-Path: X-Original-To: apmail-hadoop-mapreduce-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 5A87210AA6 for ; Sun, 1 Feb 2015 19:25:34 +0000 (UTC) Received: (qmail 47153 invoked by uid 500); 1 Feb 2015 19:25:35 -0000 Delivered-To: apmail-hadoop-mapreduce-issues-archive@hadoop.apache.org Received: (qmail 47086 invoked by uid 500); 1 Feb 2015 19:25:34 -0000 Mailing-List: contact mapreduce-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mapreduce-issues@hadoop.apache.org Delivered-To: mailing list mapreduce-issues@hadoop.apache.org Received: (qmail 47074 invoked by uid 99); 1 Feb 2015 19:25:34 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 01 Feb 2015 19:25:34 +0000 Date: Sun, 1 Feb 2015 19:25:34 +0000 (UTC) From: "zhihai xu (JIRA)" To: mapreduce-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (MAPREDUCE-6238) MR2 can't run local jobs with -libjars command options which is a regression from MR1 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/MAPREDUCE-6238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated MAPREDUCE-6238: --------------------------------- Attachment: MAPREDUCE-6238.000.patch > MR2 can't run local jobs with -libjars command options which is a regression from MR1 > ------------------------------------------------------------------------------------- > > Key: MAPREDUCE-6238 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6238 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2 > Reporter: zhihai xu > Assignee: zhihai xu > Priority: Critical > Attachments: MAPREDUCE-6238.000.patch > > > MR2 can't run local jobs with -libjars command options which is a regression from MR1. > When run MR2 job with -jt local and -libjars, the job fails with java.io.FileNotFoundException: File does not exist: hdfs://XXXXXXXXXXXXXXX.jar. > But the same command is working in MR1. > I find the problem is because when MR2 run local job using LocalJobRunner > from JobSubmitter, the JobSubmitter#jtFs is local filesystem, > So copyRemoteFiles will return from [the middle of the function|https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/JobSubmitter.java#L138] > because source and destination file system are same. > {code} > if (compareFs(remoteFs, jtFs)) { > return originalPath; > } > {code} > The following code at [JobSubmitter.java|https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/JobSubmitter.java#L219] > try to add the destination file to DistributedCache which introduce a bug for local job. > {code} > Path newPath = copyRemoteFiles(libjarsDir, tmp, conf, replication); > DistributedCache.addFileToClassPath( > new Path(newPath.toUri().getPath()), conf); > {code} > Because new Path(newPath.toUri().getPath()) will lose the filesystem information from newPath, the file added to DistributedCache will use the default Uri filesystem hdfs based on the following code. This causes the > FileNotFoundException when we access the file later at > [determineTimestampsAndCacheVisibilities|https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/JobSubmitter.java#L270] > {code} > public static void addFileToClassPath(Path file, Configuration conf) > throws IOException { > addFileToClassPath(file, conf, file.getFileSystem(conf)); > } > public static void addFileToClassPath > (Path file, Configuration conf, FileSystem fs) > throws IOException { > String classpath = conf.get(MRJobConfig.CLASSPATH_FILES); > conf.set(MRJobConfig.CLASSPATH_FILES, classpath == null ? file.toString() > : classpath + "," + file.toString()); > URI uri = fs.makeQualified(file).toUri(); > addCacheFile(uri, conf); > } > {code} > Compare to the following [MR1 code|https://github.com/apache/hadoop/blob/branch-1/src/mapred/org/apache/hadoop/mapred/JobClient.java#L811]: > {code} > Path newPath = copyRemoteFiles(fs, libjarsDir, tmp, job, replication); > DistributedCache.addFileToClassPath( > new Path(newPath.toUri().getPath()), job, fs); > {code} > You will see why MR1 doesn't have this issue. > because it passes the local filesystem into DistributedCache#addFileToClassPath instead of using the default Uri filesystem hdfs. > We should do the same in MR2. -- This message was sent by Atlassian JIRA (v6.3.4#6332)