From mapreduce-commits-return-498-apmail-hadoop-mapreduce-commits-archive=hadoop.apache.org@hadoop.apache.org Fri Nov 27 12:37:56 2009 Return-Path: Delivered-To: apmail-hadoop-mapreduce-commits-archive@minotaur.apache.org Received: (qmail 35866 invoked from network); 27 Nov 2009 12:37:56 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 27 Nov 2009 12:37:56 -0000 Received: (qmail 79413 invoked by uid 500); 27 Nov 2009 10:51:15 -0000 Delivered-To: apmail-hadoop-mapreduce-commits-archive@hadoop.apache.org Received: (qmail 79350 invoked by uid 500); 27 Nov 2009 10:51:15 -0000 Mailing-List: contact mapreduce-commits-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mapreduce-dev@hadoop.apache.org Delivered-To: mailing list mapreduce-commits@hadoop.apache.org Received: (qmail 79340 invoked by uid 99); 27 Nov 2009 10:51:14 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 27 Nov 2009 10:51:14 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.4] (HELO eris.apache.org) (140.211.11.4) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 27 Nov 2009 10:51:07 +0000 Received: by eris.apache.org (Postfix, from userid 65534) id 8BF7F23888C5; Fri, 27 Nov 2009 10:50:45 +0000 (UTC) Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Subject: svn commit: r884832 - in /hadoop/mapreduce/trunk: ./ src/docs/src/documentation/content/xdocs/ src/java/org/apache/hadoop/mapreduce/ src/java/org/apache/hadoop/mapreduce/filecache/ src/test/mapred/org/apache/hadoop/mapred/ src/test/mapred/testshell/ Date: Fri, 27 Nov 2009 10:50:45 -0000 To: mapreduce-commits@hadoop.apache.org From: sharad@apache.org X-Mailer: svnmailer-1.0.8 Message-Id: <20091127105045.8BF7F23888C5@eris.apache.org> X-Virus-Checked: Checked by ClamAV on apache.org Author: sharad Date: Fri Nov 27 10:50:44 2009 New Revision: 884832 URL: http://svn.apache.org/viewvc?rev=884832&view=rev Log: MAPREDUCE-787. Fix JobSubmitter to honor user given symlink path. Contributed by Amareshwari Sriramadasu. Modified: hadoop/mapreduce/trunk/CHANGES.txt hadoop/mapreduce/trunk/src/docs/src/documentation/content/xdocs/mapred_tutorial.xml hadoop/mapreduce/trunk/src/docs/src/documentation/content/xdocs/streaming.xml hadoop/mapreduce/trunk/src/java/org/apache/hadoop/mapreduce/JobSubmitter.java hadoop/mapreduce/trunk/src/java/org/apache/hadoop/mapreduce/filecache/TaskDistributedCacheManager.java hadoop/mapreduce/trunk/src/test/mapred/org/apache/hadoop/mapred/TestCommandLineJobSubmission.java hadoop/mapreduce/trunk/src/test/mapred/testshell/ExternalMapReduce.java Modified: hadoop/mapreduce/trunk/CHANGES.txt URL: http://svn.apache.org/viewvc/hadoop/mapreduce/trunk/CHANGES.txt?rev=884832&r1=884831&r2=884832&view=diff ============================================================================== --- hadoop/mapreduce/trunk/CHANGES.txt (original) +++ hadoop/mapreduce/trunk/CHANGES.txt Fri Nov 27 10:50:44 2009 @@ -921,3 +921,6 @@ MAPREDUCE-1239. Fix contrib components build dependencies. (Giridharan Kesavan and omalley) + MAPREDUCE-787. Fix JobSubmitter to honor user given symlink path. + (Amareshwari Sriramadasu via sharad) + Modified: hadoop/mapreduce/trunk/src/docs/src/documentation/content/xdocs/mapred_tutorial.xml URL: http://svn.apache.org/viewvc/hadoop/mapreduce/trunk/src/docs/src/documentation/content/xdocs/mapred_tutorial.xml?rev=884832&r1=884831&r2=884832&view=diff ============================================================================== --- hadoop/mapreduce/trunk/src/docs/src/documentation/content/xdocs/mapred_tutorial.xml (original) +++ hadoop/mapreduce/trunk/src/docs/src/documentation/content/xdocs/mapred_tutorial.xml Fri Nov 27 10:50:44 2009 @@ -607,17 +607,35 @@ would be present in the current working directory of the task using the option -files. The -libjars option allows applications to add jars to the classpaths of the maps - and reduces. The -archives allows them to pass archives - as arguments that are unzipped/unjarred and a link with name of the - jar/zip are created in the current working directory of tasks. More + and reduces. The option -archives allows them to pass + comma separated list of archives as arguments. These archives are + unarchived and a link with name of the archive is created in + the current working directory of tasks. More details about the command line options are available at Hadoop Commands Guide.

Running wordcount example with - -libjars and -files:
+ -libjars, -files and -archives: +
hadoop jar hadoop-examples.jar wordcount -files cachefile.txt - -libjars mylib.jar input output -

+ -libjars mylib.jar -archives myarchive.zip input output + Here, myarchive.zip will be placed and unzipped into a directory + by the name "myarchive.zip" +

+ +

Users can specify a different symbolic name for + files and archives passed through -files and -archives option, using #. +

+ +

For example, + hadoop jar hadoop-examples.jar wordcount + -files dir1/dict.txt#dict1,dir2/dict.txt#dict2 + -archives mytar.tgz#tgzdir input output + Here, the files dir1/dict.txt and dir2/dict.txt can be accessed by + tasks using the symbolic names dict1 and dict2 respectively. + And the archive mytar.tgz will be placed and unarchived into a + directory by the name tgzdir +

Modified: hadoop/mapreduce/trunk/src/docs/src/documentation/content/xdocs/streaming.xml URL: http://svn.apache.org/viewvc/hadoop/mapreduce/trunk/src/docs/src/documentation/content/xdocs/streaming.xml?rev=884832&r1=884831&r2=884832&view=diff ============================================================================== --- hadoop/mapreduce/trunk/src/docs/src/documentation/content/xdocs/streaming.xml (original) +++ hadoop/mapreduce/trunk/src/docs/src/documentation/content/xdocs/streaming.xml Fri Nov 27 10:50:44 2009 @@ -322,6 +322,10 @@ -files hdfs://host:fs_port/user/testfile.txt +

User can specify a different symlink name for -files using #.

+ +-files hdfs://host:fs_port/user/testfile.txt#testfile +

Multiple entries can be specified like this:

@@ -342,6 +346,10 @@ -archives hdfs://host:fs_port/user/testfile.jar +

User can specify a different symlink name for -archives using #.

+ +-archives hdfs://host:fs_port/user/testfile.tgz#tgzdir +

In this example, the input.txt file has two lines specifying the names of the two files: cachedir.jar/cache.txt and cachedir.jar/cache2.txt. Modified: hadoop/mapreduce/trunk/src/java/org/apache/hadoop/mapreduce/JobSubmitter.java URL: http://svn.apache.org/viewvc/hadoop/mapreduce/trunk/src/java/org/apache/hadoop/mapreduce/JobSubmitter.java?rev=884832&r1=884831&r2=884832&view=diff ============================================================================== --- hadoop/mapreduce/trunk/src/java/org/apache/hadoop/mapreduce/JobSubmitter.java (original) +++ hadoop/mapreduce/trunk/src/java/org/apache/hadoop/mapreduce/JobSubmitter.java Fri Nov 27 10:50:44 2009 @@ -160,15 +160,20 @@ FileSystem.mkdirs(jtFs, filesDir, mapredSysPerms); String[] fileArr = files.split(","); for (String tmpFile: fileArr) { - Path tmp = new Path(tmpFile); + URI tmpURI = null; + try { + tmpURI = new URI(tmpFile); + } catch (URISyntaxException e) { + throw new IllegalArgumentException(e); + } + Path tmp = new Path(tmpURI); Path newPath = copyRemoteFiles(filesDir, tmp, conf, replication); try { - URI pathURI = new URI(newPath.toUri().toString() + "#" - + newPath.getName()); + URI pathURI = getPathURI(newPath, tmpURI.getFragment()); DistributedCache.addCacheFile(pathURI, conf); } catch(URISyntaxException ue) { //should not throw a uri exception - throw new IOException("Failed to create uri for " + tmpFile); + throw new IOException("Failed to create uri for " + tmpFile, ue); } DistributedCache.createSymlink(conf); } @@ -188,16 +193,21 @@ FileSystem.mkdirs(jtFs, archivesDir, mapredSysPerms); String[] archivesArr = archives.split(","); for (String tmpArchives: archivesArr) { - Path tmp = new Path(tmpArchives); + URI tmpURI; + try { + tmpURI = new URI(tmpArchives); + } catch (URISyntaxException e) { + throw new IllegalArgumentException(e); + } + Path tmp = new Path(tmpURI); Path newPath = copyRemoteFiles(archivesDir, tmp, conf, replication); try { - URI pathURI = new URI(newPath.toUri().toString() + "#" - + newPath.getName()); + URI pathURI = getPathURI(newPath, tmpURI.getFragment()); DistributedCache.addCacheArchive(pathURI, conf); } catch(URISyntaxException ue) { //should not throw an uri excpetion - throw new IOException("Failed to create uri for " + tmpArchives); + throw new IOException("Failed to create uri for " + tmpArchives, ue); } DistributedCache.createSymlink(conf); } @@ -207,6 +217,19 @@ TrackerDistributedCacheManager.determineTimestamps(conf); } + private URI getPathURI(Path destPath, String fragment) + throws URISyntaxException { + URI pathURI = destPath.toUri(); + if (pathURI.getFragment() == null) { + if (fragment == null) { + pathURI = new URI(pathURI.toString() + "#" + destPath.getName()); + } else { + pathURI = new URI(pathURI.toString() + "#" + fragment); + } + } + return pathURI; + } + private void copyJar(Path originalJarPath, Path submitJarFile, short replication) throws IOException { jtFs.copyFromLocalFile(originalJarPath, submitJarFile); Modified: hadoop/mapreduce/trunk/src/java/org/apache/hadoop/mapreduce/filecache/TaskDistributedCacheManager.java URL: http://svn.apache.org/viewvc/hadoop/mapreduce/trunk/src/java/org/apache/hadoop/mapreduce/filecache/TaskDistributedCacheManager.java?rev=884832&r1=884831&r2=884832&view=diff ============================================================================== --- hadoop/mapreduce/trunk/src/java/org/apache/hadoop/mapreduce/filecache/TaskDistributedCacheManager.java (original) +++ hadoop/mapreduce/trunk/src/java/org/apache/hadoop/mapreduce/filecache/TaskDistributedCacheManager.java Fri Nov 27 10:50:44 2009 @@ -96,7 +96,7 @@ Map classPaths = new HashMap(); if (paths != null) { for (Path p : paths) { - classPaths.put(p.toString(), p); + classPaths.put(p.toUri().getPath().toString(), p); } } for (int i = 0; i < uris.length; ++i) { Modified: hadoop/mapreduce/trunk/src/test/mapred/org/apache/hadoop/mapred/TestCommandLineJobSubmission.java URL: http://svn.apache.org/viewvc/hadoop/mapreduce/trunk/src/test/mapred/org/apache/hadoop/mapred/TestCommandLineJobSubmission.java?rev=884832&r1=884831&r2=884832&view=diff ============================================================================== --- hadoop/mapreduce/trunk/src/test/mapred/org/apache/hadoop/mapred/TestCommandLineJobSubmission.java (original) +++ hadoop/mapreduce/trunk/src/test/mapred/org/apache/hadoop/mapred/TestCommandLineJobSubmission.java Fri Nov 27 10:50:44 2009 @@ -19,11 +19,13 @@ import java.io.File; import java.io.FileOutputStream; +import java.io.IOException; import junit.framework.TestCase; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.*; +import org.apache.hadoop.util.StringUtils; import org.apache.hadoop.util.ToolRunner; import org.apache.hadoop.hdfs.MiniDFSCluster; @@ -60,17 +62,59 @@ FileOutputStream fstream = new FileOutputStream(f); fstream.write("somestrings".getBytes()); fstream.close(); - String[] args = new String[8]; + File f1 = new File(thisbuildDir, "files_tmp1"); + fstream = new FileOutputStream(f1); + fstream.write("somestrings".getBytes()); + fstream.close(); + + // copy files to dfs + Path cachePath = new Path("/cacheDir"); + if (!fs.mkdirs(cachePath)) { + throw new IOException( + "Mkdirs failed to create " + cachePath.toString()); + } + Path localCachePath = new Path(System.getProperty("test.cache.data")); + Path txtPath = new Path(localCachePath, new Path("test.txt")); + Path jarPath = new Path(localCachePath, new Path("test.jar")); + Path zipPath = new Path(localCachePath, new Path("test.zip")); + Path tarPath = new Path(localCachePath, new Path("test.tar")); + Path tgzPath = new Path(localCachePath, new Path("test.tgz")); + fs.copyFromLocalFile(txtPath, cachePath); + fs.copyFromLocalFile(jarPath, cachePath); + fs.copyFromLocalFile(zipPath, cachePath); + + // construct options for -files + String[] files = new String[3]; + files[0] = f.toString(); + files[1] = f1.toString() + "#localfilelink"; + files[2] = + fs.getUri().resolve(cachePath + "/test.txt#dfsfilelink").toString(); + + // construct options for -libjars + String[] libjars = new String[2]; + libjars[0] = "build/test/mapred/testjar/testjob.jar"; + libjars[1] = fs.getUri().resolve(cachePath + "/test.jar").toString(); + + // construct options for archives + String[] archives = new String[3]; + archives[0] = tgzPath.toString(); + archives[1] = tarPath + "#tarlink"; + archives[2] = + fs.getUri().resolve(cachePath + "/test.zip#ziplink").toString(); + + String[] args = new String[10]; args[0] = "-files"; - args[1] = f.toString(); + args[1] = StringUtils.arrayToString(files); args[2] = "-libjars"; // the testjob.jar as a temporary jar file // rather than creating its own - args[3] = "build/test/mapred/testjar/testjob.jar"; - args[4] = "-D"; - args[5] = "mapred.output.committer.class=testjar.CustomOutputCommitter"; - args[6] = input.toString(); - args[7] = output.toString(); + args[3] = StringUtils.arrayToString(libjars); + args[4] = "-archives"; + args[5] = StringUtils.arrayToString(archives); + args[6] = "-D"; + args[7] = "mapred.output.committer.class=testjar.CustomOutputCommitter"; + args[8] = input.toString(); + args[9] = output.toString(); JobConf jobConf = mr.createJobConf(); //before running the job, verify that libjar is not in client classpath Modified: hadoop/mapreduce/trunk/src/test/mapred/testshell/ExternalMapReduce.java URL: http://svn.apache.org/viewvc/hadoop/mapreduce/trunk/src/test/mapred/testshell/ExternalMapReduce.java?rev=884832&r1=884831&r2=884832&view=diff ============================================================================== --- hadoop/mapreduce/trunk/src/test/mapred/testshell/ExternalMapReduce.java (original) +++ hadoop/mapreduce/trunk/src/test/mapred/testshell/ExternalMapReduce.java Fri Nov 27 10:50:44 2009 @@ -66,12 +66,21 @@ if (classpath.indexOf("testjob.jar") == -1) { throw new IOException("failed to find in the library " + classpath); } + if (classpath.indexOf("test.jar") == -1) { + throw new IOException("failed to find the library test.jar in" + + classpath); + } //fork off ls to see if the file exists. // java file.exists() will not work on // cygwin since it is a symlink - String[] argv = new String[2]; + String[] argv = new String[7]; argv[0] = "ls"; argv[1] = "files_tmp"; + argv[2] = "localfilelink"; + argv[3] = "dfsfilelink"; + argv[4] = "tarlink"; + argv[5] = "ziplink"; + argv[6] = "test.tgz"; Process p = Runtime.getRuntime().exec(argv); int ret = -1; try {