Return-Path: Delivered-To: apmail-hadoop-common-commits-archive@www.apache.org Received: (qmail 22351 invoked from network); 3 Aug 2009 12:37:34 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 3 Aug 2009 12:37:34 -0000 Received: (qmail 34337 invoked by uid 500); 3 Aug 2009 12:37:39 -0000 Delivered-To: apmail-hadoop-common-commits-archive@hadoop.apache.org Received: (qmail 34274 invoked by uid 500); 3 Aug 2009 12:37:38 -0000 Mailing-List: contact common-commits-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-dev@hadoop.apache.org Delivered-To: mailing list common-commits@hadoop.apache.org Received: (qmail 34265 invoked by uid 99); 3 Aug 2009 12:37:38 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 03 Aug 2009 12:37:38 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.4] (HELO eris.apache.org) (140.211.11.4) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 03 Aug 2009 12:37:28 +0000 Received: by eris.apache.org (Postfix, from userid 65534) id 457F32388895; Mon, 3 Aug 2009 12:37:08 +0000 (UTC) Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Subject: svn commit: r800334 [2/2] - /hadoop/common/branches/branch-0.19/docs/jdiff/hadoop_0.19.2.xml Date: Mon, 03 Aug 2009 12:37:07 -0000 To: common-commits@hadoop.apache.org From: tomwhite@apache.org X-Mailer: svnmailer-1.0.8 Message-Id: <20090803123708.457F32388895@eris.apache.org> X-Virus-Checked: Checked by ClamAV on apache.org Added: hadoop/common/branches/branch-0.19/docs/jdiff/hadoop_0.19.2.xml URL: http://svn.apache.org/viewvc/hadoop/common/branches/branch-0.19/docs/jdiff/hadoop_0.19.2.xml?rev=800334&view=auto ============================================================================== --- hadoop/common/branches/branch-0.19/docs/jdiff/hadoop_0.19.2.xml (added) +++ hadoop/common/branches/branch-0.19/docs/jdiff/hadoop_0.19.2.xml Mon Aug 3 12:37:07 2009 @@ -0,0 +1,44204 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + final. + + @param name resource to be added, the classpath is examined for a file + with that name.]]> + + + + + + final. + + @param url url of the resource to be added, the local filesystem is + examined directly to find the resource, without referring to + the classpath.]]> + + + + + + final. + + @param file file-path of resource to be added, the local filesystem is + examined directly to find the resource, without referring to + the classpath.]]> + + + + + + final. + + @param in InputStream to deserialize the object from.]]> + + + + + + + + + + + name property, null if + no such property exists. + + Values are processed for variable expansion + before being returned. + + @param name the property name. + @return the value of the name property, + or null if no such property exists.]]> + + + + + + name property, without doing + variable expansion. + + @param name the property name. + @return the value of the name property, + or null if no such property exists.]]> + + + + + + + value of the name property. + + @param name property name. + @param value property value.]]> + + + + + + + name property. If no such property + exists, then defaultValue is returned. + + @param name property name. + @param defaultValue default value. + @return property value, or defaultValue if the property + doesn't exist.]]> + + + + + + + name property as an int. + + If no such property exists, or if the specified value is not a valid + int, then defaultValue is returned. + + @param name property name. + @param defaultValue default value. + @return property value as an int, + or defaultValue.]]> + + + + + + + name property to an int. + + @param name property name. + @param value int value of the property.]]> + + + + + + + name property as a long. + If no such property is specified, or if the specified value is not a valid + long, then defaultValue is returned. + + @param name property name. + @param defaultValue default value. + @return property value as a long, + or defaultValue.]]> + + + + + + + name property to a long. + + @param name property name. + @param value long value of the property.]]> + + + + + + + name property as a float. + If no such property is specified, or if the specified value is not a valid + float, then defaultValue is returned. + + @param name property name. + @param defaultValue default value. + @return property value as a float, + or defaultValue.]]> + + + + + + + name property as a boolean. + If no such property is specified, or if the specified value is not a valid + boolean, then defaultValue is returned. + + @param name property name. + @param defaultValue default value. + @return property value as a boolean, + or defaultValue.]]> + + + + + + + name property to a boolean. + + @param name property name. + @param value boolean value of the property.]]> + + + + + + + + + + + + + name property as + a collection of Strings. + If no such property is specified then empty collection is returned. +

+ This is an optimized version of {@link #getStrings(String)} + + @param name property name. + @return property value as a collection of Strings.]]> + + + + + + name property as + an array of Strings. + If no such property is specified then null is returned. + + @param name property name. + @return property value as an array of Strings, + or null.]]> + + + + + + + name property as + an array of Strings. + If no such property is specified then default value is returned. + + @param name property name. + @param defaultValue The default value + @return property value as an array of Strings, + or default value.]]> + + + + + + + name property as + as comma delimited values. + + @param name property name. + @param values The values]]> + + + + + + + + + + + + + + name property + as an array of Class. + The value of the property specifies a list of comma separated class names. + If no such property is specified, then defaultValue is + returned. + + @param name the property name. + @param defaultValue default value. + @return property value as a Class[], + or defaultValue.]]> + + + + + + + name property as a Class. + If no such property is specified, then defaultValue is + returned. + + @param name the class name. + @param defaultValue default value. + @return property value as a Class, + or defaultValue.]]> + + + + + + + + name property as a Class + implementing the interface specified by xface. + + If no such property is specified, then defaultValue is + returned. + + An exception is thrown if the returned class does not implement the named + interface. + + @param name the class name. + @param defaultValue default value. + @param xface the interface implemented by the named class. + @return property value as a Class, + or defaultValue.]]> + + + + + + + + name property to the name of a + theClass implementing the given interface xface. + + An exception is thrown if theClass does not implement the + interface xface. + + @param name property name. + @param theClass property value. + @param xface the interface implemented by the named class.]]> + + + + + + + + dirsProp with + the given path. If dirsProp contains multiple directories, + then one is chosen based on path's hash code. If the selected + directory does not exist, an attempt is made to create it. + + @param dirsProp directory in which to locate the file. + @param path file-path. + @return local file under the directory with the given path.]]> + + + + + + + + dirsProp with + the given path. If dirsProp contains multiple directories, + then one is chosen based on path's hash code. If the selected + directory does not exist, an attempt is made to create it. + + @param dirsProp directory in which to locate the file. + @param path file-path. + @return local file under the directory with the given path.]]> + + + + + + + + + + + + name. + + @param name configuration resource name. + @return an input stream attached to the resource.]]> + + + + + + name. + + @param name configuration resource name. + @return a reader attached to the resource.]]> + + + + + + + + + + + + + + + String + key-value pairs in the configuration. + + @return an iterator over the entries.]]> + + + + + + + + + + + + + + + + + + + + + + + + + + true to set quiet-mode on, false + to turn it off.]]> + + + + + + + + + + + + + + + + + + + Resources + +

Configurations are specified by resources. A resource contains a set of + name/value pairs as XML data. Each resource is named by either a + String or by a {@link Path}. If named by a String, + then the classpath is examined for a file with that name. If named by a + Path, then the local filesystem is examined directly, without + referring to the classpath. + +

Unless explicitly turned off, Hadoop by default specifies two + resources, loaded in-order from the classpath:

    +
  1. hadoop-default.xml + : Read-only defaults for hadoop.
  2. +
  3. hadoop-site.xml: Site-specific configuration for a given hadoop + installation.
  4. +
+ Applications may add additional resources, which are loaded + subsequent to these resources in the order they are added. + +

Final Parameters

+ +

Configuration parameters may be declared final. + Once a resource declares a value final, no subsequently-loaded + resource can alter that value. + For example, one might define a final parameter with: +

+  <property>
+    <name>dfs.client.buffer.dir</name>
+    <value>/tmp/hadoop/dfs/client</value>
+    <final>true</final>
+  </property>
+ + Administrators typically define parameters as final in + hadoop-site.xml for values that user applications may not alter. + +

Variable Expansion

+ +

Value strings are first processed for variable expansion. The + available properties are:

    +
  1. Other properties defined in this Configuration; and, if a name is + undefined here,
  2. +
  3. Properties in {@link System#getProperties()}.
  4. +
+ +

For example, if a configuration resource contains the following property + definitions: +

+  <property>
+    <name>basedir</name>
+    <value>/user/${user.name}</value>
+  </property>
+  
+  <property>
+    <name>tempdir</name>
+    <value>${basedir}/tmp</value>
+  </property>
+ + When conf.get("tempdir") is called, then ${basedir} + will be resolved to another property in this Configuration, while + ${user.name} would then ordinarily be resolved to the value + of the System property with that name.]]> +
+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + DistributedCache is a facility provided by the Map-Reduce + framework to cache files (text, archives, jars etc.) needed by applications. +

+ +

Applications specify the files, via urls (hdfs:// or http://) to be cached + via the {@link org.apache.hadoop.mapred.JobConf}. + The DistributedCache assumes that the + files specified via hdfs:// urls are already present on the + {@link FileSystem} at the path specified by the url.

+ +

The framework will copy the necessary files on to the slave node before + any tasks for the job are executed on that node. Its efficiency stems from + the fact that the files are only copied once per job and the ability to + cache archives which are un-archived on the slaves.

+ +

DistributedCache can be used to distribute simple, read-only + data/text files and/or more complex types such as archives, jars etc. + Archives (zip, tar and tgz/tar.gz files) are un-archived at the slave nodes. + Jars may be optionally added to the classpath of the tasks, a rudimentary + software distribution mechanism. Files have execution permissions. + Optionally users can also direct it to symlink the distributed cache file(s) + into the working directory of the task.

+ +

DistributedCache tracks modification timestamps of the cache + files. Clearly the cache files should not be modified by the application + or externally while the job is executing.

+ +

Here is an illustrative example on how to use the + DistributedCache:

+

+     // Setting up the cache for the application
+     
+     1. Copy the requisite files to the FileSystem:
+     
+     $ bin/hadoop fs -copyFromLocal lookup.dat /myapp/lookup.dat  
+     $ bin/hadoop fs -copyFromLocal map.zip /myapp/map.zip  
+     $ bin/hadoop fs -copyFromLocal mylib.jar /myapp/mylib.jar
+     $ bin/hadoop fs -copyFromLocal mytar.tar /myapp/mytar.tar
+     $ bin/hadoop fs -copyFromLocal mytgz.tgz /myapp/mytgz.tgz
+     $ bin/hadoop fs -copyFromLocal mytargz.tar.gz /myapp/mytargz.tar.gz
+     
+     2. Setup the application's JobConf:
+     
+     JobConf job = new JobConf();
+     DistributedCache.addCacheFile(new URI("/myapp/lookup.dat#lookup.dat"), 
+                                   job);
+     DistributedCache.addCacheArchive(new URI("/myapp/map.zip", job);
+     DistributedCache.addFileToClassPath(new Path("/myapp/mylib.jar"), job);
+     DistributedCache.addCacheArchive(new URI("/myapp/mytar.tar", job);
+     DistributedCache.addCacheArchive(new URI("/myapp/mytgz.tgz", job);
+     DistributedCache.addCacheArchive(new URI("/myapp/mytargz.tar.gz", job);
+     
+     3. Use the cached files in the {@link org.apache.hadoop.mapred.Mapper}
+     or {@link org.apache.hadoop.mapred.Reducer}:
+     
+     public static class MapClass extends MapReduceBase  
+     implements Mapper<K, V, K, V> {
+     
+       private Path[] localArchives;
+       private Path[] localFiles;
+       
+       public void configure(JobConf job) {
+         // Get the cached archives/files
+         localArchives = DistributedCache.getLocalCacheArchives(job);
+         localFiles = DistributedCache.getLocalCacheFiles(job);
+       }
+       
+       public void map(K key, V value, 
+                       OutputCollector<K, V> output, Reporter reporter) 
+       throws IOException {
+         // Use data from the cached archives/files here
+         // ...
+         // ...
+         output.collect(k, v);
+       }
+     }
+     
+ 

+ + @see org.apache.hadoop.mapred.JobConf + @see org.apache.hadoop.mapred.JobClient]]> +
+
+ +
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + BufferedFSInputStream + with the specified buffer size, + and saves its argument, the input stream + in, for later use. An internal + buffer array of length size + is created and stored in buf. + + @param in the underlying input stream. + @param size the buffer size. + @exception IllegalArgumentException if size <= 0.]]> + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + setReplication of FileSystem + @param src file name + @param replication new replication + @throws IOException + @return true if successful; + false if file does not exist or is a directory]]> + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +