Return-Path:
+ This is an optimized version of {@link #getStrings(String)}
+
+ @param name property name.
+ @return property value as a collection of Configurations are specified by resources. A resource contains a set of
+ name/value pairs as XML data. Each resource is named by either a
+ Hadoop by default specifies two resources, loaded in-order from the
+ classpath: Configuration parameters may be declared final.
+ Once a resource declares a value final, no subsequently-loaded
+ resource can alter that value.
+ For example, one might define a final parameter with:
+ Value strings are first processed for variable expansion. The
+ available properties are: For example, if a configuration resource contains the following property
+ definitions:
+ null
if
+ no such property exists.
+
+ Values are processed for variable expansion
+ before being returned.
+
+ @param name the property name.
+ @return the value of the name
property,
+ or null if no such property exists.]]>
+ name
property,
+ or null if no such property exists.]]>
+ name
property.
+
+ @param name property name.
+ @param value property value.]]>
+ defaultValue
is returned.
+
+ @param name property name.
+ @param defaultValue default value.
+ @return property value, or defaultValue
if the property
+ doesn't exist.]]>
+ int
.
+
+ If no such property exists, or if the specified value is not a valid
+ int
, then defaultValue
is returned.
+
+ @param name property name.
+ @param defaultValue default value.
+ @return property value as an int
,
+ or defaultValue
.]]>
+ int
.
+
+ @param name property name.
+ @param value int
value of the property.]]>
+ long
.
+ If no such property is specified, or if the specified value is not a valid
+ long
, then defaultValue
is returned.
+
+ @param name property name.
+ @param defaultValue default value.
+ @return property value as a long
,
+ or defaultValue
.]]>
+ long
.
+
+ @param name property name.
+ @param value long
value of the property.]]>
+ float
.
+ If no such property is specified, or if the specified value is not a valid
+ float
, then defaultValue
is returned.
+
+ @param name property name.
+ @param defaultValue default value.
+ @return property value as a float
,
+ or defaultValue
.]]>
+ boolean
.
+ If no such property is specified, or if the specified value is not a valid
+ boolean
, then defaultValue
is returned.
+
+ @param name property name.
+ @param defaultValue default value.
+ @return property value as a boolean
,
+ or defaultValue
.]]>
+ boolean
.
+
+ @param name property name.
+ @param value boolean
value of the property.]]>
+ String
s.
+ If no such property is specified then empty collection is returned.
+ String
s.]]>
+ String
s.
+ If no such property is specified then null
is returned.
+
+ @param name property name.
+ @return property value as an array of String
s,
+ or null
.]]>
+ String
s.
+ If no such property is specified then default value is returned.
+
+ @param name property name.
+ @param defaultValue The default value
+ @return property value as an array of String
s,
+ or default value.]]>
+ Class
.
+ If no such property is specified, then defaultValue
is
+ returned.
+
+ @param name the class name.
+ @param defaultValue default value.
+ @return property value as a Class
,
+ or defaultValue
.]]>
+ Class
+ implementing the interface specified by xface
.
+
+ If no such property is specified, then defaultValue
is
+ returned.
+
+ An exception is thrown if the returned class does not implement the named
+ interface.
+
+ @param name the class name.
+ @param defaultValue default value.
+ @param xface the interface implemented by the named class.
+ @return property value as a Class
,
+ or defaultValue
.]]>
+ theClass
implementing the given interface xface
.
+
+ An exception is thrown if theClass
does not implement the
+ interface xface
.
+
+ @param name property name.
+ @param theClass property value.
+ @param xface the interface implemented by the named class.]]>
+ false
+ to turn it off.]]>
+ String
or by a {@link Path}. If named by a String
,
+ then the classpath is examined for a file with that name. If named by a
+ Path
, then the local filesystem is examined directly, without
+ referring to the classpath.
+
+
+
+ Applications may add additional resources, which are loaded
+ subsequent to these resources in the order they are added.
+
+ Final Parameters
+
+
+ <property>
+ <name>dfs.client.buffer.dir</name>
+ <value>/tmp/hadoop/dfs/client</value>
+ <final>true</final>
+ </property>
+
+ Administrators typically define parameters as final in
+ hadoop-site.xml for values that user applications may not alter.
+
+ Variable Expansion
+
+
+
+
+
+ <property>
+ <name>basedir</name>
+ <value>/user/${user.name}</value>
+ </property>
+
+ <property>
+ <name>tempdir</name>
+ <value>${basedir}/tmp</value>
+ </property>
+
+ When conf.get("tempdir") is called, then ${basedir}
+ will be resolved to another property in this Configuration, while
+ ${user.name} would then ordinarily be resolved to the value
+ of the System property with that name.]]>
+ DistributedCache
is a facility provided by the Map-Reduce
+ framework to cache files (text, archives, jars etc.) needed by applications.
+
Applications specify the files, via urls (hdfs:// or http://) to be cached
+ via the {@link JobConf}. The DistributedCache
assumes that the
+ files specified via hdfs:// urls are already present on the
+ {@link FileSystem} at the path specified by the url.
The framework will copy the necessary files on to the slave node before + any tasks for the job are executed on that node. Its efficiency stems from + the fact that the files are only copied once per job and the ability to + cache archives which are un-archived on the slaves.
+ +DistributedCache
can be used to distribute simple, read-only
+ data/text files and/or more complex types such as archives, jars etc.
+ Archives (zip, tar and tgz/tar.gz files) are un-archived at the slave nodes.
+ Jars may be optionally added to the classpath of the tasks, a rudimentary
+ software distribution mechanism. Files have execution permissions.
+ Optionally users can also direct it to symlink the distributed cache file(s)
+ into the working directory of the task.
DistributedCache
tracks modification timestamps of the cache
+ files. Clearly the cache files should not be modified by the application
+ or externally while the job is executing.
Here is an illustrative example on how to use the
+ DistributedCache
:
+ + @see JobConf + @see JobClient]]> + + + + ++ // Setting up the cache for the application + + 1. Copy the requisite files to theFileSystem
: + + $ bin/hadoop fs -copyFromLocal lookup.dat /myapp/lookup.dat + $ bin/hadoop fs -copyFromLocal map.zip /myapp/map.zip + $ bin/hadoop fs -copyFromLocal mylib.jar /myapp/mylib.jar + $ bin/hadoop fs -copyFromLocal mytar.tar /myapp/mytar.tar + $ bin/hadoop fs -copyFromLocal mytgz.tgz /myapp/mytgz.tgz + $ bin/hadoop fs -copyFromLocal mytargz.tar.gz /myapp/mytargz.tar.gz + + 2. Setup the application'sJobConf
: + + JobConf job = new JobConf(); + DistributedCache.addCacheFile(new URI("/myapp/lookup.dat#lookup.dat"), + job); + DistributedCache.addCacheArchive(new URI("/myapp/map.zip", job); + DistributedCache.addFileToClassPath(new Path("/myapp/mylib.jar"), job); + DistributedCache.addCacheArchive(new URI("/myapp/mytar.tar", job); + DistributedCache.addCacheArchive(new URI("/myapp/mytgz.tgz", job); + DistributedCache.addCacheArchive(new URI("/myapp/mytargz.tar.gz", job); + + 3. Use the cached files in the {@link Mapper} or {@link Reducer}: + + public static class MapClass extends MapReduceBase + implements Mapper<K, V, K, V> { + + private Path[] localArchives; + private Path[] localFiles; + + public void configure(JobConf job) { + // Get the cached archives/files + localArchives = DistributedCache.getLocalCacheArchives(job); + localFiles = DistributedCache.getLocalCacheFiles(job); + } + + public void map(K key, V value, + OutputCollector<K, V> output, Reporter reporter) + throws IOException { + // Use data from the cached archives/files here + // ... + // ... + output.collect(k, v); + } + } + +
in
, for later use. An internal
+ buffer array of length size
+ is created and stored in buf
.
+
+ @param in the underlying input stream.
+ @param size the buffer size.
+ @exception IllegalArgumentException if size <= 0.]]>
+