hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hadoop Wiki] Update of "FAQ" by QwertyManiac
Date Thu, 27 Jan 2011 04:21:29 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The "FAQ" page has been changed by QwertyManiac.
The comment on this change is: Reading cluster configuration values in Job..


  == What is the Distributed Cache used for? ==
  The distributed cache is used to distribute large read-only files that are needed by map/reduce
jobs to the cluster. The framework will copy the necessary files from a url (either hdfs:
or http:) on to the slave node before any tasks for the job are executed on that node. The
files are only copied once per job and so should not be modified by the application.
+ == How do I get my MapReduce Java Program to read the Cluster's set configuration and not
just defaults? ==
+ The configuration property files ({core|mapred|hdfs}-site.xml) that are available in the
various '''conf/''' directories of your Hadoop installation needs to be on the '''CLASSPATH'''
of your Java application for it to get found and applied. Another way of ensuring that no
set configuration gets overridden by any Job is to set those properties as final; for example:
+ {{{
+ <name>mapreduce.task.io.sort.mb</name>
+ <value>400</value>
+ <final>true</final>
+ }}}
+ Setting configuration properties as final is a common thing Administrators do, as is noted
in the [[http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/conf/Configuration.html|Configuration]]
API docs.
  == Can I write create/write-to hdfs files directly from map/reduce tasks? ==
  Yes. (Clearly, you want this since you need to create/write-to files other than the output-file
written out by [[http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/mapred/OutputCollector.html|OutputCollector]].)

View raw message