hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Lucene-hadoop Wiki] Update of "HowToConfigure" by OwenOMalley
Date Fri, 30 Jun 2006 23:36:21 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Lucene-hadoop Wiki" for change notification.

The following page has been changed by OwenOMalley:
http://wiki.apache.org/lucene-hadoop/HowToConfigure

------------------------------------------------------------------------------
  = How To Configure Hadoop =
  
+ == Primary XML Files ==
+ 
- Hadoop is configured with a set of files. The files are loaded in order with the lower files
taking priority over the higher ones:
+ Hadoop is configured with a set of files. The files are loaded in order list in the table
below, with the lower files in the table overriding the higher ones:
  
  || '''Filename''' || '''Description''' ||
  || hadoop-default.xml || Generic default values ||
@@ -10, +12 @@

  || job.xml || Configuration for a specific map/reduce job ||
  || hadoop-site.xml || Site specific value that can not be modified by the job ||
  
- == Look up path ==
+ === Look up path ===
  
  Configuration files are found via Java's Classpath. Only the first instance of each file
is used. The $HADOOP_CONF_DIR is added by the bin/hadoop script to the front of the path.
When installing Hadoop on a cluster, it is best to use a conf directory outside of the distribution.
That allows you to easily update the release on the cluster without changing your configuration
by mistake.
  
- == Hadoop-default.xml ==
+ === Hadoop-default.xml ===
  
  This file has the default values for many of the configuration variables that are used by
Hadoop. This file should never be in $HADOOP_CONF_DIR so that the version in the hadoop-*.jar
is used. (Otherwise, if a variable is added to this file in a new release, you won't have
it defined.)
  
- == mapred-default.xml ==
+ === mapred-default.xml ===
  
- This file should contain the majority of your customization of hadoop. Useful variables
are:
+ This file should contain the majority of your site's customization of Hadoop. Although this
file is named mapred, it is really used for both the map/reduce and DFS servers, because they
all use !JobConf objects rather than Configuration objects.
+ 
+ Some useful variables are:
  
  || '''Name''' || '''Meaning''' ||
  || dfs.block.size || size in bytes of each data block in DFS ||
@@ -34, +38 @@

  || mapred.output.compress || Should the reduce outputs be compressed? ||
  
  
+ === job.xml ===
  
- == job.xml ==
+ This file is never created explicitly by the user. The map/reduce application creates a

+ [http://wiki.apache.org/lucene-hadoop/JobConfFile JobConf], which is serialized when the
job is submitted.
  
- This file is never created explicitly by the user. The map/reduce application creates a
JobConf, which is serialized when the job is submitted.
- 
- == hadoop-site.xml ==
+ === hadoop-site.xml ===
  
  This file overrides any settings in the job.xml and therefore should be very minimal. Usually
it just contains, the addresses of the NameNode and JobTracker, the port and working directories
for the various servers.
  
+ == Environment Variables ==
+ 
+ For the most part, you should only need to define $HADOOP_CONF_DIR. Other environment variables
are defined in $HADOOP_CONF_DIR/hadoop-env.sh. 
+ 
+ Variables in hadoop-env.sh, include:
+ 
+ || '''Name''' || '''Meaning''' ||
+ || JAVA_HOME || Root of the Java installation ||
+ || HADOOP_HEAPSIZE || MB of heap for the servers ||
+ || HADOOP_IDENT_STRING || User name of the cluster ||
+ || HADOOP_OPTS || Extra arguments to the JVM ||
+ || HADOOP_HOME || Hadoop release directory ||
+ || HADOOP_LOG_DIR || Directory for log files ||
+ || HADOOP_PID_DIR || Directory to store the PID for the servers ||
+ 
+ == Log4j Configuration ==
+ 
+ Hadoop logs messages to Log4j by default. Log4j is configured via log4j.properties on the
classpath. This file defines both what is logged and where. For applications, the default
root logger is "INFO,console", which logs all message at level INFO and above to the console's
stderr. Servers log to the "INFO,DRFA", which logs to a file that is rolled daily. Log files
are named $HADOOP_LOG_DIR/hadoop-$HADOOP_IDENT_STRING-<server>.log. 
+ 
+ For Hadoop developers, it is often convienent to get additional logging from particular
classes. If you are working on the TaskTracker, for example, you would likely want 
+  log4.logger.org.apache.hadoop.mapred.!TaskTracker=DEBUG
+ in you log4j.properties.
+ 

Mime
View raw message