hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Doug Cutting (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-785) Divide the server and client configurations
Date Wed, 06 Dec 2006 20:15:23 GMT
    [ http://issues.apache.org/jira/browse/HADOOP-785?page=comments#action_12456192 ] 
            
Doug Cutting commented on HADOOP-785:
-------------------------------------

I think this is the right direction.  We logically have a tree.  Each node corresponds to
a config file that inherits and overrides its parent's files.

The need is that users be able to easily (1) remember the tree, (2) know where to specify
a property within the tree.

I propose that the tree is organized around *where* in the cluster things are used, not *what*
part of the code they configure (that's determined by the parameter name).  This addresses
the primary source of confusion, and thus is what we must clarify.  In particular we should
distinguish between things used only by servers, and things that clients may specify.

I propose the following tree:

default --read-only defaults for things that clients can override
  site -- site-specific defaults
    server-default -- read-only defaults for server-only configuration
      server -- server overrides for this site
    client -- user overrides

The read-only default files serve as documentation of what parameters can be added to files
lower in the tree.  It is a configuration error to specify something that does not have a
default value above it.

Some examples of what might be in the three non read-only files:
 
site - - site-specific defaults
  dfs.namenode.host&port
  dfs.block.size
  dfs.replication
  mapred.jobtracker.host&port
  mapred.map.tasks
  mapred.reduce.tasks

server -- server-specifics
   dfs.name.dir
   dfs.data.dir
   mapred.local.dir

client -- user can override defaults and site here, but not server
  dfs.replication -- user overrides site
  mapred.map.tasks -- user overrides site

Following from this, we'd have the following instantiable classes:

ServerConfiguration
  reads default, site, server-default, server, in that order.
  used by daemons

ClientConfiguration
   reads default, site, client, in that order.
   used by client applications

Rather than provide subclasses for different parts of the system, we should instead use static
methods.  For example, we might have:

JobConf.setNumMapTasks(ClientConfiguration conf, int count);
HdfsConf.setReplication(ClientConfiguration conf, int replicas);

The point of these is compile-time checking of names and values while keeping the code well
partitioned.  When we add a new HDFS parameter we should not have to change code outside of
HDFS, yet, without multiple-inheritance, we cannot have a single object that permits configuration
of HDFS, MapReduce, etc.

Thoughts?


  

> Divide the server and client configurations
> -------------------------------------------
>
>                 Key: HADOOP-785
>                 URL: http://issues.apache.org/jira/browse/HADOOP-785
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: conf
>    Affects Versions: 0.9.0
>            Reporter: Owen O'Malley
>         Assigned To: Arun C Murthy
>             Fix For: 0.10.0
>
>
> The configuration system is easy to misconfigure and I think we need to strongly divide
the server from client configs. 
> An example of the problem was a configuration where the task tracker has a hadoop-site.xml
that set mapred.reduce.tasks to 1. Therefore, the job tracker had the right number of reduces,
but the map task thought there was a single reduce. This lead to a hard to find diagnose failure.
> Therefore, I propose separating out the configuration types as:
> class Configuration;
> // reads site-default.xml, hadoop-default.xml
> class ServerConf extends Configuration;
> // reads hadoop-server.xml, $super
> class DfsServerConf extends ServerConf;
> // reads dfs-server.xml, $super
> class MapRedServerConf extends ServerConf;
> // reads mapred-server.xml, $super
> class ClientConf extends Configuration;
> // reads hadoop-client.xml, $super
> class JobConf extends ClientConf;
> // reads job.xml, $super
> Note in particular, that nothing corresponds to hadoop-site.xml, which overrides both
client and server configs. Furthermore, the properties from the *-default.xml files should
never be saved into the job.xml.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message