hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Arun C Murthy (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HADOOP-785) Divide the server and client configurations
Date Thu, 02 Aug 2007 18:16:53 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Arun C Murthy updated HADOOP-785:
---------------------------------

    Fix Version/s: 0.15.0
         Assignee: Arun C Murthy  (was: Milind Bhandarkar)

I'll try and take this forward from now on...

After some hallway discussions here are some of my ideas, clearly they are fairly nascent
and open to discussion...

The *whys* for this issue are fairly clear and I'm not getting into them again...

The *hows* are a mixture of ideas already thrown around here and some of my own... (so yeah,
clearly there is a fair amount of plagiarism involved! *smile* ).

-*-*-

Proposal:

Like all previous proposals I'm all for splitting up client and server configs, this would
let the administrators of large clusters change them independently (e.g. configure dfs.client.buffer.dir
separately on the actual cluster and on submission nodes - this is important in cases where
the submission nodes lie outside the hadoop cluster itself). Also, I'm with separation of
configuration variables according to the contexts in which they are used, and not which part
of code they configure.

One break from the past in this proposal is to split-up hadoop-site.xml into hadoop-server.xml
& hadoop-client.xml to reflect that we have separate configs for servers (hadoop daemons)
and clients (job-clients or dfs-clients). Both of these are initially empty as with hadoop-site.xml
today.

Thus the class heirarchy would look like:

{{Configuration}} (reads hadoop-default.xml)

{{ServerConfiguration}} (reads hadoop-default.xml & hadoop-server.xml)

{{ClientConfiguration}} (reads hadoop-default.xml & hadoop-client.xml)

{{JobConfiguration}} (reads hadoop-default.xml & hadoop-client.xml & maybe a user-defined
job config file)

Thus hadoop daemons i.e. servers only use ServerConfiguration and clients (e.g. dfsClient)
use ClientConfiguration to ensure they don't get polluted by the others. Clearly mapred jobs
use JobConfiguration as-is today.

To ensure users know where to override specific config values (i.e. should I override fs.default.name
in hadoop-server.xml or hadoop-client.xml to make sure my clients pick up the *right* values)
I propose we add a {{context}} tag to {{property}} (or just an attribute) which is either
server, client or job.

E.g.

{noformat}
<property> 
  <context>client</context>
  <name>dfs.client.buffer.dir</name>
  <value>/tmp/dfs/bufdir</value>
  <description>...</description>
</property>
{noformat}

This could alternatively be done via a comment for each {{property}} in hadoop-default.xml
but I reckon the tag (or attribute) sort of institutionalizes it... *smile*.

Similarly we could also add a {{level}} tag (or attribute) to each property which is either
of expert, intermediate,  beginner to let users know how much of an affect changing a specific
knob entails... (again this could be just a comment, and at the risk of repeating myself...
yadda, yadda, yadda).

-*-*-

Overall the idea is to have a simple, reasonably error-resistant configuration system without
falling into the trap of over-generalising the same.

Thoughts?


> Divide the server and client configurations
> -------------------------------------------
>
>                 Key: HADOOP-785
>                 URL: https://issues.apache.org/jira/browse/HADOOP-785
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: conf
>    Affects Versions: 0.9.0
>            Reporter: Owen O'Malley
>            Assignee: Arun C Murthy
>             Fix For: 0.15.0
>
>
> The configuration system is easy to misconfigure and I think we need to strongly divide
the server from client configs. 
> An example of the problem was a configuration where the task tracker has a hadoop-site.xml
that set mapred.reduce.tasks to 1. Therefore, the job tracker had the right number of reduces,
but the map task thought there was a single reduce. This lead to a hard to find diagnose failure.
> Therefore, I propose separating out the configuration types as:
> class Configuration;
> // reads site-default.xml, hadoop-default.xml
> class ServerConf extends Configuration;
> // reads hadoop-server.xml, $super
> class DfsServerConf extends ServerConf;
> // reads dfs-server.xml, $super
> class MapRedServerConf extends ServerConf;
> // reads mapred-server.xml, $super
> class ClientConf extends Configuration;
> // reads hadoop-client.xml, $super
> class JobConf extends ClientConf;
> // reads job.xml, $super
> Note in particular, that nothing corresponds to hadoop-site.xml, which overrides both
client and server configs. Furthermore, the properties from the *-default.xml files should
never be saved into the job.xml.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message