hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yoram Arnon (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-88) Configuration: separate client config from server config (and from other-server config)
Date Tue, 21 Mar 2006 02:02:00 GMT
    [ http://issues.apache.org/jira/browse/HADOOP-88?page=comments#action_12371177 ] 

Yoram Arnon commented on HADOOP-88:

I'd also separate the dfs config from the map-reduce config - they have nothing in common,
and dfs has a life of its own in that it could support apps other than map-reduce.

Taking this one step further, I'd separate name node config from data node config, and job
tracker config from task tracker config. While useful to have them all bunled up when running
on a single node, they're typically running on distinct nodes in a real system, and definitely
in different processes, so separate configs make sense.

As for the client config, it should be really really easy:
 config file, rather than directory
 config file can reside anywhere, including the same directory as the application, and can
have any name
 no reliance on environment variables
 specify the config file on the command line (client -f <file>) allowing concurrent
clients for multiple hadoop clusters
 as few knobs as possible, simple to configure manually

> Configuration: separate client config from server config (and from other-server config)
> ---------------------------------------------------------------------------------------
>          Key: HADOOP-88
>          URL: http://issues.apache.org/jira/browse/HADOOP-88
>      Project: Hadoop
>         Type: Wish
>     Reporter: Michel Tourn
>     Assignee: Doug Cutting

> servers = JobTracker, NameNode, TaskTracker, DataNode
> clients =  runs JobClient (to submit MapReduce jobs), or runs DFSShell (to browse )
> Server machines are administered together.
> So it is OK to have all server config together (esp file paths and network ports).
> This is stored in hadoop-default.xml or hadoop-mycluster.xml
> Client machines:
> there may be as many client machines as there are MapRed developers.
> the temp space for DFS needs to be writable by the active user.
> So it should be possible to select the client temp space directory for the machine and
for the user.
> (The global /tmp is not an option as discussed elsewhere: partition may be full)
> Current situation: 
> Both the server and the clients have a copy of the server config: hadoop-default.xml
> But the XML property  "dfs.data.dir" is being used as a LOCAL directory path 
> on both the server machines (Data nodes) and the client machines.
> Effect:
> Exception in thread "main" java.io.IOException: No valid local directories in property:
>  at org.apache.hadoop.conf.Configuration.getFile(Configuration.java:286)
>  at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.newBackupFile(DFSClient.java:560)
>  ...
>  at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:267)
> Current Workaround:
> On the client use hadoop-site.xml to override dfs.data.dir
> One proposed solution:
> For the purpose of JobClient operations, use a different property in place of dfs.data.dir.
> (Ex: dfs.client.data.dir) 
> On the client, set this property in hadoop-site.xml so that it will override hadoop-default.xml

> Another proposed solution:
> Handle the fact that the world is made of a federation of independant Hadoop systems.
> They can talk to each other (as peers) but they are administered separately.
> Each Hadoop system should have its own separate XML config file.
> Clients should be able to specify the Hadoop system they want to talk to.
> An advantage is that clients can then easily sync their local copy of a given Hadoop
system config:
>  just pull its config file
> In this view of the world, a Job client is also a kind of independant (serverless) Hadoop
> In this case the client config file may have its own dfs.data.dir, which is 
> separate from the dfs.data.dir in the server config file.

This message is automatically generated by JIRA.
If you think it was sent incorrectly contact one of the administrators:
For more information on JIRA, see:

View raw message