hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michel Tourn (JIRA)" <j...@apache.org>
Subject [jira] Created: (HADOOP-88) Configuration: separate client config from server config (and from other-server config)
Date Fri, 17 Mar 2006 01:16:01 GMT
Configuration: separate client config from server config (and from other-server config)

         Key: HADOOP-88
         URL: http://issues.apache.org/jira/browse/HADOOP-88
     Project: Hadoop
        Type: Wish
    Reporter: Michel Tourn
 Assigned to: Doug Cutting 

servers = JobTracker, NameNode, TaskTracker, DataNode
clients =  runs JobClient (to submit MapReduce jobs), or runs DFSShell (to browse )

Server machines are administered together.
So it is OK to have all server config together (esp file paths and network ports).
This is stored in hadoop-default.xml or hadoop-mycluster.xml

Client machines:
there may be as many client machines as there are MapRed developers.
the temp space for DFS needs to be writable by the active user.
So it should be possible to select the client temp space directory for the machine and for
the user.
(The global /tmp is not an option as discussed elsewhere: partition may be full)

Current situation: 
Both the server and the clients have a copy of the server config: hadoop-default.xml
But the XML property  "dfs.data.dir" is being used as a LOCAL directory path 
on both the server machines (Data nodes) and the client machines.

Exception in thread "main" java.io.IOException: No valid local directories in property: dfs.data.dir
 at org.apache.hadoop.conf.Configuration.getFile(Configuration.java:286)
 at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.newBackupFile(DFSClient.java:560)
 at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:267)

Current Workaround:
On the client use hadoop-site.xml to override dfs.data.dir

One proposed solution:

For the purpose of JobClient operations, use a different property in place of dfs.data.dir.
(Ex: dfs.client.data.dir) 
On the client, set this property in hadoop-site.xml so that it will override hadoop-default.xml

Another proposed solution:

Handle the fact that the world is made of a federation of independant Hadoop systems.
They can talk to each other (as peers) but they are administered separately.
Each Hadoop system should have its own separate XML config file.
Clients should be able to specify the Hadoop system they want to talk to.
An advantage is that clients can then easily sync their local copy of a given Hadoop system
 just pull its config file

In this view of the world, a Job client is also a kind of independant (serverless) Hadoop
In this case the client config file may have its own dfs.data.dir, which is 
separate from the dfs.data.dir in the server config file.

This message is automatically generated by JIRA.
If you think it was sent incorrectly contact one of the administrators:
For more information on JIRA, see:

View raw message