hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steve Loughran <ste...@apache.org>
Subject Re: How do people keep their client configurations in sync with the remote cluster(s)
Date Thu, 15 May 2008 15:56:27 GMT
Allen Wittenauer wrote:
> On 5/15/08 5:05 AM, "Steve Loughran" <stevel@apache.org> wrote:
>> I have a question for users: how do they ensure their client apps have
>> configuration XML file that are kept up to date?
>     We control the client as well as the servers, so it all gets pushed at
> once. :)

yes, but you use NFS, so you have your own problems, like the log 
message "NFS Server not responding still trying" appearing across 
everyone's machines simultaneously, which is to be feared almost as much 
as when ClearCase announces that its filesystem is offline.

>     That said, we're starting to allow clients that aren't controlled by us
> to talk to our grids.  We'll likely re-bundle our configs into digest-able
> packages for them at some point and then have flag days.

mmm. but then you have the problem that once you change your settings, 
all code that compiles the old settings into their JAR break.

>> I'm thinking of looking at what it would take for a job submitter to ask
>> the tracker for its config data, to get things like the various
>> directory bases from the cluster, instead of being compiled into the
>> client. Then the management problem becomes one of keeping the cluster
>> configuration under control, which is a much easier proposition.
>     But I like this idea a lot.  The tricky part comes when clients really
> do need to modify something (# of mappers, heap size, whatever).

yes. I think the jobs need to have the right to override most of a sites 
settings, but I don't see why they should have the responsibility of 
getting all those settings in the first place, at build time. At the 
very least, they should be retrieved at run time.

Having looked at  http://issues.apache.org/jira/browse/HADOOP-3135, I 
can see that it addresses a core issue -you need to know the cluster's 
filesystem layout.


View raw message