hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron Kimball <aa...@cloudera.com>
Subject Re: How does hadoop deal with hadoop-site.xml?
Date Wed, 19 Aug 2009 23:38:53 GMT
Hi Inifok,

This is a confusing aspect of Hadoop, I'm afraid.

Settings are divided into two categories: "per-job" and "per-node."
Unfortunately, which are which, isn't documented.

Some settings are applied to the node that is being used. So for example, if
you set fs.default.name on a node to be "hdfs://some.namenode:8020/", then
any FS connections you make from that node will go to some.namenode. If a
different machine in your cluster has fs.default.name set to
hdfs://other.namenode, then that machine will connect to a different

Another example of a per-machine setting is
mapred.tasktracker.map.tasks.maximum; this tells a tasktracker the maximum
number of tasks it should run in parallel. Each tasktracker is free to
configure this value differently. e.g., if you have some quad-core and some
eight-core machines. dfs.data.dir tells a datanode where its data
directories should be kept. Naturally, this can vary machine-to-machine.

Other settings are applied to a job as a whole. These settings are
configured when you submit the job. So if you write
conf.set("mapred.reduce.parallel.copies", 20) in your code, this will be the
setting for the job. Settings that you don't explicitly put in your code,
are drawn from the hadoop-site.xml file on the machine where the job is
submitted from.

In general, I strongly recommend you save yourself some pain by keeping your
configuration files as identical as possible :)
Good luck,
- Aaron

On Wed, Aug 19, 2009 at 7:21 AM, yang song <hadoop.inifok@gmail.com> wrote:

> Hello, everybody
>    I feel puzzled about setting properties in hadoop-site.xml.
>    Suppose I submit the job from machine A, and JobTracker runs on machine
> B. So there are two hadoop-site.xml files. Now, I increase
> "mapred.reduce.parallel.copies"(e.g. 10) on machine B since I want to make
> copy phrase faster. However, "mapred.reduce.parallel.copies" from WebUI is
> still 5. When I increase it on machine A, it changes. So, I feel very
> puzzled. Why does it doesn't work when I change it on B? What's more, when
> I
> add some properties on B, the certain properties will be found on WebUI.
> And
> why I can't change properties through machine B? Does some certain
> properties must be changed through A and some others must be changed
> through
> B?
>    Thank you!
>    Inifok

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message