hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matthew Foley <ma...@yahoo-inc.com>
Subject Re: Hi, I'm graduate student and I have one question, multiple hadoop install.
Date Tue, 01 Mar 2011 19:26:47 GMT
Hi Sungho,
Here is a "recipe" for how to run multiple nodes on a single server, posted to this list on
Sept. 15:
http://mail-archives.apache.org/mod_mbox/hadoop-common-user/201009.mbox/%3C8A898C33-DC4E-418C-ADC0-5689D434B05E@yahoo-inc.com%3E

For v22 and later, the world has been split into three parts; where there was formerly HADOOP_HOME,
there is now HADOOP_COMMON_HOME, HADOOP_HDFS_HOME, and HADOOP_MAPRED_HOME, and in the default
configuration each of them has its own "conf/" subdirectory.  However, it is acceptable to
pile all the contents of the three conf directories into a single conf directory somewhere
else (the only name conflict is "configuration.xsl" which can be shared), set an environment
variable $HADOOP_CONF_DIR to point to it, and pass that value in with the "--config" option
whenever you launch processes with bin/hadoop or bin/hdfs.

Now, the above recipe assumes you want multiple nodes from ONE cluster running on a single
server.  I suggest you start with that and get it working, so you understand the hdfs-site.xml
file and how it is used.

You seem to be asking to run multiple CLUSTERS on a single server.  I believe the same mechanism
will work (pointing different "node" invocations at different config directories), but you
will need to make several more changes in the $HADOOP_CONF_DIR/hdfs-site.xml files, to create
different namenode configurations as well as the different datanode configurations addressed
in the recipe.  Please look at the documentation for which parameters to change.

A couple comments:
-  You probably can't run two namenodes simultaneously in the same server, unless it has a
huge amount of memory and you don't care about performance.  But you can have two different
configurations stored, and run them at different times.
-  If the ONLY difference in the two clusters is the number of datanodes, you actually don't
have to have different namenode configurations.  You can just configure 10 datanodes, and
then sometimes run only 5 of them (clearing storage in between test runs, of course, so it
doesn't look like you lost half your stored blocks!).  This is because namenodes have no configuration
for which or how many datanodes to expect; namenodes simply accept registration from any datanode
that initiates communication with it.
-  Your statement "I can control number of datanode by change conf and restart" is therefore
not entirely correct.  Each datanode launched has to be pointed at its own config, but there
is no place in the config to define how many datanodes to launch. (This is partly because
running multiple nodes on a single server is not considered normal for a production environment,
even though it is useful for a test environment.) You may be thinking of the "slaves" file,
which is used by some launch scripts, but that is a tool to assist users in launching clusters,
not part of namenode configuration, and is also not really oriented toward launching multiple
nodes in a single server, if you read the scripts.

If you want launch scripts to help you locally launch different numbers of nodes with different
configs, you'll have to write them yourself, but they're really easy.  They just consist of
multiple lines that look like
$HADOOP_COMMON_HOME/bin/hadoop-daemon.sh --config $HADOOP_CONF_DIR --script $HADOOP_HDFS_HOME/bin/hdfs
start <datanode|namenode> [args]
with different values of $HADOOP_CONF_DIR for each line.

The same lines with "stop" instead of "start" will give you a well-behaved kill script.
As always you have to start and stop each node with appropriate userId so they have read/write
and i/o access permissions.

Hope this helps,
--Matt


On Mar 1, 2011, at 4:19 AM, Sungho Jeon wrote:

Hi, I'm graduate student and my major is computer science, data mining.
Is that possible that install multiple hadoop in one node?


I mean, I want to install several hadoop that have different conf.
Specifically, one hadoop has 5 datanode and other hadoop has 10 datanode.


Of course I can control number of datanode by change conf and restart.
But, without changing conf, install multiple hadoop in one node is possible?

Thanks


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message