hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From sodul <s...@odul.com>
Subject Re: IP based hadoop cluster
Date Sat, 05 Oct 2013 10:43:06 GMT
The only security is the one provided by the slave/master whitelists (more
dumb proof than attack proof, but still useful to avoid clusters talking to
each other accidentally).

I want to automate the deployment of hadoop clusters through Glu (from
LinkedIn) since we already use it to do single click deployments.

So far what I want to deploy, configure and start automatically without host
names or ssh is:
 - hdfs (done, except for that UI glitch)
 - mapreduce (done, looks fine)
 - hbase (almost done)
 - hive (not started)
 - sqoop (not started)
 - oozie (not started)

hdfs took me a while to figure out since I've never deployed hadoop clusters
before, mapreduce was easier and hbase is comming along quickly.

We use a single config file per cluster, mostly maps ip lists to roles, and
include some configuration variables. From there a script tells glu what
binaries go on what machines then glu deploys everything that needs to be
deployed in parallel. If a new version of a binary is released only the
machines that do not have the new binaries get redeployed. Adding/removing
hdfs/mapreduce slaves is done in a few clicks in the Glu WebUI and takes
just a few seconds (12s to deploy 3 machines last time I measured).

View this message in context: http://hadoop.6.n7.nabble.com/IP-based-hadoop-cluster-tp70191p70241.html
Sent from the common-user mailing list archive at Nabble.com.

View raw message