hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Marton, Elek" <...@anzix.net>
Subject Re: Hadoop "managed" setup basic question (Ambari, CDH?)
Date Tue, 26 Sep 2017 21:06:37 GMT

If you would like to do it in a more dynamic way you can also you 
service registry/key-value stores.

For example the configuration could be stored in Consul and the servers 
(namenode, datanode) could be started with consul-template 

In case of configuration change the servers will be refreshed and 
restarted automatically.


ps: I use a very similar (but much more complex) approach when I run 
Hadoop in the cloud.

1. I initialize the vm-s with Terraform

2. After that I install the basic infrastructure (eg. Consul, Nomad 
servers and Weave scope monitoring) with ansible. Inventory file is 
generated from the Terraform state file.

3. I start hadoop/namenode from docker. Containers are scheduled with 
Nomad. (nomad definition: http://github.com/flokkr/runtime-nomad, 
generic docs about containers: https://github.com/flokkr/flokkr)

4. Configuration is stored in a git repository 
(https://github.com/flokkr/configuration) in a simplified format. During 
a preprocessing step they are uploaded to the consul (in the final 
form). And it supports specific profiles. For example I can switch 
easily between HA or non-HA configuration just with one flag.

5. A consul-template like script (but more simple: 
https://github.com/elek/consul-launcher) is part of my the docker images 
(https://github.com/flokkr/docker-baseimage). They are listen on the 
changes in consul and the servers will be restarted if the configuration 
are changes.

There are many small pieces, so most probably it's a more complex 
solution then what you need. But if you are familiar with the small 
tools, it's not so hard the build some low-level (and lightening-fast) 
configuration-management/service-registry solution with the existing 
devops tools.

On 09/22/2017 12:42 PM, Sanel Zukan wrote:
> Hi,
> For this amount of nodes, I'd go with automation tools like
> Ansible[1]/Puppet[2]/Rex[3]. They can install necessary packages, setup
> /etc/hosts and make per-node settings.
> Ansibles has a nice playbook
> (https://github.com/analytically/hadoop-ansible) you can start with and
> Puppet isn't short either (https://forge.puppet.com/tags/hadoop).
> Best,
> Sanel
> [1] https://ansible.com
> [2] https://puppet.com
> [3] https://rexify.org
> "Zaki SEc." <zakimano@gmail.com> writes:
>> [I am sorry in case this mail is sent twice, it was not intentional]
>> Hi!
>> I'm fairly new to Hadoop, but I've been browsing the documentation and
>> 'how-to'-s for some time now.
>> My question would be as follows; How can one setup a cluster, where the
>> nodes aren't static?
>> What I mean is, I want to be able to run a cluster, say, 20 machines, where
>> each of the nodes have Hadoop installed, and they 'recognize' each other -
>> saving me from having to manually set their hostnames and configure their
>> '/etc/hosts' file.
>> I did look into Apache Ambari, hoping that it would give me an easy
>> solution to the above problem, but it does not support Ubuntu 16.04 which I
>> have to work with, and it failed to build for various reasons.
>> I have also looked into Cloudera's CDH distribution, (the manual
>> installation) but that has the same problem - it asks me to manually
>> configure these settings for each node.
>> It seemed to me, that "Rack Awareness" could potentially solve my problem,
>> but after some reading, I had to realize that it's for a different thing
>> entirely.
>> So now it looks like I'm out of options.
>> Lately, I was wondering about writing an external script, that would update
>> the settings for each of the nodes automatically, based on one central
>> 'list', hosted on, for ex. the NameNode. While this isn't nearly on the
>> level of a real dynamic setup, it would make my job significantly easier.
>> Thanks in advance,
>> Zaki
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@hadoop.apache.org
> For additional commands, e-mail: user-help@hadoop.apache.org

To unsubscribe, e-mail: user-unsubscribe@hadoop.apache.org
For additional commands, e-mail: user-help@hadoop.apache.org

View raw message