ambari-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Todd Snyder <>
Subject Re: A few questions regarding Ambari: Stateless Imaging and existing infrastructure
Date Mon, 04 Aug 2014 15:54:36 GMT

> 1) Stateless images and Ambari

We have just gone through the process of figuring this out, and it does
work great.  We boot hundreds of nodes using a PXE server.  Our philosophy
is to work in layers, so we bring up the 'platform' with PXE, then use our
config managemnt to configure the node, then use Ambari to config the
service.  As such, we've baked our image with CentOS6.5, Pupppet, Ambari
Agent, and due to some frustration, we ended up baking in the Yarn install
as well.  Our Datanodes are completely stateless, except for one folder on
a data disk where we keep the Hadoop logs, so that if we have issues
causing datanode reboots, we still have logs to review.

Everything works nicely - we use Cobbler to manage DHCP/PXE.  The DHCP
server has been 'modified' (cfg file) to provide both the puppet master and
the ambari server for the envrionment.  DHCP-exit-hooks configures the
initial environment, then runs Pupppet, which further configures it.  This
brings up Ambari, looking for 'ambari.FQDN', which is (hopefully) the local
cluster, assuming it's up.  Ambari checks in, gets its configs and tada!

There are some caveats to "tada", however.  The first time you join the
datanode to the cluster (ie: join Ambari agent to the server), Ambari
server will assign it its roles, bring up the service(s) assigned (assuming
you've assigned them on the server), and things are grand.  However, if you
reboot the stateless server, Ambari (agent) doesn't start automatically,
and forgets what its roles were.  As such, I've written a script that we
call from rc.local (after Puppet, after Ambari agent) that uses the API to
call out to the Ambari server, and push the roles back down to the agent.
 This causes the server to push the roles to the agent, as well as the
related configs, and brings everything back up and works.  We've rebooted
hundreds of times now (across various nodes) and the approach works well.
 Apparently support for doing this automatically is coming (having the
agent check in and get it's roles again).  Its been a few weeks since I
looked at all of this, so I might be mixing up words/order of operations,
apologies.  I can likely share more details if you're interested.

> 3) Existing Hadoop

We are doing a migration between versions of Hadoop, and haven't had any
issues, particularly with Ambari.  Ambari hasn't formatted any disks or
anything like that - it sits above that level of things.  I'd suggest
testing to confirm, but in our case, we're simply rebooting the datanodes,
switching from Ubuntu to CentOS + Ambari, and it leaves all the data alone.

>4) Ubuntu 14.04 suppport

We ditched Ubuntu and picked up CentOS 6.5 because of the Ambari support.
 It wasn't much work, just figuring out how to make a PXE image for CentOS
vs Ubuntu.

Happy to talk about managing large clusters.  I'm fairly new to it, but
there's not enough people talking about the platform in the Big Data
community .. everyone wants to talk about the Data :)



On Mon, Aug 4, 2014 at 11:15 AM, Martin Tippmann <> wrote:

> Hi!
> We are in the process to plan a new Hadoop 2.0 cluster. Ambari looks
> really great for this job but while looking through the documentation
> we stumbled upon a few questions:
> 1. Stateless images and Ambari
> We think about booting all machines in the Cluster using PXE +
> stateless images. This means the OS image will only be in memory and
> changes to /etc/ or files will vanish after an reboot. Is it possible
> to use Ambari in such a setup? In theory in should be enough to start
> the ambari-agent after booting the image and the agent will ensure
> that the configuration is correct.
> The idea is to use all the HDDs in the machines for HDFS storage and
> to avoid the burden of maintance for seperate OS installs.
> Provisioning the OS via automated install on the HDD is another option
> if stateless imagining is not compatible with Ambari.
> Can anyone here tell what they are using? What are the best practices?
> We will have around 140 machines.
> 2. Existing Icinga/Nagios and Ganglia
> Is it possible to use an existing install of Ganglia and Nagios for
> Ambari? We already a smaller Hadoop cluster and have Ganglia and
> Icinga checks in place. We would like to avoid having duplicate
> Infrastructure if possible run only one Icinga/Nagios server and only
> one Ganglia instance for everything.
> 3. Existing Hadoop
> Is it possible to migrate an existing HDFS to Ambari? We have 150TB
> data in one HDFS and would migrate that to Ambari but due to automated
> nature of the installation I'd like to ask if it is safe to do so.
> Does Ambari format the disks on the nodes while installing? Or will
> the NameNode be formatted during installation?
> 4. Ubuntu 14.04 support
> We plan on using Ubuntu 14.04 LTS for the new cluster as we are only
> using Ubuntu in the department here. Is this a bad idea? Will there be
> support in the future? From looking through the requirements it
> shouldn't be a major problem as Ambari is mostly Python and Java - but
> if it is not and will not be supported we probably have to change the
> OS.
> Thanks for any help!
> If you are already running a bigger Hadoop cluster I'd love to hear
> some advice and best-practices for managing the system. At the moment
> we plan on using xCat for provisioning the machines, Saltstack for
> configuration management and Ambari for managing the Hadoop
> configuration.
> regards
> Martin Tippmann

View raw message