ambari-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "a." <>
Subject Re: A few questions regarding Ambari: Stateless Imaging and existing infrastructure
Date Mon, 11 Aug 2014 16:02:25 GMT
Hi Todd!

I've run into a caveat getting the same setup to work, as we are looking
into a "live system", i.e. where the OS is completely loaded into the
RAM and not installed onto the hard drive before use. To be honest I
thought this was possible from what you wrote. Right now I only managed
for cobbler to let their installer provision CentOS onto the machines
hard drive. But this isn't what we wanted.

Could you clarify: Are you installing your CentOS onto the machines or
are you booting them up live? You wrote "Our Datanodes are completely
stateless" so that's where we took that from.

If you managed to get a pxe-live-boot to work with cobbler, I'd be so
happy to hear from you how you did it. I'm pretty sure it should be
doable but right now I do not know how to set cobbler up for that.

Any info you could give us would be great!

Also happy to talk to folk having the same problems as we have over
here. You are totally right, everyone wants to talk about the Data but
never about the platforms. ;)


P.S. I tried to email you directly before, but your address
toddæ was not valid. :(

On 08/04/2014 06:08 PM, Martin Tippmann wrote:
> ---------- Forwarded message ----------
> From: Todd Snyder <>
> Date: 2014-08-04 17:54 GMT+02:00
> Subject: Re: A few questions regarding Ambari: Stateless Imaging and
> existing infrastructure
> To:,
> re:
>> 1) Stateless images and Ambari
> We have just gone through the process of figuring this out, and it
> does work great.  We boot hundreds of nodes using a PXE server.  Our
> philosophy is to work in layers, so we bring up the 'platform' with
> PXE, then use our config managemnt to configure the node, then use
> Ambari to config the service.  As such, we've baked our image with
> CentOS6.5, Pupppet, Ambari Agent, and due to some frustration, we
> ended up baking in the Yarn install as well.  Our Datanodes are
> completely stateless, except for one folder on a data disk where we
> keep the Hadoop logs, so that if we have issues causing datanode
> reboots, we still have logs to review.
> Everything works nicely - we use Cobbler to manage DHCP/PXE.  The DHCP
> server has been 'modified' (cfg file) to provide both the puppet
> master and the ambari server for the envrionment.  DHCP-exit-hooks
> configures the initial environment, then runs Pupppet, which further
> configures it.  This brings up Ambari, looking for 'ambari.FQDN',
> which is (hopefully) the local cluster, assuming it's up.  Ambari
> checks in, gets its configs and tada!
> There are some caveats to "tada", however.  The first time you join
> the datanode to the cluster (ie: join Ambari agent to the server),
> Ambari server will assign it its roles, bring up the service(s)
> assigned (assuming you've assigned them on the server), and things are
> grand.  However, if you reboot the stateless server, Ambari (agent)
> doesn't start automatically, and forgets what its roles were.  As
> such, I've written a script that we call from rc.local (after Puppet,
> after Ambari agent) that uses the API to call out to the Ambari
> server, and push the roles back down to the agent.  This causes the
> server to push the roles to the agent, as well as the related configs,
> and brings everything back up and works.  We've rebooted hundreds of
> times now (across various nodes) and the approach works well.
> Apparently support for doing this automatically is coming (having the
> agent check in and get it's roles again).  Its been a few weeks since
> I looked at all of this, so I might be mixing up words/order of
> operations, apologies.  I can likely share more details if you're
> interested.
>> 3) Existing Hadoop
> We are doing a migration between versions of Hadoop, and haven't had
> any issues, particularly with Ambari.  Ambari hasn't formatted any
> disks or anything like that - it sits above that level of things.  I'd
> suggest testing to confirm, but in our case, we're simply rebooting
> the datanodes, switching from Ubuntu to CentOS + Ambari, and it leaves
> all the data alone.
>> 4) Ubuntu 14.04 suppport
> We ditched Ubuntu and picked up CentOS 6.5 because of the Ambari
> support.  It wasn't much work, just figuring out how to make a PXE
> image for CentOS vs Ubuntu.
> Happy to talk about managing large clusters.  I'm fairly new to it,
> but there's not enough people talking about the platform in the Big
> Data community .. everyone wants to talk about the Data :)
> Cheers,
> t.
> On Mon, Aug 4, 2014 at 11:15 AM, Martin Tippmann
> <> wrote:
>> Hi!
>> We are in the process to plan a new Hadoop 2.0 cluster. Ambari looks
>> really great for this job but while looking through the documentation
>> we stumbled upon a few questions:
>> 1. Stateless images and Ambari
>> We think about booting all machines in the Cluster using PXE +
>> stateless images. This means the OS image will only be in memory and
>> changes to /etc/ or files will vanish after an reboot. Is it possible
>> to use Ambari in such a setup? In theory in should be enough to start
>> the ambari-agent after booting the image and the agent will ensure
>> that the configuration is correct.
>> The idea is to use all the HDDs in the machines for HDFS storage and
>> to avoid the burden of maintance for seperate OS installs.
>> Provisioning the OS via automated install on the HDD is another option
>> if stateless imagining is not compatible with Ambari.
>> Can anyone here tell what they are using? What are the best practices?
>> We will have around 140 machines.
>> 2. Existing Icinga/Nagios and Ganglia
>> Is it possible to use an existing install of Ganglia and Nagios for
>> Ambari? We already a smaller Hadoop cluster and have Ganglia and
>> Icinga checks in place. We would like to avoid having duplicate
>> Infrastructure if possible run only one Icinga/Nagios server and only
>> one Ganglia instance for everything.
>> 3. Existing Hadoop
>> Is it possible to migrate an existing HDFS to Ambari? We have 150TB
>> data in one HDFS and would migrate that to Ambari but due to automated
>> nature of the installation I'd like to ask if it is safe to do so.
>> Does Ambari format the disks on the nodes while installing? Or will
>> the NameNode be formatted during installation?
>> 4. Ubuntu 14.04 support
>> We plan on using Ubuntu 14.04 LTS for the new cluster as we are only
>> using Ubuntu in the department here. Is this a bad idea? Will there be
>> support in the future? From looking through the requirements it
>> shouldn't be a major problem as Ambari is mostly Python and Java - but
>> if it is not and will not be supported we probably have to change the
>> OS.
>> Thanks for any help!
>> If you are already running a bigger Hadoop cluster I'd love to hear
>> some advice and best-practices for managing the system. At the moment
>> we plan on using xCat for provisioning the machines, Saltstack for
>> configuration management and Ambari for managing the Hadoop
>> configuration.
>> regards
>> Martin Tippmann

View raw message