cloudstack-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eric Green <eric.lee.gr...@gmail.com>
Subject Some things I found out installing on Centos 7
Date Wed, 02 Aug 2017 08:12:51 GMT
First, about me -- I've been administering Linux systems since 1995. No, that's not a typo
-- that's 22 years. I've also worked for a firewall manufacturer in the past, I designed the
layer 2 VLAN support for a firewall vendor, so I know VLAN's and such. I run a fairly complex
production network with multiple VLAN's, multiple networks, etc. already, and speak fluent
Cisco CLI. In short, I'm not an amateur at this networking stuff, but figuring out how Cloudstack
wanted my CentOS 7 networking to be configured, and doing all the gymnastics to make it happen,
consumed nearly a week because the documentation simply isn't up to date, thorough, or accurate,
at least for Centos 7. 

So anyhow, my configuration:

Cloudstack 4.9.2.0 from the RPM repository at cloudstack.apt-get.eu

Centos 7 servers with:

2 10gbit Ethernet ports -> bond0 

A handful of VLANS:

100 -- from my top of rack switch is sent to my core backbone switch layer 3 routed to my
local network as 10.100.x.x and thru the NAT border firewall and router to the Internet. Management.
101 -- same but for 10.101.x.x  -- public.
102 -- same but for 10.102.x.x  -- guest public (see below).
192 -- A video surveillance camera network that is not routed to anywhere, via a drop from
the core video surveillance POE switch to an access mode port on my top of rack switch. Not
routed.
200 -- 10 gig drop over to my production racks to my storage network there for accessing legacy
storage. Not routed. (Legacy storage is not used for Cloudstack instance or secondary storage
but can be accessed by virtual machines being migrated to this rack).
1000-2000 -- VLAN's that exist in my top of rack switch on the Cloudstack rack and assigned
to my trunk ports to the cloud servers but routed nowhere else, for VPC's and such. 

Stuck with VLAN's rather than one of the SDN modules like VXNET because a) it's the oldest
and most likely to be stable, b) compatible with my already-existing network hardware and
networks (wouldn't have to somehow map a VLAN to a SDN virtual network to reach 192 or 200
or create a public 102), and c) least complex to set up and configure given my existing top-of-rack
switch that does VLANs just fine.

Okay, here's how I had to configure Centos 7 to make it work: 

enp4s[01] -> bond0 -> bond0.100 -> br100  -- had to create two interface files, add
them to bond0 bridge, then create a bond0.100 vlan interface, then a br100 bridge,  for my
management network. In
/etc/sysconfig-network-scripts: 

# ls ifcfg-*
ifcfg-bond0 ifcfg-bond0.100 ifcfg-br100 ifcfg-enp4s0 ifcfg-enp4s1

(where 4s0 and 4s1 are my 10 gigabit Ethernets).

Don't create anything else. You'll just confuse Cloudstack. Any other configuration of the
network simply fails to work. In particular, creating br101 etc. fails because CloudStack
wants to create its own VLANs and  bridges and if you traffic label it as br101 it'll try
making vlan br101.101 (doesn't work, duh). Yes, I know this contradicts every single piece
of advice I've seen on this list. All I know is that this is what works, while every other
piece of advice I've seen for labeling the public and private guest networking fails. 

When creating the networks in the GUI under Advanced networking, set bond0 as your physical
network and br100 as the KVM traffic label for the Management network and Storage network
and give them addresses with VLAN 100 (assuming you're using the same network for both management
and storage networks, which is what makes sense with my single 10gbit pipe), but do *not*
set up anything as a traffic label for Guest or Public networks. You will confuse the agent
greatly. Let it use the default labels. It'll work. It'll set up its on bond0.<tag>
VLAN interface and brbond0-<tag> as needed. This violates every other piece of advice
I've seen for labeling, but this is what actually works with this version of Cloudstack and
this version of Centos when you're sending everything through a VLAN-tagged bond0.

A very important configuration option *not* documented in the installation documents:

secstorage.allowed.internal.sites=10.100.0.0/16

(for my particular network). 

Otherwise I couldn't upload ISO files to the server from my nginx server that's pointing at
the NFS directory full of ISO files.

---

Very important guest VM image prep *NOT* in the docs:

Be sure to install / enable / run acpid on Linux guests, otherwise "clean" shutdowns can't
happen. Turns out Cloudstack on KVM uses the ACPI shutdown functionality of qemu-kvm. Probably
does that on other hypervisors too.

---

Now on for that mysterious VLAN 102:

I created a "public" shared network on the 102 vlan for stuff I don't care is out in the open.
This is a QA lab environment, not a public cloud. So I assigned a subnet and a VLAN, ran a
VLAN drop over to my main backbone layer 3 switch (and bopped up to my border firewall and
told it about the new subnet too so that we could get out to the Internet as needed), and
let it go public. Gotta be a reason why we paid Cisco big bucks for all that hardware, right?

Plus it's very convenient to delegate a subdomain to the virtual router for that subnet, and
have people able to access their instances as "my-instance.cloud.mycompany.com" where "my-instance"
is the name of their instance in the GUI. It's not documented anywhere that I can find that
you can do this (delegate a subdomain to the virtual router for a guest subnet). But it works,
and it's very convenient for my QA people. 

I've played with the VPC stuff. It looks quite powerful. If I were doing a customer-facing
cloud, that's how I'd do it. It's just not what our engineers need for testing our software.

---

Final thoughts:

1) The GUI is definitely in need of help. Maybe I'm just too accustomed to modern responsive
RESTful UI's, but this GUI is the opposite of responsive in most locations. You do something,
and the display never updates with the changes. Because it's not RESTful, you can't just hit
the refresh button either -- that'll take you all the way back to the login screen. 
2) The documentation clearly is in need of help. If I, someone with 22 years of experience
with Linux and advanced networking and an already-existing complex network of multiple VLAN's
with multiple virtualization offerings and who already had a top-of-rack switch configured
and VLANs and subnets to core backbone switch and Internet boundary router configured as well
as working networking with NFS etc already configured on the CentOS 7 servers take a week
of trial-and-error to actually get a working installation when it turns out to be ridiculously
simple once you know the tricks, clearly the tricks need to be documented. It appears that
most of the documentation is oriented around XenServer, and there's nothing specific to CentOS
7 either, though the CentOS 6 documents are *almost* correct for CentOS 7.
3) Failures were mysterious. Error messages said '[Null] failed' way too often. '[Null]' what?!
So then I had to examine the system itself via journalctl / ip addr / etc. to see what clues
it may have left behind such as attempts to configure network ports, check agent logs, etc.
to make guesses as to what may have gone wrong. A simple "Could not create network bridge
for public network because the NIC is in use by another bridge" would have saved hours worth
of time all by itself. 

That said, I looked at OpenStack -- a mess of incompatible technologies stitched together
with hacks -- and waved off as something that was overkill for anything smaller than a Fortune
500 company or Rackspace.com that has the budget to have a team of consultants come in and
hack it to their needs. Eucalyptus isn't flexible enough to do what I need to do with networks,
we have a surveillance network with around 100 cameras that feeds data to the QA / R&D
infrastructure, I could find no way in Eucalyptus to give that network to the virtual machines
I wanted to have it. OpenNebula ate a friend's cloud multiple times.  Not going to talk about
oVirt. Nope, won't. And CloudStack does everything I need it to do.

That said, my needs are almost fulfilled by vSphere / vCenter. It's quite clear why VMware
still continues to exist despite the limitations of their solution. There is something to
be said for bullet-proof and easy to install and manage.  It's clunky and limited but bullet-proof.
As in, the only time my ESXi servers have ever gone done, *ever*, is for power failures. As
in, run for years at a time without any attention at all. And it didn't take much time to
install and configure either, certainly none of the trial and error involved with Cloudstack.
That's hard to beat... but the hardware requirements are exacting and would have required
me to invest more in hardware than I did here, the software licenses are expensive too, and
I just couldn't justify that for a QA playground.

So consider me slightly annoyed but appreciative. It appears Cloudstack is going to solve
my needs here. We'll see.



Mime
View raw message