cloudstack-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pedro Roque Marques <pedro.r.marq...@gmail.com>
Subject Re: [MERGE] network-guru-orchestration into master
Date Fri, 01 Nov 2013 16:16:42 GMT
Darren,

On Oct 31, 2013, at 10:05 AM, Darren Shepherd <darren.s.shepherd@gmail.com> wrote:

> Yeah I think it would be great to talk about this at CCC.  I'm
> hesitant to further narrow down the definition of the network.  For
> example, I think OpenStack's Neutron is fundamentally flawed because
> they defined a network as a L2 segment.

OpenContrail implements a Neutron plugin. It uses the Neutron API to provide the concept of
a virtual-network. The virtual-network can be a collection of IP subnets that work as a closed
user group; by configuring a network-policy between virtual-networks the user/admin can define
additional connectivity for the network. The same functionality can be achieved using the
AWS VPC API. We have extended the Neutron API with the concept of network-policy but have
not changed the underlying concept of network; the 1.00 release of the software provides an
IP service to the guest-only (the latest release provides fallback bridging for non-IP traffic
also). While i don't have a firm opinion on the Neutron API, it does not limit the network
to be an L2 segment.

> In the world of SDN, I think its even more important to keep the
> definition of the a network loose.  SDN has the capability of
> completely changing the way we look at L2 and L3.  Currently in
> networking we group things by L3 and L2 concepts as that is how
> routers and switches are laid out today.  As SDN matures and you see
> more flow oriented design it won't make sense to group things using L2
> and L3 concepts (as those become more a physical fabric technology),
> the groups becomes more loose and thus the definition of a network
> should be loose.

I don't believe there is an accepted definition of SDN. My perspective and the goal for OpenContrail
is to decouple the physical network from the service provided to the "edge" (the virtual-machines
in this case). The goal is to allow the physical underlay to be designed for throughput and
high inter-connectivity (e.g. CLOS topology); while implementing the functionality traditionally
found in an aggregation switch (the L2/L3 boundary) in the host.

The logic is that to get the highest server utilization one needs to be able to schedule a
VM (or LXC) anywhere in the cluster; this implies much greater data throughput requirements.
The standard operating procedure used to be to aim for I/O locality by placing multiple components
of an application stack in the same rack. In the traditional design you can easily find a
20:1 over-subscription between server ports and the actual throughput of the network core.

Once you spread the server load around, the network requirements go up to design points like
2:1 oversub. This requires a different physical design for the network and makes it so that
there isn't a pair of aggregation switches nicely positioned above the rack in order to implement
policies that control network-to-network traffic. This is the reason that OpenContrail tries
to implement network-to-network traffic policies in the ingress hypervisor switch and forward
traffic directly without requiring a VirtualRouter appliance.

Just to provide one less fluffy definition of what is the problem we are trying to solve...

> 
> Now that's not to say that a network can't provide L2 and L3
> information.  You should be able to create a network in CloudStack and
> based on the configuration you know that it is a single L2 or L3.  It
> is just that the core orchestration system can't make that fundamental
> assumption.  I'd be interested in furthering the model and maybe
> adding a concept of a L2 network such that a network guru when
> designing a network, can define multiple l2networks and associate them
> with the generic network that was created.  That idea I'm still
> toiling with.

I'd encourage you to not thing about L2 networks. I've yet to see an application that is "cloud-ready"
that needs anything but IP connectivity. For IP it doesn't matter what the underlying data
layer looks like... emulating ethernet is a rat-hole. There is no point in doing so.

> 
> For example, when configuring DHCP on the systemvm.  DHCP is a L2
> based serviced.

DHCP is an IP service. Typically provided via a DHCP relay service in the aggregation switch.
For instance in OpenContrail this is provided in the hypervisor switch (aka vrouter linux
kernel module).

>  So to configure DHCP you really need to know for each
> nic, what is the L2 its attached to and what are the VMs associated
> with that L2.  Today, since there is no first class concept of a L2
> network, you have to look at the implied definition of L2.  For basic
> networks, the L2 is the Pod, so you need to list all VMs in that Pod.
> For guest/VPC networks, the L2 is the network object, so you need to
> list all VMs associated with the network.  It would be nice if when
> the guru designed the network, it also defined the l2networks, and
> then when a VM starts the guru the reserve() method could associate
> the l2network to the nic.  So the nic object would have a network_id
> and a l2_network_id.

With OpenContrail, DHCP is quite simple. The Nic uuid is known by the vrouter kernel module
on the compute-node. When the DHCP request comes from the tap/vif interface the vrouter answers
locally (it known the relationship between Nic, its properties and virtual-network). Please
do not try to bring L2 into the picture. It would be very unhelpful to do so.

For most data-centers, the main networking objective is to get rid of L2 and its limitations.
Ethernet is really complex. It has a nice zero config deployment for very simple networks
but at the cost of high complexity if you are trying to do redundancy, use multiple links,
interoperate with other network devices, scale.... not to mention that all state is data-driven
which makes it really really hard to debug. Ethernet as a layer 1 point to point link is great;
not as a network.

  Pedro.
Mime
View raw message