cloudstack-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marcus Sorensen <>
Subject Re: [MERGE] network-guru-orchestration into master
Date Sat, 02 Nov 2013 04:55:52 GMT
On Fri, Nov 1, 2013 at 10:16 AM, Pedro Roque Marques
<> wrote:
> Darren,
> On Oct 31, 2013, at 10:05 AM, Darren Shepherd <> wrote:
>> Yeah I think it would be great to talk about this at CCC.  I'm
>> hesitant to further narrow down the definition of the network.  For
>> example, I think OpenStack's Neutron is fundamentally flawed because
>> they defined a network as a L2 segment.
> OpenContrail implements a Neutron plugin. It uses the Neutron API to provide the concept
of a virtual-network. The virtual-network can be a collection of IP subnets that work as a
closed user group; by configuring a network-policy between virtual-networks the user/admin
can define additional connectivity for the network. The same functionality can be achieved
using the AWS VPC API. We have extended the Neutron API with the concept of network-policy
but have not changed the underlying concept of network; the 1.00 release of the software provides
an IP service to the guest-only (the latest release provides fallback bridging for non-IP
traffic also). While i don't have a firm opinion on the Neutron API, it does not limit the
network to be an L2 segment.
>> In the world of SDN, I think its even more important to keep the
>> definition of the a network loose.  SDN has the capability of
>> completely changing the way we look at L2 and L3.  Currently in
>> networking we group things by L3 and L2 concepts as that is how
>> routers and switches are laid out today.  As SDN matures and you see
>> more flow oriented design it won't make sense to group things using L2
>> and L3 concepts (as those become more a physical fabric technology),
>> the groups becomes more loose and thus the definition of a network
>> should be loose.
> I don't believe there is an accepted definition of SDN. My perspective and the goal for
OpenContrail is to decouple the physical network from the service provided to the "edge" (the
virtual-machines in this case). The goal is to allow the physical underlay to be designed
for throughput and high inter-connectivity (e.g. CLOS topology); while implementing the functionality
traditionally found in an aggregation switch (the L2/L3 boundary) in the host.
> The logic is that to get the highest server utilization one needs to be able to schedule
a VM (or LXC) anywhere in the cluster; this implies much greater data throughput requirements.
The standard operating procedure used to be to aim for I/O locality by placing multiple components
of an application stack in the same rack. In the traditional design you can easily find a
20:1 over-subscription between server ports and the actual throughput of the network core.
> Once you spread the server load around, the network requirements go up to design points
like 2:1 oversub. This requires a different physical design for the network and makes it so
that there isn't a pair of aggregation switches nicely positioned above the rack in order
to implement policies that control network-to-network traffic. This is the reason that OpenContrail
tries to implement network-to-network traffic policies in the ingress hypervisor switch and
forward traffic directly without requiring a VirtualRouter appliance.
> Just to provide one less fluffy definition of what is the problem we are trying to solve...
>> Now that's not to say that a network can't provide L2 and L3
>> information.  You should be able to create a network in CloudStack and
>> based on the configuration you know that it is a single L2 or L3.  It
>> is just that the core orchestration system can't make that fundamental
>> assumption.  I'd be interested in furthering the model and maybe
>> adding a concept of a L2 network such that a network guru when
>> designing a network, can define multiple l2networks and associate them
>> with the generic network that was created.  That idea I'm still
>> toiling with.
> I'd encourage you to not thing about L2 networks. I've yet to see an application that
is "cloud-ready" that needs anything but IP connectivity. For IP it doesn't matter what the
underlying data layer looks like... emulating ethernet is a rat-hole. There is no point in
doing so.

May be true in the sense that 'cloud-ready' applications are generally
just web/application servers that are ephemeral, but I'd just like to
point out that many folks aren't using CloudStack to provide cloud
servers, they're using it to provide traditional or hybrid
infrastructure.  Throwing out layer 2 to me seems like throwing away
the whole concept of a VPC.  Or perhaps you're just saying that it can
be emulated by managing ACLs on a per-VM basis, like security groups,
and that no applications actually need to be on the same subnet or
broadcast domain. I'm not sure that can be assumed, for example
DSR-style load balancing requires a real layer 2.

>> For example, when configuring DHCP on the systemvm.  DHCP is a L2
>> based serviced.
> DHCP is an IP service. Typically provided via a DHCP relay service in the aggregation
switch. For instance in OpenContrail this is provided in the hypervisor switch (aka vrouter
linux kernel module).
>>  So to configure DHCP you really need to know for each
>> nic, what is the L2 its attached to and what are the VMs associated
>> with that L2.  Today, since there is no first class concept of a L2
>> network, you have to look at the implied definition of L2.  For basic
>> networks, the L2 is the Pod, so you need to list all VMs in that Pod.
>> For guest/VPC networks, the L2 is the network object, so you need to
>> list all VMs associated with the network.  It would be nice if when
>> the guru designed the network, it also defined the l2networks, and
>> then when a VM starts the guru the reserve() method could associate
>> the l2network to the nic.  So the nic object would have a network_id
>> and a l2_network_id.
> With OpenContrail, DHCP is quite simple. The Nic uuid is known by the vrouter kernel
module on the compute-node. When the DHCP request comes from the tap/vif interface the vrouter
answers locally (it known the relationship between Nic, its properties and virtual-network).
Please do not try to bring L2 into the picture. It would be very unhelpful to do so.
> For most data-centers, the main networking objective is to get rid of L2 and its limitations.
Ethernet is really complex. It has a nice zero config deployment for very simple networks
but at the cost of high complexity if you are trying to do redundancy, use multiple links,
interoperate with other network devices, scale.... not to mention that all state is data-driven
which makes it really really hard to debug. Ethernet as a layer 1 point to point link is great;
not as a network.
>   Pedro.

View raw message