incubator-cloudstack-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chiradeep Vittal <Chiradeep.Vit...@citrix.com>
Subject Re: Review Request: API commands to add or remove NIC from a VM
Date Mon, 28 Jan 2013 19:31:27 GMT
Inline

On 1/25/13 4:03 PM, "Brian Angus" <blangus@betterservers.com> wrote:

>My answers inline...
>
>On 01/24/2013 06:49 PM, Marcus Sorensen wrote:
>> Brian is probably the right guy to answer some of these, but I'll chime
>>in
>> on a few.
>>
>> On Thu, Jan 24, 2013 at 5:58 PM, Chiradeep Vittal <
>> Chiradeep.Vittal@citrix.com> wrote:
>>
>>> Thanks for this.
>>> A few comments:
>>> 1. These mutating operations have to be async and hence the API
>>>commands
>>> have to inherit from BaseAsyncCmd instead of BaseCmd
>>>
>>
>> We can do that
>This change has been pushed to the branch

I saw this. We have 2 choices: extend from BaseAsyncCreateCmd or
BaseAsyncCmd. I see that you have chosen the latter. If the former: you
synchronously create the nic object first and obtain an id (uuid), and
then asynchronously attach the nic. In the current implementation both the
db entity creation and the attach are in the asynchronous workflow. If the
attach fails in the backend, then there will be a nic object in the
database theoretically attached to the vm, but not really attached.
Choices:
A) change the data model to model the fact that the nic is attached or not
(and use the BaseAsyncCreateCmd approach)
B) ensure that the nic is destroyed when the attach fails.

Note that with (B) you have a potential race condition where an API client
can retrieve the user vm before the attach has succeeded/failed and then
potentially add a static nat rule.

>>
>>
>>> 2. The functional specification is rather sparse -- I.e., it is not of
>>> much use for a QA person to develop a comprehensive set of test cases.
>>>I
>>> recommend taking a look at
>>> 
>>>https://cwiki.apache.org/confluence/display/CLOUDSTACK/Design+Document+T
>>>emp
>>> late to see what more information one could put in there. For example,
>>> edge cases (remove the last nic / exceed nic limit), configuration
>>> parameters.
>I have added more details to the functional spec.  Probably still needs
>more work...

Yes, please add any information about limits, usage data, logs, failure
conditions and any OS dependencies.

>>> 3. There's todos in the code: either this code is important  and cannot
>>> ship without the TODO or is a nice to have. It is hard for someone to
>>> figure out which case it is. Regardless if the TODOs are not fixed,
>>>they
>>> should have bugs
>>>
>>
>> I'll have to revisit the code, but I know at least a few that were
>> inherited from the existing code base (things that were borrowed). Also
>>I
>> know of one or two that were just notes to myself of things to
>>investigate,
>> I thought I had removed those but I'll double check.
>Some of these have been addressed I'll work on the ones that still exist.
>>
>>
>>> 4. Questions in my mind:
>>> A. Do you update instance metadata during this operation?
>>> B. Verify IP address is not already used / in range / etc
>>>
>>
>> We don't assign the IP, we call the same code that creates nics in the
>> first place, and the IP is assigned via the same functionality that
>> currently exists. If there's a problem with using IPs that are already
>> used, it's a problem for deployVirtualMachine as well.
>>

If so, then it is a bug for both use cases. Doesn't it mean we shouldn't
fix at least one of the use cases? For example, if the network subnet is
10.1.1.0/24 and I try to create a nic with IP 75.75.75.75/32? I am
referring to the 'ipaddress' parameter in the AddNicToVMCmd.


>> E. Verify live migration after operation?
>>>
>>
>> This has been done for KVM, could be tested for Xen, but the VM
>>definition
>> looks good.
>I have tested this on XEN devcloud as well

I didn't know you could do live migration on devcloud?

>>
>>> F. Operation blocks / fails during live migration / HA / start / stop
>>>
>>
>> Brian recently changed some of the code related to this, but previously
>>the
>> VM state had to be 'stopped', not 'starting', 'stopping', 'running', or
>> 'migrating'.
>The API calls will fail if the VM is not in a Running or Stopped state.

That closes some holes. But since the async operation does not have a
state machine, it is still possible for someone to migrate / stop the VM
while the nic attach is in progress. It may not matter (and might actually
work). I would note this as a limitation (bug) and hope that Simon takes
care of this in his feature.

>>
>>
>>> G. Usage events for remove / add operations
>>>
>>
>> I'm unfamilar with usage events. The usages for new nics are tracked,
>> though.
>>
>>
>>> H. Interaction of removeNic with extant LB / PF / Static NAT/ FW rules
>>>
>>
>> This is a good one to verify, does removing a nic orphan a port
>>forwarding
>> rule, etc, or does removing the nic clean up the rule?

Unlikely that any clean up is going to happen.

>>
>>
>>> I. Interaction of addNic / updateNic with PF/Static NAT/Source NAT --
>>> e.g., the default route might remain the same inside the VM leading to
>>> head scratching as to why a specific PF/NAT rule does not work.
>>>
>>
>> This is a non-issue, in my opinion. It's similar to attaching a volume
>>and
>> then wondering why it's not formatted and mounted for us right where we
>> wanted it. People have all sorts of varying uses for multiple networks,
>>and
>> it's up to them to configure what they intend. It would be good to make
>>a
>> note in the documentation though, people should know that we don't have
>> much control over their VM's OS config and they need to treat it like
>>any
>> other server.

The FS is a good place to document this.

Thanks
--
Chiradeep


Mime
View raw message