incubator-cloudstack-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marcus Sorensen <shadow...@gmail.com>
Subject Re: [DISCUSS] getting rid of KVM patchdisk
Date Tue, 05 Mar 2013 18:51:44 GMT
Yes, I knew we were headed that route (API on system vm), but the
amount of work and testing there isn't insignificant. I'd agree that
the api via link local is better than using the socket.  But in the
meantime, we can use the socket to get rid of the patch disk for KVM
and easily have that in 4.2. If it's obsoleted it wasn't a lot of
work.

On Tue, Mar 5, 2013 at 11:41 AM, Wido den Hollander <wido@widodh.nl> wrote:
>
>
> On 03/05/2013 07:28 PM, Edison Su wrote:
>>
>>
>>
>>> -----Original Message-----
>>> From: Marcus Sorensen [mailto:shadowsor@gmail.com]
>>> Sent: Monday, March 04, 2013 10:26 PM
>>> To: cloudstack-dev@incubator.apache.org
>>> Subject: Re: [DISCUSS] getting rid of KVM patchdisk
>>>
>>> I've been thinking about how to tackle this, written a little concept
>>> code, and
>>> it seems fairly straightforward to include our own little python daemon
>>> that
>>> speaks JSON via this local character device in the system vm. I'm
>>> assuming
>>> we'd start it up at the beginning of cloud-early-config.
>>>
>>> What I'm not certain of is how to get the 'cmdline' bits into the system
>>> before
>>> cloud-early-config needs them. Do we block in cloud-early-config, waiting
>>> on
>>> getting the cmdline file before continuing, and push it via StartCommand?
>>
>>
>> We put a lot of logic into init scripts inside system vm, which is
>> unnecessary complicated the system vm programming:
>> 1. init script is not portable, if people want to use other Linux
>> distribution as system vm OS, then he has to write his own init scripts.
>> 2. init script is not easy to hack, it has its own dialect(how to log
>> message, how to write dependence etc)
>> 3. init script is running in a limited environment(some system wide
>> services are not available), put the limitation on what you can do in a init
>> script.
>>
>> Maybe we need to start working on new system vm programming model now?
>> Better to just put a python daemon inside system vm, and provide restful API
>> through link local ip address(or private ip if it's vmware), then mgt server
>> or hypervisor agent code can just send commands to the python daemon through
>> http, instead of ssh.
>>
>> In your case, the python daemon, needs to wait on the well-defined serial
>> port(e.g /dev/virtio-ports/org.apache.cloudstack.guest.agent), get cmdline,
>> then programming system vm itself, and reboot.
>>
>
> This seems like a very sensible thing to do. I've already created Jira
> tickets about this last November.
>
> I haven't been able to look at it though, but have a Python daemon running
> which does everything on a Read-Only (!!!) filesystem would be awesome.
>
> The reason I mention the read-only filesystem is that it would make the
> system VMs much more resilient against SAN issues. Assume a read-only FS and
> make them stateless and store everything on a tmpfs or in memory somewhere.
>
> (Would be something cool to discuss at a Colab Conference ;))
>
> Wido
>
>
>>
>>
>>>
>>> On Mon, Mar 4, 2013 at 5:27 PM, Marcus Sorensen <shadowsor@gmail.com>
>>> wrote:
>>>>
>>>> I tested this with Rohit's systemvm from master. It works fine,
>>>> provided you install the qemu-guest-agent software and modify the
>>>> libvirt xml definition of the system vm to include something like:
>>>>
>>>>     <channel type='unix'>
>>>>        <source mode='bind' path='/var/lib/libvirt/qemu/v-2-VM.agent'/>
>>>>        <target type='virtio' name='org.qemu.guest_agent.0'/>
>>>>        <alias name='channel0'/>
>>>>        <address type='virtio-serial' controller='0' bus='0' port='1'/>
>>>>      </channel>
>>>>
>>>> Then on the host you can connect to the
>>>> /var/lib/libvirt/qemu/v-2-VM.agent unix socket and send QMP JSON to do
>>>> things like write files. We can't execute the various scripts through
>>>> it, but we also don't have to use qemu-ga; we could have our own thing
>>>> listening on the unix socket.
>>>>
>>>>
>>>>
>>>> On Mon, Mar 4, 2013 at 3:24 PM, Marcus Sorensen
>>>
>>> <shadowsor@gmail.com> wrote:
>>>>>
>>>>> I think this just requires an updated system vm (the virtio-serial
>>>>> portion). I've played a bit with the old debian 2.6.32-5-686-bigmem
>>>>> one and can't get the device nodes to show up, even though the
>>>>> /boot/config shows that it has CONFIG_VIRTIO_CONSOLE=y. However, if I
>>>>> try this with a CentOS 6.3 VM, on a CentOS 6.3 or Ubuntu 12.04 KVM
>>>>> host it works. So I'm not sure what's being used for the ipv6 update,
>>>>> but we can probably make one that works. We'll need to install
>>>>> qemu-ga and start it within the systemvm as well.
>>>>>
>>>>> On Mon, Mar 4, 2013 at 12:41 PM, Edison Su <Edison.su@citrix.com>
>>>
>>> wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: Marcus Sorensen [mailto:shadowsor@gmail.com]
>>>>>>> Sent: Sunday, March 03, 2013 12:13 PM
>>>>>>> To: cloudstack-dev@incubator.apache.org
>>>>>>> Subject: [DISCUSS] getting rid of KVM patchdisk
>>>>>>>
>>>>>>> For those who don't know (this probably doesn't matter, but...),
>>>>>>> when KVM brings up a system VM, it creates a 'patchdisk' on primary
>>>>>>> storage. This patchdisk is used to pass along 1) the authorized_keys
>>>>>>> file
>>>
>>> and 2) a 'cmdline'
>>>>>>>
>>>>>>> file that describes to the systemvm startup services all of the
>>>>>>> various properties of the system vm.
>>>>>>>
>>>>>>> Example cmdline file:
>>>>>>>
>>>>>>>   template=domP type=secstorage host=172.17.10.10 port=8250
>>>>>>> name=s-1- VM
>>>>>>> zone=1 pod=1 guid=s-1-VM
>>>>>>> resource=com.cloud.storage.resource.NfsSecondaryStorageResource
>>>>>>> instance=SecStorage sslcopy=true role=templateProcessor mtu=1500
>>>>>>> eth2ip=192.168.100.170 eth2mask=255.255.255.0
>>>
>>> gateway=192.168.100.1
>>>>>>>
>>>>>>> public.network.device=eth2 eth0ip=169.254.1.46
>>>
>>> eth0mask=255.255.0.0
>>>>>>>
>>>>>>> eth1ip=172.17.10.150 eth1mask=255.255.255.0
>>>
>>> mgmtcidr=172.17.10.0/24
>>>>>>>
>>>>>>> localgw=172.17.10.1 private.network.device=eth1
>>>>>>> eth3ip=172.17.10.192
>>>>>>> eth3mask=255.255.255.0 storageip=172.17.10.192
>>>>>>> storagenetmask=255.255.255.0 storagegateway=172.17.10.1
>>>>>>> internaldns1=8.8.4.4 dns1=8.8.8.8
>>>>>>>
>>>>>>> This patch disk has been bugging me for awhile, as it creates
a
>>>>>>> volume that isn't really tracked anywhere or known about in
>>>>>>> cloudstack's database. Up until recently these would just litter
>>>>>>> the KVM primary storages, but there's been some triage done to
>>>>>>> attempt to clean them up when the system vms go away. It's not
>>>>>>> perfect. It also can be inefficient for certain primary storage
>>>>>>> types, for example if you end up creating a bunch of 10MB luns
on a
>>>
>>> SAN for these.
>>>>>>>
>>>>>>>
>>>>>>> So my question goes to those who have been working on the system
>>>
>>> vm.
>>>>>>>
>>>>>>> My first preference (aside from a full system vm redesign, perhaps
>>>>>>> something that is controlled via an API) would be to copy these
up
>>>>>>> to the system vm via SCP or something. But the cloud services
start
>>>>>>> so early on that this isn't possible. Next would be to inject
them
>>>>>>> into the system vm's root disk before starting the server, but
if
>>>>>>> we're allowing people to make their own system vms, can we count
on
>>>>>>> the partitions being what we expect? Also I don't think this
will
>>>>>>> work for RBD, which qemu directly connects to, with the host
OS
>>>
>>> unaware of any disk.
>>>>>>>
>>>>>>>
>>>>>>> Options?
>>>>>>
>>>>>>
>>>>>> Could you take a look at the status of this projects in KVM?
>>>>>> http://wiki.qemu.org/Features/QAPI/GuestAgent
>>>>>> https://fedoraproject.org/wiki/Features/VirtioSerial
>>>>>>
>>>>>> Basically, we need a way to talk to guest VM(sending parameters to
>>>
>>> KVM guest) after VM is booted up. Both VMware/Xenserver has its own way
>>> to send parameters to guest VM through PV driver, but there is no such
>>> thing
>>> for KVM few years ago.

Mime
View raw message