cloudstack-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Syahrul Sazli Shaharir <sa...@nocser.net>
Subject Re: patchviasocket seems to be broken with qemu 2.3(+?)
Date Wed, 21 Dec 2016 06:35:31 GMT
On 2016-12-20 17:53, Wei ZHOU wrote:
> Hi Synhrul,
> 
> Could you upload the /var/log/cloud.log ?

Sure:-

Working router VM: http://pastebin.com/hwwk86ve

Non-working router VM: http://pastebin.com/G4nv09ab

Thanks.

> 
> -Wei
> 
> 2016-12-20 3:18 GMT+01:00 Syahrul Sazli Shaharir <sazli@nocser.net>:
> 
>> On 2016-12-19 18:10, Syahrul Sazli Shaharir wrote:
>> 
>>> On 2016-12-19 17:03, Linas Žilinskas wrote:
>>> 
>>>> From the logs it doesn't seem that the script timeouts. "Execution 
>>>> is
>>>> successful", so it manages to pass the data over the socket.
>>>> 
>>>> I guess the systemvm just doesn't configure itself for some reason.
>>>> 
>>> 
>>> You are right, I was able to enter the router VM console at some 
>>> point
>>> during the timeout loops, and able to capture syslog output during 
>>> the
>>> loop:-
>>> 
>>> http://pastebin.com/n37aHeSa
>>> 
>> 
>> I restarted another network, and that network's router VM was able to 
>> be
>> recreated, even on the same host as the failed network (and both 
>> networks
>> are exactly same configuration, only VLAN & subnet are different).
>> Comparing between the two syslog outputs during boot shows the 
>> problematic
>> network router VM self-configuration got stuck in vm_dhcp_entry.json .
>> 
>> 1. Working network router VM : http://pastebin.com/Y6zpDa6M
>> 2. Non-working network router VM : http://pastebin.com/jzfGMGQB
>> 
>> Thanks.
>> 
>> 
>> 
>>> Also, in my personal tests, I noticed some different behaviour with
>>>> different kernels. Don't remember the specifics right now, but on 
>>>> some
>>>> combinations (qemu / kernel) the socket acted differently. For 
>>>> example
>>>> the data was sent over the socket, but wasn't visible inside the VM.
>>>> Other times the socket would be stuck from the host side.
>>>> 
>>>> So i would suggest testing different kernels (3.x, 4.4.x, 4.8.x) or
>>>> try to login to the system vm and see what's happening from inside.
>>>> 
>>> 
>>> Will do this next and feedback the results here.
>>> 
>>> Thanks for your help! :)
>>> 
>>> 
>>> On 12/16/16 03:46, Syahrul Sazli Shaharir wrote:
>>>> 
>>>> On 2016-12-16 11:27, Syahrul Sazli Shaharir wrote:
>>>>> On Wed, 26 Oct 2016, Linas ?ilinskas wrote:
>>>>> 
>>>>> So after some investigation I've found out that qemu 2.3.0 is 
>>>>> indeed
>>>>> broken, at least the way CS uses the qemu chardev/socket.
>>>>> 
>>>>> Not sure in which specific version it happened, but it was fixed in
>>>>> 2.4.0-rc3, specifically noting that CloudStack 4.2 was not working.
>>>>> 
>>>>> qemu git commit: 4bf1cb03fbc43b0055af60d4ff093d6894aa4338
>>>>> 
>>>>> Also attaching the patch from that commit.
>>>>> 
>>>>> For our own purposes i've included the patch to the qemu-kvm-ev
>>>>> package (2.3.0) and all is well.
>>>>> 
>>>>> Hi,
>>>>> 
>>>>> I am facing the exact same issue on latest Cloudstack 4.9.0.1, on
>>>>> latest CentOS 7.3.1611, with latest qemu-kvm-ev-2.6.0-27.1.el7
>>>>> package.
>>>>> 
>>>>> The issue initially surfaced following a heartbeat-induced reset of
>>>>> all hosts, when it was on CS 4.8 @ CentOS 7.0 and stock
>>>>> qemu-kvm-1.5.3. Since then, the patchviasocket.pl/py timeouts
>>>>> persisted for 1 out of 4 router VM/networks, even after upgrading 
>>>>> to
>>>>> 
>>>>> latest code. (I have checked the qemu-kvm-ev-2.6.0-27.1.el7 source,
>>>>> and the patched code are pretty much still intact, as per the
>>>>> 2.4.0-rc3 commit).
>>>>> 
>>>>> Any help would be greatly appreciated.
>>>>> 
>>>>> Thanks.
>>>>> 
>>>>> (Attached are some debug logs from the host's agent.log)
>>>>> 
>>>> 
>>>> Here are the debug logs as mentioned: http://pastebin.com/yHdsMNzZ
>>>> 
>>>> Thanks.
>>>> 
>>>> --sazli
>>>>> 
>>>>> On 2016-10-20 09:59, Linas ?ilinskas wrote:
>>>>> 
>>>>> Hi.
>>>>> 
>>>>> We have made an upgrade to 4.9.
>>>>> 
>>>>> Custom build packages with our own patches, which in my mind (i'm
>>>>> the only
>>>>> one patching those) should not affect the issue i'll describe.
>>>>> 
>>>>> I'm not sure whether we didn't notice it before, or it's actually
>>>>> related
>>>>> to something in 4.9
>>>>> 
>>>>> Basically our system vm's were unable to be patched via the qemu
>>>>> socket.
>>>>> The script simply error'ed out with a timeout while trying to push
>>>>> the
>>>>> data to the socket.
>>>>> 
>>>>> Executing it manually (with cmd line from the logs) resulted the
>>>>> same. I
>>>>> even tried the old perl variant, which also had same result.
>>>>> 
>>>>> So finally we found out that this issue happens only on our HVs
>>>>> which run
>>>>> qemu 2.3.0, from the centos 7 special interest virtualization repo.
>>>>> Other
>>>>> ones that run qemu 1.5, from official repos, can patch the system
>>>>> vms
>>>>> fine.
>>>>> 
>>>>> So i'm wondering if anyone tested 4.9 with kvm with qemu >= 2.x?
>>>>> Maybe it
>>>>> something else special in our setup. e.g. we're running the HVs
>>>>> from a
>>>>> preconfigured netboot image (pxe), but all of them, including those
>>>>> with
>>>>> qemu 1.5, so i have no idea.
>>>>> 
>>>>> Linas ?ilinskas
>>>>> Head of Development
>>>>> website <http://www.host1plus.com/> [1] facebook
>>>>> <https://www.facebook.com/Host1Plus> [2] twitter
>>>>> <https://twitter.com/Host1Plus> [3] linkedin
>>>>> <https://www.linkedin.com/company/digital-energy-technologies-ltd.>
>>>>> [4]
>>>>> 
>>>>> Host1Plus is a division of Digital Energy Technologies Ltd.
>>>>> 
>>>>> 26 York Street, London W1U 6PZ, United Kingdom
>>>>> 
>>>>> 
>> --
>> --sazli
>> 

Mime
View raw message