cloudstack-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rohit Yadav <rohit.ya...@shapeblue.com>
Subject Re: Latest Qemu KVM EV appears to be broken with ACS
Date Tue, 23 Apr 2019 10:54:07 GMT
All,


I've found and fixed an edge/security case while testing it for CentOS6, the PR should not
be compatible for all support KVM distros:

https://github.com/apache/cloudstack/pull/3278


The issue was that the systemvm.iso file includes an authorized_keys file from our codebase
and may overwrite the payload we send using patchviasocket or virsh qemu-guest-agent. I've
removed that unknown/default authorized_keys file in the PR.


Historically, we had seen few cases where a VR failed to start with an error related to get_systemvm_template.sh
execution (failing with a non-zero exit code) that finds the DomR version seen in logs. That
issue would be fixed by my patch now.


Regards,

Rohit Yadav

Software Architect, ShapeBlue

https://www.shapeblue.com

________________________________
From: Simon Weller <sweller@ena.com.INVALID>
Sent: Tuesday, April 23, 2019 12:59:00 AM
To: dev@cloudstack.apache.org
Subject: Re: Latest Qemu KVM EV appears to be broken with ACS

Hey  Andrija,

In our case the SystemVMs were booting fine, but ACS wasn't able to inject the payload via
the socket.

-Si

________________________________
From: Andrija Panic <andrija.panic@gmail.com>
Sent: Monday, April 22, 2019 1:16 PM
To: dev
Subject: Re: Latest Qemu KVM EV appears to be broken with ACS

Hi Simon, all,

did you try running CentOS with newer kernel - I just got a really strange
issue after upgrading KVM host from stock 1.5.3 to qemu-kvm-ev 2.12 with
stock kernel 3.10 (issues on Intel CPUs, while no issues on AMD Opteron),
which was fixed by upgrading kernel to 4.4 (Elrepo version).

My case was that SystemVM were not able to boot, stuck on "booting from
hard drive" SeaBios message (actually any VM with VirtIO "hardware") using
qemu-kvm-ev 2.12 (while no issues on stock 1.5.3).

What I could find is the that there are obviously some issues when using
nested KVM on top of ESXi (or HyperV), which is what I'm running.
When I switched template to Intel emulated one i.e. "Windows 2016" OS type
- VMs were able to boot just fine (user VM at least).

Might be related to original issue on this thread...

Best,
Andrija


rohit.yadav@shapeblue.com 
www.shapeblue.com
Amadeus House, Floral Street, London  WC2E 9DPUK
@shapeblue
  
 

On Thu, 18 Apr 2019 at 22:36, Sven Vogel <S.Vogel@ewerk.com> wrote:

> Hi Rohit,
>
> Thx we will test it!
>
>
>
> Von meinem iPhone gesendet
>
>
> __
>
> Sven Vogel
> Teamlead Platform
>
> EWERK RZ GmbH
> Brühl 24, D-04109 Leipzig
> P +49 341 42649 - 11
> F +49 341 42649 - 18
> S.Vogel@ewerk.com
> www.ewerk.com<http://www.ewerk.com>
>
> Geschäftsführer:
> Dr. Erik Wende, Hendrik Schubert, Frank Richter, Gerhard Hoyer
> Registergericht: Leipzig HRB 17023
>
> Zertifiziert nach:
> ISO/IEC 27001:2013
> DIN EN ISO 9001:2015
> DIN ISO/IEC 20000-1:2011
>
> EWERK-Blog | LinkedIn | Xing | Twitter | Facebook
>
> Auskünfte und Angebote per Mail sind freibleibend und unverbindlich.
>
> Disclaimer Privacy:
> Der Inhalt dieser E-Mail (einschließlich etwaiger beigefügter Dateien) ist
> vertraulich und nur für den Empfänger bestimmt. Sollten Sie nicht der
> bestimmungsgemäße Empfänger sein, ist Ihnen jegliche Offenlegung,
> Vervielfältigung, Weitergabe oder Nutzung des Inhalts untersagt. Bitte
> informieren Sie in diesem Fall unverzüglich den Absender und löschen Sie
> die E-Mail (einschließlich etwaiger beigefügter Dateien) von Ihrem System.
> Vielen Dank.
>
> The contents of this e-mail (including any attachments) are confidential
> and may be legally privileged. If you are not the intended recipient of
> this e-mail, any disclosure, copying, distribution or use of its contents
> is strictly prohibited, and you should please notify the sender immediately
> and then delete it (including any attachments) from your system. Thank you.
> > Am 18.04.2019 um 21:44 schrieb Rohit Yadav <rohit.yadav@shapeblue.com>:
> >
> > I've sent a PR that attempts to solve the issue. It is under testing but
> ready for review: https://github.com/apache/cloudstack/pull/3278
> >
> >
> > Thanks.
> >
> >
> > Regards,
> >
> > Rohit Yadav
> >
> > Software Architect, ShapeBlue
> >
> > https://www.shapeblue.com
> >
> > ________________________________
> > From: Simon Weller <sweller@ena.com.INVALID>
> > Sent: Monday, April 15, 2019 7:24:40 PM
> > To: dev@cloudstack.apache.org
> > Subject: Re: Latest Qemu KVM EV appears to be broken with ACS
> >
> > +1 for the qemu guest agent approach.
> >
> >
> > ________________________________
> > From: Wido den Hollander <wido@widodh.nl>
> > Sent: Saturday, April 13, 2019 2:32 PM
> > To: dev@cloudstack.apache.org; Rohit Yadav
> > Subject: Re: Latest Qemu KVM EV appears to be broken with ACS
> >
> >
> >
> >> On 4/12/19 9:33 PM, Rohit Yadav wrote:
> >> Thanks, I was already exploring a solution using qemu guest agent since
> morning today. It just so happened that you also thought of the approach,
> and I could validate my script to work with qemu ev 2.12 by the end of my
> day.
> >>
> >
> > That would be great actually. The Qemu Guest Agent is a lot better to
> > use. We might want to explore that indeed. Not for now, but it is a
> > better option to talk to VMs imho.
> >
> > Wido
> >
> >> A proper fix might require some additional changes in
> cloud-early-config and therefore a new systemvmtemplate for
> 4.13.0.0/4.11.3.0, I'll start a PR on that in the following week(s).
> >>
> >> Regards.
> >>
> >> Regards,
> >> Rohit Yadav
> >>
> >> ________________________________
> >> From: Marcus <shadowsor@gmail.com>
> >> Sent: Saturday, April 13, 2019 12:31:33 AM
> >> To: dev@cloudstack.apache.org
> >> Subject: Re: Latest Qemu KVM EV appears to be broken with ACS
> >>
> >> Wow, that was fast. Good work.
> >>
> >> The script seems to work for me. There was one case where I rebooted the
> >> router and got the old link local IP somehow. I'm not sure if that was a
> >> timing issue in seeing the existing /var/cache/cloud/cmdline before the
> new
> >> one was written or what, but if it was a timing issue it would seem
> like we
> >> should already have that problem with the existing cloud-early-config.
> >>
> >> On Fri, Apr 12, 2019 at 12:24 PM Rohit Yadav <rohit.yadav@shapeblue.com
> >
> >> wrote:
> >>
> >>> Hi Marcus, Simon,
> >>>
> >>>
> >>> I explore two of the short term solutions and I've a working (work in
> >>> progress) script that replaces the patchviasocket script to use the
> qemu
> >>> guest agent (that is installed in 4.11+ sytemvmtemplate). This was
> part of
> >>> a scoping exercise for solving the patching problem for qemu 2.12+
> (Ubuntu
> >>> 19.04 has 3.x version).
> >>>
> >>>
> >>> This is what I've so far, however, further testing is needed:
> >>>
> >>> https://gist.github.com/rhtyd/ddb42c4c7581c4129ca04fbb829f16cf
> >>>
> >>>
> >>> The logic is completely written in bash as:
> >>>
> >>> - Try if we're able to contact the guest agent
> >>>
> >>> - Once we're able to connect, confirm that the I/O is not error prone
> >>>
> >>> - Then write the payload as file (the ssh public key and cmdline
> string)
> >>>
> >>> - Then fix file permissions
> >>> - Hope that internally cloud-early-config would detect the cmdline we
> had
> >>> saved and patching would work
> >>>
> >>>
> >>> While this may work, for the long term a proper fix is needed that
> should
> >>> be a standard patching mechanism across all hypervisors.
> >>>
> >>>
> >>> Regards,
> >>>
> >>> Rohit Yadav
> >>>
> >>> Software Architect, ShapeBlue
> >>>
> >>> https://www.shapeblue.com
> >>>
> >>> ________________________________
> >>> From: Marcus <shadowsor@gmail.com>
> >>> Sent: Friday, April 12, 2019 11:30:46 PM
> >>> To: dev@cloudstack.apache.org
> >>> Subject: Re: Latest Qemu KVM EV appears to be broken with ACS
> >>>
> >>> Long ago it was a disk. The problem was that these disks had to go
> >>> somewhere, a place where they could survive migrations, which didn't
> work
> >>> well for block based primary storage... at least for the code base at
> the
> >>> time. Using virtio socket was seen as a fairly standard way to
> communicate
> >>> temporary information to the guest, and didn't require managing the
> >>> lifecycle of a special disk.
> >>>
> >>> I believe the current problem is that the sender needs to remain
> connected
> >>> until the receiver has read. Maybe socat does this, but if so we need
> to
> >>> ensure that it is available and applied as a new RPM dependency. In my
> >>> testing, waiting on the sender side didn't 100% fix things, or
> sometimes
> >>> took a very long time due to the backoff algorithm on the
> >>> cloud-early-config receiver. Some tweaks to that made it more robust,
> but
> >>> it is still a game of trying to coordinate timing of two services on
> either
> >>> end. If it works though, I'm all for it.
> >>>
> >>> Just to throw another idea out there... If we want to fix this without
> >>> involving storage, I might suggest switching to the qemu-guest-agent
> that
> >>> now exists, with a socket and listening client already in the system
> vm.
> >>> This would be far more robust, I think, than our scripting reading unix
> >>> sockets without any sort of protocol or buffer control considerations,
> and
> >>> would likely be more robust to changes in qemu as the guest agent is
> the
> >>> primary target for the feature.
> >>>
> >>> We can directly write our /var/cache/cloud/cmdline from the host like
> so
> >>> (I'm using virsh but we could perhaps communicate with the guest agent
> >>> socket directly or via socat):
> >>>
> >>> virsh qemu-agent-command 19 '{"execute":"guest-file-open",
> >>> "arguments":{"path":"/tmp/testfile","mode":"w+"}}'
> >>> {"return":1001}
> >>>
> >>> virsh qemu-agent-command 19 '{"execute":"guest-file-write",
> >>> "arguments":{"handle":1001,"buf-b64":"Zm9vIHdhcyBoZXJlCg=="}}'
> >>> {"return":{"count":13,"eof":false}}
> >>>
> >>> virsh qemu-agent-command 19 '{"execute":"guest-file-close",
> >>> "arguments":{"handle":1001}}'
> >>> {"return":{}}
> >>>
> >>> root@r-54850-VM:~# cat /tmp/testfile
> >>> foo was here
> >>>
> >>> We are also able to detect via libvirt that the qemu guest agent is up
> and
> >>> ready. You can see it in the XML when you list a VM.
> >>>
> >>> We do need to keep other hypervisors in mind. This is just an option
> for a
> >>> fix that doesn't involve a larger redesign.
> >>>
> >>> On Fri, Apr 12, 2019 at 10:21 AM Rohit Yadav <
> rohit.yadav@shapeblue.com>
> >>> wrote:
> >>>
> >>>> Hi Simon,
> >>>>
> >>>>
> >>>> I'm exploring a solution for the same, I've found that the python
> based
> >>>> patching script fails to wait for the message to be written on the
> unix
> >>>> socket before that the socket is closed. I reckon this could be
> related
> >>> to
> >>>> serial port device handling related changes in qemu-ev 2.12, as the
> same
> >>>> mechanism used to work in past versions.
> >>>>
> >>>>
> >>>> I'm exploring/testing a solution where I replace the python based
> >>> patching
> >>>> script into a bash one. Can you test the following in your envrionment
> >>>> (ensure socat is installed), just backup and replace the
> >>> patchviasocket.py
> >>>> file with this:
> >>>>
> >>>> https://gist.github.com/rhtyd/aab23357fef2d8a530c0e83ec8be10c5
> >>>>
> >>>>
> >>>> The short term solution would be one of the ways to ensure patching
> works
> >>>> without much change in the scripts or systemvmtemplate. However,
> longer
> >>>> term we need to explore and standardize patching mechanism across all
> >>>> hypervisors, for example by using a small payload via a config drive
> iso.
> >>>>
> >>>>
> >>>> Regards,
> >>>>
> >>>> Rohit Yadav
> >>>>
> >>>> Software Architect, ShapeBlue
> >>>>
> >>>> https://www.shapeblue.com
> >>>>
> >>>> ________________________________
> >>>> From: Simon Weller <sweller@ena.com.INVALID>
> >>>> Sent: Friday, April 12, 2019 8:29:04 PM
> >>>> To: dev; users
> >>>> Subject: Latest Qemu KVM EV appears to be broken with ACS
> >>>>
> >>>> All,
> >>>>
> >>>> After troubleshooting a strange issue with a new lab environment
> >>>> yesterday, it appears that the patchviasocket functionality we rely
on
> >>> for
> >>>> key and ip injection into our router/SSVM/CPVM images is broken with
> >>>> qemu-kvm-ev-2.12.0-18.el7 (January 2019 release). This was tested on
> >>> Centos
> >>>> 7.6.
> >>>> No data is injected and this was confirmed using socat on
> /dev/vport0p1.
> >>>> qemu-kvm-ev-2.10.0-21.el7_5.7.1 works, so hopefully this will save
> >>> someone
> >>>> some pain and suffering trying to figure out why the deployed seems
> >>> broken.
> >>>>
> >>>> We're going to dig in and see if can figure out the patches
> responsible
> >>>> for it breaking.
> >>>>
> >>>> -Si
> >>>>
> >>>>
> >>>>
> >>>> rohit.yadav@shapeblue.com
> >>>> www.shapeblue.com<http://www.shapeblue.com>
> >>>> Amadeus House, Floral Street, London  WC2E 9DPUK
> >>>> @shapeblue
> >>>>
> >>>>
> >>>>
> >>>>
> >>>
> >>> rohit.yadav@shapeblue.com
> >>> www.shapeblue.com<http://www.shapeblue.com>
> >>> Amadeus House, Floral Street, London  WC2E 9DPUK
> >>> @shapeblue
> >>>
> >>>
> >>>
> >>>
> >>
> >> rohit.yadav@shapeblue.com
> >> www.shapeblue.com<http://www.shapeblue.com>
> >> Amadeus House, Floral Street, London  WC2E 9DPUK
> >> @shapeblue
> >>
> >>
> >>
> >
> > rohit.yadav@shapeblue.com
> > www.shapeblue.com<http://www.shapeblue.com>
> > Amadeus House, Floral Street, London  WC2E 9DPUK
> > @shapeblue
> >
> >
> >
>


--

Andrija Panić

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message