cloudstack-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rohit Yadav <rohit.ya...@shapeblue.com>
Subject Re: Latest Qemu KVM EV appears to be broken with ACS
Date Mon, 29 Apr 2019 10:18:41 GMT
Hi Jfn, we have to follow the merging guidelines which require that we've two LGTMs/approvals
from the reviewers and smoketests pass: https://github.com/apache/cloudstack/pull/3278


I think we're just waiting for the reviewers.


This will then also require a new systemvmtemplate when we do work towards the 4.11.3 release.


Regards,

Rohit Yadav

Software Architect, ShapeBlue

https://www.shapeblue.com

________________________________
From: Jean-Francois Nadeau <the.jfnadeau@gmail.com>
Sent: Friday, April 26, 2019 7:10:17 PM
To: dev@cloudstack.apache.org
Subject: Re: Latest Qemu KVM EV appears to be broken with ACS

Can we consider merging this fix in 4.11.3  as well ?    For those like us
that would really want make to jump on qemu-ev versions but also want to
stick to CS LTS releases.

best,

Jfn

On Tue, Apr 23, 2019 at 6:54 AM Rohit Yadav <rohit.yadav@shapeblue.com>
wrote:

> All,
>
>
> I've found and fixed an edge/security case while testing it for CentOS6,
> the PR should not be compatible for all support KVM distros:
>
> https://github.com/apache/cloudstack/pull/3278
>
>
> The issue was that the systemvm.iso file includes an authorized_keys file
> from our codebase and may overwrite the payload we send using
> patchviasocket or virsh qemu-guest-agent. I've removed that unknown/default
> authorized_keys file in the PR.
>
>
> Historically, we had seen few cases where a VR failed to start with an
> error related to get_systemvm_template.sh execution (failing with a
> non-zero exit code) that finds the DomR version seen in logs. That issue
> would be fixed by my patch now.
>
>
> Regards,
>
> Rohit Yadav
>
> Software Architect, ShapeBlue
>
> https://www.shapeblue.com
>
> ________________________________
> From: Simon Weller <sweller@ena.com.INVALID>
> Sent: Tuesday, April 23, 2019 12:59:00 AM
> To: dev@cloudstack.apache.org
> Subject: Re: Latest Qemu KVM EV appears to be broken with ACS
>
> Hey  Andrija,
>
> In our case the SystemVMs were booting fine, but ACS wasn't able to inject
> the payload via the socket.
>
> -Si
>
> ________________________________
> From: Andrija Panic <andrija.panic@gmail.com>
> Sent: Monday, April 22, 2019 1:16 PM
> To: dev
> Subject: Re: Latest Qemu KVM EV appears to be broken with ACS
>
> Hi Simon, all,
>
> did you try running CentOS with newer kernel - I just got a really strange
> issue after upgrading KVM host from stock 1.5.3 to qemu-kvm-ev 2.12 with
> stock kernel 3.10 (issues on Intel CPUs, while no issues on AMD Opteron),
> which was fixed by upgrading kernel to 4.4 (Elrepo version).
>
> My case was that SystemVM were not able to boot, stuck on "booting from
> hard drive" SeaBios message (actually any VM with VirtIO "hardware") using
> qemu-kvm-ev 2.12 (while no issues on stock 1.5.3).
>
> What I could find is the that there are obviously some issues when using
> nested KVM on top of ESXi (or HyperV), which is what I'm running.
> When I switched template to Intel emulated one i.e. "Windows 2016" OS type
> - VMs were able to boot just fine (user VM at least).
>
> Might be related to original issue on this thread...
>
> Best,
> Andrija
>
>
> rohit.yadav@shapeblue.com
> www.shapeblue.com<http://www.shapeblue.com>
> Amadeus House, Floral Street, London  WC2E 9DPUK
> @shapeblue
>
>
>

rohit.yadav@shapeblue.com 
www.shapeblue.com
Amadeus House, Floral Street, London  WC2E 9DPUK
@shapeblue
  
 

> On Thu, 18 Apr 2019 at 22:36, Sven Vogel <S.Vogel@ewerk.com> wrote:
>
> > Hi Rohit,
> >
> > Thx we will test it!
> >
> >
> >
> > Von meinem iPhone gesendet
> >
> >
> > __
> >
> > Sven Vogel
> > Teamlead Platform
> >
> > EWERK RZ GmbH
> > Brühl 24, D-04109 Leipzig
> > P +49 341 42649 - 11
> > F +49 341 42649 - 18
> > S.Vogel@ewerk.com
> > www.ewerk.com<http://www.ewerk.com>
> >
> > Geschäftsführer:
> > Dr. Erik Wende, Hendrik Schubert, Frank Richter, Gerhard Hoyer
> > Registergericht: Leipzig HRB 17023
> >
> > Zertifiziert nach:
> > ISO/IEC 27001:2013
> > DIN EN ISO 9001:2015
> > DIN ISO/IEC 20000-1:2011
> >
> > EWERK-Blog | LinkedIn | Xing | Twitter | Facebook
> >
> > Auskünfte und Angebote per Mail sind freibleibend und unverbindlich.
> >
> > Disclaimer Privacy:
> > Der Inhalt dieser E-Mail (einschließlich etwaiger beigefügter Dateien)
> ist
> > vertraulich und nur für den Empfänger bestimmt. Sollten Sie nicht der
> > bestimmungsgemäße Empfänger sein, ist Ihnen jegliche Offenlegung,
> > Vervielfältigung, Weitergabe oder Nutzung des Inhalts untersagt. Bitte
> > informieren Sie in diesem Fall unverzüglich den Absender und löschen Sie
> > die E-Mail (einschließlich etwaiger beigefügter Dateien) von Ihrem
> System.
> > Vielen Dank.
> >
> > The contents of this e-mail (including any attachments) are confidential
> > and may be legally privileged. If you are not the intended recipient of
> > this e-mail, any disclosure, copying, distribution or use of its contents
> > is strictly prohibited, and you should please notify the sender
> immediately
> > and then delete it (including any attachments) from your system. Thank
> you.
> > > Am 18.04.2019 um 21:44 schrieb Rohit Yadav <rohit.yadav@shapeblue.com
> >:
> > >
> > > I've sent a PR that attempts to solve the issue. It is under testing
> but
> > ready for review: https://github.com/apache/cloudstack/pull/3278
> > >
> > >
> > > Thanks.
> > >
> > >
> > > Regards,
> > >
> > > Rohit Yadav
> > >
> > > Software Architect, ShapeBlue
> > >
> > > https://www.shapeblue.com
> > >
> > > ________________________________
> > > From: Simon Weller <sweller@ena.com.INVALID>
> > > Sent: Monday, April 15, 2019 7:24:40 PM
> > > To: dev@cloudstack.apache.org
> > > Subject: Re: Latest Qemu KVM EV appears to be broken with ACS
> > >
> > > +1 for the qemu guest agent approach.
> > >
> > >
> > > ________________________________
> > > From: Wido den Hollander <wido@widodh.nl>
> > > Sent: Saturday, April 13, 2019 2:32 PM
> > > To: dev@cloudstack.apache.org; Rohit Yadav
> > > Subject: Re: Latest Qemu KVM EV appears to be broken with ACS
> > >
> > >
> > >
> > >> On 4/12/19 9:33 PM, Rohit Yadav wrote:
> > >> Thanks, I was already exploring a solution using qemu guest agent
> since
> > morning today. It just so happened that you also thought of the approach,
> > and I could validate my script to work with qemu ev 2.12 by the end of my
> > day.
> > >>
> > >
> > > That would be great actually. The Qemu Guest Agent is a lot better to
> > > use. We might want to explore that indeed. Not for now, but it is a
> > > better option to talk to VMs imho.
> > >
> > > Wido
> > >
> > >> A proper fix might require some additional changes in
> > cloud-early-config and therefore a new systemvmtemplate for
> > 4.13.0.0/4.11.3.0, I'll start a PR on that in the following week(s).
> > >>
> > >> Regards.
> > >>
> > >> Regards,
> > >> Rohit Yadav
> > >>
> > >> ________________________________
> > >> From: Marcus <shadowsor@gmail.com>
> > >> Sent: Saturday, April 13, 2019 12:31:33 AM
> > >> To: dev@cloudstack.apache.org
> > >> Subject: Re: Latest Qemu KVM EV appears to be broken with ACS
> > >>
> > >> Wow, that was fast. Good work.
> > >>
> > >> The script seems to work for me. There was one case where I rebooted
> the
> > >> router and got the old link local IP somehow. I'm not sure if that
> was a
> > >> timing issue in seeing the existing /var/cache/cloud/cmdline before
> the
> > new
> > >> one was written or what, but if it was a timing issue it would seem
> > like we
> > >> should already have that problem with the existing cloud-early-config.
> > >>
> > >> On Fri, Apr 12, 2019 at 12:24 PM Rohit Yadav <
> rohit.yadav@shapeblue.com
> > >
> > >> wrote:
> > >>
> > >>> Hi Marcus, Simon,
> > >>>
> > >>>
> > >>> I explore two of the short term solutions and I've a working (work
in
> > >>> progress) script that replaces the patchviasocket script to use the
> > qemu
> > >>> guest agent (that is installed in 4.11+ sytemvmtemplate). This was
> > part of
> > >>> a scoping exercise for solving the patching problem for qemu 2.12+
> > (Ubuntu
> > >>> 19.04 has 3.x version).
> > >>>
> > >>>
> > >>> This is what I've so far, however, further testing is needed:
> > >>>
> > >>> https://gist.github.com/rhtyd/ddb42c4c7581c4129ca04fbb829f16cf
> > >>>
> > >>>
> > >>> The logic is completely written in bash as:
> > >>>
> > >>> - Try if we're able to contact the guest agent
> > >>>
> > >>> - Once we're able to connect, confirm that the I/O is not error prone
> > >>>
> > >>> - Then write the payload as file (the ssh public key and cmdline
> > string)
> > >>>
> > >>> - Then fix file permissions
> > >>> - Hope that internally cloud-early-config would detect the cmdline
we
> > had
> > >>> saved and patching would work
> > >>>
> > >>>
> > >>> While this may work, for the long term a proper fix is needed that
> > should
> > >>> be a standard patching mechanism across all hypervisors.
> > >>>
> > >>>
> > >>> Regards,
> > >>>
> > >>> Rohit Yadav
> > >>>
> > >>> Software Architect, ShapeBlue
> > >>>
> > >>> https://www.shapeblue.com
> > >>>
> > >>> ________________________________
> > >>> From: Marcus <shadowsor@gmail.com>
> > >>> Sent: Friday, April 12, 2019 11:30:46 PM
> > >>> To: dev@cloudstack.apache.org
> > >>> Subject: Re: Latest Qemu KVM EV appears to be broken with ACS
> > >>>
> > >>> Long ago it was a disk. The problem was that these disks had to go
> > >>> somewhere, a place where they could survive migrations, which didn't
> > work
> > >>> well for block based primary storage... at least for the code base
at
> > the
> > >>> time. Using virtio socket was seen as a fairly standard way to
> > communicate
> > >>> temporary information to the guest, and didn't require managing the
> > >>> lifecycle of a special disk.
> > >>>
> > >>> I believe the current problem is that the sender needs to remain
> > connected
> > >>> until the receiver has read. Maybe socat does this, but if so we need
> > to
> > >>> ensure that it is available and applied as a new RPM dependency. In
> my
> > >>> testing, waiting on the sender side didn't 100% fix things, or
> > sometimes
> > >>> took a very long time due to the backoff algorithm on the
> > >>> cloud-early-config receiver. Some tweaks to that made it more robust,
> > but
> > >>> it is still a game of trying to coordinate timing of two services on
> > either
> > >>> end. If it works though, I'm all for it.
> > >>>
> > >>> Just to throw another idea out there... If we want to fix this
> without
> > >>> involving storage, I might suggest switching to the qemu-guest-agent
> > that
> > >>> now exists, with a socket and listening client already in the system
> > vm.
> > >>> This would be far more robust, I think, than our scripting reading
> unix
> > >>> sockets without any sort of protocol or buffer control
> considerations,
> > and
> > >>> would likely be more robust to changes in qemu as the guest agent is
> > the
> > >>> primary target for the feature.
> > >>>
> > >>> We can directly write our /var/cache/cloud/cmdline from the host like
> > so
> > >>> (I'm using virsh but we could perhaps communicate with the guest
> agent
> > >>> socket directly or via socat):
> > >>>
> > >>> virsh qemu-agent-command 19 '{"execute":"guest-file-open",
> > >>> "arguments":{"path":"/tmp/testfile","mode":"w+"}}'
> > >>> {"return":1001}
> > >>>
> > >>> virsh qemu-agent-command 19 '{"execute":"guest-file-write",
> > >>> "arguments":{"handle":1001,"buf-b64":"Zm9vIHdhcyBoZXJlCg=="}}'
> > >>> {"return":{"count":13,"eof":false}}
> > >>>
> > >>> virsh qemu-agent-command 19 '{"execute":"guest-file-close",
> > >>> "arguments":{"handle":1001}}'
> > >>> {"return":{}}
> > >>>
> > >>> root@r-54850-VM:~# cat /tmp/testfile
> > >>> foo was here
> > >>>
> > >>> We are also able to detect via libvirt that the qemu guest agent is
> up
> > and
> > >>> ready. You can see it in the XML when you list a VM.
> > >>>
> > >>> We do need to keep other hypervisors in mind. This is just an option
> > for a
> > >>> fix that doesn't involve a larger redesign.
> > >>>
> > >>> On Fri, Apr 12, 2019 at 10:21 AM Rohit Yadav <
> > rohit.yadav@shapeblue.com>
> > >>> wrote:
> > >>>
> > >>>> Hi Simon,
> > >>>>
> > >>>>
> > >>>> I'm exploring a solution for the same, I've found that the python
> > based
> > >>>> patching script fails to wait for the message to be written on
the
> > unix
> > >>>> socket before that the socket is closed. I reckon this could be
> > related
> > >>> to
> > >>>> serial port device handling related changes in qemu-ev 2.12, as
the
> > same
> > >>>> mechanism used to work in past versions.
> > >>>>
> > >>>>
> > >>>> I'm exploring/testing a solution where I replace the python based
> > >>> patching
> > >>>> script into a bash one. Can you test the following in your
> envrionment
> > >>>> (ensure socat is installed), just backup and replace the
> > >>> patchviasocket.py
> > >>>> file with this:
> > >>>>
> > >>>> https://gist.github.com/rhtyd/aab23357fef2d8a530c0e83ec8be10c5
> > >>>>
> > >>>>
> > >>>> The short term solution would be one of the ways to ensure patching
> > works
> > >>>> without much change in the scripts or systemvmtemplate. However,
> > longer
> > >>>> term we need to explore and standardize patching mechanism across
> all
> > >>>> hypervisors, for example by using a small payload via a config
drive
> > iso.
> > >>>>
> > >>>>
> > >>>> Regards,
> > >>>>
> > >>>> Rohit Yadav
> > >>>>
> > >>>> Software Architect, ShapeBlue
> > >>>>
> > >>>> https://www.shapeblue.com
> > >>>>
> > >>>> ________________________________
> > >>>> From: Simon Weller <sweller@ena.com.INVALID>
> > >>>> Sent: Friday, April 12, 2019 8:29:04 PM
> > >>>> To: dev; users
> > >>>> Subject: Latest Qemu KVM EV appears to be broken with ACS
> > >>>>
> > >>>> All,
> > >>>>
> > >>>> After troubleshooting a strange issue with a new lab environment
> > >>>> yesterday, it appears that the patchviasocket functionality we
rely
> on
> > >>> for
> > >>>> key and ip injection into our router/SSVM/CPVM images is broken
with
> > >>>> qemu-kvm-ev-2.12.0-18.el7 (January 2019 release). This was tested
on
> > >>> Centos
> > >>>> 7.6.
> > >>>> No data is injected and this was confirmed using socat on
> > /dev/vport0p1.
> > >>>> qemu-kvm-ev-2.10.0-21.el7_5.7.1 works, so hopefully this will save
> > >>> someone
> > >>>> some pain and suffering trying to figure out why the deployed seems
> > >>> broken.
> > >>>>
> > >>>> We're going to dig in and see if can figure out the patches
> > responsible
> > >>>> for it breaking.
> > >>>>
> > >>>> -Si
> > >>>>
> > >>>>
> > >>>>
> > >>>> rohit.yadav@shapeblue.com
> > >>>> www.shapeblue.com<http://www.shapeblue.com>
> > >>>> Amadeus House, Floral Street, London  WC2E 9DPUK
> > >>>> @shapeblue
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>
> > >>> rohit.yadav@shapeblue.com
> > >>> www.shapeblue.com<http://www.shapeblue.com>
> > >>> Amadeus House, Floral Street, London  WC2E 9DPUK
> > >>> @shapeblue
> > >>>
> > >>>
> > >>>
> > >>>
> > >>
> > >> rohit.yadav@shapeblue.com
> > >> www.shapeblue.com<http://www.shapeblue.com>
> > >> Amadeus House, Floral Street, London  WC2E 9DPUK
> > >> @shapeblue
> > >>
> > >>
> > >>
> > >
> > > rohit.yadav@shapeblue.com
> > > www.shapeblue.com<http://www.shapeblue.com>
> > > Amadeus House, Floral Street, London  WC2E 9DPUK
> > > @shapeblue
> > >
> > >
> > >
> >
>
>
> --
>
> Andrija Panić
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message