cloudstack-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Skinner <john.skin...@appcore.com>
Subject Re: CentOS KVM systemvm issue
Date Fri, 12 Sep 2014 18:23:00 GMT
Actually, I believe the kernel is the problem. The hosts are running CentOS 6, the systemvm
is stock template, Debian 7. This does not seem to be an issue on Ubuntu KVM hypervisors.

The fact that you are rebuilding systemvms on reboot is exactly why you are not seeing this
issue. New system VMs are usually successful, it’s when you reboot them or start a stopped
one where this issue shows up.

The serial port is loading, but I think the behavior is different after initial boot because
if you access the system VM after you reboot it you do not have anything on /proc/cmdline
and in /var/cache/cloud/cmdline the file is old and does not contain the new control network
IP address. However, I am able to net cat the serial port between the hypervisor and the systemvm
after it comes up - but CloudStack will eventually force stop the VM since it doesn’t get
the new control network IP address it assumes it never started.

Which is why when we wrap that while loop to check for an empty string on $cmd it works every
time after that.

Change that global setting from true to false, and try to reboot a few routers. I guarantee
you will see this issue.

John Skinner
Appcore

On Sep 12, 2014, at 10:48 AM, Marcus <shadowsor@gmail.com> wrote:

> You may also want to investigate on whether you are seeing a race condition
> with /dev/vport0p1 coming on line and cloud-early-config running. It will
> be indicated by a log line in the systemvm /var/log/cloud.log:
> 
> log_it "/dev/vport0p1 not loaded, perhaps guest kernel is too old."
> 
> Actually, if it has anything to do with the virtio-serial socket that would
> probably be logged. Can you open a bug in Jira and provide the logs?
> 
> On Fri, Sep 12, 2014 at 9:36 AM, Marcus <shadowsor@gmail.com> wrote:
> 
>> Can you provide more info? Is the host running CentOS 6.x, or is your
>> systemvm? What is rebooted, the host or the router, and how is it rebooted?
>> We have what sounds like the same config (CentOS 6.x hosts, stock
>> community provided systemvm), and are running thousands of virtual routers,
>> rebooted regularly with no issue (both hosts and virtual routers).  One
>> setting we may have that you may not is that our system vms are rebuilt
>> from scratch on every reboot (recreate.systemvm.enabled=true in global
>> settings), not that I expect this to be the problem, but might be something
>> to look at.
>> 
>> On Fri, Sep 12, 2014 at 8:49 AM, John Skinner <john.skinner@appcore.com>
>> wrote:
>> 
>>> I have found that on CloudStack 4.2 + (when we changed to using the
>>> virtio-socket to send data to the systemvm) when running CentOS 6.X
>>> cloud-early-config fails. On new systemvm creation there is a high chance
>>> for success, but still a chance for failure. After the systemvm has been
>>> created a simple reboot will cause start to fail every time. This has been
>>> confirmed on 2 separate CloudStack 4.2 environments; 1 running CentOS 6.3
>>> KVM, and another running CentOS 6.2 KVM. This can be fixed with a simple
>>> modification to the get_boot_params function in the cloud-early-config
>>> script. If you wrap the while read line inside of another while that checks
>>> if $cmd returns an empty string it fixes the issue.
>>> 
>>> This is a pretty nasty issue for any one running CloudStack 4.2 + on
>>> CentOS 6.X
>>> 
>>> John Skinner
>>> Appcore
>> 
>> 
>> 


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message