cloudstack-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ian Young <iyo...@ratespecial.com>
Subject Re: console proxy times out
Date Thu, 22 May 2014 17:49:32 GMT
And this is in /var/log/messages right before that event:

May 22 10:16:07 virthost1 kernel: INFO: task qemu-kvm:2971 blocked for more
than 120 seconds.
May 22 10:16:07 virthost1 kernel:      Not tainted
2.6.32-431.11.2.el6.x86_64 #1
May 22 10:16:07 virthost1 kernel: "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 22 10:16:07 virthost1 kernel: qemu-kvm      D 0000000000000002     0
 2971      1 0x00000080
May 22 10:16:07 virthost1 kernel: ffff8810724e9be8 0000000000000082
0000000000000000 ffff88106b6529d8
May 22 10:16:07 virthost1 kernel: ffff880871b3e8d8 ffff880871b3e8f0
ffffffff8100bb8e ffff8810724e9be8
May 22 10:16:07 virthost1 kernel: ffff881073525058 ffff8810724e9fd8
000000000000fbc8 ffff881073525058
May 22 10:16:07 virthost1 kernel: Call Trace:
May 22 10:16:07 virthost1 kernel: [<ffffffff8100bb8e>] ?
apic_timer_interrupt+0xe/0x20
May 22 10:16:07 virthost1 kernel: [<ffffffff810555ef>] ?
mutex_spin_on_owner+0x9f/0xc0
May 22 10:16:07 virthost1 kernel: [<ffffffff8152969e>]
__mutex_lock_slowpath+0x13e/0x180
May 22 10:16:07 virthost1 kernel: [<ffffffff8152953b>] mutex_lock+0x2b/0x50
May 22 10:16:07 virthost1 kernel: [<ffffffffa021c2cf>]
memory_access_ok+0x7f/0xc0 [vhost_net]
May 22 10:16:07 virthost1 kernel: [<ffffffffa021d89c>]
vhost_dev_ioctl+0x2ec/0xa50 [vhost_net]
May 22 10:16:07 virthost1 kernel: [<ffffffffa021c411>] ?
vhost_work_flush+0xe1/0x120 [vhost_net]
May 22 10:16:07 virthost1 kernel: [<ffffffff8122db91>] ?
avc_has_perm+0x71/0x90
May 22 10:16:07 virthost1 kernel: [<ffffffffa021f11a>]
vhost_net_ioctl+0x7a/0x5d0 [vhost_net]
May 22 10:16:07 virthost1 kernel: [<ffffffff8122f914>] ?
inode_has_perm+0x54/0xa0
May 22 10:16:07 virthost1 kernel: [<ffffffffa01a28b7>] ?
kvm_vcpu_ioctl+0x1e7/0x580 [kvm]
May 22 10:16:07 virthost1 kernel: [<ffffffff8108b14e>] ?
send_signal+0x3e/0x90
May 22 10:16:07 virthost1 kernel: [<ffffffff8119dc12>] vfs_ioctl+0x22/0xa0
May 22 10:16:07 virthost1 kernel: [<ffffffff8119ddb4>]
do_vfs_ioctl+0x84/0x580
May 22 10:16:07 virthost1 kernel: [<ffffffff8119e331>] sys_ioctl+0x81/0xa0
May 22 10:16:07 virthost1 kernel: [<ffffffff810e1e4e>] ?
__audit_syscall_exit+0x25e/0x290
May 22 10:16:07 virthost1 kernel: [<ffffffff8100b072>]
system_call_fastpath+0x16/0x1b


On Thu, May 22, 2014 at 10:39 AM, Ian Young <iyoung@ratespecial.com> wrote:

> The console proxy became unavailable again yesterday afternoon.  I could
> SSH into it via its link local address and nothing seemed to be wrong
> inside the VM itself.  However, the qemu-kvm process for that VM was at
> almost 100% CPU.  Inside the VM, the CPU usage was minimal and the java
> process was running and listening on port 443.  So there seems to be
> something wrong with it down at the KVM/QEMU level.  It's weird how this
> keeps happening to the console proxy only and not any of the other VMs.  I
> tried to reboot it from the management UI and after about 15 minutes, it
> finally did.  Now the console proxy is working but I don't know how long it
> will last before it breaks again.  I found this in libvirtd.log, which
> corresponds with the time the console proxy rebooted:
>
> 2014-05-22 17:17:04.362+0000: 25195: info : libvirt version: 0.10.2,
> package: 29.el6_5.7 (CentOS BuildSystem <http://bugs.centos.org>,
> 2014-04-07-07:42:04, c6b9.bsys.dev.centos.org)
> 2014-05-22 17:17:04.362+0000: 25195: error : qemuMonitorIO:614 : internal
> error End of file from monitor
>
>
> On Wed, May 21, 2014 at 2:07 PM, Ian Young <iyoung@ratespecial.com> wrote:
>
>> I built and installed a libvirt 1.04 package from the Fedora src rpm.  It
>> installed fine inside a test VM but installing it on the real hypervisor
>> was a bad idea and I doubt I'll be pursuing it further.  All VMs promptly
>> stopped and this appeared in libvirtd.log:
>>
>> 2014-05-21 20:36:19.260+0000: 23567: info : libvirt version: 1.0.4,
>> package: 1.el6 (Unknown, 2014-05-21-11:36:09, redacted.com)
>> 2014-05-21 20:36:19.260+0000: 23567: warning : virDriverLoadModule:72 :
>> Module /usr/lib64/libvirt/connection-driver/libvirt_driver_network.so not
>> accessible
>> 2014-05-21 20:36:19.260+0000: 23567: warning : virDriverLoadModule:72 :
>> Module /usr/lib64/libvirt/connection-driver/libvirt_driver_storage.so not
>> accessible
>> 2014-05-21 20:36:19.260+0000: 23567: warning : virDriverLoadModule:72 :
>> Module /usr/lib64/libvirt/connection-driver/libvirt_driver_nodedev.so not
>> accessible
>> 2014-05-21 20:36:19.260+0000: 23567: warning : virDriverLoadModule:72 :
>> Module /usr/lib64/libvirt/connection-driver/libvirt_driver_secret.so not
>> accessible
>> 2014-05-21 20:36:19.260+0000: 23567: warning : virDriverLoadModule:72 :
>> Module /usr/lib64/libvirt/connection-driver/libvirt_driver_nwfilter.so not
>> accessible
>> 2014-05-21 20:36:19.260+0000: 23567: warning : virDriverLoadModule:72 :
>> Module /usr/lib64/libvirt/connection-driver/libvirt_driver_interface.so not
>> accessible
>> 2014-05-21 20:36:19.260+0000: 23567: warning : virDriverLoadModule:72 :
>> Module /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so not
>> accessible
>> 2014-05-21 20:36:19.260+0000: 23567: warning : virDriverLoadModule:72 :
>> Module /usr/lib64/libvirt/connection-driver/libvirt_driver_lxc.so not
>> accessible
>> 2014-05-21 20:36:49.471+0000: 23570: error : do_open:1220 : no connection
>> driver available for qemu:///system
>> 2014-05-21 20:36:49.472+0000: 23567: error : virNetSocketReadWire:1370 :
>> End of file while reading data: Input/output error
>> 2014-05-21 20:36:49.473+0000: 23571: error : do_open:1220 : no connection
>> driver available for lxc:///
>> 2014-05-21 20:36:49.474+0000: 23567: error : virNetSocketReadWire:1370 :
>> End of file while reading data: Input/output error
>> 2014-05-21 20:36:49.475+0000: 23568: error : do_open:1220 : no connection
>> driver available for qemu:///system
>> 2014-05-21 20:36:49.476+0000: 23567: error : virNetSocketReadWire:1370 :
>> End of file while reading data: Input/output error
>> 2014-05-21 20:36:49.678+0000: 23575: error : do_open:1220 : no connection
>> driver available for qemu:///system
>> 2014-05-21 20:36:49.678+0000: 23567: error : virNetSocketReadWire:1370 :
>> End of file while reading data: Input/output error
>> 2014-05-21 20:36:49.681+0000: 23572: error : do_open:1220 : no connection
>> driver available for qemu:///system
>> 2014-05-21 20:36:49.682+0000: 23567: error : virNetSocketReadWire:1370 :
>> End of file while reading data: Input/output error
>>
>>
>> On Wed, May 21, 2014 at 10:45 AM, Ian Young <iyoung@ratespecial.com>wrote:
>>
>>> I was able to get it working by following these steps:
>>>
>>> 1. stop all instances
>>> 2. service cloudstack-management stop
>>> 3. service cloudstack-agent stop
>>> 4. virsh shutdown {domain} (for each of the system VMs)
>>> 5. service libvirtd stop
>>> 6. umount primary and secondary
>>> 7. reboot
>>>
>>> The console proxy is working again.  I expect it will probably break
>>> again in a day or two.  I have a feeling it's a result of this libvirtd
>>> bug, since I've seen the "cannot acquire state change lock" several times.
>>>
>>> https://bugs.launchpad.net/nova/+bug/1254872
>>>
>>> I might try building my own libvirtd 1.0.3 for EL6.
>>>
>>>
>>> On Tue, May 20, 2014 at 6:21 PM, Ian Young <iyoung@ratespecial.com>wrote:
>>>
>>>> So I got the console proxy working via HTTPS (by managing my own "
>>>> realhostip.com" DNS) last week and everything was working fine.
>>>>  Today, all of a sudden, the console proxy stopped working again.  The
>>>> browser says, "Connecting to 192-168-100-159.realhostip.com..." and
>>>> eventually times out.  I tried to restart it and it went into a "Stopping"
>>>> state that never completed and the Agent State was "Disconnected."  I could
>>>> not shut down the VM using virsh or with "kill -9" because libvirtd kept
>>>> saying, "cannot acquire state change lock," so I gracefully shut down the
>>>> remaining instances and rebooted the entire management server/hypervisor.
>>>>  Start over.
>>>>
>>>> When it came back up, the SSVM and console proxy started but the
>>>> virtual router was stopped.  I was able to manually start it from the UI.
>>>>  The console proxy still times out when I try to access it from a browser.
>>>>  I don't see any errors in the management or agent logs, just this:
>>>>
>>>> 2014-05-20 18:04:27,632 DEBUG [c.c.a.t.Request] (catalina-exec-10:null)
>>>> Seq 1-2130378876: Sending  { Cmd , MgmtId: 55157049428734, via: 1(
>>>> virthost1.redacted.com), Ver: v1, Flags: 100011,
>>>> [{"com.cloud.agent.api.GetVncPortCommand":{"id":4,"name":"r-4-VM","wait":0}}]
>>>> }
>>>> 2014-05-20 18:04:27,684 DEBUG [c.c.a.t.Request]
>>>> (AgentManager-Handler-3:null) Seq 1-2130378876: Processing:  { Ans: ,
>>>> MgmtId: 55157049428734, via: 1, Ver: v1, Flags: 10,
>>>> [{"com.cloud.agent.api.GetVncPortAnswer":{"address":"192.168.100.6","port":5902,"result":true,"wait":0}}]
>>>> }
>>>> 2014-05-20 18:04:27,684 DEBUG [c.c.a.t.Request] (catalina-exec-10:null)
>>>> Seq 1-2130378876: Received:  { Ans: , MgmtId: 55157049428734, via: 1, Ver:
>>>> v1, Flags: 10, { GetVncPortAnswer } }
>>>> 2014-05-20 18:04:27,684 DEBUG [c.c.s.ConsoleProxyServlet]
>>>> (catalina-exec-10:null) Port info 192.168.100.6
>>>> 2014-05-20 18:04:27,684 INFO  [c.c.s.ConsoleProxyServlet]
>>>> (catalina-exec-10:null) Parse host info returned from executing
>>>> GetVNCPortCommand. host info: 192.168.100.6
>>>> 2014-05-20 18:04:27,686 DEBUG [c.c.s.ConsoleProxyServlet]
>>>> (catalina-exec-10:null) Compose console url:
>>>> https://192-168-100-159.realhostip.com/ajax?token=CsPhU4m_R2ZoLIdXOtjo3y3humnQN20wt5fSPjbZOHtRh7nli7tiq0ZiWUuwCVIn7VwF6503ziEqCRejlRsVcsyQcUfemTRXlhAOpJUyRugyCuTjmbUIX3EY1cHnFMKwF8FXXZr_PgwyXGPEoOHhkdRgsyRiczbk_Unuh4KmRngATr0FPCLtqhwIMpnbLSYwpnFDz65k9lEJmK6IlXYKVpWXg2rpVEsQvaNlulrZdhMQ7qUbacn82EG43OY8nmwm1SYB8TrUFH5Btb1RHpJm9A
>>>> 2014-05-20 18:04:27,686 DEBUG [c.c.s.ConsoleProxyServlet]
>>>> (catalina-exec-10:null) the console url is ::
>>>> <html><title>r-4-VM</title><frameset><frame src="
>>>> https://192-168-100-159.realhostip.com/ajax?token=CsPhU4m_R2ZoLIdXOtjo3y3humnQN20wt5fSPjbZOHtRh7nli7tiq0ZiWUuwCVIn7VwF6503ziEqCRejlRsVcsyQcUfemTRXlhAOpJUyRugyCuTjmbUIX3EY1cHnFMKwF8FXXZr_PgwyXGPEoOHhkdRgsyRiczbk_Unuh4KmRngATr0FPCLtqhwIMpnbLSYwpnFDz65k9lEJmK6IlXYKVpWXg2rpVEsQvaNlulrZdhMQ7qUbacn82EG43OY8nmwm1SYB8TrUFH5Btb1RHpJm9A
>>>> "></frame></frameset></html>
>>>> 2014-05-20 18:04:29,216 DEBUG [c.c.a.m.AgentManagerImpl]
>>>> (AgentManager-Handler-4:null) SeqA 2-545: Processing Seq 2-545:  { Cmd ,
>>>> MgmtId: -1, via: 2, Ver: v1, Flags: 11,
>>>> [{"com.cloud.agent.api.ConsoleProxyLoadReportCommand":{"_proxyVmId":2,"_loadInfo":"{\n
>>>>  \"connections\": []\n}","wait":0}}] }
>>>>
>>>> If I try to restart the system VMs with cloudstack-sysvmadm, it says:
>>>>
>>>> Stopping and starting 1 secondary storage vm(s)...
>>>> curl: (7) couldn't connect to host
>>>> ERROR: Failed to stop secondary storage vm with id 1
>>>>
>>>> Done stopping and starting secondary storage vm(s)
>>>>
>>>> Stopping and starting 1 console proxy vm(s)...
>>>> curl: (7) couldn't connect to host
>>>> ERROR: Failed to stop console proxy vm with id 2
>>>>
>>>> Done stopping and starting console proxy vm(s) .
>>>>
>>>> Stopping and starting 1 running routing vm(s)...
>>>> curl: (7) couldn't connect to host
>>>> 2
>>>> Done restarting router(s).
>>>>
>>>> I notice there are now four entries for the same management server in
>>>> the mshost table, and they all are in an "Up" state and the "removed" field
>>>> is NULL.  What's wrong with this system?
>>>>
>>>
>>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message