From "Adrian Sender" <>
Subject Dell R910 - Windows 2012 R2 Cloudstack Xenserver 6.5 Dom0 Kernel Panic
Date Wed, 16 Mar 2016 00:29:43 GMT
Dear List,

In the lab I am experiencing full Dom0 host crashes on Cloudstack 4.5 and
Xenserver 6.5 using the Windows 2012 R2 VHD template from the Microsoft site.

Cannot reproduce in test on Dell R720 or Sun X4470, this appears to be
specific to Windows 2012 R2 on Dell R910 Hardware running older bios 2.10.0
(08/29/2013); this is the latest release bios for Dell R910. Note: issue also
happened with 2.5.0 Bios on Dell R910.

If you are running Dell R910 hardware in production I strongly recommend not
deploying 2012 R2 instance on the hardware.

Panic was result of CPU getting stuck (NMI), Dell does not provide BIOS
upgrade for R 910 but errata for Xeon E7-4800 family exists

(XEN) [100514.040554] Watchdog timer detects that CPU23 is stuck!
(XEN) [100514.040559] ----[ Xen-4.4.1-xs108229 x86_64 debug=n Not tainted ]----
(XEN) [100514.040561] CPU: 23
(XEN) [100514.040562] RIP: e008:[<ffff82d080129984>] _write_lock+0x4/0x50
(XEN) [100514.040570] RFLAGS: 0000000000000286 CONTEXT: hypervisor

(XEN) [100514.040625] Xen call trace:
(XEN) [100514.040627] [<ffff82d080129984>] _write_lock+0x4/0x50
(XEN) [100514.040631] [<ffff82d0801e60f3>] __get_gfn_type_access+0x183/0x200
(XEN) [100514.040635] [<ffff82d0801b866f>] hvm_hap_nested_page_fault+0xaf/0x4d0
(XEN) [100514.040638] [<ffff82d0801dc45f>] vmx_vmexit_handler+0xa8f/0x1910
(XEN) [100514.040640] [<ffff82d08012bced>] add_entry+0x4d/0xc0
(XEN) [100514.040642] [<ffff82d08011f9f0>] csched_tick+0x230/0x3d0
(XEN) [100514.040644] [<ffff82d080128dfe>] schedule+0x75e/0x7d0
(XEN) [100514.040646] [<ffff82d0801cbc42>] pt_update_irq+0x212/0x230
(XEN) [100514.040648] [<ffff82d08011f7c0>] csched_tick+0/0x3d0
(XEN) [100514.040650] [<ffff82d0801c87ef>] vlapic_has_pending_irq+0x5f/0xa0
(XEN) [100514.040652] [<ffff82d0801b4e58>] hvm_io_pending+0x28/0x50
(XEN) [100514.040653] [<ffff82d0801da1ff>] vmx_vmenter_helper+0xcf/0x1b0
(XEN) [100514.040655] [<ffff82d0801e3901>] vmx_asm_vmexit_handler+0x41/0xc0
(XEN) [100514.040655]
(XEN) [100514.040658]
(XEN) [100514.040659] ****************************************
(XEN) [100514.040659] Panic on CPU 23:
(XEN) [100514.040660] FATAL TRAP: vector = 2 (nmi)
(XEN) [100514.040661] [error_code=0000]
(XEN) [100514.040661] ****************************************
(XEN) [100514.040662]
(XEN) [100514.040663] Reboot in five seconds...
(XEN) [100514.040665] Executing kexec image on cpu23
(XEN) [100514.042497] Shot down all CPUs

DomU (id 45 at the time of crash) caused a (HAP - hardware assisted paging
error) (Windows 2012 R2)

Very similar to errata -> BP57

Call Trace:
[ffff82d080113040] elf_core_save_regs+0/0xb0
ffff82d0801140ec kexec_crash+0x4c/0x50
ffff82d08013f080 panic+0xf0/0x120
ffff82d0801885f2 show_stack+0x162/0x1b0
ffff82d080188c28 fatal_trap+0x78/0xb0
ffff82d08017ef09 nmi_watchdog_tick+0x1d9/0x200
ffff82d0801892db do_nmi+0xeb/0x1a0
ffff82d080220874 handle_ist_exception+0x8a/0xf6

NMI interrupted Code at e008:ffff82d080129984 and Stack at 0000:ffff831033c97c50

[ffff82d080129984] _write_lock+0x4/0x50
ffff82d0801e60f3 __get_gfn_type_access+0x183/0x200
ffff82d0801b866f hvm_hap_nested_page_fault+0xaf/0x4d0
ffff82d0801dc45f vmx_vmexit_handler+0xa8f/0x1910
ffff82d08012bced add_entry+0x4d/0xc0
ffff82d08011f9f0 csched_tick+0x230/0x3d0
ffff82d080128dfe schedule+0x75e/0x7d0
ffff82d0801cbc42 pt_update_irq+0x212/0x230
ffff82d08011f7c0 csched_tick+0/0x3d0
ffff82d0801c87ef vlapic_has_pending_irq+0x5f/0xa0
ffff82d0801b4e58 hvm_io_pending+0x28/0x50
ffff82d0801da1ff vmx_vmenter_helper+0xcf/0x1b0
ffff82d0801e3901 vmx_asm_vmexit_handler+0x41/0xc0

