List moderator: feel free to delete my previous large message with attachments that's in the moderation queue...it's now obsolete anyway.
I have found a fix/workaround for my reboot issues with Xen 4.6.3-12 + Kernel 4.9.13:
Once I finally got serial output all the way through the boot process (xen+dom0) I discovered the stack trace:
[Firmware Bug]: CPU7: APIC id mismatch. Firmware: 0 APIC: 7 installing Xen timer for CPU 8 [Firmware Bug]: CPU8: APIC id mismatch. Firmware: 0 APIC: 20 smpboot: Package 1 of CPU 8 exceeds BIOS package data 1. ------------[ cut here ]------------ kernel BUG at arch/x86/kernel/cpu/common.c:997! invalid opcode: 0000 [#1] SMP Modules linked in: CPU: 8 PID: 0 Comm: swapper/8 Not tainted 4.9.13-22.el7.x86_64 #1 Hardware name: Supermicro X9DRT/X9DRT, BIOS 3.2a 08/04/2015 random: fast init done task: ffff880058a8c4c0 task.stack: ffffc900400b4000 RIP: e030:[<ffffffff8103e527>] [<ffffffff8103e527>] identify_secondary_cpu+0x57/0x80 RSP: e02b:ffffc900400b7f08 EFLAGS: 00010086 RAX: 00000000ffffffe4 RBX: ffff88005d80a020 RCX: ffffffff81c5be68 RDX: 0000000000000001 RSI: 0000000000000005 RDI: 0000000000000005 RBP: ffffc900400b7f18 R08: 00000000000000cb R09: 0000000000000004 R10: 0000000000000000 R11: 0000000000000006 R12: 0000000000000008 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff88005d800000(0000) knlGS:0000000000000000 CS: e033 DS: 002b ES: 002b CR0: 0000000080050033 CR2: 0000000000000000 CR3: 0000000001c07000 CR4: 0000000000042660 Stack: 0000000000000008 0000000000000000 ffffc900400b7f28 ffffffff8104e94e ffffc900400b7f40 ffffffff81029925 0000000000000000 ffffc900400b7f50 ffffffff810299a0 0000000000000000 0000000000000000 0000000000000000 Call Trace: [<ffffffff8104e94e>] smp_store_cpu_info+0x3e/0x40 [<ffffffff81029925>] cpu_bringup+0x35/0x90 [<ffffffff810299a0>] cpu_bringup_and_idle+0x20/0x40 Code: 44 89 e7 ff 50 68 0f b7 93 d2 00 00 00 39 d0 75 1c 0f b7 bb da 00 00 00 44 89 e6 e8 24 03 01 00 85 c0 75 07 5b 41 5c 5d c3 0f 0b <0f> 0b 0f b7 8b d4 00 00 00 89 c2 44 89 e6 48 c7 c7 98 87 a6 81 RIP [<ffffffff8103e527>] identify_secondary_cpu+0x57/0x80 RSP <ffffc900400b7f08> ---[ end trace dc5563100443876e ]---
I surmised that reducing the number of dom0 vcpu might solve this issue (they were unbounded)
In testing adding "dom0_max_vcpus=4 dom0_vcpus_pin" to the GRUB_CMDLINE_XEN_DEFAULT line in /etc/defaults/grub and re-running grub2-mkconfig has resulted in the system I have that never booted Xen 4.6.3-12 + Kernel 4.9.13, booting every single time out of 5-10 tests.
So...I don't know if there's a race condition somewhere, or what...but...so far this workaround has not failed me.
Thanks, -Dave
On Fri, Apr 7, 2017 at 6:58 AM, PJ Welsh <pjwelsh at gmail.com
wrote: I've not gotten any bites from my posting on the xen-devel mailing list. Here is the only one to-date: https://lists.xen.org/archives/html/xen-devel/2017-04/msg01069.html
From that email, there needs to be some hypervisor messages.
Does anyone know how to produce the hypervisor messages? I've already
removed the rhgb and quiet options from the boot.
Thanks PJ
I spoke too soon. To get more information: Please see
https://wiki.xenproject.org/wiki/Reporting_Bugs_against_Xen_Project
and
https://wiki.xenproject.org/wiki/Xen_Serial_Console
or alternatively at least add "vga=keep".
pjwelsh