summaryrefslogtreecommitdiff
blob: 81e010ba5362f62450a362bda19ade4d63d991b1 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
From 7b5155a79ea946dd513847d4e7ad2b7e6a4ebb73 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
Date: Tue, 5 Sep 2023 08:45:29 +0200
Subject: [PATCH 15/55] xen/vcpu: ignore VCPU_SSHOTTMR_future
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The usage of VCPU_SSHOTTMR_future in Linux prior to 4.7 is bogus.
When the hypervisor returns -ETIME (timeout in the past) Linux keeps
retrying to setup the timer with a higher timeout instead of
self-injecting a timer interrupt.

On boxes without any hardware assistance for logdirty we have seen HVM
Linux guests < 4.7 with 32vCPUs give up trying to setup the timer when
logdirty is enabled:

CE: Reprogramming failure. Giving up
CE: xen increased min_delta_ns to 1000000 nsec
CE: Reprogramming failure. Giving up
CE: Reprogramming failure. Giving up
CE: xen increased min_delta_ns to 506250 nsec
CE: xen increased min_delta_ns to 759375 nsec
CE: xen increased min_delta_ns to 1000000 nsec
CE: Reprogramming failure. Giving up
CE: Reprogramming failure. Giving up
CE: Reprogramming failure. Giving up
Freezing user space processes ...
INFO: rcu_sched detected stalls on CPUs/tasks: { 14} (detected by 10, t=60002 jiffies, g=4006, c=4005, q=14130)
Task dump for CPU 14:
swapper/14      R  running task        0     0      1 0x00000000
Call Trace:
 [<ffffffff90160f5d>] ? rcu_eqs_enter_common.isra.30+0x3d/0xf0
 [<ffffffff907b9bde>] ? default_idle+0x1e/0xd0
 [<ffffffff90039570>] ? arch_cpu_idle+0x20/0xc0
 [<ffffffff9010820a>] ? cpu_startup_entry+0x14a/0x1e0
 [<ffffffff9005d3a7>] ? start_secondary+0x1f7/0x270
 [<ffffffff900000d5>] ? start_cpu+0x5/0x14
INFO: rcu_sched detected stalls on CPUs/tasks: { 26} (detected by 24, t=60002 jiffies, g=6922, c=6921, q=7013)
Task dump for CPU 26:
swapper/26      R  running task        0     0      1 0x00000000
Call Trace:
 [<ffffffff90160f5d>] ? rcu_eqs_enter_common.isra.30+0x3d/0xf0
 [<ffffffff907b9bde>] ? default_idle+0x1e/0xd0
 [<ffffffff90039570>] ? arch_cpu_idle+0x20/0xc0
 [<ffffffff9010820a>] ? cpu_startup_entry+0x14a/0x1e0
 [<ffffffff9005d3a7>] ? start_secondary+0x1f7/0x270
 [<ffffffff900000d5>] ? start_cpu+0x5/0x14
INFO: rcu_sched detected stalls on CPUs/tasks: { 26} (detected by 24, t=60002 jiffies, g=8499, c=8498, q=7664)
Task dump for CPU 26:
swapper/26      R  running task        0     0      1 0x00000000
Call Trace:
 [<ffffffff90160f5d>] ? rcu_eqs_enter_common.isra.30+0x3d/0xf0
 [<ffffffff907b9bde>] ? default_idle+0x1e/0xd0
 [<ffffffff90039570>] ? arch_cpu_idle+0x20/0xc0
 [<ffffffff9010820a>] ? cpu_startup_entry+0x14a/0x1e0
 [<ffffffff9005d3a7>] ? start_secondary+0x1f7/0x270
 [<ffffffff900000d5>] ? start_cpu+0x5/0x14

Thus leading to CPU stalls and a broken system as a result.

Workaround this bogus usage by ignoring the VCPU_SSHOTTMR_future in
the hypervisor.  Old Linux versions are the only ones known to have
(wrongly) attempted to use the flag, and ignoring it is compatible
with the behavior expected by any guests setting that flag.

Note the usage of the flag has been removed from Linux by commit:

c06b6d70feb3 xen/x86: don't lose event interrupts

Which landed in Linux 4.7.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Henry Wang <Henry.Wang@arm.com> # CHANGELOG
Acked-by: Jan Beulich <jbeulich@suse.com>
master commit: 19c6cbd90965b1440bd551069373d6fa3f2f365d
master date: 2023-05-03 13:36:05 +0200
---
 CHANGELOG.md              |  6 ++++++
 xen/common/domain.c       | 13 ++++++++++---
 xen/include/public/vcpu.h |  5 ++++-
 3 files changed, 20 insertions(+), 4 deletions(-)

diff --git a/CHANGELOG.md b/CHANGELOG.md
index 7f4d0f25e9..bb0eceb69a 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -4,6 +4,12 @@ Notable changes to Xen will be documented in this file.
 
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/)
 
+## [4.17.3](https://xenbits.xen.org/gitweb/?p=xen.git;a=shortlog;h=RELEASE-4.17.3)
+
+### Changed
+ - Ignore VCPUOP_set_singleshot_timer's VCPU_SSHOTTMR_future flag. The only
+   known user doesn't use it properly, leading to in-guest breakage.
+
 ## [4.17.0](https://xenbits.xen.org/gitweb/?p=xen.git;a=shortlog;h=RELEASE-4.17.0) - 2022-12-12
 
 ### Changed
diff --git a/xen/common/domain.c b/xen/common/domain.c
index 53f7e734fe..30c2279673 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -1691,9 +1691,16 @@ long common_vcpu_op(int cmd, struct vcpu *v, XEN_GUEST_HANDLE_PARAM(void) arg)
         if ( copy_from_guest(&set, arg, 1) )
             return -EFAULT;
 
-        if ( (set.flags & VCPU_SSHOTTMR_future) &&
-             (set.timeout_abs_ns < NOW()) )
-            return -ETIME;
+        if ( set.timeout_abs_ns < NOW() )
+        {
+            /*
+             * Simplify the logic if the timeout has already expired and just
+             * inject the event.
+             */
+            stop_timer(&v->singleshot_timer);
+            send_timer_event(v);
+            break;
+        }
 
         migrate_timer(&v->singleshot_timer, smp_processor_id());
         set_timer(&v->singleshot_timer, set.timeout_abs_ns);
diff --git a/xen/include/public/vcpu.h b/xen/include/public/vcpu.h
index 81a3b3a743..a836b264a9 100644
--- a/xen/include/public/vcpu.h
+++ b/xen/include/public/vcpu.h
@@ -150,7 +150,10 @@ typedef struct vcpu_set_singleshot_timer vcpu_set_singleshot_timer_t;
 DEFINE_XEN_GUEST_HANDLE(vcpu_set_singleshot_timer_t);
 
 /* Flags to VCPUOP_set_singleshot_timer. */
- /* Require the timeout to be in the future (return -ETIME if it's passed). */
+ /*
+  * Request the timeout to be in the future (return -ETIME if it's passed)
+  * but can be ignored by the hypervisor.
+  */
 #define _VCPU_SSHOTTMR_future (0)
 #define VCPU_SSHOTTMR_future  (1U << _VCPU_SSHOTTMR_future)
 
-- 
2.42.0