OpenWrt – Blame information for rev 1
?pathlinks?
Rev | Author | Line No. | Line |
---|---|---|---|
1 | office | 1 | From 7cd6dca3600d8d71328950216688ecd00015d1ce Mon Sep 17 00:00:00 2001 |
2 | From: Samuel Holland <samuel@sholland.org> |
||
3 | Date: Sat, 12 Jan 2019 20:17:18 -0600 |
||
4 | Subject: [PATCH] clocksource/drivers/arch_timer: Workaround for Allwinner A64 |
||
5 | timer instability |
||
6 | MIME-Version: 1.0 |
||
7 | Content-Type: text/plain; charset=UTF-8 |
||
8 | Content-Transfer-Encoding: 8bit |
||
9 | |||
10 | The Allwinner A64 SoC is known[1] to have an unstable architectural |
||
11 | timer, which manifests itself most obviously in the time jumping forward |
||
12 | a multiple of 95 years[2][3]. This coincides with 2^56 cycles at a |
||
13 | timer frequency of 24 MHz, implying that the time went slightly backward |
||
14 | (and this was interpreted by the kernel as it jumping forward and |
||
15 | wrapping around past the epoch). |
||
16 | |||
17 | Investigation revealed instability in the low bits of CNTVCT at the |
||
18 | point a high bit rolls over. This leads to power-of-two cycle forward |
||
19 | and backward jumps. (Testing shows that forward jumps are about twice as |
||
20 | likely as backward jumps.) Since the counter value returns to normal |
||
21 | after an indeterminate read, each "jump" really consists of both a |
||
22 | forward and backward jump from the software perspective. |
||
23 | |||
24 | Unless the kernel is trapping CNTVCT reads, a userspace program is able |
||
25 | to read the register in a loop faster than it changes. A test program |
||
26 | running on all 4 CPU cores that reported jumps larger than 100 ms was |
||
27 | run for 13.6 hours and reported the following: |
||
28 | |||
29 | Count | Event |
||
30 | -------+--------------------------- |
||
31 | 9940 | jumped backward 699ms |
||
32 | 268 | jumped backward 1398ms |
||
33 | 1 | jumped backward 2097ms |
||
34 | 16020 | jumped forward 175ms |
||
35 | 6443 | jumped forward 699ms |
||
36 | 2976 | jumped forward 1398ms |
||
37 | 9 | jumped forward 356516ms |
||
38 | 9 | jumped forward 357215ms |
||
39 | 4 | jumped forward 714430ms |
||
40 | 1 | jumped forward 3578440ms |
||
41 | |||
42 | This works out to a jump larger than 100 ms about every 5.5 seconds on |
||
43 | each CPU core. |
||
44 | |||
45 | The largest jump (almost an hour!) was the following sequence of reads: |
||
46 | 0x0000007fffffffff → 0x00000093feffffff → 0x0000008000000000 |
||
47 | |||
48 | Note that the middle bits don't necessarily all read as all zeroes or |
||
49 | all ones during the anomalous behavior; however the low 10 bits checked |
||
50 | by the function in this patch have never been observed with any other |
||
51 | value. |
||
52 | |||
53 | Also note that smaller jumps are much more common, with backward jumps |
||
54 | of 2048 (2^11) cycles observed over 400 times per second on each core. |
||
55 | (Of course, this is partially explained by lower bits rolling over more |
||
56 | frequently.) Any one of these could have caused the 95 year time skip. |
||
57 | |||
58 | Similar anomalies were observed while reading CNTPCT (after patching the |
||
59 | kernel to allow reads from userspace). However, the CNTPCT jumps are |
||
60 | much less frequent, and only small jumps were observed. The same program |
||
61 | as before (except now reading CNTPCT) observed after 72 hours: |
||
62 | |||
63 | Count | Event |
||
64 | -------+--------------------------- |
||
65 | 17 | jumped backward 699ms |
||
66 | 52 | jumped forward 175ms |
||
67 | 2831 | jumped forward 699ms |
||
68 | 5 | jumped forward 1398ms |
||
69 | |||
70 | Further investigation showed that the instability in CNTPCT/CNTVCT also |
||
71 | affected the respective timer's TVAL register. The following values were |
||
72 | observed immediately after writing CNVT_TVAL to 0x10000000: |
||
73 | |||
74 | CNTVCT | CNTV_TVAL | CNTV_CVAL | CNTV_TVAL Error |
||
75 | --------------------+------------+--------------------+----------------- |
||
76 | 0x000000d4a2d8bfff | 0x10003fff | 0x000000d4b2d8bfff | +0x00004000 |
||
77 | 0x000000d4a2d94000 | 0x0fffffff | 0x000000d4b2d97fff | -0x00004000 |
||
78 | 0x000000d4a2d97fff | 0x10003fff | 0x000000d4b2d97fff | +0x00004000 |
||
79 | 0x000000d4a2d9c000 | 0x0fffffff | 0x000000d4b2d9ffff | -0x00004000 |
||
80 | |||
81 | The pattern of errors in CNTV_TVAL seemed to depend on exactly which |
||
82 | value was written to it. For example, after writing 0x10101010: |
||
83 | |||
84 | CNTVCT | CNTV_TVAL | CNTV_CVAL | CNTV_TVAL Error |
||
85 | --------------------+------------+--------------------+----------------- |
||
86 | 0x000001ac3effffff | 0x1110100f | 0x000001ac4f10100f | +0x1000000 |
||
87 | 0x000001ac40000000 | 0x1010100f | 0x000001ac5110100f | -0x1000000 |
||
88 | 0x000001ac58ffffff | 0x1110100f | 0x000001ac6910100f | +0x1000000 |
||
89 | 0x000001ac66000000 | 0x1010100f | 0x000001ac7710100f | -0x1000000 |
||
90 | 0x000001ac6affffff | 0x1110100f | 0x000001ac7b10100f | +0x1000000 |
||
91 | 0x000001ac6e000000 | 0x1010100f | 0x000001ac7f10100f | -0x1000000 |
||
92 | |||
93 | I was also twice able to reproduce the issue covered by Allwinner's |
||
94 | workaround[4], that writing to TVAL sometimes fails, and both CVAL and |
||
95 | TVAL are left with entirely bogus values. One was the following values: |
||
96 | |||
97 | CNTVCT | CNTV_TVAL | CNTV_CVAL |
||
98 | --------------------+------------+-------------------------------------- |
||
99 | 0x000000d4a2d6014c | 0x8fbd5721 | 0x000000d132935fff (615s in the past) |
||
100 | Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> |
||
101 | |||
102 | ======================================================================== |
||
103 | |||
104 | Because the CPU can read the CNTPCT/CNTVCT registers faster than they |
||
105 | change, performing two reads of the register and comparing the high bits |
||
106 | (like other workarounds) is not a workable solution. And because the |
||
107 | timer can jump both forward and backward, no pair of reads can |
||
108 | distinguish a good value from a bad one. The only way to guarantee a |
||
109 | good value from consecutive reads would be to read _three_ times, and |
||
110 | take the middle value only if the three values are 1) each unique and |
||
111 | 2) increasing. This takes at minimum 3 counter cycles (125 ns), or more |
||
112 | if an anomaly is detected. |
||
113 | |||
114 | However, since there is a distinct pattern to the bad values, we can |
||
115 | optimize the common case (1022/1024 of the time) to a single read by |
||
116 | simply ignoring values that match the error pattern. This still takes no |
||
117 | more than 3 cycles in the worst case, and requires much less code. As an |
||
118 | additional safety check, we still limit the loop iteration to the number |
||
119 | of max-frequency (1.2 GHz) CPU cycles in three 24 MHz counter periods. |
||
120 | |||
121 | For the TVAL registers, the simple solution is to not use them. Instead, |
||
122 | read or write the CVAL and calculate the TVAL value in software. |
||
123 | |||
124 | Although the manufacturer is aware of at least part of the erratum[4], |
||
125 | there is no official name for it. For now, use the kernel-internal name |
||
126 | "UNKNOWN1". |
||
127 | |||
128 | [1]: https://github.com/armbian/build/commit/a08cd6fe7ae9 |
||
129 | [2]: https://forum.armbian.com/topic/3458-a64-datetime-clock-issue/ |
||
130 | [3]: https://irclog.whitequark.org/linux-sunxi/2018-01-26 |
||
131 | [4]: https://github.com/Allwinner-Homlet/H6-BSP4.9-linux/blob/master/drivers/clocksource/arm_arch_timer.c#L272 |
||
132 | |||
133 | Acked-by: Maxime Ripard <maxime.ripard@bootlin.com> |
||
134 | Tested-by: Andre Przywara <andre.przywara@arm.com> |
||
135 | Signed-off-by: Samuel Holland <samuel@sholland.org> |
||
136 | Cc: stable@vger.kernel.org |
||
137 | Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> |
||
138 | --- |
||
139 | Documentation/arm64/silicon-errata.txt | 2 + |
||
140 | drivers/clocksource/Kconfig | 10 +++++ |
||
141 | drivers/clocksource/arm_arch_timer.c | 55 ++++++++++++++++++++++++++ |
||
142 | 3 files changed, 67 insertions(+) |
||
143 | |||
144 | --- a/Documentation/arm64/silicon-errata.txt |
||
145 | +++ b/Documentation/arm64/silicon-errata.txt |
||
146 | @@ -44,6 +44,8 @@ stable kernels. |
||
147 | |||
148 | | Implementor | Component | Erratum ID | Kconfig | |
||
149 | +----------------+-----------------+-----------------+-----------------------------+ |
||
150 | +| Allwinner | A64/R18 | UNKNOWN1 | SUN50I_ERRATUM_UNKNOWN1 | |
||
151 | +| | | | | |
||
152 | | ARM | Cortex-A53 | #826319 | ARM64_ERRATUM_826319 | |
||
153 | | ARM | Cortex-A53 | #827319 | ARM64_ERRATUM_827319 | |
||
154 | | ARM | Cortex-A53 | #824069 | ARM64_ERRATUM_824069 | |
||
155 | --- a/drivers/clocksource/Kconfig |
||
156 | +++ b/drivers/clocksource/Kconfig |
||
157 | @@ -365,6 +365,16 @@ config ARM64_ERRATUM_858921 |
||
158 | The workaround will be dynamically enabled when an affected |
||
159 | core is detected. |
||
160 | |||
161 | +config SUN50I_ERRATUM_UNKNOWN1 |
||
162 | + bool "Workaround for Allwinner A64 erratum UNKNOWN1" |
||
163 | + default y |
||
164 | + depends on ARM_ARCH_TIMER && ARM64 && ARCH_SUNXI |
||
165 | + select ARM_ARCH_TIMER_OOL_WORKAROUND |
||
166 | + help |
||
167 | + This option enables a workaround for instability in the timer on |
||
168 | + the Allwinner A64 SoC. The workaround will only be active if the |
||
169 | + allwinner,erratum-unknown1 property is found in the timer node. |
||
170 | + |
||
171 | config ARM_GLOBAL_TIMER |
||
172 | bool "Support for the ARM global timer" if COMPILE_TEST |
||
173 | select TIMER_OF if OF |
||
174 | --- a/drivers/clocksource/arm_arch_timer.c |
||
175 | +++ b/drivers/clocksource/arm_arch_timer.c |
||
176 | @@ -319,6 +319,48 @@ static u64 notrace arm64_858921_read_cnt |
||
177 | } |
||
178 | #endif |
||
179 | |||
180 | +#ifdef CONFIG_SUN50I_ERRATUM_UNKNOWN1 |
||
181 | +/* |
||
182 | + * The low bits of the counter registers are indeterminate while bit 10 or |
||
183 | + * greater is rolling over. Since the counter value can jump both backward |
||
184 | + * (7ff -> 000 -> 800) and forward (7ff -> fff -> 800), ignore register values |
||
185 | + * with all ones or all zeros in the low bits. Bound the loop by the maximum |
||
186 | + * number of CPU cycles in 3 consecutive 24 MHz counter periods. |
||
187 | + */ |
||
188 | +#define __sun50i_a64_read_reg(reg) ({ \ |
||
189 | + u64 _val; \ |
||
190 | + int _retries = 150; \ |
||
191 | + \ |
||
192 | + do { \ |
||
193 | + _val = read_sysreg(reg); \ |
||
194 | + _retries--; \ |
||
195 | + } while (((_val + 1) & GENMASK(9, 0)) <= 1 && _retries); \ |
||
196 | + \ |
||
197 | + WARN_ON_ONCE(!_retries); \ |
||
198 | + _val; \ |
||
199 | +}) |
||
200 | + |
||
201 | +static u64 notrace sun50i_a64_read_cntpct_el0(void) |
||
202 | +{ |
||
203 | + return __sun50i_a64_read_reg(cntpct_el0); |
||
204 | +} |
||
205 | + |
||
206 | +static u64 notrace sun50i_a64_read_cntvct_el0(void) |
||
207 | +{ |
||
208 | + return __sun50i_a64_read_reg(cntvct_el0); |
||
209 | +} |
||
210 | + |
||
211 | +static u32 notrace sun50i_a64_read_cntp_tval_el0(void) |
||
212 | +{ |
||
213 | + return read_sysreg(cntp_cval_el0) - sun50i_a64_read_cntpct_el0(); |
||
214 | +} |
||
215 | + |
||
216 | +static u32 notrace sun50i_a64_read_cntv_tval_el0(void) |
||
217 | +{ |
||
218 | + return read_sysreg(cntv_cval_el0) - sun50i_a64_read_cntvct_el0(); |
||
219 | +} |
||
220 | +#endif |
||
221 | + |
||
222 | #ifdef CONFIG_ARM_ARCH_TIMER_OOL_WORKAROUND |
||
223 | DEFINE_PER_CPU(const struct arch_timer_erratum_workaround *, timer_unstable_counter_workaround); |
||
224 | EXPORT_SYMBOL_GPL(timer_unstable_counter_workaround); |
||
225 | @@ -408,6 +450,19 @@ static const struct arch_timer_erratum_w |
||
226 | .read_cntvct_el0 = arm64_858921_read_cntvct_el0, |
||
227 | }, |
||
228 | #endif |
||
229 | +#ifdef CONFIG_SUN50I_ERRATUM_UNKNOWN1 |
||
230 | + { |
||
231 | + .match_type = ate_match_dt, |
||
232 | + .id = "allwinner,erratum-unknown1", |
||
233 | + .desc = "Allwinner erratum UNKNOWN1", |
||
234 | + .read_cntp_tval_el0 = sun50i_a64_read_cntp_tval_el0, |
||
235 | + .read_cntv_tval_el0 = sun50i_a64_read_cntv_tval_el0, |
||
236 | + .read_cntpct_el0 = sun50i_a64_read_cntpct_el0, |
||
237 | + .read_cntvct_el0 = sun50i_a64_read_cntvct_el0, |
||
238 | + .set_next_event_phys = erratum_set_next_event_tval_phys, |
||
239 | + .set_next_event_virt = erratum_set_next_event_tval_virt, |
||
240 | + }, |
||
241 | +#endif |
||
242 | }; |
||
243 | |||
244 | typedef bool (*ate_match_fn_t)(const struct arch_timer_erratum_workaround *, |