28 Commits

Author SHA1 Message Date
Brendan Cunningham
3a9b874192 nvidia_p2p_get_pages(): Fix double-free in register-callback error path
Double-free in rm_p2p_register_callback() error-path in
nv_p2p_get_pages() causes memory corruption that leads to a kernel
panic.

Fix this by adding a separate goto for this error path that skips
freeing the already-freed memory.

Double-free can be produced by calling nvidia_p2p_get_pages() on one CPU
while simultaneously freeing the GPU virtual address range passed into
nvidia_p2p_get_pages() on another CPU. Producing the double-free is
timing dependent and may require multiple tries.

'slub_debug=FZ' kernel boot parameter shows the double-free:

  [  239.115091] =============================================================================
  [  239.124659] BUG kmalloc-16 (Tainted: G           OE     ): Object already free
  [  239.133011] -----------------------------------------------------------------------------

  [  239.144491] Slab 0xfffffa8bc4434140 objects=85 used=82 fp=0xffff9a3dd0d05910 flags=0x17ffffc0000200(slab|node=0|zone=2|lastcpupid=0x1fffff)
  [  239.158997] Object 0xffff9a3dd0d05670 @offset=1648 fp=0x0000000000000000

  [  239.168766] Redzone  ffff9a3dd0d05660: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb  ................
  [  239.179633] Object   ffff9a3dd0d05670: 10 00 00 00 00 00 00 00 e5 04 3f 13 96 18 8e 47  ..........?....G
  [  239.190641] Redzone  ffff9a3dd0d05680: bb bb bb bb bb bb bb bb                          ........
  [  239.200739] Padding  ffff9a3dd0d05688: 84 80 0e 00 00 00 00 00                          ........
  [  239.210938] CPU: 0 PID: 3150 Comm: hfi-sdma-test Kdump: loaded Tainted: G           OE      6.5.0-rc1+ #1
  [  239.221911] Hardware name: Intel Corporation S2600CWR/S2600CWR, BIOS SE5C610.86B.01.01.1029.090220201031 09/02/2020
  [  239.233948] Call Trace:
  [  239.236992]  <TASK>
  [  239.239608]  dump_stack_lvl+0x33/0x50
  [  239.244010]  object_err+0x3a/0x80
  [  239.248014]  free_debug_processing+0x265/0x360
  [  239.253392]  ? nv_p2p_get_pages+0x163/0x590 [nvidia]
  [  239.259399]  free_to_partial_list+0x80/0x280
  [  239.264478]  ? nv_p2p_get_pages+0x163/0x590 [nvidia]
  [  239.270426]  nv_p2p_get_pages+0x163/0x590 [nvidia]
  [  239.276303]  ? __pfx_remove_nvidia_pages+0x10/0x10 [hfi1]
  [  239.282692]  nvidia_p2p_get_pages+0x25/0x40 [nvidia]
  [  239.288601]  ? __pfx_remove_nvidia_pages+0x10/0x10 [hfi1]
  ...
  [  239.498990]  </TASK>
  [  239.501662] Disabling lock debugging due to kernel taint
  [  239.507828] FIX kmalloc-16: Object at 0xffff9a3dd0d05670 not freed

Signed-off-by: Brendan Cunningham <bcunningham@cornelisnetworks.com>
2023-09-11 10:24:47 -04:00
Bernhard Stoeckner
a8e01be6b2
535.104.05 2023-08-22 15:09:37 +02:00
Bernhard Stoeckner
12c0739352
535.98 2023-08-08 18:28:38 +02:00
Bernhard Stoeckner
29f830f1bb
535.86.10 2023-07-31 18:17:14 +02:00
Bernhard Stoeckner
337e28efda
535.86.05 2023-07-18 16:00:22 +02:00
Andy Ritger
26458140be
535.54.03 2023-06-14 12:37:59 -07:00
Andy Ritger
eb5c7665a1
535.43.02 2023-05-30 10:11:36 -07:00
Andy Ritger
6dd092ddb7
530.41.03 2023-03-23 11:00:12 -07:00
Andy Ritger
4397463e73
530.30.02 2023-02-28 11:12:44 -08:00
Andy Ritger
e598191e8e
525.89.02 2023-02-08 10:15:15 -08:00
Maneet Singh
1dc88ff75e
525.85.12 2023-01-30 16:30:12 -08:00
Andy Ritger
811073c51e
525.85.05 2023-01-19 10:41:59 -08:00
Andy Ritger
dac2350c7f
525.78.01 2023-01-05 10:40:27 -08:00
Andy Ritger
9594cc0169
525.60.13 2022-12-05 10:49:53 -08:00
Andy Ritger
5f40a5aee5
525.60.11 2022-11-28 13:39:27 -08:00
Andy Ritger
758b4ee818
525.53 2022-11-10 08:39:33 -08:00
Andy Ritger
7c345b838b
520.56.06 2022-10-12 10:30:46 -07:00
Andy Ritger
90eb10774f
520.61.05 2022-10-10 14:59:24 -07:00
Andy Ritger
fe0728787f
515.76 2022-09-20 13:54:59 -07:00
Andy Ritger
9855350159
515.65.01 2022-08-02 08:35:13 -07:00
Joshua Ashton
28d2504766 nv-pci: Fix nullptr dereference if device was not found
Closes #57
2022-08-02 08:28:14 -07:00
Andy Ritger
94eaea9726
515.57 2022-06-28 08:00:06 -07:00
Andy Ritger
965db98552
515.48.07 2022-05-27 16:40:24 -07:00
nitepone
af26e1ea89 Remove trailing whitespace from conftest 2022-05-24 17:29:46 -07:00
nitepone
a9924b6fd3 Remove non-posix local usage from conftest 2022-05-24 17:29:46 -07:00
nitepone
e543b69fb0 Fix shellcheck errors in conftest 2022-05-24 17:29:46 -07:00
Filip Fedoryszyn
eb960e2f2a Fixed some typos 2022-05-12 22:16:20 -07:00
Andy Ritger
1739a20efc
515.43.04 2022-05-09 13:18:59 -07:00