nvidia_p2p_get_pages(): Fix double-free in register-callback error path

Double-free in rm_p2p_register_callback() error-path in
nv_p2p_get_pages() causes memory corruption that leads to a kernel
panic.

Fix this by adding a separate goto for this error path that skips
freeing the already-freed memory.

Double-free can be produced by calling nvidia_p2p_get_pages() on one CPU
while simultaneously freeing the GPU virtual address range passed into
nvidia_p2p_get_pages() on another CPU. Producing the double-free is
timing dependent and may require multiple tries.

'slub_debug=FZ' kernel boot parameter shows the double-free:

  [  239.115091] =============================================================================
  [  239.124659] BUG kmalloc-16 (Tainted: G           OE     ): Object already free
  [  239.133011] -----------------------------------------------------------------------------

  [  239.144491] Slab 0xfffffa8bc4434140 objects=85 used=82 fp=0xffff9a3dd0d05910 flags=0x17ffffc0000200(slab|node=0|zone=2|lastcpupid=0x1fffff)
  [  239.158997] Object 0xffff9a3dd0d05670 @offset=1648 fp=0x0000000000000000

  [  239.168766] Redzone  ffff9a3dd0d05660: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb  ................
  [  239.179633] Object   ffff9a3dd0d05670: 10 00 00 00 00 00 00 00 e5 04 3f 13 96 18 8e 47  ..........?....G
  [  239.190641] Redzone  ffff9a3dd0d05680: bb bb bb bb bb bb bb bb                          ........
  [  239.200739] Padding  ffff9a3dd0d05688: 84 80 0e 00 00 00 00 00                          ........
  [  239.210938] CPU: 0 PID: 3150 Comm: hfi-sdma-test Kdump: loaded Tainted: G           OE      6.5.0-rc1+ #1
  [  239.221911] Hardware name: Intel Corporation S2600CWR/S2600CWR, BIOS SE5C610.86B.01.01.1029.090220201031 09/02/2020
  [  239.233948] Call Trace:
  [  239.236992]  <TASK>
  [  239.239608]  dump_stack_lvl+0x33/0x50
  [  239.244010]  object_err+0x3a/0x80
  [  239.248014]  free_debug_processing+0x265/0x360
  [  239.253392]  ? nv_p2p_get_pages+0x163/0x590 [nvidia]
  [  239.259399]  free_to_partial_list+0x80/0x280
  [  239.264478]  ? nv_p2p_get_pages+0x163/0x590 [nvidia]
  [  239.270426]  nv_p2p_get_pages+0x163/0x590 [nvidia]
  [  239.276303]  ? __pfx_remove_nvidia_pages+0x10/0x10 [hfi1]
  [  239.282692]  nvidia_p2p_get_pages+0x25/0x40 [nvidia]
  [  239.288601]  ? __pfx_remove_nvidia_pages+0x10/0x10 [hfi1]
  ...
  [  239.498990]  </TASK>
  [  239.501662] Disabling lock debugging due to kernel taint
  [  239.507828] FIX kmalloc-16: Object at 0xffff9a3dd0d05670 not freed

Signed-off-by: Brendan Cunningham <bcunningham@cornelisnetworks.com>
This commit is contained in:
Brendan Cunningham 2023-09-07 15:30:52 -04:00
parent a8e01be6b2
commit 3a9b874192

View File

@ -518,7 +518,7 @@ static int nv_p2p_get_pages(
*page_table, nv_p2p_mem_info_free_callback, mem_info);
if (status != NV_OK)
{
goto failed;
goto failed_nofree;
}
}
@ -542,6 +542,7 @@ failed:
os_free_mem(rreqmb_h);
}
failed_nofree:
if (bGetPages)
{
(void)nv_p2p_put_pages(pt_type, sp, p2p_token, va_space,