Bug 273982 - x11/nvidia-driver-390: patches for user pages handling
Summary: x11/nvidia-driver-390: patches for user pages handling
Status: Open
Alias: None
Product: Ports & Packages
Classification: Unclassified
Component: Individual Port(s) (show other bugs)
Version: Latest
Hardware: amd64 Any
: --- Affects Some People
Assignee: Alexey Dokuchaev
URL:
Keywords: needs-qa
Depends on:
Blocks:
 
Reported: 2023-09-20 18:04 UTC by jinxiaoyong
Modified: 2023-09-21 02:03 UTC (History)
2 users (show)

See Also:
bugzilla: maintainer-feedback? (danfe)
grahamperrin: maintainer-feedback? (ashafer)


Attachments
patches for x11/nvidia-driver (7.65 KB, patch)
2023-09-20 18:04 UTC, jinxiaoyong
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description jinxiaoyong 2023-09-20 18:04:00 UTC
Created attachment 245073 [details]
patches for x11/nvidia-driver

I backported some of the user pages handling code from 535.98 to this 390.154 driver. With these changes, clpeak runs under libc6-shim

$ NVIDIA_LIB64_DIR=/compat/bookworm/lib/x86_64-linux-gnu /usr/local/bin/nv-sglrun /usr/local/bin/clpeak
shim init

Platform: NVIDIA CUDA
  Device: Quadro 600
    Driver version  : 390.154 (FreeBSD)
    Compute units   : 2
    Clock frequency : 1280 MHz

    Global memory bandwidth (GBPS)
      float   : 19.77
      float2  : 19.99
      float4  : 20.15
      float8  : 17.74
      float16 : 10.88

    Single-precision compute (GFLOPS)
      float   : 161.39
      float2  : 239.89
      float4  : 232.48
      float8  : 223.77
      float16 : 229.75

    No half precision support! Skipped

    Double-precision compute (GFLOPS)
      double   : 20.49
      double2  : 20.47
      double4  : 20.43
      double8  : 20.35
      double16 : 19.06

    Integer compute (GIOPS)
      int   : 81.53
      int2  : 81.46
      int4  : 81.55
      int8  : 81.64
      int16 : 81.51

    Integer compute Fast 24bit (GIOPS)
      int   : 81.68
      int2  : 81.66
      int4  : 81.52
      int8  : 81.68
      int16 : 81.67

    Transfer bandwidth (GBPS)
      enqueueWriteBuffer              : 6.46
      enqueueReadBuffer               : 6.33
      enqueueWriteBuffer non-blocking : 0.02
      enqueueReadBuffer non-blocking  : 0.02
      enqueueMapBuffer(for read)      : 6.28
        memcpy from mapped ptr        : 9.66
      enqueueUnmap(after write)       : 6.65
        memcpy to mapped ptr          : 9.69

    Kernel launch latency : 5.39 us
Comment 1 Austin Shafer 2023-09-21 02:03:10 UTC
> ++    // NvBool write = FLD_TEST_DRF(_LOCK_USER_PAGES, _FLAGS, _WRITE, _YES, flags);
> ++    // vm_prot_t prot =  write ? (VM_PROT_READ | VM_PROT_WRITE) : VM_PROT_READ;
> ++    vm_prot_t prot = VM_PROT_READ | VM_PROT_WRITE;

Ignoring the flags here and forcing read+write seems not great. I'm assuming the line for write doesn't compile or something and that's why you did this?

There's some other parts you've commented out that seem problematic. For example not setting at->num_pages would break nv_get_num_phys_pages(), although I'm guessing that's not implemented on 390.

For the bits you've commented out can you please explain why?