Created attachment 245073 [details] patches for x11/nvidia-driver I backported some of the user pages handling code from 535.98 to this 390.154 driver. With these changes, clpeak runs under libc6-shim $ NVIDIA_LIB64_DIR=/compat/bookworm/lib/x86_64-linux-gnu /usr/local/bin/nv-sglrun /usr/local/bin/clpeak shim init Platform: NVIDIA CUDA Device: Quadro 600 Driver version : 390.154 (FreeBSD) Compute units : 2 Clock frequency : 1280 MHz Global memory bandwidth (GBPS) float : 19.77 float2 : 19.99 float4 : 20.15 float8 : 17.74 float16 : 10.88 Single-precision compute (GFLOPS) float : 161.39 float2 : 239.89 float4 : 232.48 float8 : 223.77 float16 : 229.75 No half precision support! Skipped Double-precision compute (GFLOPS) double : 20.49 double2 : 20.47 double4 : 20.43 double8 : 20.35 double16 : 19.06 Integer compute (GIOPS) int : 81.53 int2 : 81.46 int4 : 81.55 int8 : 81.64 int16 : 81.51 Integer compute Fast 24bit (GIOPS) int : 81.68 int2 : 81.66 int4 : 81.52 int8 : 81.68 int16 : 81.67 Transfer bandwidth (GBPS) enqueueWriteBuffer : 6.46 enqueueReadBuffer : 6.33 enqueueWriteBuffer non-blocking : 0.02 enqueueReadBuffer non-blocking : 0.02 enqueueMapBuffer(for read) : 6.28 memcpy from mapped ptr : 9.66 enqueueUnmap(after write) : 6.65 memcpy to mapped ptr : 9.69 Kernel launch latency : 5.39 us
> ++ // NvBool write = FLD_TEST_DRF(_LOCK_USER_PAGES, _FLAGS, _WRITE, _YES, flags); > ++ // vm_prot_t prot = write ? (VM_PROT_READ | VM_PROT_WRITE) : VM_PROT_READ; > ++ vm_prot_t prot = VM_PROT_READ | VM_PROT_WRITE; Ignoring the flags here and forcing read+write seems not great. I'm assuming the line for write doesn't compile or something and that's why you did this? There's some other parts you've commented out that seem problematic. For example not setting at->num_pages would break nv_get_num_phys_pages(), although I'm guessing that's not implemented on 390. For the bits you've commented out can you please explain why?