Slightly downgrade CUDA version to fix overaggressive BFC (#22589)

* Downgrade CUDA to support 50 series without errors

* Update speed
This commit is contained in:
Nicolas Mowen 2026-03-23 08:13:32 -06:00 committed by GitHub
parent d8f599c377
commit 2c9a25e678
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
2 changed files with 13 additions and 13 deletions

View File

@ -1,18 +1,18 @@
# Nvidia ONNX Runtime GPU Support
--extra-index-url 'https://pypi.nvidia.com'
cython==3.0.*; platform_machine == 'x86_64'
nvidia-cuda-cupti-cu12==12.9.79; platform_machine == 'x86_64'
nvidia-cublas-cu12==12.9.1.*; platform_machine == 'x86_64'
nvidia-cudnn-cu12==9.19.0.*; platform_machine == 'x86_64'
nvidia-cufft-cu12==11.4.1.*; platform_machine == 'x86_64'
nvidia-curand-cu12==10.3.10.*; platform_machine == 'x86_64'
nvidia-cuda-nvcc-cu12==12.9.86; platform_machine == 'x86_64'
nvidia-cuda-nvrtc-cu12==12.9.86; platform_machine == 'x86_64'
nvidia-cuda-runtime-cu12==12.9.79; platform_machine == 'x86_64'
nvidia-cusolver-cu12==11.7.5.*; platform_machine == 'x86_64'
nvidia-cusparse-cu12==12.5.10.*; platform_machine == 'x86_64'
nvidia-nccl-cu12==2.29.7; platform_machine == 'x86_64'
nvidia-nvjitlink-cu12==12.9.86; platform_machine == 'x86_64'
nvidia-cuda-cupti-cu12==12.8.90; platform_machine == 'x86_64'
nvidia-cublas-cu12==12.8.4.1; platform_machine == 'x86_64'
nvidia-cudnn-cu12==9.8.0.87; platform_machine == 'x86_64'
nvidia-cufft-cu12==11.3.3.83; platform_machine == 'x86_64'
nvidia-curand-cu12==10.3.9.90; platform_machine == 'x86_64'
nvidia-cuda-nvcc-cu12==12.8.93; platform_machine == 'x86_64'
nvidia-cuda-nvrtc-cu12==12.8.93; platform_machine == 'x86_64'
nvidia-cuda-runtime-cu12==12.8.90; platform_machine == 'x86_64'
nvidia-cusolver-cu12==11.7.3.90; platform_machine == 'x86_64'
nvidia-cusparse-cu12==12.5.8.93; platform_machine == 'x86_64'
nvidia-nccl-cu12==2.26.2.post1; platform_machine == 'x86_64'
nvidia-nvjitlink-cu12==12.8.93; platform_machine == 'x86_64'
onnx==1.16.*; platform_machine == 'x86_64'
onnxruntime-gpu==1.24.*; platform_machine == 'x86_64'
protobuf==3.20.3; platform_machine == 'x86_64'

View File

@ -205,7 +205,7 @@ Inference is done with the `onnx` detector type. Speeds will vary greatly depend
| GTX 1070 | s-320: 16 ms | | 320: 14 ms |
| RTX 3050 | t-320: 8 ms s-320: 10 ms s-640: 28 ms | Nano-320: ~ 12 ms | 320: ~ 10 ms 640: ~ 16 ms |
| RTX 3070 | t-320: 6 ms s-320: 8 ms s-640: 25 ms | Nano-320: ~ 9 ms | 320: ~ 8 ms 640: ~ 14 ms |
| RTX 5060 Ti | t-320: 5 ms s-320: 7 ms s-640: 22 ms | Nano-320: ~ 6 ms | |
| RTX 5060 Ti | t-320: 5 ms s-320: 7 ms s-640: 22 ms | Nano-320: ~ 4 ms | |
| RTX A4000 | | | 320: ~ 15 ms |
| Tesla P40 | | | 320: ~ 105 ms |