> 技术文档 > llama.cpp无法使用gpu的问题_warning: no usable gpu found, --gpu-layers option

llama.cpp无法使用gpu的问题_warning: no usable gpu found, --gpu-layers option

使用cuda编译llama.cpp后,仍然无法使用gpu。

./llama-server -m ../../../../../model/hf_models/qwen/qwen3-4b-q8_0.gguf  -ngl 40

报错如下

ggml_cuda_init: failed to initialize CUDA: forward compatibility was attempted on non supported HW
warning: no usable GPU found, --gpu-layers option will be ignored
warning: one possible reason is that llama.cpp was compiled without GPU support
warning: consult docs/build.md for compilation instructions
 

使用nvidia-smi

$ nvidia-smi 
Failed to initialize NVML: Driver/library version mismatch
NVML library version: 550.144

重启即可解决问题

./llama-server -m ../../../../../model/hf_models/qwen/qwen3-4b-q8_0.gguf  -ngl 40
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GeForce GTX 1660 Ti, compute capability 7.5, VMM: yes
...

load_tensors: offloading 36 repeating layers to GPU
load_tensors: offloading output layer to GPU
load_tensors: offloaded 37/37 layers to GPU
load_tensors:        CUDA0 model buffer size =  4076.43 MiB
load_tensors:   CPU_Mapped model buffer size =   394.12 MiB