> 技术文档 > 解决多卡情况下运行llamafactory报错问题

解决多卡情况下运行llamafactory报错问题

使用llamafactory多卡会报错

Traceback (most recent call last):      File \"/home/Mmm/anaconda3/envs/llama/bin/llamafactory-cli\", line 8, in         sys.exit(main())      File \"/data/Mmm/LLaMA-Factory/src/llamafactory/cli.py\", line 130, in main        process = subprocess.run(      File \"/home/Mmm/anaconda3/envs/llama/lib/python3.10/subprocess.py\", line 526, in run        raise CalledProcessError(retcode, process.args,    subprocess.CalledProcessError: Command \'[\'torchrun\', \'--nnodes\', \'1\', \'--node_rank\', \'0\', \'--nproc_per_node\', \'4\', \'--master_addr\', \'127.0.0.1\', \'--master_port\', \'33837\', \'/data/Mmm/LLaMA-Factory/src/llamafactory/launcher.py\', \'/data/Mmm/Params/train_2025-07-23-13-30-31/training_args.yaml\']\' returned non-zero exit status 1.

解决方法:在LLaMA-Factory/src/llamafactory/launcher.py这个文件中添加

import os    # 指定使用的GPU ID(例如只使用第0号GPU)    os.environ[\"CUDA_VISIBLE_DEVICES\"] = \"0\"

然后在终端运行llamafactory-cli webui之前,先依次运行这两条命令

 export NCCL_P2P_DISABLE=1 export NCCL_IB_DISABLE=1

否则会报错:
      File \"/home/Mmm/anaconda3/envs/llama/lib/python3.10/site-packages/accelerate/state.py\", line 311, in __init__
        raise NotImplementedError(
    NotImplementedError: Using RTX 4000 series doesn\'t support faster communication broadband via P2P or IB. Please set `NCCL_P2P_DISABLE=\"1\"` and `NCCL_IB_DISABLE=\"1\" or use `accelerate launch` which will do this automatically.

   根据上面的步骤就可以解决不使用多卡的报错问题