> 技术文档 > 在华为昇腾服务器Ascend 300I Pro 310P芯片( 310P3)安装QWQ32B大模型以及deepseek蒸馏版!

在华为昇腾服务器Ascend 300I Pro 310P芯片( 310P3)安装QWQ32B大模型以及deepseek蒸馏版!

前提条件:服务器已安装docker
1.下载镜像: 1.0.0-300I-Duo-py311-openeuler24.03-lts
备注:官网镜像下载,需要申请,审批还得1,2天,这时你肯定想骂HW!没事,我已为您准备好了:请发私信!
申请地址: https://www.hiascend.com/developer/ascendhub/detail/af85b724a7e5469ebd7ea13c3439d48f
在华为昇腾服务器Ascend 300I Pro 310P芯片( 310P3)安装QWQ32B大模型以及deepseek蒸馏版!
2.下载模型:魔乐社区(https://modelers.cn/models/Models_Ecosystem/QwQ-32B)

服务器上安装社区下载的比较快:

 pip install modelscope modelscope download \"Qwen/QwQ-32B\" --local_dir \"/home/models/qwq\"

注意事项:模型上传到服务器需要给于模型下config.json权限

chmod 750 config.json

3.docker 启动

注意映射的模型文件到服务器中:

docker run -it -d --net=host --shm-size=50g --privileged --name qwq-i --device=/dev/davinci_manager --device=/dev/hisi_hdc --device=/dev/devmm_svm -v /usr/local/Ascend/driver:/usr/local/Ascend/driver:ro -v /usr/local/sbin:/usr/local/sbin:ro -v /home/models/qwq:/home/models/qwq:rw swr.cn-south-1.myhuaweicloud.com/ascendhub/mindie:1.0.0-300I-Duo-py311-openeuler24.03-lts

4.进入docker容器中(以下的操作全部是在docker容器中

编辑配置文件:

注意点:
ipAddress: 本地服务器IP
httpsEnabled : false, 关闭https
modelName:模型名称
modelWeightPath:模型路径(容器内的)
npuDeviceIds:显卡ID (根据自己情况,npu-smi info 查看)

vim /usr/local/Ascend/mindie/latest/mindie-service/conf/config.json
{ \"Version\" : \"1.1.0\", \"LogConfig\" : { \"logLevel\" : \"Info\", \"logFileSize\" : 20, \"logFileNum\" : 20, \"logPath\" : \"logs/mindservice.log\" }, \"ServerConfig\" : { \"ipAddress\" : \"192.168.0.203\", \"managementIpAddress\" : \"127.0.0.2\", \"port\" : 1025, \"managementPort\" : 1026, \"metricsPort\" : 1027, \"allowAllZeroIpListening\" : false, \"maxLinkNum\" : 1000, \"httpsEnabled\" : false, \"fullTextEnabled\" : false, \"tlsCaPath\" : \"security/ca/\", \"tlsCaFile\" : [\"ca.pem\"], \"tlsCert\" : \"security/certs/server.pem\", \"tlsPk\" : \"security/keys/server.key.pem\", \"tlsPkPwd\" : \"security/pass/key_pwd.txt\", \"tlsCrlPath\" : \"security/certs/\", \"tlsCrlFiles\" : [\"server_crl.pem\"], \"managementTlsCaFile\" : [\"management_ca.pem\"], \"managementTlsCert\" : \"security/certs/management/server.pem\", \"managementTlsPk\" : \"security/keys/management/server.key.pem\", \"managementTlsPkPwd\" : \"security/pass/management/key_pwd.txt\", \"managementTlsCrlPath\" : \"security/management/certs/\", \"managementTlsCrlFiles\" : [\"server_crl.pem\"], \"kmcKsfMaster\" : \"tools/pmt/master/ksfa\", \"kmcKsfStandby\" : \"tools/pmt/standby/ksfb\", \"inferMode\" : \"standard\", \"interCommTLSEnabled\" : true, \"interCommPort\" : 1121, \"interCommTlsCaPath\" : \"security/grpc/ca/\", \"interCommTlsCaFiles\" : [\"ca.pem\"], \"interCommTlsCert\" : \"security/grpc/certs/server.pem\", \"interCommPk\" : \"security/grpc/keys/server.key.pem\", \"interCommPkPwd\" : \"security/grpc/pass/key_pwd.txt\", \"interCommTlsCrlPath\" : \"security/grpc/certs/\", \"interCommTlsCrlFiles\" : [\"server_crl.pem\"], \"openAiSupport\" : \"vllm\" }, \"BackendConfig\" : { \"backendName\" : \"mindieservice_llm_engine\", \"modelInstanceNumber\" : 1, \"npuDeviceIds\" : [[0,1,2,3]], \"tokenizerProcessNumber\" : 8, \"multiNodesInferEnabled\" : false, \"multiNodesInferPort\" : 1120, \"interNodeTLSEnabled\" : true, \"interNodeTlsCaPath\" : \"security/grpc/ca/\", \"interNodeTlsCaFiles\" : [\"ca.pem\"], \"interNodeTlsCert\" : \"security/grpc/certs/server.pem\", \"interNodeTlsPk\" : \"security/grpc/keys/server.key.pem\", \"interNodeTlsPkPwd\" : \"security/grpc/pass/mindie_server_key_pwd.txt\", \"interNodeTlsCrlPath\" : \"security/grpc/certs/\", \"interNodeTlsCrlFiles\" : [\"server_crl.pem\"], \"interNodeKmcKsfMaster\" : \"tools/pmt/master/ksfa\", \"interNodeKmcKsfStandby\" : \"tools/pmt/standby/ksfb\", \"ModelDeployConfig\" : { \"maxSeqLen\" : 32580, \"maxInputTokenLen\" : 30000, \"truncation\" : false, \"ModelConfig\" : [ {  \"modelInstanceType\" : \"Standard\",  \"modelName\" : \"qwen\",  \"modelWeightPath\" : \"/home/models/qwq\",  \"worldSize\" : 4,  \"cpuMemSize\" : 5,  \"npuMemSize\" : -1,  \"backendType\" : \"atb\",  \"trustRemoteCode\" : false } ] }, \"ScheduleConfig\" : { \"templateType\" : \"Standard\", \"templateName\" : \"Standard_LLM\", \"cacheBlockSize\" : 128, \"maxPrefillBatchSize\" : 50, \"maxPrefillTokens\" : 30000, \"prefillTimeMsPerReq\" : 150, \"prefillPolicyType\" : 0, \"decodeTimeMsPerReq\" : 50, \"decodePolicyType\" : 0, \"maxBatchSize\" : 200, \"maxIterTimes\" : 4096, \"maxPreemptCount\" : 0, \"supportSelectBatch\" : false, \"maxQueueDelayMicroseconds\" : 5000 } }}

启动

cd /usr/local/Ascend/mindie/latest/mindie-service/bin
./mindieservice_daemon

看到如下界面就启动成功了!

在华为昇腾服务器Ascend 300I Pro 310P芯片( 310P3)安装QWQ32B大模型以及deepseek蒸馏版!

测试:如果防火墙没关,请放开1025端口!

sudo firewall-cmd --permanent --add-port=1025/tcpsudo firewall-cmd --reload

接口地址:
post:http://192.168.0.202:1025/v1/chat/completions

{\"model\": \"qwen\",  \"messages\": [{\"role\": \"user\", \"content\": \"你是谁\"}], \"max_tokens\": 32768, \"stream\": false  }

在华为昇腾服务器Ascend 300I Pro 310P芯片( 310P3)安装QWQ32B大模型以及deepseek蒸馏版!
显卡使用情况:达到88%
在华为昇腾服务器Ascend 300I Pro 310P芯片( 310P3)安装QWQ32B大模型以及deepseek蒸馏版!
deepseek:
310P 芯片仅支持FP16精度,并不支持BF16或INT8等数据类型,因此需要到模型权重文件中修改config.json:
和上述的操作一致:只需要将下载的模型的config.json中的 dtype改为:float16后保存
在华为昇腾服务器Ascend 300I Pro 310P芯片( 310P3)安装QWQ32B大模型以及deepseek蒸馏版!