LLaMA-Factory 合并 LoRA 适配器_llama-factory lora merge

技术文档

LLaMA-Factory 合并 LoRA 适配器

flyfish

将LoRA适配器合并到基础模型中的命令

llamafactory-cli export examples/merge_lora/llama3_lora_sft.yaml

llama3_lora_sft.yaml内容

### Note: DO NOT use quantized model or quantization_bit when merging lora adapters### modelmodel_name_or_path: Qwen/Qwen2.5-VL-7B-Instructadapter_name_or_path: saves/qwen2_5vl-7b/lora/sfttemplate: qwen2_vltrust_remote_code: true### exportexport_dir: output/qwen2_5vl_lora_sftexport_size: 5export_device: cpu # choices: [cpu, auto]export_legacy_format: false

一、参数说明

1. [model] 部分

model_name_or_path:
基础模型的路径或名称，例如：Qwen/Qwen2.5-VL-7B-Instruct。
注意：必须使用未量化（非4bit/8bit压缩）的原始模型权重，否则会导致合并失败或精度异常。
adapter_name_or_path:
LoRA适配器的路径，例如：saves/qwen2_5vl-7b/lora/sft。
这是训练得到的低秩参数，会与基础模型的权重合并。
template:
模型使用的模板名称（如 qwen2_vl），主要影响tokenization过程，尤其是多模态输入（如图文混合）的格式兼容性。
trust_remote_code:
设置为 true，信任模型代码仓库中可能包含的自定义脚本或tokenizer，这通常是训练/推理必需的。

2. [export] 部分

export_dir:
合并后的完整模型保存目录，例如：output/qwen2_5vl_lora_sft。
export_size:
模型文件的分片数量，值为 5 表示将模型拆分为 5个文件 储存（如 pytorch_model-00001-of-00005.bin 等）。
作用：避免单个文件过大，降低硬件存储压力，尤其适合大模型（如7B/13B级别）。
export_device:
指定合并使用的设备，可选 cpu 或 auto。
- cpu: 借助CPU内存合并模型（速度较慢但更安全，避免显存不足）。
- auto: 使用GPU（如显卡足够显存）加速合并。
export_legacy_format:
设置为 false 表示使用新版模型格式（如 safetensors），以兼容现代工具链；设置为 true 会使用旧版格式（通常不再建议）。

二、注意事项

不要使用量化模型
- 注意事项中特别提示：合并时必须使用原始未压缩的模型（例如32-bit浮点精度的原始权重），否则可能导致LoRA参数不能正确与基础模型融合。
合并流程概括
1. 加载未量化基础模型（如 Qwen2.5-VL-7B-Instruct）。
2. 加载已训练的LoRA适配器权重。
3. 将LoRA参数与基础模型权重合并，生成一个完整的“新模型”。
4. 按配置将合并后模型保存为多个文件（如5份）。
分片（export_size）的作用
大模型（如7B参数）可能超过单个设备的存储限制，将其拆分为多个文件便于加载和传输。例如：
- 文件名：pytorch_model-00001-of-00005.bin（共5份）。

三、为什么要合并LoRA

LoRA训练的特点：训练仅更新少量参数（低秩矩阵），而主模型权重保持冻结。
合并目的：
- 将LoRA适配器永久融合到基础模型中，形成完整的模型文件（如 merged_model.bin）。
- 合并后可直接部署推理，无需再加载额外的LoRA权重文件。

四、合并命令

llamafactory-cli export examples/merge_lora/qwen2_5vl_lora_sft.yaml

[2025-06-14 10:40:03,136] [INFO] [real_accelerator.py:254:get_accelerator] Setting ds_accelerator to cuda (auto detect)INFO 06-14 10:40:05 [__init__.py:239] Automatically detected platform cuda.[INFO|tokenization_utils_base.py:2058] 2025-06-14 10:40:08,236 >> loading file vocab.json[INFO|tokenization_utils_base.py:2058] 2025-06-14 10:40:08,236 >> loading file merges.txt[INFO|tokenization_utils_base.py:2058] 2025-06-14 10:40:08,236 >> loading file tokenizer.json[INFO|tokenization_utils_base.py:2058] 2025-06-14 10:40:08,236 >> loading file added_tokens.json[INFO|tokenization_utils_base.py:2058] 2025-06-14 10:40:08,236 >> loading file special_tokens_map.json[INFO|tokenization_utils_base.py:2058] 2025-06-14 10:40:08,236 >> loading file tokenizer_config.json[INFO|tokenization_utils_base.py:2058] 2025-06-14 10:40:08,237 >> loading file chat_template.jinja[INFO|tokenization_utils_base.py:2323] 2025-06-14 10:40:08,551 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.[INFO|image_processing_base.py:378] 2025-06-14 10:40:08,552 >> loading configuration file /media/user/model/Qwen/Qwen2___5-VL-7B-Instruct/preprocessor_config.json[INFO|image_processing_base.py:378] 2025-06-14 10:40:08,553 >> loading configuration file /media/user/model/Qwen/Qwen2___5-VL-7B-Instruct/preprocessor_config.json[WARNING|logging.py:328] 2025-06-14 10:40:08,553 >> Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.52, even if the model was saved with a slow processor. This will result in minor differences in outputs. You\'ll still be able to use a slow processor with `use_fast=False`.[INFO|image_processing_base.py:433] 2025-06-14 10:40:08,553 >> Image processor Qwen2VLImageProcessor { \"do_convert_rgb\": true, \"do_normalize\": true, \"do_rescale\": true, \"do_resize\": true, \"image_mean\": [ 0.48145466, 0.4578275, 0.40821073 ], \"image_processor_type\": \"Qwen2VLImageProcessor\", \"image_std\": [ 0.26862954, 0.26130258, 0.27577711 ], \"max_pixels\": 12845056, \"merge_size\": 2, \"min_pixels\": 3136, \"patch_size\": 14, \"processor_class\": \"Qwen2_5_VLProcessor\", \"resample\": 3, \"rescale_factor\": 0.00392156862745098, \"size\": { \"longest_edge\": 12845056, \"shortest_edge\": 3136 }, \"temporal_patch_size\": 2}[INFO|tokenization_utils_base.py:2058] 2025-06-14 10:40:08,554 >> loading file vocab.json[INFO|tokenization_utils_base.py:2058] 2025-06-14 10:40:08,554 >> loading file merges.txt[INFO|tokenization_utils_base.py:2058] 2025-06-14 10:40:08,554 >> loading file tokenizer.json[INFO|tokenization_utils_base.py:2058] 2025-06-14 10:40:08,554 >> loading file added_tokens.json[INFO|tokenization_utils_base.py:2058] 2025-06-14 10:40:08,554 >> loading file special_tokens_map.json[INFO|tokenization_utils_base.py:2058] 2025-06-14 10:40:08,554 >> loading file tokenizer_config.json[INFO|tokenization_utils_base.py:2058] 2025-06-14 10:40:08,554 >> loading file chat_template.jinja[INFO|tokenization_utils_base.py:2323] 2025-06-14 10:40:08,863 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.[INFO|processing_utils.py:884] 2025-06-14 10:40:09,405 >> Processor Qwen2_5_VLProcessor:- image_processor: Qwen2VLImageProcessor { \"do_convert_rgb\": true, \"do_normalize\": true, \"do_rescale\": true, \"do_resize\": true, \"image_mean\": [ 0.48145466, 0.4578275, 0.40821073 ], \"image_processor_type\": \"Qwen2VLImageProcessor\", \"image_std\": [ 0.26862954, 0.26130258, 0.27577711 ], \"max_pixels\": 12845056, \"merge_size\": 2, \"min_pixels\": 3136, \"patch_size\": 14, \"processor_class\": \"Qwen2_5_VLProcessor\", \"resample\": 3, \"rescale_factor\": 0.00392156862745098, \"size\": { \"longest_edge\": 12845056, \"shortest_edge\": 3136 }, \"temporal_patch_size\": 2}- tokenizer: Qwen2TokenizerFast(name_or_path=\'/media/user/model/Qwen/Qwen2___5-VL-7B-Instruct/\', vocab_size=151643, model_max_length=131072, is_fast=True, padding_side=\'right\', truncation_side=\'right\', special_tokens={\'eos_token\': \'\', \'pad_token\': \'\', \'additional_special_tokens\': [\'\', \'\', \'\', \'\', \'\', \'\', \'\', \'\', \'\', \'\', \'\', \'\', \'\']}, clean_up_tokenization_spaces=False, added_tokens_decoder={151643: AddedToken(\"\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),151644: AddedToken(\"\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),151645: AddedToken(\"\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),151646: AddedToken(\"\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),151647: AddedToken(\"\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),151648: AddedToken(\"\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),151649: AddedToken(\"\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),151650: AddedToken(\"\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),151651: AddedToken(\"\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),151652: AddedToken(\"\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),151653: AddedToken(\"\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),151654: AddedToken(\"\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),151655: AddedToken(\"\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),151656: AddedToken(\"\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),151657: AddedToken(\"\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False),151658: AddedToken(\"\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False),151659: AddedToken(\"\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False),151660: AddedToken(\"\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False),151661: AddedToken(\"\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False),151662: AddedToken(\"\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False),151663: AddedToken(\"\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False),151664: AddedToken(\"\", rstrip=False, lstrip=False, single_word=False, normalized=False, special=False),}){ \"processor_class\": \"Qwen2_5_VLProcessor\"}[INFO|configuration_utils.py:691] 2025-06-14 10:40:09,440 >> loading configuration file /media/user/model/Qwen/Qwen2___5-VL-7B-Instruct/config.json[INFO|configuration_utils.py:765] 2025-06-14 10:40:09,442 >> Model config Qwen2_5_VLConfig { \"architectures\": [ \"Qwen2_5_VLForConditionalGeneration\" ], \"attention_dropout\": 0.0, \"bos_token_id\": 151643, \"eos_token_id\": 151645, \"hidden_act\": \"silu\", \"hidden_size\": 3584, \"image_token_id\": 151655, \"initializer_range\": 0.02, \"intermediate_size\": 18944, \"max_position_embeddings\": 128000, \"max_window_layers\": 28, \"model_type\": \"qwen2_5_vl\", \"num_attention_heads\": 28, \"num_hidden_layers\": 28, \"num_key_value_heads\": 4, \"rms_norm_eps\": 1e-06, \"rope_scaling\": { \"mrope_section\": [ 16, 24, 24 ], \"rope_type\": \"default\", \"type\": \"default\" }, \"rope_theta\": 1000000.0, \"sliding_window\": 32768, \"tie_word_embeddings\": false, \"torch_dtype\": \"bfloat16\", \"transformers_version\": \"4.51.3\", \"use_cache\": true, \"use_sliding_window\": false, \"video_token_id\": 151656, \"vision_config\": { \"depth\": 32, \"fullatt_block_indexes\": [ 7, 15, 23, 31 ], \"hidden_act\": \"silu\", \"hidden_size\": 1280, \"in_channels\": 3, \"in_chans\": 3, \"intermediate_size\": 3420, \"model_type\": \"qwen2_5_vl\", \"num_heads\": 16, \"out_hidden_size\": 3584, \"patch_size\": 14, \"spatial_merge_size\": 2, \"spatial_patch_size\": 14, \"temporal_patch_size\": 2, \"tokens_per_second\": 2, \"window_size\": 112 }, \"vision_end_token_id\": 151653, \"vision_start_token_id\": 151652, \"vision_token_id\": 151654, \"vocab_size\": 152064}[INFO|2025-06-14 10:40:09] llamafactory.model.model_utils.kv_cache:143 >> KV cache is enabled for faster generation.[INFO|modeling_utils.py:1121] 2025-06-14 10:40:09,455 >> loading weights file /media/user/model/Qwen/Qwen2___5-VL-7B-Instruct/model.safetensors.index.json[INFO|modeling_utils.py:2167] 2025-06-14 10:40:09,455 >> Instantiating Qwen2_5_VLForConditionalGeneration model under default dtype torch.bfloat16.[INFO|configuration_utils.py:1142] 2025-06-14 10:40:09,457 >> Generate config GenerationConfig { \"bos_token_id\": 151643, \"eos_token_id\": 151645}[INFO|modeling_utils.py:2167] 2025-06-14 10:40:09,457 >> Instantiating Qwen2_5_VisionTransformerPretrainedModel model under default dtype torch.bfloat16.Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:02> All model checkpoint weights were used when initializing Qwen2_5_VLForConditionalGeneration.[INFO|modeling_utils.py:4938] 2025-06-14 10:40:12,225 >> All the weights of Qwen2_5_VLForConditionalGeneration were initialized from the model checkpoint at /media/user/model/Qwen/Qwen2___5-VL-7B-Instruct/.If your task is similar to the task the model of the checkpoint was trained on, you can already use Qwen2_5_VLForConditionalGeneration for predictions without further training.[INFO|configuration_utils.py:1095] 2025-06-14 10:40:12,228 >> loading configuration file /media/user/model/Qwen/Qwen2___5-VL-7B-Instruct/generation_config.json[INFO|configuration_utils.py:1142] 2025-06-14 10:40:12,228 >> Generate config GenerationConfig { \"bos_token_id\": 151643, \"do_sample\": true, \"eos_token_id\": [ 151645, 151643 ], \"pad_token_id\": 151643, \"repetition_penalty\": 1.05, \"temperature\": 0.1, \"top_k\": 1, \"top_p\": 0.001}[INFO|2025-06-14 10:40:12] llamafactory.model.model_utils.attention:143 >> Using torch SDPA for faster training and inference./home/user/anaconda3/envs/llamafactory/lib/python3.12/site-packages/awq/__init__.py:21: DeprecationWarning: I have left this message as the final dev message to help you transition.Important Notice:- AutoAWQ is officially deprecated and will no longer be maintained.- The last tested configuration used Torch 2.6.0 and Transformers 4.51.3.- If future versions of Transformers break AutoAWQ compatibility, please report the issue to the Transformers project.Alternative:- AutoAWQ has been adopted by the vLLM Project: https://github.com/vllm-project/llm-compressorFor further inquiries, feel free to reach out:- X: https://x.com/casper_hansen_- LinkedIn: https://www.linkedin.com/in/casper-hansen-804005170/ warnings.warn(_FINAL_DEV_MESSAGE, category=DeprecationWarning, stacklevel=1)INFO ENV: Auto setting PYTORCH_CUDA_ALLOC_CONF=\'expandable_segments:True\' for memory saving. INFO ENV: Auto setting CUDA_DEVICE_ORDER=PCI_BUS_ID for correctness. [INFO|2025-06-14 10:40:12] llamafactory.model.adapter:143 >> Merged 1 adapter(s).[INFO|2025-06-14 10:40:12] llamafactory.model.adapter:143 >> Loaded adapter(s): saves/qwen2_5vl-7b/lora/sft[INFO|2025-06-14 10:40:12] llamafactory.model.loader:143 >> all params: 8,292,166,656[INFO|2025-06-14 10:40:12] llamafactory.train.tuner:143 >> Convert model dtype to: torch.bfloat16.[INFO|configuration_utils.py:419] 2025-06-14 10:40:12,856 >> Configuration saved in output/qwen2_5vl_lora_sft/config.json[INFO|configuration_utils.py:911] 2025-06-14 10:40:12,857 >> Configuration saved in output/qwen2_5vl_lora_sft/generation_config.json[INFO|modeling_utils.py:3580] 2025-06-14 10:40:33,311 >> The model is bigger than the maximum size per checkpoint (5GB) and is going to be split in 4 checkpoint shards. You can find where each parameters has been saved in the index located at output/qwen2_5vl_lora_sft/model.safetensors.index.json.[INFO|tokenization_utils_base.py:2510] 2025-06-14 10:40:33,312 >> tokenizer config file saved in output/qwen2_5vl_lora_sft/tokenizer_config.json[INFO|tokenization_utils_base.py:2519] 2025-06-14 10:40:33,312 >> Special tokens file saved in output/qwen2_5vl_lora_sft/special_tokens_map.json[INFO|image_processing_base.py:260] 2025-06-14 10:40:33,430 >> Image processor saved in output/qwen2_5vl_lora_sft/preprocessor_config.json[INFO|tokenization_utils_base.py:2510] 2025-06-14 10:40:33,451 >> tokenizer config file saved in output/qwen2_5vl_lora_sft/tokenizer_config.json[INFO|tokenization_utils_base.py:2519] 2025-06-14 10:40:33,451 >> Special tokens file saved in output/qwen2_5vl_lora_sft/special_tokens_map.json[INFO|processing_utils.py:648] 2025-06-14 10:40:34,023 >> chat template saved in output/qwen2_5vl_lora_sft/chat_template.json[INFO|2025-06-14 10:40:34] llamafactory.train.tuner:143 >> Ollama modelfile saved in output/qwen2_5vl_lora_sft/Modelfile

LLaMA-Factory 合并 LoRA 适配器_llama-factory lora merge

LLaMA-Factory 合并 LoRA 适配器

将LoRA适配器合并到基础模型中的命令

llama3_lora_sft.yaml内容

一、参数说明

1. [model] 部分

2. [export] 部分

二、注意事项

三、为什么要合并LoRA

四、合并命令

公告

DeepSeek全套部署资料免费下载

免费可商用字体批量下载

标签

LLaMA-Factory 合并 LoRA 适配器_llama-factory lora merge

LLaMA-Factory 合并 LoRA 适配器

将LoRA适配器合并到基础模型中的命令

llama3_lora_sft.yaml内容

一、参数说明

1. [model] 部分

2. [export] 部分

二、注意事项

三、为什么要合并LoRA

四、合并命令

相关问题

公告

DeepSeek全套部署资料免费下载

免费可商用字体批量下载

标签