Elasticsearch数据导入和导出方法_elasticsearch导出索引

技术文档

ES导入和导出

快照与恢复

快照恢和恢复需要大版本保持一致，例如：ES8.x，ES7.x

插件不能缺失，例如：快照的ES中包含 analysis-ik，那么恢复的ES中也必须包含；否则恢复后会出现节点不可用

快照

在 Elasticsearch（简称 ES）中，快照（Snapshot）和恢复（Restore）是用于备份和恢复数据的重要功能。下面介绍如何通过命令行（主要是使用 curl 命令调用 REST API）进行快照和恢复操作。

创建非ROOT目录

Elasticsearch 绝不能以 root 用户身份运行 ，一般是 elasticsearch 用户。否则可能回报错

mkdir -p /data/es_backupchown -R elasticsearch:elasticsearch /data/es_backupchmod 750 /data/es_backup

编辑配置文件

修改配置文件

Elasticsearch 出于安全考虑，要求你在配置文件中明确指定哪些路径可以作为快照仓库（repository）。这个配置项叫 path.repo，如果没有设置，或者设置的路径与你实际使用的路径不匹配，就会报错如下错误

\"reason\":\"[my_backup] location [/root/elasticsearch/my_backup] doesn\'t match any of the locations specified by path.repo because this setting is empty\"

找到你的 Elasticsearch 配置文件 elasticsearch.yml，添加如下配置：

path.repo: [\"/root/elasticsearch/my_backup\"]

或者你可以指定多个路径：

path.repo: [\"/data/es_backup\", \"/data/es_backup01\"]

重启服务

配置修改后，必须重启 Elasticsearch 服务 使配置生效。

# 以 systemd 管理为例systemctl restart elasticsearch

或

service elasticsearch restart

注册仓库

curl -X PUT \"localhost:9200/_snapshot/my_backup\" -H \'Content-Type: application/json\' -d\'{ \"type\": \"fs\", \"settings\": { \"location\": \"/data/es_backup\", \"compress\": true }}\'

my_backup 是仓库名称
location 是你服务器上的实际路径（需 ES 有写权限）

创建快照

curl -X PUT \"localhost:9200/_snapshot/my_backup/snapshot_1?wait_for_completion=true\"

wait_for_completion=true 表示命令会等待快照完成再返回
snapshot_1：快照名，需要和恢复快照时，保持一致

如果只备份指定索引：

curl -X PUT \"localhost:9200/_snapshot/my_backup/snapshot_1?wait_for_completion=true\" -H \'Content-Type: application/json\' -d\'{ \"indices\": \"index1,index2\"}\'

查看快照状态

curl -X GET \"localhost:9200/_snapshot/my_backup/snapshot_1\"

在 Elasticsearch 中，查看快照状态和进度可以通过 REST API 实现。你可以看到快照是否完成、当前进度、每个分片的状态等信息。

返回示例：

{ \"snapshots\" : [ { \"snapshot\" : \"snapshot_1\", \"repository\" : \"my_backup\", \"state\" : \"SUCCESS\", \"shards_stats\" : { \"initializing\" : 0, \"started\" : 0, \"finalizing\" : 0, \"done\" : 10, \"failed\" : 0, \"total\" : 10 }, \"stats\" : { \"incremental\" : { \"file_count\" : 20, \"size_in_bytes\" : 12345678 }, \"processed\" : { \"file_count\" : 20, \"size_in_bytes\" : 12345678 }, \"start_time_in_millis\" : 1710000000000, \"time_in_millis\" : 12345 } } ]}

state 字段为 SUCCESS 表示快照已完成。
shards_stats 里的 done/total 表示已完成分片/总分片数。
如果快照还在进行中，started、finalizing 字段会有值。

查看所有快照状态

curl -X GET \"localhost:9200/_snapshot/你的仓库名/_all?pretty\"

可以看到所有快照的整体状态（但没有详细进度）。

快照进度百分比

没有直接的百分比字段，但你可以用 done/total 计算进度。例如：

done: 5, total: 10 → 进度 50%
done: 10, total: 10 → 进度 100%

其它相关命令

查看快照任务（如果你用 wait_for_completion=false，可以查任务队列）：
```
curl -X GET \"localhost:9200/_cat/snapshots/my_backup?v\"
```

总结

用 /_snapshot/仓库/快照/_status 查看详细进度
关注 shards_stats 的 done 和 total
计算进度 = done/total × 100%

恢复

创建非ROOT目录

Elasticsearch 绝不能以 root 用户身份运行 ，一般是 elasticsearch 用户。否则可能回报错

mkdir -p /data/es_backupchown -R elasticsearch:elasticsearch /data/es_backupchmod 750 /data/es_backup

编辑配置文件

修改配置文件

Elasticsearch 出于安全考虑，要求你在配置文件中明确指定哪些路径可以作为快照仓库（repository）。这个配置项叫 path.repo，如果没有设置，或者设置的路径与你实际使用的路径不匹配，就会报错如下错误

\"reason\":\"[my_backup] location [/root/elasticsearch/my_backup] doesn\'t match any of the locations specified by path.repo because this setting is empty\"

找到你的 Elasticsearch 配置文件 elasticsearch.yml，添加如下配置：

path.repo: [\"/root/elasticsearch/my_backup\"]

或者你可以指定多个路径：

path.repo: [\"/data/es_backup\", \"/data/es_backup01\"]

重启服务

配置修改后，必须重启 Elasticsearch 服务 使配置生效。

# 以 systemd 管理为例systemctl restart elasticsearch

或

service elasticsearch restart

注册仓库

curl -X PUT \"localhost:9200/_snapshot/my_backup\" -H \'Content-Type: application/json\' -d\'{ \"type\": \"fs\", \"settings\": { \"location\": \"/data/es_backup\", \"compress\": true }}\'

my_backup 是仓库名称
location 是你服务器上的实际路径（需 ES 有写权限）

恢复快照

curl -X POST \"localhost:9200/_snapshot/my_backup/snapshot_1/_restore\"

my_backup 是仓库名称
snapshot_1：创建快照时设置的快照名，需要和创建快照时保持一致

elasticdump命令导入和导出

Elasticdump 是一个用于 Elasticsearch 数据迁移和备份的工具，支持索引数据的导入和导出，可以处理索引的设置、映射、数据等各种组件。

顺序问题：通常应先导入 settings，然后是 mapping，最后是 data

版本兼容性：确保源和目标 Elasticsearch 版本兼容

性能考虑：

增大 --limit 可以提高性能，但会增加内存使用

网络延迟高时，减少批次大小可能更稳定

安全考虑：不要在命令行中直接暴露密码，考虑使用环境变量或配置文件

大索引处理：对于超大索引，考虑分割处理或使用 Elasticsearch 快照功能

安装 Elasticdump

前提条件

已安装 Node.js (建议 v10+)
npm 或 yarn 包管理器

安装方法

全局安装：

npm install elasticdump -g

局部安装：

npm install elasticdump

使用 yarn 安装：

yarn global add elasticdump

验证安装

elasticdump --help

如果看到帮助信息，说明安装成功。

基本命令结构

导出命令格式：

elasticdump \\ --input=http://source.es.com:9200/my_index \\ --output=/path/to/my_index_mapping.json \\ --type=mapping

导入命令格式：

elasticdump \\ --input=/path/to/my_index_mapping.json \\ --output=http://target.es.com:9200/my_index \\ --type=mapping

常用参数说明

参数缩写说明 --input -i 输入源 (ES URL 或文件路径) --output -o 输出目标 (ES URL 或文件路径) --type -t 操作类型: data, mapping, settings, analyzer, alias, template --limit -l 每批处理文档数 (默认: 100) --size -s 每批从ES获取的文档数 (默认: 100) --transform 对文档进行转换的JS函数 --overwrite 覆盖已存在的索引 --delete 导入前删除目标索引 --headers 添加HTTP头 (如认证头) --searchBody 自定义查询体 (JSON格式) --fileSize 分割文件大小 (如 10mb) --retryAttempts 重试次数 (默认: 5) --retryDelay 重试延迟 (毫秒，默认: 5000)

导出操作示例

导出索引设置

elasticdump \\ --input=http://localhost:9200/my_index \\ --output=my_index_settings.json \\ --type=settings

导出索引映射

elasticdump \\ --input=http://localhost:9200/my_index \\ --output=my_index_mapping.json \\ --type=mapping

导出索引数据

elasticdump \\ --input=http://localhost:9200/my_index \\ --output=my_index_data.json \\ --type=data \\ --limit=5000

导出所有文档（使用查询）

elasticdump \\ --input=http://localhost:9200/my_index \\ --output=query_results.json \\ --searchBody=\'{\"query\":{\"range\":{\"timestamp\":{\"gte\":\"now-1d/d\"}}}}\'

导出到多个文件

elasticdump \\ --input=http://localhost:9200/my_index \\ --output=my_index_data_part1.json \\ --fileSize=10mb \\ --type=data

导入操作示例

导入索引设置

elasticdump \\ --input=my_index_settings.json \\ --output=http://new_host:9200/my_index \\ --type=settings

导入索引映射

elasticdump \\ --input=my_index_mapping.json \\ --output=http://new_host:9200/my_index \\ --type=mapping

导入索引数据

elasticdump \\ --input=my_index_data.json \\ --output=http://new_host:9200/my_index \\ --type=data \\ --limit=5000

导入前删除现有索引

elasticdump \\ --input=my_index_data.json \\ --output=http://new_host:9200/my_index \\ --type=data \\ --delete

高级用法

使用认证

elasticdump \\ --input=http://user:pass@localhost:9200/my_index \\ --output=my_index_data.json \\ --type=data

或使用 headers 参数：

elasticdump \\ --input=http://localhost:9200/my_index \\ --output=my_index_data.json \\ --headers=\'{\"Authorization\": \"Basic dXNlcjpwYXNz\"}\'

数据转换

elasticdump \\ --input=http://localhost:9200/my_index \\ --output=transformed_data.json \\ --transform=\'doc._source.new_field = \"value\"\'

多线程导入/导出

# 先导出到多个文件elasticdump --input=http://localhost:9200/my_index --output=my_index_data1.json --fileSize=10mbelasticdump --input=http://localhost:9200/my_index --output=my_index_data2.json --fileSize=10mb# 然后并行导入elasticdump --input=my_index_data1.json --output=http://new_host:9200/my_index &elasticdump --input=my_index_data2.json --output=http://new_host:9200/my_index &wait

导入脚本

ES=http://127.0.0.1:9220ED=/data/es_dataecho \"==============================================================\"index=\"index_name\"echo \"==============================================================\"# settings, analyzer, data, mapping, alias, templateecho \"elasticdump --output=$ES/$tg_index --input=$ED/$index\"elasticdump --input=${ED}/${index}_setting.json --output=${ES}/${index} --type=settings --limit=10000elasticdump --input=${ED}/${index}_analyzer.json --output=${ES}/${index} --type=analyzer --limit=10000elasticdump --input=${ED}/${index}_alias.json --output=${ES}/${index} --type=alias --limit=10000elasticdump --input=${ED}/${index}_template.json --output=${ES}/${index} --type=template --limit=10000elasticdump --input=${ED}/${index}_mapping.json --output=${ES}/${index} --type=mapping --limit=10000elasticdump --input=${ED}/${index}_data.json --output=${ES}/${index} --type=data --limit=10000echo \"success\"

ES: 定义目标 Elasticsearch 集群的地址和端口

ED: 定义包含导出数据的 JSON 文件所在的本地目录路径
index: 指定要导入的索引名称

导出脚本

#!/bin/bashES=http://127.0.0.1:9220ED=/data/es_datadatename=$(date +%Y-%m-%d)index=index_qa_yingshiecho \"elasticdump --input=$ES/$index --output=$ED/$index.json\" elasticdump --input=$ES/$index --output=${ED}/${index}_setting.json --type=settings --limit=10000 elasticdump --input=$ES/$index --output=${ED}/${index}_analyzer.json --type=analyzer --limit=10000 elasticdump --input=$ES/$index --output=${ED}/${index}_alias.json --type=alias --limit=10000 elasticdump --input=$ES/$index --output=${ED}/${index}_template.json --type=template --limit=10000 elasticdump --input=$ES/$index --output=${ED}/${index}_mapping.json --type=mapping --limit=10000 elasticdump --input=$ES/$index --output=${ED}/${index}_data.json --type=data --limit=10000cd $ED#tar -zcvf $index.tar.gz $index.json#find $ED/* -type f -mtime +10 -exec rm {} \\;echo \"success\"

Elasticsearch数据导入和导出方法_elasticsearch导出索引

ES导入和导出

快照与恢复

快照

创建非ROOT目录

编辑配置文件

注册仓库

创建快照

查看快照状态

查看所有快照状态

快照进度百分比

其它相关命令

总结

恢复

创建非ROOT目录

编辑配置文件

注册仓库

恢复快照

elasticdump命令导入和导出

安装 Elasticdump

安装方法

验证安装

基本命令结构

导出命令格式：

导入命令格式：

常用参数说明

导出操作示例

导出索引设置

导出索引映射

导出索引数据

导出所有文档（使用查询）

导出到多个文件

导入操作示例

导入索引设置

导入索引映射

导入索引数据

导入前删除现有索引

高级用法

使用认证

数据转换

多线程导入/导出

导入脚本

导出脚本

相关问题

公告

DeepSeek全套部署资料免费下载

免费可商用字体批量下载

标签