【faiss】用于高效相似性搜索和聚类的C++库 | 源码详解与编译安装

技术文档

●Faiss

■faiss::Index

■faiss::read_index

■faiss::write_index

■faiss::IndexFlatIP

■faiss::Index::idx_t

■完整代码示例

■总结

●Faiss库源码详解

●Faiss库编译安装

Faiss（Facebook AI Similarity Search）是一个用于高效相似性搜索和聚类的 C++ 库，支持在大规模向量集合中快速查找最近邻（ANN, Approximate Nearest Neighbor）。

●Faiss

■faiss::Index

核心基类：Faiss中最核心的抽象类，表示一个向量索引结构。所有索引类型都继承自faiss::Index；提供统一接口：添加向量、搜索最近邻、保存/加载索引等。

常用方法：

/** * 向索引中添加 n 个维度为 d 的向量。 * * 向量会被隐式地分配标签，标签范围为 ntotal 到 ntotal + n - 1。 * 该函数将输入的向量切分为比 blocksize_add 小的块，并调用 add_core 函数进行添加。 * * @param n 向量的数量 * @param x 输入矩阵，大小为 n * d */virtual void add(idx_t n, const float* x) = 0;

/** * 对索引中的 n 个 d 维向量进行查询。 * * 此函数用于在索引中搜索每个输入向量的 k 个最近邻向量。 * 如果某次查询的结果不足，则结果数组将用 -1 填充。 * * @param n  输入向量的数量 * @param x  要搜索的输入向量数组，大小为 n * d * @param k  需要检索的最近邻向量数量 * @param distances 输出的距离数组，存储每个查询向量与最近邻向量之间的距离，大小为 n*k * @param labels 输出的标签数组，存储最近邻向量的标签，大小为 n*k * @param params 可选的搜索参数，用于调整搜索行为（默认为 nullptr） */virtual void search( idx_t n, const float* x, idx_t k, float* distances, idx_t* labels, const SearchParameters* params = nullptr) const = 0;

■faiss::read_index

用于从磁盘文件(二进制格式)中加载一个已保存的 Faiss 索引。支持跨平台使用（Linux/Windows/macOS）。

函数原型：

faiss::Index* faiss::read_index(const char* fname, int32_t io_flags = 0);

fname: 文件路径；
io_flags: 可选标志，例如IO_FLAG_MMAP（内存映射加载）；

Index* read_index(const char* fname, int io_flags = 0);

Index* read_index(FILE* f, int io_flags = 0);

Index* read_index(IOReader* reader, int io_flags = 0);

使用示例：

faiss::Index* index = faiss::read_index(\"my_index.index\");

■faiss::write_index

将当前索引对象保存到磁盘文件。支持跨平台使用（Linux/Windows/macOS）。

函数原型：

void faiss::write_index(const Index* index, const char* fname);

index: 要保存的索引对象；
fname: 保存的文件路径；

使用示例：

faiss::write_index(index, \"my_index.index\");

■faiss::IndexFlatIP

这是一个精确搜索索引，使用内积（Inner Product）作为相似度度量。适用于向量已经归一化的情况，内积等价于余弦相似度。

暴力搜索：无压缩/量化，完整计算所有向量距离。

精度最高，速度最慢（时间复杂度 O(n)）。

使用示例：

int d = 64; // 向量维度faiss::IndexFlatIP index(d); // 创建一个基于内积的索引，64维向量

特点：不使用近似，精度高；搜索复杂度为 O(N)，适合小数据集，需要 100% 准确率的场景。

■faiss::Index::idx_t

这是 Faiss 中用于表示索引编号的类型定义，通常为int64_t，64 位整数（支持超大索引）。

表示向量 ID（索引），用于add()、search()等函数的参数。

使用示例：

faiss::Index::idx_t n = 1000;

■完整代码示例

test.cpp内容如下：

#include #include #include #include int main() { // 向量维度 int d = 64; // 创建索引 faiss::IndexFlatIP index(d); // 准备训练数据（1000 个向量） size_t nb = 1000; std::vector xb(d * nb); for (size_t i = 0; i < xb.size(); i++) { xb[i] = drand48(); // 随机初始化 } // 添加向量到索引中,添加完成后，索引结构内部会组织这些向量，以便在后续的 search 操作中快速找到最近邻。 // nb：要添加的向量数量（本例中是 1000 个向量）。 // xb.data()：指向存储所有向量数据的内存起始地址。 // xb 是一个 std::vector，其中存储了 nb 个向量，每个向量有 d 个浮点数 index.add(nb, xb.data()); // 保存索引到文件 faiss::write_index(&index, \"index.faiss\"); // 重新加载索引 faiss::Index* loaded_index = faiss::read_index(\"index.faiss\"); // 查询向量（1 个） std::vector xq(d); // 这个循环为 xq 向量填充了 64 个随机浮点数值，构造出一个随机的查询向量。 // 假设 d = 5，运行这段代码后，xq 可能是这样一个向量：xq = [0.123, 0.456, 0.789, 0.012, 0.345] for (int i = 0; i < d; i++) { xq[i] = drand48(); // 随机查询向量 } // 搜索最近邻（top-10） int k = 10; std::vector distances(k); std::vector labels(k); // 已经训练好的索引 loaded_index，现在有一个查询向量 xq // 在这个索引里找出与它最相似的前 k 个向量，并把它们的距离和 ID 分别保存在 distances 和 labels 里 loaded_index->search(1, xq.data(), k, distances.data(), labels.data()); // 输出结果 std::cout << \"Top-\" << k << \" 最近邻 ID 和距离：\" << std::endl; for (int i = 0; i < k; i++) { std::cout << \"ID: \" << labels[i] << \", 距离: \" << distances[i] << std::endl; } // 清理资源 delete loaded_index; return 0;}

编译：

g++ test.cpp -o test -lfaiss -lopenblas -fopenmp

./test

运行结果示例：

■总结

类/函数名

作用

faiss::Index

所有索引的基类，提供统一接口

faiss::read_index

从文件加载索引

faiss::write_index

将索引保存到文件

faiss::IndexFlatIP

精确搜索索引，使用内积（适用于归一化向量）

faiss::Index::idx_t

表示索引编号的类型，通常为int64_t

●Faiss库源码详解

faiss/index_io.h

/* * Copyright (c) Meta Platforms, Inc. and affiliates. * * This source code is licensed under the MIT license found in the * LICENSE file in the root directory of this source tree. */// I/O code for indexes#ifndef FAISS_INDEX_IO_H#define FAISS_INDEX_IO_H#include #include #include #include /** I/O functions can read/write to a filename, a file handle or to an * object that abstracts the medium. * * The read functions return objects that should be deallocated with * delete. All references within these objectes are owned by the * object. */namespace faiss {// 前向声明常用的数据结构struct Index; // 索引基类struct IndexBinary; // 二值索引struct VectorTransform; // 向量变换struct ProductQuantizer; // 乘积量化器struct IOReader; // 输入流抽象struct IOWriter; // 输出流抽象struct InvertedLists; // 倒排列表/// IO标志位：跳过图索引的存储数据const int IO_FLAG_SKIP_STORAGE = 1;// 将Index对象写入文件void write_index(const Index* idx, const char* fname, int io_flags = 0);void write_index(const Index* idx, FILE* f, int io_flags = 0);void write_index(const Index* idx, IOWriter* writer, int io_flags = 0);// 将二值索引写入文件void write_index_binary(const IndexBinary* idx, const char* fname);void write_index_binary(const IndexBinary* idx, FILE* f);void write_index_binary(const IndexBinary* idx, IOWriter* writer);// read_index 支持的标志位，仅部分索引类型支持const int IO_FLAG_READ_ONLY = 2; // 只读模式const int IO_FLAG_ONDISK_SAME_DIR = 4;  // 文件路径相对于索引文件目录const int IO_FLAG_SKIP_IVF_DATA = 8;  // 不加载 IVF 数据到内存，仅保留列表大小const int IO_FLAG_SKIP_PRECOMPUTE_TABLE = 16; // 不预计算查找表const int IO_FLAG_PQ_SKIP_SDC_TABLE = 32; // 不计算 SDC 表，禁用 PQ 相关距离计算const int IO_FLAG_MMAP = IO_FLAG_SKIP_IVF_DATA | 0x646f0000; // 使用内存映射读取数据// 从文件/流中读取Index对象Index* read_index(const char* fname, int io_flags = 0);Index* read_index(FILE* f, int io_flags = 0);Index* read_index(IOReader* reader, int io_flags = 0);// 从文件/流中读取二值索引IndexBinary* read_index_binary(const char* fname, int io_flags = 0);IndexBinary* read_index_binary(FILE* f, int io_flags = 0);IndexBinary* read_index_binary(IOReader* reader, int io_flags = 0);// 写入VectorTransform对象void write_VectorTransform(const VectorTransform* vt, const char* fname);void write_VectorTransform(const VectorTransform* vt, IOWriter* f);// 从文件/流中读取VectorTransform对象VectorTransform* read_VectorTransform(const char* fname);VectorTransform* read_VectorTransform(IOReader* f);// 写入ProductQuantizer对象void write_ProductQuantizer(const ProductQuantizer* pq, const char* fname);void write_ProductQuantizer(const ProductQuantizer* pq, IOWriter* f);// 从文件/流中读取ProductQuantizer对象ProductQuantizer* read_ProductQuantizer(const char* fname);ProductQuantizer* read_ProductQuantizer(IOReader* reader);// 写入倒排列表void write_InvertedLists(const InvertedLists* ils, IOWriter* f);// 从输入流中读取倒排列表InvertedLists* read_InvertedLists(IOReader* reader, int io_flags = 0);} // namespace faiss#endif

faiss/IndexFlat.h

/* * Copyright (c) Meta Platforms, Inc. and affiliates. * * This source code is licensed under the MIT license found in the * LICENSE file in the root directory of this source tree. */// -*- c++ -*-#ifndef INDEX_FLAT_H#define INDEX_FLAT_H#include #include namespace faiss {/** * IndexFlat 是一个存储完整向量并执行穷举搜索的索引结构。 * 它继承自 IndexFlatCodes。 */struct IndexFlat : IndexFlatCodes { /** * 构造函数 * @param d 输入向量的维度 * @param metric 距离度量类型，默认为 L2 */ explicit IndexFlat( idx_t d, ///< 输入向量的维度 MetricType metric = METRIC_L2); /** * 搜索与查询向量最接近的 k 个向量 * @param n 查询向量的数量 * @param x 查询向量，大小为 n * d * @param k 需要返回的最近邻数量 * @param distances 输出的距离，大小为 n * k * @param labels 输出的标签，大小为 n * k * @param params 可选的搜索参数 */ void search( idx_t n, const float* x, idx_t k, float* distances, idx_t* labels, const SearchParameters* params = nullptr) const override; /** * 在指定半径范围内搜索向量 * @param n 查询向量的数量 * @param x 查询向量，大小为 n * d * @param radius 搜索半径 * @param result 输出范围搜索结果 * @param params 可选的搜索参数 */ void range_search( idx_t n, const float* x, float radius, RangeSearchResult* result, const SearchParameters* params = nullptr) const override; /** * 重构指定索引处的向量 * @param key 要重构的向量的索引 * @param recons 输出重构的向量 */ void reconstruct(idx_t key, float* recons) const override; /** * 计算与部分向量的子集之间的距离 * @param x 查询向量，大小为 n * d * @param labels 每个查询向量需要比较的向量索引，大小为 n * k * @param distances 对应的输出距离，大小为 n * k */ void compute_distance_subset( idx_t n, const float* x, idx_t k, float* distances, const idx_t* labels) const; // 获取指向浮点数据的指针 float* get_xb() { return (float*)codes.data(); } const float* get_xb() const { return (const float*)codes.data(); } IndexFlat() {} // 默认构造函数 /** * 获取距离计算器 * @return 返回一个 FlatCodesDistanceComputer 实例 */ FlatCodesDistanceComputer* get_FlatCodesDistanceComputer() const override; /* 独立编解码接口（在这种情况下只是内存拷贝） */ void sa_encode(idx_t n, const float* x, uint8_t* bytes) const override; void sa_decode(idx_t n, const uint8_t* bytes, float* x) const override;};/** * IndexFlatIP 是 IndexFlat 的特例，使用内积（Inner Product）作为距离度量 */struct IndexFlatIP : IndexFlat { explicit IndexFlatIP(idx_t d) : IndexFlat(d, METRIC_INNER_PRODUCT) {} IndexFlatIP() {}};/** * IndexFlatL2 是 IndexFlat 的特例，使用 L2 范数作为距离度量 * 它还支持缓存 L2 范数以加速计算 */struct IndexFlatL2 : IndexFlat { std::vector cached_l2norms; ///< L2 范数缓存 /** * 构造函数 * @param d 输入向量的维度 */ explicit IndexFlatL2(idx_t d) : IndexFlat(d, METRIC_L2) {} IndexFlatL2() {} /** * 获取距离计算器（针对 L2 范数缓存进行优化） * @return 返回一个 FlatCodesDistanceComputer 实例 */ FlatCodesDistanceComputer* get_FlatCodesDistanceComputer() const override; void sync_l2norms(); ///< 计算并缓存 L2 范数 void clear_l2norms(); ///< 清除 L2 范数缓存};/** * IndexFlat1D 是 IndexFlatL2 的优化版本，专门用于 1D 向量 */struct IndexFlat1D : IndexFlatL2 { bool continuous_update = true; ///< 是否连续更新排列？ std::vector perm; ///< 排序后的数据库索引 explicit IndexFlat1D(bool continuous_update = true); ///< 构造函数 /** * 如果未启用连续更新，则在最后一次添加和首次搜索之间调用此函数 */ void update_permutation(); void add(idx_t n, const float* x) override; ///< 添加向量 void reset() override; ///< 重置索引 /** * 搜索最近邻（返回的是 L1 距离，而不是 L2） * @param n 查询向量的数量 * @param x 查询向量，大小为 n * d * @param k 需要返回的最近邻数量 * @param distances 输出的距离，大小为 n * k * @param labels 输出的标签，大小为 n * k * @param params 可选的搜索参数 */ void search( idx_t n, const float* x, idx_t k, float* distances, idx_t* labels, const SearchParameters* params = nullptr) const override;};} // namespace faiss#endif

●Faiss库编译安装

编译安装

安装依赖

#安装依赖sudo apt-get updatesudo apt-get install libomp-dev

# 安装 OpenBLASsudo apt-get install libopenblas-dev

下载faiss源码并编译安装

#下载源码git clone https://github.com/facebookresearch/faiss.gitcd faissmkdir build && cd build

cmake .. -DFAISS_ENABLE_CXX=ON -DFAISS_ENABLE_PYTHON=OFF -DBUILD_SHARED_LIBS=ON -DCMAKE_BUILD_TYPE=Release -DFAISS_ENABLE_GPU=OFF -DOpenMP_CXX_FLAGS=\"-fopenmp\" -DOpenMP_CXX_LIB_NAMES=\"gomp\" -DOpenMP_gomp_LIBRARY=/usr/lib/x86_64-linux-gnu/libgomp.so.1 -DBUILD_TESTING=OFF

make -j$(nproc) && sudo make install

报错：Could NOT find OpenMP_CXX (missing: OpenMP_gomp_LIBRARY) (found version\"4.5\")

解决方法：-DOpenMP_gomp_LIBRARY=/usr/lib/x86_64-linux-gnu/libgomp.so.1

默认安装路径为/usr/local/，头文件位于/usr/local/include/faiss/，库文件位于/usr/local/lib/libfaiss.a。

交叉编译到Android

【Android】交叉编译faiss库 | 问题解决-CSDN博客

至此，本文的内容就结束了。

【faiss】用于高效相似性搜索和聚类的C++库 | 源码详解与编译安装

●Faiss

■faiss::Index

■faiss::read_index

■faiss::write_index

■faiss::IndexFlatIP

■faiss::Index::idx_t

■完整代码示例

■总结

●Faiss库源码详解

●Faiss库编译安装

公告

DeepSeek全套部署资料免费下载

免费可商用字体批量下载

标签

【faiss】用于高效相似性搜索和聚类的C++库 | 源码详解与编译安装

●Faiss

■faiss::Index

■faiss::read_index

■faiss::write_index

■faiss::IndexFlatIP

■faiss::Index::idx_t

■完整代码示例

■总结

●Faiss库源码详解

●Faiss库编译安装

相关问题

公告

DeepSeek全套部署资料免费下载

免费可商用字体批量下载

标签