HNSW#
这是 hnswlib 的一个包装器,用于将 CAGRA 索引加载为不可变的 HNSW 索引。加载的 HNSW 索引仅在 cuVS 中兼容,可以使用包装函数进行搜索。
#include <raft/neighbors/hnsw.h>
索引搜索参数#
-
typedef struct cuvsHnswSearchParams *cuvsHnswSearchParams_t#
- cuvsError_t cuvsHnswSearchParamsCreate(
- cuvsHnswSearchParams_t *params
分配 HNSW 搜索参数,并填充默认值。
- 参数:
params – [in] 要分配的 cuvsHnswSearchParams_t
- 返回:
- cuvsError_t cuvsHnswSearchParamsDestroy(
- cuvsHnswSearchParams_t params
释放 HNSW 搜索参数。
- 参数:
params – [in] 要释放的 cuvsHnswSearchParams_t
- 返回:
-
struct cuvsHnswSearchParams#
- #include <hnsw.h>
索引#
-
typedef cuvsHnswIndex *cuvsHnswIndex_t#
-
cuvsError_t cuvsHnswIndexCreate(cuvsHnswIndex_t *index)#
分配 HNSW 索引。
- 参数:
index – [in] 要分配的 cuvsHnswIndex_t
- 返回:
HnswError_t
-
cuvsError_t cuvsHnswIndexDestroy(cuvsHnswIndex_t index)#
释放 HNSW 索引。
- 参数:
index – [in] 要释放的 cuvsHnswIndex_t
-
struct cuvsHnswIndex#
- #include <hnsw.h>
结构体,用于保存 cuvs::neighbors::Hnsw::index 的地址及其活跃的训练 dtype。
索引扩展参数#
-
typedef struct cuvsHnswExtendParams *cuvsHnswExtendParams_t#
- cuvsError_t cuvsHnswExtendParamsCreate(
- cuvsHnswExtendParams_t *params
分配 HNSW 扩展参数,并填充默认值。
- 参数:
params – [in] 要分配的 cuvsHnswExtendParams_t
- 返回:
- cuvsError_t cuvsHnswExtendParamsDestroy(
- cuvsHnswExtendParams_t params
释放 HNSW 扩展参数。
- 参数:
params – [in] 要释放的 cuvsHnswExtendParams_t
- 返回:
索引扩展#
- cuvsError_t cuvsHnswExtend(
- cuvsResources_t res,
- cuvsHnswExtendParams_t params,
- cuvsHnswExtendParams_t params,
- DLManagedTensor *additional_dataset
cuvsHnswIndex_t index
#include <cuvs/core/c_api.h> #include <cuvs/neighbors/cagra.h> #include <cuvs/neighbors/hnsw.h> // Create cuvsResources_t cuvsResources_t res; cuvsError_t res_create_status = cuvsResourcesCreate(&res); // create an index with `cuvsCagraBuild` // Convert the CAGRA index to an HNSW index cuvsHnswIndex_t hnsw_index; cuvsHnswIndexCreate(&hnsw_index); cuvsHnswIndexParams_t hnsw_params; cuvsHnswIndexParamsCreate(&hnsw_params); cuvsHnswFromCagra(res, hnsw_params, cagra_index, hnsw_index); // Extend the HNSW index with additional vectors DLManagedTensor additional_dataset; cuvsHnswExtendParams_t extend_params; cuvsHnswExtendParamsCreate(&extend_params); cuvsHnswExtend(res, extend_params, additional_dataset, hnsw_index); // de-allocate `hnsw_params`, `hnsw_index`, `extend_params` and `res` cuvsError_t hnsw_params_destroy_status = cuvsHnswIndexParamsDestroy(hnsw_params); cuvsError_t hnsw_index_destroy_status = cuvsHnswIndexDestroy(hnsw_index); cuvsError_t extend_params_destroy_status = cuvsHnswExtendParamsDestroy(extend_params); cuvsError_t res_destroy_status = cuvsResourcesDestroy(res);
- 参数:
将新向量添加到 HNSW 索引 注意:只有当层级为
CPU
时,HNSW 索引才能在从 CAGRA 索引转换后进行扩展。res – [in] cuvsResources_t 不透明 C 句柄
params – [in] 用于扩展 Hnsw 索引的 cuvsHnswExtendParams_t
additional_dataset – [in] DLManagedTensor* 附加数据集,用于扩展索引
- 返回:
index – [inout] 要扩展的 cuvsHnswIndex_t
- 索引加载#
- cuvsResources_t res,
- cuvsError_t cuvsHnswFromCagra(,
- cuvsResources_t res,
- cuvsHnswIndexParams_t params
cuvsCagraIndex_t cagra_index
cuvsHnswIndex_t hnsw_index
将 CAGRA 索引转换为 HNSW 索引。注意:当层级为
#include <cuvs/core/c_api.h> #include <cuvs/neighbors/cagra.h> #include <cuvs/neighbors/hnsw.h> // Create cuvsResources_t cuvsResources_t res; cuvsError_t res_create_status = cuvsResourcesCreate(&res); // create a CAGRA index with `cuvsCagraBuild` // Convert the CAGRA index to an HNSW index cuvsHnswIndex_t hnsw_index; cuvsHnswIndexCreate(&hnsw_index); cuvsHnswIndexParams_t hnsw_params; cuvsHnswIndexParamsCreate(&hnsw_params); cuvsHnswFromCagra(res, hnsw_params, cagra_index, hnsw_index); // de-allocate `hnsw_params`, `hnsw_index` and `res` cuvsError_t hnsw_params_destroy_status = cuvsHnswIndexParamsDestroy(hnsw_params); cuvsError_t hnsw_index_destroy_status = cuvsHnswIndexDestroy(hnsw_index); cuvsError_t res_destroy_status = cuvsResourcesDestroy(res);
- 参数:
将新向量添加到 HNSW 索引 注意:只有当层级为
CPU
时,HNSW 索引才能在从 CAGRA 索引转换后进行扩展。NONE
: 此方法使用文件系统将 CAGRA 索引写入/tmp/<random_number>.bin
,然后将其读取为 hnswlib 索引,最后删除临时文件。返回的索引是不可变的,只能由 cuVS 中的 hnswlib 包装器进行搜索,因为格式与原始 hnswlib 不兼容。CPU
: 返回的索引是可变的,可以使用附加向量进行扩展。序列化后的索引也与原始 hnswlib 库兼容。res – [in] cuvsResources_t 不透明 C 句柄
- 返回:
- params – [in] 用于加载 Hnsw 索引的 cuvsHnswIndexParams_t
- cuvsResources_t res,
- cuvsError_t cuvsHnswFromCagra(,
- cuvsResources_t res,
- cuvsHnswIndexParams_t params,
- cagra_index – [in] 要转换为 HNSW 索引的 cuvsCagraIndex_t
hnsw_index – [out] 用于返回 HNSW 索引的 cuvsHnswIndex_t
- cuvsError_t cuvsHnswFromCagraWithDataset(
- cuvsResources_t res,
- cuvsHnswSearchParams_t params,
- DLManagedTensor *additional_dataset,
- cuvsResources_t res,
- cuvsHnswIndexParams_t params,
- cuvsCagraIndex_t cagra_index
cuvsHnswIndex_t hnsw_index
DLManagedTensor *dataset_tensor
索引搜索#
cuvsError_t cuvsHnswSearch(
#include <cuvs/core/c_api.h> #include <cuvs/neighbors/hnsw.h> // Create cuvsResources_t cuvsResources_t res; cuvsError_t res_create_status = cuvsResourcesCreate(&res); // Assume a populated `DLManagedTensor` type here DLManagedTensor dataset; DLManagedTensor queries; DLManagedTensor neighbors; // Create default search params cuvsHnswSearchParams_t params; cuvsError_t params_create_status = cuvsHnswSearchParamsCreate(¶ms); // Search the `index` built using `cuvsHnswFromCagra` cuvsError_t search_status = cuvsHnswSearch(res, params, index, &queries, &neighbors, &distances); // de-allocate `params` and `res` cuvsError_t params_destroy_status = cuvsHnswSearchParamsDestroy(params); cuvsError_t res_destroy_status = cuvsResourcesDestroy(res);
- 参数:
将新向量添加到 HNSW 索引 注意:只有当层级为
CPU
时,HNSW 索引才能在从 CAGRA 索引转换后进行扩展。cuvsResources_t res
cuvsHnswSearchParams_t params
cuvsHnswIndex_t index
DLManagedTensor *queries
DLManagedTensor *neighbors
DLManagedTensor *distances
使用底层 DLDeviceType
等于 kDLCPU
、kDLCUDAHost
或 kDLCUDAManaged
的 DLManagedTensor
搜索 HNSW 索引。同样重要的是要注意,HNSW 索引必须使用与 queries
相同的数据类型构建,即 index.dtype.code == queries.dl_tensor.dtype.code
支持的输入类型有
queries
: a. kDLDataType.code == kDLFloat
且 kDLDataType.bits = 32
b. kDLDataType.code == kDLInt
且 kDLDataType.bits = 8
c. kDLDataType.code == kDLUInt
且 kDLDataType.bits = 8