最初的 cuSpatial C++ API (libcuspatial) 设计上依赖于 RAPIDS libcudf 并使用其核心数据类型，特别是 cudf::column。对于不使用 libcudf 或其他 RAPIDS API 的用户来说，依赖 libcudf 可能是采用 libcuspatial 的一大障碍。libcudf 是一个非常大的库，构建它需要大量时间。

因此，cuSpatial 的核心现在实现为一个不依赖 libcudf 的独立 C++ API。这是一个基于迭代器和范围的头文件模板 API。这有许多优点。

使用头文件 API，用户可以只包含和构建他们实际使用的部分。
使用模板化 API，可以灵活支持各种基本数据类型，例如用于位置数据的 float 和 double，以及用于索引的不同整数大小。
通过对迭代器类型进行模板化，可以使用“高级”迭代器将 cuSpatial 算法与输入数据转换融合。例如转换迭代器和计数迭代器。
内存资源只需要作为分配临时中间存储的 API 的一部分。输出存储在 API 外部分配，并作为参数传递输出迭代器。

这种 API 的主要缺点是

头文件 API 会增加依赖它们的代码的编译时间。
一些用户（特别是 cuSpatial Python API）可能更喜欢基于 cuDF 的 API。

好消息是，通过将现有的基于 libcudf 的 C++ API 作为头文件 libcuspatial API 之上的一个层来维护，可以避免列式 API 用户的第 1 个问题和第 2 个问题。

API 示例

以下是 cuspatial::haversine_distance 的基于迭代器的 API 示例。（关于 API 文档的讨论请参见下文。）

template <class LonLatItA,
          class LonLatItB,
          class OutputIt,
          class Location = typename std::iterator_traits<LonLatItA>::value_type,
          class T = typename Location::value_type>
OutputIt haversine_distance(LonLatItA a_lonlat_first,
LonLatItA a_lonlat_last,
LonLatItB b_lonlat_first,
OutputIt distance_first,
T const radius = EARTH_RADIUS_KM,
rmm::cuda_stream_view stream = rmm::cuda_stream_default);

有几个关键点需要注意。

该 API 与 std::transform 等 STL 算法非常相似。
所有数组输入和输出都是迭代器类型模板。
经度/纬度数据使用 cuspatial::vec_2d 类型（include/cuspatial/vec_2d.hpp）作为结构体数组传递。这通过函数体中的 static_assert 来强制执行（稍后讨论）。
Location 类型是一个模板，默认等于输入迭代器的 value_type。
浮点类型是一个模板（T），默认等于 Location 的 value_type。
两个输入范围（A 和 B）的迭代器类型是不同的模板。这对于启用组合高级迭代器（A 和 B 的类型可能不同）至关重要。
示例 API 中输入和输出范围的大小相等，因此只提供了 A 范围的起点和终点（a_lonlat_first 和 a_lonlat_last）。这模仿了 STL API。
此 API 返回一个指向输出中写入的最后一个元素之后的元素的迭代器。这受到 std::transform 的启发，尽管与 transform 一样，haversine_distance 的许多用途不需要此返回的迭代器。
所有运行 CUDA 设备代码（包括 Thrust 算法）或分配内存的 API 都接受一个 CUDA 流，在该流上执行设备代码和分配内存。

文档示例

以下是上述 cuspatial::haversine_distance 的 (Doxygen) 文档。

/**
 * @brief Compute haversine distances between points in set A to the corresponding points in set B.
 *
 * Computes N haversine distances, where N is `std::distance(a_lonlat_first, a_lonlat_last)`.
 * The distance for each `a_lonlat[i]` and `b_lonlat[i]` point pair is assigned to
 * `distance_first[i]`. `distance_first` must be an iterator to output storage allocated for N
 * distances.
 *
 * Computed distances will have the same units as `radius`.
 *
 * https://en.wikipedia.org/wiki/Haversine_formula
 *
 * @param[in]  a_lonlat_first: beginning of range of (longitude, latitude) locations in set A
 * @param[in]  a_lonlat_last: end of range of (longitude, latitude) locations in set A
 * @param[in]  b_lonlat_first: beginning of range of (longitude, latitude) locations in set B
 * @param[out] distance_first: beginning of output range of haversine distances
 * @param[in]  radius: radius of the sphere on which the points reside. default: 6371.0
 *            (approximate radius of Earth in km)
 * @param[in]  stream: The CUDA stream on which to perform computations and allocate memory.
 *
 * @tparam LonLatItA Iterator to input location set A. Must meet the requirements of
 * [LegacyRandomAccessIterator][LinkLRAI] and be device-accessible.
 * @tparam LonLatItB Iterator to input location set B. Must meet the requirements of
 * [LegacyRandomAccessIterator][LinkLRAI] and be device-accessible.
 * @tparam OutputIt Output iterator. Must meet the requirements of
 * [LegacyRandomAccessIterator][LinkLRAI] and be device-accessible.
 * @tparam Location The `value_type` of `LonLatItA` and `LonLatItB`. Must be `cuspatial::vec_2d<T>`.
 * @tparam T The underlying coordinate type. Must be a floating-point type.
 *
 * @pre `a_lonlat_first` may equal `distance_first`, but the range `[a_lonlat_first, a_lonlat_last)`
 * shall not overlap the range `[distance_first, distance_first + (a_lonlat_last - a_lonlat_last))
 * otherwise.
 * @pre `b_lonlat_first` may equal `distance_first`, but the range `[b_lonlat_first, b_lonlat_last)`
 * shall not overlap the range `[distance_first, distance_first + (b_lonlat_last - b_lonlat_last))
 * otherwise.
 * @pre All iterators must have the same `Location` type, with  the same underlying floating-point
 * coordinate type (e.g. `cuspatial::vec_2d<float>`).
 *
 * @return Output iterator to the element past the last distance computed.
 *
 * [LinkLRAI]: https://cppreference.cn/w/cpp/named_req/RandomAccessIterator
 * "LegacyRandomAccessIterator"
 */

关键点

精确简洁地记录了 API 计算的内容，并提供了参考资料。
所有参数和所有模板参数都已记录。
声明了必须实现的 C++ 标准迭代器概念，以及迭代器必须是设备可访问的。
使用 @pre 将要求记录为前置条件。
使用前置条件明确记录允许哪些输入范围重叠。
记录具有单位的任何输入或输出的单位。

cuSpatial 基于 libcudf 的 C++ API（旧版 API）

这是现有的 API，重构后未改变。以下是现有的 cuspatial::haversine_distance

template <class LonLatItA,
          class LonLatItB,
          class OutputIt,
          class T = typename cuspatial::iterator_vec_base_type<LonLatItA>>
OutputIt haversine_distance(LonLatItA a_lonlat_first,
LonLatItA a_lonlat_last,
LonLatItB b_lonlat_first,
OutputIt distance_first,
T const radius = EARTH_RADIUS_KM,
rmm::cuda_stream_view stream = rmm::cuda_stream_default);

关键点

所有输入数据都是 cudf::column_view。这是一个类型擦除容器，因此必须在运行时确定数据类型。
所有输入都是标量数组。经度和纬度是分开的。
输出是一个返回的 unique_ptr<cudf::column>。
输出在函数内部使用传递的内存资源分配。
公共 API 不接受流。有一个接受流的 detail 版本 API。这遵循 libcudf 的做法，将来可能会改变。

文件结构

libcuspatial API 应定义在 cpp/include/cuspatial/ 目录中的头文件中。API 头文件应以 API 命名。示例中，haversine.hpp 定义了 cuspatial::haversine_distance API。

实现也必须在头文件中，但应位于 cuspatial/detail 目录中。实现应从 API 定义文件中包含，放在文件末尾。示例：

... // API 声明在此之上

#include <cuspatial/detail/haversine.hpp>

命名空间

公共 API 在 cuspatial 命名空间中。请注意，头文件 API 和基于 libcudf 的 API 可以存在于同一个命名空间中，因为它们没有歧义（参数非常不同）。

头文件 API 的实现应在 cuspatial::detail 命名空间中。

实现

主要实现应在 detail 头文件中。

头文件 API 实现

由于它是静态类型 API，头文件实现可以比基于 libcudf 的 API 简单得多，后者需要运行时类型分派。对于 haversine_distance，只需几个静态断言和动态期望检查，然后调用带自定义转换 functor 的 thrust::transform。

template <class LonLatItA, class LonLatItB, class OutputIt, class T>
OutputIt haversine_distance(LonLatItA a_lonlat_first,
LonLatItA a_lonlat_last,
LonLatItB b_lonlat_first,
OutputIt distance_first,
T const radius,
rmm::cuda_stream_view stream)
{
  static_assert(is_same<vec_2d<T>,
cuspatial::iterator_value_type<LonLatItA>,
cuspatial::iterator_value_type<LonLatItB>>(),
                "输入必须是 cuspatial::vec_2d"););
  static_assert(
is_same_floating_point<T,
                                   typename cuspatial::iterator_vec_base_type<LonLatItA>,
                                   typename cuspatial::iterator_value_type<OutputIt>>(),
    "所有迭代器类型和半径必须具有相同的浮点坐标值类型。"););
 
  CUSPATIAL_EXPECTS(radius > 0, "半径必须为正。");
 
  return thrust::transform(rmm::exec_policy(stream),
a_lonlat_first,
a_lonlat_last,
b_lonlat_first,
distance_first,
detail::haversine_distance_functor<T>(radius));
}

注意，我们使用 static_assert 断言迭代器输入的类型与文档中期望的相匹配。我们还进行运行时检查，确保半径为正。最后，我们只需调用 thrust::transform，并将 haversine_distance_functor 的实例传递给它，这是一个接受两个 vec_2d<T> 输入并实现 Haversine 距离公式的函数。

基于 libcudf 的 API 实现

重构的实质是将基于 libcudf 的 API 作为头文件 API 的包装器。这主要包括将类型分派 functor 中的业务逻辑实现替换为对头文件 API 的调用。我们还需要将分散的纬度和经度输入转换为 vec_2d<T> 结构体。这可以使用 type_utils.hpp 中提供的 cuspatial::make_vec_2d_iterator 实用程序轻松完成。

因此，要重构基于 libcudf 的 API，我们删除以下代码。

auto input_tuple = thrust::make_tuple(thrust::make_constant_iterator(static_cast<T>(radius)),
a_lon.begin<T>(),
a_lat.begin<T>(),
b_lon.begin<T>(),
b_lat.begin<T>());
 
auto input_iter = thrust::make_zip_iterator(input_tuple);
 
thrust::transform(rmm::exec_policy(stream),
input_iter,
input_iter + result->size(),
result->mutable_view().begin<T>(),
[] __device__(auto inputs) {
                    return calculate_haversine_distance(thrust::get<0>(inputs),
thrust::get<1>(inputs),
thrust::get<2>(inputs),
thrust::get<3>(inputs),
thrust::get<4>(inputs));
                  });

并替换为以下代码。

auto lonlat_a = cuspatial::make_vec_2d_iterator(a_lon.begin<T>(), a_lat.begin<T>());
auto lonlat_b = cuspatial::make_vec_2d_iterator(b_lon.begin<T>(), b_lat.begin<T>());
 
cuspatial::haversine_distance(lonlat_a,
lonlat_a + a_lon.size(),
lonlat_b,
                              static_cast<cudf::mutable_column_view>(*result).begin<T>(),
T{radius},
stream);

测试

现有的基于 libcudf 的 API 测试基本可以保持不变。应该添加新的测试以单独测试头文件 API，以防基于 libcudf 的 API 被移除。

请注意，测试与头文件 API 一样，不应依赖 libcudf 或 libcudf_test。基于 cuDF 的 API 犯了依赖 libcudf_test 的错误，这导致当 libcudf_test 改变时 cuSpatial 有时会中断。