主机端 UDF 实现用于分组聚合上下文的接口。更多...

#include <host_udf.hpp>

cudf::groupby_host_udf 的继承图

公有成员函数
虚函数 std::unique_ptr< column >	get_empty_output (rmm::cuda_stream_view stream, rmm::device_async_resource_ref mr) const =0
	当输入值列为空时获取输出。更多...

虚函数 std::unique_ptr< column >	operator() (rmm::cuda_stream_view stream, rmm::device_async_resource_ref mr) const =0
	执行主机端 UDF 的主要分组计算。更多...

继承自 cudf::host_udf_base 的公有成员函数
虚函数	~host_udf_base ()=default
	默认析构函数。

虚函数 std::size_t	do_hash () const
	计算实例的哈希值。更多...

虚函数 bool	is_equal (host_udf_base const &other) const =0
	比较派生类的两个实例是否相等。更多...

虚函数 std::unique_ptr< host_udf_base >	clone () const =0
	克隆实例。更多...

保护成员函数
column_view	get_input_values () const
	访问输入值列。更多...

column_view	get_grouped_values () const
	访问根据输入键分组的输入值，其中每个组内的值保持其原始顺序。更多...

column_view	get_sorted_grouped_values () const
	访问根据输入键分组并在每个组内排序的输入值。更多...

size_type	get_num_groups () const
	访问组的数量（即不同键的数量）。更多...

device_span< size_type const >	get_group_offsets () const
	访问分隔组的偏移量。更多...

device_span< size_type const >	get_group_labels () const
	访问组标签（与组索引相同）。更多...

column_view	compute_aggregation (std::unique_ptr< aggregation > other_agg) const
	计算内置分组聚合并访问其结果。更多...

友元
结构体	groupby::detail::aggregate_result_functor

详细描述

主机端 UDF 实现用于分组聚合上下文的接口。

groupby 的主机端 UDF 实现需要从该类派生。除了实现基类 host_udf_base 中声明的虚函数外，此类派生类还必须定义函数 get_empty_output()，用于在输入为空时返回结果，以及 operator()，用于执行其分组操作。

执行期间，派生类可以通过一组 get* 访问器访问 libcudf 分组框架提供的内部数据，也可以通过 compute_aggregation 函数调用其他内置分组聚合。

注意: 派生类只能执行基于排序的分组聚合。基于哈希的分组聚合需要更复杂的数据结构，目前尚不支持。

示例

struct my_udf_aggregation : cudf::groupby_host_udf {
my_udf_aggregation() = default;
 
[[nodiscard]] std::unique_ptr<column> get_empty_output(
    rmm::cuda_stream_view stream,
    rmm::device_async_resource_ref mr) const override
  {
    // 当输入值列为空时，返回与结果对应的列。
  }
 
[[nodiscard]] std::unique_ptr<column> operator()(
    rmm::cuda_stream_view stream,
    rmm::device_async_resource_ref mr) const override
  {
    // 使用输入数据执行 UDF 计算并返回结果。
  }
 
[[nodiscard]] bool is_equal(host_udf_base const& other) const override
  {
    // 检查 other 对象是否也是此类的实例。
    // 如果存在内部状态变量，可能还需要检查它们是否相等。
    return dynamic_cast<my_udf_aggregation const*>(&other) != nullptr;
  }
 
[[nodiscard]] std::unique_ptr<host_udf_base> clone() const override
  {
    return std::make_unique<my_udf_aggregation>();
  }
};

定义位于 host_udf.hpp 文件的 267 行。

成员函数文档

◆ compute_aggregation()

column_view cudf::groupby_host_udf::compute_aggregation ( std::unique_ptr< aggregation > other_agg ) const

inline保护

计算内置分组聚合并访问其结果。

这允许派生类在相同的输入值列上调用任何其他内置分组聚合，并访问其操作的输出。

参数

other_agg 任意内置分组聚合

返回值: 与给定聚合输出结果对应的 column_view 对象

定义位于 host_udf.hpp 文件的 410 行。

◆ get_empty_output()

virtual std::unique_ptr<column> cudf::groupby_host_udf::get_empty_output	(	rmm::cuda_stream_view	stream,
		rmm::device_async_resource_ref	mr
	)		const

纯虚函数

当输入值列为空时获取输出。

当输入值列为空时，libcudf 会调用此函数。在这种情况下，libcudf 会尝试直接生成输出，而无需不必要地计算中间数据。

参数

stream	用于任何内核启动的 CUDA 流
mr	用于任何分配的设备内存资源

返回值: 当输入值列为空时，聚合的输出结果

◆ get_group_labels()

device_span<size_type const> cudf::groupby_host_udf::get_group_labels ( ) const

inline保护

访问组标签（与组索引相同）。

返回值: 组标签数组。

定义位于 host_udf.hpp 文件的 395 行。

◆ get_group_offsets()

device_span<size_type const> cudf::groupby_host_udf::get_group_offsets ( ) const

inline保护

访问分隔组的偏移量。

返回值: 组偏移量数组。

定义位于 host_udf.hpp 文件的 384 行。

◆ get_grouped_values()

column_view cudf::groupby_host_udf::get_grouped_values ( ) const

inline保护

访问根据输入键分组的输入值，其中每个组内的值保持其原始顺序。

返回值: 分组值列。

定义位于 host_udf.hpp 文件的 350 行。

◆ get_input_values()

column_view cudf::groupby_host_udf::get_input_values ( ) const

inline保护

访问输入值列。

返回值: 输入值列。

定义位于 host_udf.hpp 文件的 338 行。

◆ get_num_groups()

size_type cudf::groupby_host_udf::get_num_groups ( ) const

inline保护

访问组的数量（即不同键的数量）。

返回值: 组的数量。

定义位于 host_udf.hpp 文件的 373 行。

◆ get_sorted_grouped_values()

column_view cudf::groupby_host_udf::get_sorted_grouped_values ( ) const

inline保护

访问根据输入键分组并在每个组内排序的输入值。

返回值: 排序后的分组值列。

定义位于 host_udf.hpp 文件的 362 行。

◆ operator()()

virtual std::unique_ptr<column> cudf::groupby_host_udf::operator()	(	rmm::cuda_stream_view	stream,
		rmm::device_async_resource_ref	mr
	)		const

纯虚函数

执行主机端 UDF 的主要分组计算。

参数

stream	用于任何内核启动的 CUDA 流
mr	用于任何分配的设备内存资源

返回值: 聚合的输出结果

本结构体的文档生成自以下文件

host_udf.hpp

公有成员函数

保护成员函数

友元

详细描述

成员函数文档

◆ compute_aggregation()

◆ get_empty_output()

◆ get_group_labels()

◆ get_group_offsets()

◆ get_grouped_values()

◆ get_input_values()

◆ get_num_groups()

◆ get_sorted_grouped_values()

◆ operator()()