复制 Scatter#

group Scattering

函数

std::unique_ptr<table> scatter(table_view const &source, column_view const &scatter_map, table_view const &target, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = cudf::get_current_device_resource_ref())#

根据散布映射，将源表的行散布到目标表的副本中。

将值从源表就地散布到目标表中，返回一个“目标表”。散布是根据散布映射执行的，使得目标表中 scatter_map[i] 行获取源表的 i 行。目标表的所有其他行等于目标表的相应行。

源表中的列数必须与目标表中的列数匹配，并且它们相应的数据类型必须相同。

如果在散布映射中同一个索引出现多次，则结果是未定义的。

如果 scatter_map 中的任何值超出区间 [-n, n)，其中 n 是 target 表中的行数，则行为是未定义的。

scatter_map 中的负值 i 被解释为 i+n，其中 n 是 target 表中的行数。

抛出:

std::invalid_argument – 如果源表中的列数与目标表中的列数不匹配
std::invalid_argument – 如果源表中的行数与 scatter_map 中的元素数量不匹配
cudf::data_type_error – 如果源列和目标列的数据类型不匹配
std::invalid_argument – 如果 scatter_map 包含 null 值

参数:

source – 包含要散布到目标列中的值的输入列
scatter_map – 一个非 null 的整数索引列，将源表中的行映射到目标表中的行。其大小必须等于或小于源列中的元素数量。
target – 将要散布源表中的值的列集
stream – 用于设备内存操作和内核启动的 CUDA 流
mr – 用于分配返回表的设备内存的设备内存资源

返回:

将值从 source 散布到 target 的结果

std::unique_ptr<table> scatter(std::vector<std::reference_wrapper<scalar const>> const &source, column_view const &indices, table_view const &target, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = cudf::get_current_device_resource_ref())#

根据散布映射，将一行标量值散布到目标表的副本中。

将值从源行就地散布到目标表中，返回一个“目标表”。散布是根据索引执行的，使得目标表中由 indices[i] 指定的行被源行替换。目标表的所有其他行等于目标表的相应行。

source 中的元素数量必须与目标表中的列数匹配，并且它们相应的数据类型必须相同。

如果在散布映射中同一个索引出现多次，则结果是未定义的。

如果 scatter_map 中的任何值超出区间 [-n, n)，其中 n 是 target 表中的行数，则行为是未定义的。

抛出:

std::invalid_argument – 如果标量数量与目标表中的列数不匹配
std::invalid_argument – 如果 indices 包含 null 值
cudf::data_type_error – 如果标量和目标列的数据类型不匹配

参数:

source – 包含要散布到目标列中的值的输入标量
indices – 一个非 null 的整数索引列，指示目标表中要被 source 替换的行。
target – 将要散布源表中的值的列集
stream – 用于设备内存操作和内核启动的 CUDA 流
mr – 用于分配返回表的设备内存的设备内存资源

返回:

将值从 source 散布到 target 的结果

std::unique_ptr<table> boolean_mask_scatter(table_view const &input, table_view const &target, column_view const &boolean_mask, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = cudf::get_current_device_resource_ref())#

将输入表中的行散布到输出表中与布尔掩码中 true 值对应的行。

input 的第 i 行将被写入输出表中 boolean_mask 中第 i 个 true 值的位置。输出中的所有其他行将等于 target 中的相同行。

boolean_mask 中的 true 数量应 <= input 中的行数。如果布尔掩码为 true，则 target 中的相应值将使用相应 input 列中的值进行更新，否则保持不变。

Example:
input: {{1, 5, 6, 8, 9}}
boolean_mask: {true, false, false, false, true, true, false, true, true, false}
target:       {{   2,     2,     3,     4,    4,     7,    7,    7,    8,    10}}

output:       {{   1,     2,     3,     4,    5,     6,    7,    8,    9,    10}}

抛出:

std::invalid_argument – 如果 input.num_columns() != target.num_columns()
cudf::data_type_error – 如果任何第 i 个 input_column 类型 != 第 i 个 target_column 类型
cudf::data_type_error – 如果 boolean_mask.type() != bool
std::invalid_argument – 如果 boolean_mask.size() != target.num_rows()
std::invalid_argument – 如果 boolean_mask 中 true 的数量 > input.num_rows()

参数:

input – 要散布的 table_view (一组密集列)
target – 用于使用来自 input 的散布值进行修改的 table_view
boolean_mask – 用作布尔掩码的 column_view
stream – 用于设备内存操作和内核启动的 CUDA 流
mr – 用于分配返回表的设备内存的设备内存资源

返回:

根据 boolean_mask 将 input 散布到 target 中，返回一个表

std::unique_ptr<table> boolean_mask_scatter(std::vector<std::reference_wrapper<scalar const>> const &input, table_view const &target, column_view const &boolean_mask, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = cudf::get_current_device_resource_ref())#

将标量值散布到输出表中与布尔掩码中 true 值对应的行。

input 中的第 i 个标量将被写入输出表的第 i 列中布尔掩码中每个 true 值的位置。输出中的所有其他行将等于 target 中的相同行。

Example:
input: {11}
boolean_mask: {true, false, false, false, true, true, false, true, true, false}
target:      {{   2,     2,     3,     4,    4,     7,    7,    7,    8,    10}}

output:       {{   11,    2,     3,     4,   11,    11,    7,   11,   11,    10}}

抛出:

std::invalid_argument – 如果 input.size() != target.num_columns()
cudf::data_type_error – 如果任何第 i 个 input_column 类型 != 第 i 个 target_column 类型
cudf::data_type_error – 如果 boolean_mask.type() != bool
std::invalid_argument – 如果 boolean_mask.size() != target.num_rows()

参数:

input – 要散布的标量
target – 用于使用来自 input 的散布值进行修改的 table_view
boolean_mask – 用作布尔掩码的 column_view
stream – 用于设备内存操作和内核启动的 CUDA 流
mr – 用于分配返回表的设备内存的设备内存资源

返回:

根据 boolean_mask 将 input 散布到 target 中，返回一个表

复制 Scatter#

此页