列搜索#

组搜索

函数

std::unique_ptr<column> lower_bound(table_view const &haystack, table_view const &needles, std::vector<order> const &column_order, std::vector<null_order> const &null_precedence, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = cudf::get_current_device_resource_ref())#

在已排序的表中查找应插入值以保持顺序的最小索引。

对于 needles 中的每一行，在 haystack 中找到插入该行后仍能保持其排序顺序的第一个索引。

Example:

 Single column:
     idx        0   1   2   3   4
  haystack = { 10, 20, 20, 30, 50 }
  needles  = { 20 }
  result   = {  1 }

 Multi Column:
     idx          0    1    2    3    4
  haystack = {{  10,  20,  20,  20,  20 },
              { 5.0,  .5,  .5,  .7,  .7 },
              {  90,  77,  78,  61,  61 }}
  needles  = {{ 20 },
              { .7 },
              { 61 }}
  result   = {   3 }

参数：

haystack – 包含搜索空间的表
needles – 要查找其在搜索空间中插入位置的值
column_order – 列排序顺序的向量
null_precedence – null_precedence 枚举的向量
stream – 用于设备内存操作和内核启动的 CUDA 流
mr – 用于分配返回列的设备内存的设备内存资源

返回：

一个包含插入点的非可空元素列

std::unique_ptr<column> upper_bound(table_view const &haystack, table_view const &needles, std::vector<order> const &column_order, std::vector<null_order> const &null_precedence, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = cudf::get_current_device_resource_ref())#

在已排序的表中查找应插入值以保持顺序的最大索引。

对于 needles 中的每一行，在 haystack 中找到插入该行后仍能保持其排序顺序的最后一个索引。

Example:

 Single Column:
     idx        0   1   2   3   4
  haystack = { 10, 20, 20, 30, 50 }
  needles  = { 20 }
  result   = {  3 }

 Multi Column:
     idx          0    1    2    3    4
  haystack = {{  10,  20,  20,  20,  20 },
              { 5.0,  .5,  .5,  .7,  .7 },
              {  90,  77,  78,  61,  61 }}
  needles  = {{ 20 },
              { .7 },
              { 61 }}
  result =     { 5 }

参数：

haystack – 包含搜索空间的表
needles – 要查找其在搜索空间中插入位置的值
column_order – 列排序顺序的向量
null_precedence – null_precedence 枚举的向量
stream – 用于设备内存操作和内核启动的 CUDA 流
mr – 用于分配返回列的设备内存的设备内存资源

返回：

一个包含插入点的非可空元素列

bool contains(column_view const &haystack, scalar const &needle, rmm::cuda_stream_view stream = cudf::get_default_stream())#

检查给定的 needle 值是否存在于 haystack 列中。

Single Column:
 idx           0   1   2   3   4
 haystack = { 10, 20, 20, 30, 50 }
 needle   = { 20 }
 result   = true

抛出异常：

cudf::logic_error – 如果 haystack.type() != needle.type()。

参数：

haystack – 包含搜索空间的列
needle – 要检查其在搜索空间中是否存在性的标量值
stream – 用于设备内存操作和内核启动的 CUDA 流

返回：

如果给定的 needle 值存在于 haystack 列中，则为 true

std::unique_ptr<column> contains(column_view const &haystack, column_view const &needles, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = cudf::get_current_device_resource_ref())#

检查给定的 needles 值是否存在于 haystack 列中。

新列的类型将为 BOOL，并且具有与输入 needles 列相同的尺寸和空值掩码。也就是说，needles 列中的任何空行都会导致输出列中也出现空行。

haystack = { 10, 20, 30, 40, 50 }
needles  = { 20, 40, 60, 80 }
result   = { true, true, false, false }

抛出异常：

cudf::logic_error – 如果 haystack.type() != needles.type()

参数：

haystack – 包含搜索空间的列
needles – 要检查其在搜索空间中是否存在性的值列
stream – 用于设备内存操作和内核启动的 CUDA 流
mr – 用于分配返回列的设备内存的设备内存资源

返回：

一个 BOOL 列，指示 needles 中的每个元素是否存在于搜索空间中

列搜索#

本页