函数
std::unique_ptr< column >	cudf::dictionary::encode (column_view const &column, data_type indices_type=data_type{type_id::INT32}, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::device_async_resource_ref mr=cudf::get_current_device_resource_ref())
	通过对现有列进行字典编码来构造一个字典列。更多详情...

std::unique_ptr< column >	cudf::dictionary::decode (dictionary_column_view const &dictionary_column, rmm::cuda_stream_view stream=cudf::get_default_stream(), rmm::device_async_resource_ref mr=cudf::get_current_device_resource_ref())
	通过使用给定字典列中的索引，从字典列中提取键来创建一个新列。更多详情...

详细描述

函数文档

std::unique_ptr<column> cudf::dictionary::decode	(	dictionary_column_view const &	dictionary_column,
		rmm::cuda_stream_view	stream = `cudf::get_default_stream()`,
		rmm::device_async_resource_ref	mr = `cudf::get_current_device_resource_ref()`
	)

通过使用给定字典列中的索引，从字典列中提取键来创建一个新列。

d1 = {["a", "c", "d"], [2, 0, 1, 0]}
s = decode(d1)
s is now ["d", "a", "c", "a"]

参数

std::unique_ptr<column> cudf::dictionary::encode	(	column_view const &	column,
		data_type	indices_type = `data_type{type_id::INT32}`,
		rmm::cuda_stream_view	stream = `cudf::get_default_stream()`,
		rmm::device_async_resource_ref	mr = `cudf::get_current_device_resource_ref()`
	)

通过对现有列进行字典编码来构造一个字典列。

输出列是 DICTIONARY 类型，其键列包含非空、唯一且按严格全序排列的值。这意味着对于所有 i in [0,n-1) (其中 n 是键的数量)，keys[i] 都排在 keys[i+1] 之前。

输出列有一个子索引列，其类型为整数，大小与输入列相同。

null 掩码和 null 计数从输入列复制到输出列。

异常

cudf::logic_error	如果索引类型不是有符号整数类型
cudf::logic_error	如果待编码的列已经是 DICTIONARY 类型

c = [429, 111, 213, 111, 213, 429, 213]
d = encode(c)
d now has keys [111, 213, 429] and indices [2, 0, 1, 0, 1, 2, 1]

参数