聚合滚动#

group 滚动窗口

类型定义

using range_window_type = std::variant<unbounded, current_row, bounded_closed, bounded_open>#: 基于范围的滚动窗口端点的类型。

函数

std::pair<std::unique_ptr<column>, std::unique_ptr<column>> make_range_windows(table_view const &group_keys, column_view const &orderby, order order, null_order null_order, range_window_type preceding, range_window_type following, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = cudf::get_current_device_resource_ref())#

根据窗口范围规范构造前置和后置列。

参数：

group_keys – 定义组的已排序键的可能为空的表。
orderby – 定义窗口范围的列。必须已排序。如果 group_keys 不为空，则必须按组排序。
order – orderby 列的排序顺序。
null_order – 已排序的 orderby 列中的空值排序顺序。如果 group_keys 不为空，则按组应用。
preceding – 前置窗口的类型。
following – 后置窗口的类型。
stream – 用于设备内存操作和内核启动的 CUDA 流
mr – 用于分配返回列的设备内存的设备内存资源

返回：

前置和后置列对，它们定义了每行的窗口边界，适合传递给 rolling_window 函数。

std::unique_ptr<column> rolling_window(column_view const &input, size_type preceding_window, size_type following_window, size_type min_periods, rolling_aggregation const &agg, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = cudf::get_current_device_resource_ref())#

对列中的值应用固定大小的滚动窗口函数。

此函数对输入列中每个元素 i 周围窗口中的值进行聚合，并在观察值不足时使元素 i 的位掩码无效。窗口大小是静态的（对每个元素都相同）。这与 Pandas 的 DataFrame.rolling API 相匹配，但有一些显著差异：

代替中心标志，它使用两部分窗口以允许更灵活的窗口。总窗口大小 = preceding_window + following_window。元素 i 使用元素 [i-preceding_window+1, i+following_window] 进行窗口计算。
此函数不为不满足最小观察值数量的输出行存储 NA/NaN，而是更新列的有效位掩码以指示哪些元素有效。

关于返回列类型的说明

计数聚合的返回列始终具有 INT32 类型。
VARIANCE/STD 聚合的返回列始终具有 FLOAT64 类型。
所有其他运算符返回与输入类型相同的列。因此，建议在进行滚动 MEAN 计算之前，将整数列类型（尤其是低精度整数）转换为 FLOAT32 或 FLOAT64。

注意

输入端点附近的窗口会自动被限制在边界内。

参数：

input – [in] 输入列
preceding_window – [in] 向后方向的静态滚动窗口大小
following_window – [in] 向前方向的静态滚动窗口大小
min_periods – [in] 窗口中需要有值的最小观察值数量，否则元素 i 为 null。
agg – [in] 滚动窗口聚合类型（SUM, MAX, MIN 等）
stream – [in] 用于设备内存操作和内核启动的 CUDA 流
mr – [in] 用于分配返回列的设备内存的设备内存资源

返回：

一个可为空的输出列，包含滚动窗口结果

std::unique_ptr<column> rolling_window(column_view const &input, column_view const &default_outputs, size_type preceding_window, size_type following_window, size_type min_periods, rolling_aggregation const &agg, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = cudf::get_current_device_resource_ref())#

对列中的值应用固定大小的滚动窗口函数。

代替中心标志，它使用两部分窗口以允许更灵活的窗口。总窗口大小 = preceding_window + following_window。元素 i 使用元素 [i-preceding_window+1, i+following_window] 进行窗口计算。
此函数不为不满足最小观察值数量的输出行存储 NA/NaN，而是更新列的有效位掩码以指示哪些元素有效。

关于返回列类型的说明

计数聚合的返回列始终具有 INT32 类型。
VARIANCE/STD 聚合的返回列始终具有 FLOAT64 类型。
所有其他运算符返回与输入类型相同的列。因此，建议在进行滚动 MEAN 计算之前，将整数列类型（尤其是低精度整数）转换为 FLOAT32 或 FLOAT64。

注意

输入端点附近的窗口会自动被限制在边界内。

参数：

input – [in] 输入列
preceding_window – [in] 向后方向的静态滚动窗口大小
following_window – [in] 向前方向的静态滚动窗口大小
min_periods – [in] 窗口中需要有值的最小观察值数量，否则元素 i 为 null。
agg – [in] 滚动窗口聚合类型（SUM, MAX, MIN 等）
stream – [in] 用于设备内存操作和内核启动的 CUDA 流
mr – [in] 用于分配返回列的设备内存的设备内存资源
default_outputs – 每行的默认值列，将作为返回值而非 null。用于 LEAD()/LAG()，如果行偏移量超出列边界。

返回：

一个可为空的输出列，包含滚动窗口结果

std::unique_ptr<column> grouped_rolling_window(table_view const &group_keys, column_view const &input, size_type preceding_window, size_type following_window, size_type min_periods, rolling_aggregation const &aggr, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = cudf::get_current_device_resource_ref())#

对列中的值应用分组感知、固定大小的滚动窗口函数。

类似于 rolling_window()，此函数对指定 input 列中每个元素周围窗口中的值进行聚合。它与 rolling_window() 的不同之处在于，input 列的元素被分组到不同的组中（例如 groupby 的结果）。窗口聚合不能跨越组边界。对于 input 的第 i 行，组由 group_keys 下的列的对应值（即第 i 个值）确定。

注意：此方法要求行已按 group_key 值预排序。

Example: Consider a user-sales dataset, where the rows look as follows:
{ "user_id", sales_amt, day }

The `grouped_rolling_window()` method enables windowing queries such as grouping a dataset by
`user_id`, and summing up the `sales_amt` column over a window of 3 rows (2 preceding (including
current row), 1 row following).

In this example,
   1. `group_keys == [ user_id ]`
   2. `input == sales_amt`
The data are grouped by `user_id`, and ordered by `day`-string. The aggregation
(SUM) is then calculated for a window of 3 values around (and including) each row.

For the following input:

 [ // user,  sales_amt
   { "user1",   10      },
   { "user2",   20      },
   { "user1",   20      },
   { "user1",   10      },
   { "user2",   30      },
   { "user2",   80      },
   { "user1",   50      },
   { "user1",   60      },
   { "user2",   40      }
 ]

Partitioning (grouping) by `user_id` yields the following `sales_amt` vector
(with 2 groups, one for each distinct `user_id`):

   [ 10,  20,  10,  50,  60,  20,  30,  80,  40 ]
     <-------user1-------->|<------user2------->

The SUM aggregation is applied with 1 preceding and 1 following
row, with a minimum of 1 period. The aggregation window is thus 3 rows wide,
yielding the following column:

   [ 30, 40,  80, 120, 110,  50, 130, 150, 120 ]

Note: The SUMs calculated at the group boundaries (i.e. indices 0, 4, 5, and 8)
consider only 2 values each, in spite of the window-size being 3.
Each aggregation operation cannot cross group boundaries.

当 op == COUNT 时，返回列始终具有 INT32 类型。所有其他运算符返回与输入类型相同的列。因此，建议在进行滚动 MEAN 计算之前，将整数列类型（尤其是低精度整数）转换为 FLOAT32 或 FLOAT64。

注意：preceding_window 和 following_window 可能具有负值。这将产生当前行可能完全不包含在内的窗口。例如，考虑一个窗口定义为 (preceding=3, following=-1)。这会产生一个窗口，其范围从当前行 *之前的* 2 行（即 3-1）到当前行 *之前的* 1 行。对于上面的示例，行 #3 的窗口是

[ 10, 20, 10, 50, 60, 20, 30, 80, 40 ] <--窗口--> ^ | current_row

类似地，preceding 可能具有负值，表示窗口在当前行之后的位置开始。它与 following 的语义略有不同，因为 preceding 包含当前行。因此

preceding=1 => 窗口从当前行开始。
preceding=0 => 窗口从当前行之后 1 位开始。
preceding=-1 => 窗口从当前行之后 2 位开始。依此类推。

参数：

group_keys – [in] （预排序的）分组列
input – [in] 输入列（待聚合）
preceding_window – [in] 向后方向的静态滚动窗口大小（正值表示），或向前方向（负值表示）
following_window – [in] 向前方向的静态滚动窗口大小（正值表示），或向后方向（负值表示）
min_periods – [in] 窗口中需要有值的最小观察值数量，否则元素 i 为 null。
aggr – [in] 滚动窗口聚合类型（SUM, MAX, MIN 等）
stream – [in] 用于设备内存操作和内核启动的 CUDA 流
mr – [in] 用于分配返回列的设备内存的设备内存资源

返回：

一个可为空的输出列，包含滚动窗口结果

std::unique_ptr<column> grouped_rolling_window(table_view const &group_keys, column_view const &input, window_bounds preceding_window, window_bounds following_window, size_type min_periods, rolling_aggregation const &aggr, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = cudf::get_current_device_resource_ref())#

对列中的值应用分组感知、固定大小的滚动窗口函数。

注意：此方法要求行已按 group_key 值预排序。

Example: Consider a user-sales dataset, where the rows look as follows:
{ "user_id", sales_amt, day }

The `grouped_rolling_window()` method enables windowing queries such as grouping a dataset by
`user_id`, and summing up the `sales_amt` column over a window of 3 rows (2 preceding (including
current row), 1 row following).

In this example,
   1. `group_keys == [ user_id ]`
   2. `input == sales_amt`
The data are grouped by `user_id`, and ordered by `day`-string. The aggregation
(SUM) is then calculated for a window of 3 values around (and including) each row.

For the following input:

 [ // user,  sales_amt
   { "user1",   10      },
   { "user2",   20      },
   { "user1",   20      },
   { "user1",   10      },
   { "user2",   30      },
   { "user2",   80      },
   { "user1",   50      },
   { "user1",   60      },
   { "user2",   40      }
 ]

Partitioning (grouping) by `user_id` yields the following `sales_amt` vector
(with 2 groups, one for each distinct `user_id`):

   [ 10,  20,  10,  50,  60,  20,  30,  80,  40 ]
     <-------user1-------->|<------user2------->

The SUM aggregation is applied with 1 preceding and 1 following
row, with a minimum of 1 period. The aggregation window is thus 3 rows wide,
yielding the following column:

   [ 30, 40,  80, 120, 110,  50, 130, 150, 120 ]

Note: The SUMs calculated at the group boundaries (i.e. indices 0, 4, 5, and 8)
consider only 2 values each, in spite of the window-size being 3.
Each aggregation operation cannot cross group boundaries.

[ 10, 20, 10, 50, 60, 20, 30, 80, 40 ] <--窗口--> ^ | current_row

类似地，preceding 可能具有负值，表示窗口在当前行之后的位置开始。它与 following 的语义略有不同，因为 preceding 包含当前行。因此

preceding=1 => 窗口从当前行开始。
preceding=0 => 窗口从当前行之后 1 位开始。
preceding=-1 => 窗口从当前行之后 2 位开始。依此类推。

参数：

group_keys – [in] （预排序的）分组列
input – [in] 输入列（待聚合）
preceding_window – [in] 向后方向的静态滚动窗口大小（正值表示），或向前方向（负值表示）
following_window – [in] 向前方向的静态滚动窗口大小（正值表示），或向后方向（负值表示）
min_periods – [in] 窗口中需要有值的最小观察值数量，否则元素 i 为 null。
aggr – [in] 滚动窗口聚合类型（SUM, MAX, MIN 等）
stream – [in] 用于设备内存操作和内核启动的 CUDA 流
mr – [in] 用于分配返回列的设备内存的设备内存资源

返回：

一个可为空的输出列，包含滚动窗口结果

std::unique_ptr<column> grouped_rolling_window(table_view const &group_keys, column_view const &input, column_view const &default_outputs, size_type preceding_window, size_type following_window, size_type min_periods, rolling_aggregation const &aggr, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = cudf::get_current_device_resource_ref())#

对列中的值应用分组感知、固定大小的滚动窗口函数。

注意：此方法要求行已按 group_key 值预排序。

Example: Consider a user-sales dataset, where the rows look as follows:
{ "user_id", sales_amt, day }

The `grouped_rolling_window()` method enables windowing queries such as grouping a dataset by
`user_id`, and summing up the `sales_amt` column over a window of 3 rows (2 preceding (including
current row), 1 row following).

In this example,
   1. `group_keys == [ user_id ]`
   2. `input == sales_amt`
The data are grouped by `user_id`, and ordered by `day`-string. The aggregation
(SUM) is then calculated for a window of 3 values around (and including) each row.

For the following input:

 [ // user,  sales_amt
   { "user1",   10      },
   { "user2",   20      },
   { "user1",   20      },
   { "user1",   10      },
   { "user2",   30      },
   { "user2",   80      },
   { "user1",   50      },
   { "user1",   60      },
   { "user2",   40      }
 ]

Partitioning (grouping) by `user_id` yields the following `sales_amt` vector
(with 2 groups, one for each distinct `user_id`):

   [ 10,  20,  10,  50,  60,  20,  30,  80,  40 ]
     <-------user1-------->|<------user2------->

The SUM aggregation is applied with 1 preceding and 1 following
row, with a minimum of 1 period. The aggregation window is thus 3 rows wide,
yielding the following column:

   [ 30, 40,  80, 120, 110,  50, 130, 150, 120 ]

Note: The SUMs calculated at the group boundaries (i.e. indices 0, 4, 5, and 8)
consider only 2 values each, in spite of the window-size being 3.
Each aggregation operation cannot cross group boundaries.

[ 10, 20, 10, 50, 60, 20, 30, 80, 40 ] <--窗口--> ^ | current_row

类似地，preceding 可能具有负值，表示窗口在当前行之后的位置开始。它与 following 的语义略有不同，因为 preceding 包含当前行。因此

preceding=1 => 窗口从当前行开始。
preceding=0 => 窗口从当前行之后 1 位开始。
preceding=-1 => 窗口从当前行之后 2 位开始。依此类推。

参数：

group_keys – [in] （预排序的）分组列
input – [in] 输入列（待聚合）
preceding_window – [in] 向后方向的静态滚动窗口大小（正值表示），或向前方向（负值表示）
following_window – [in] 向前方向的静态滚动窗口大小（正值表示），或向后方向（负值表示）
min_periods – [in] 窗口中需要有值的最小观察值数量，否则元素 i 为 null。
aggr – [in] 滚动窗口聚合类型（SUM, MAX, MIN 等）
stream – [in] 用于设备内存操作和内核启动的 CUDA 流
mr – [in] 用于分配返回列的设备内存的设备内存资源
default_outputs – 每行的默认值列，将作为返回值而非 null。用于 LEAD()/LAG()，如果行偏移量超出列或组边界。

返回：

一个可为空的输出列，包含滚动窗口结果

std::unique_ptr<column> grouped_rolling_window(table_view const &group_keys, column_view const &input, column_view const &default_outputs, window_bounds preceding_window, window_bounds following_window, size_type min_periods, rolling_aggregation const &aggr, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = cudf::get_current_device_resource_ref())#

对列中的值应用分组感知、固定大小的滚动窗口函数。

注意：此方法要求行已按 group_key 值预排序。

Example: Consider a user-sales dataset, where the rows look as follows:
{ "user_id", sales_amt, day }

The `grouped_rolling_window()` method enables windowing queries such as grouping a dataset by
`user_id`, and summing up the `sales_amt` column over a window of 3 rows (2 preceding (including
current row), 1 row following).

In this example,
   1. `group_keys == [ user_id ]`
   2. `input == sales_amt`
The data are grouped by `user_id`, and ordered by `day`-string. The aggregation
(SUM) is then calculated for a window of 3 values around (and including) each row.

For the following input:

 [ // user,  sales_amt
   { "user1",   10      },
   { "user2",   20      },
   { "user1",   20      },
   { "user1",   10      },
   { "user2",   30      },
   { "user2",   80      },
   { "user1",   50      },
   { "user1",   60      },
   { "user2",   40      }
 ]

Partitioning (grouping) by `user_id` yields the following `sales_amt` vector
(with 2 groups, one for each distinct `user_id`):

   [ 10,  20,  10,  50,  60,  20,  30,  80,  40 ]
     <-------user1-------->|<------user2------->

The SUM aggregation is applied with 1 preceding and 1 following
row, with a minimum of 1 period. The aggregation window is thus 3 rows wide,
yielding the following column:

   [ 30, 40,  80, 120, 110,  50, 130, 150, 120 ]

Note: The SUMs calculated at the group boundaries (i.e. indices 0, 4, 5, and 8)
consider only 2 values each, in spite of the window-size being 3.
Each aggregation operation cannot cross group boundaries.

[ 10, 20, 10, 50, 60, 20, 30, 80, 40 ] <--窗口--> ^ | current_row

类似地，preceding 可能具有负值，表示窗口在当前行之后的位置开始。它与 following 的语义略有不同，因为 preceding 包含当前行。因此

preceding=1 => 窗口从当前行开始。
preceding=0 => 窗口从当前行之后 1 位开始。
preceding=-1 => 窗口从当前行之后 2 位开始。依此类推。

参数：

group_keys – [in] （预排序的）分组列
input – [in] 输入列（待聚合）
preceding_window – [in] 向后方向的静态滚动窗口大小（正值表示），或向前方向（负值表示）
following_window – [in] 向前方向的静态滚动窗口大小（正值表示），或向后方向（负值表示）
min_periods – [in] 窗口中需要有值的最小观察值数量，否则元素 i 为 null。
aggr – [in] 滚动窗口聚合类型（SUM, MAX, MIN 等）
stream – [in] 用于设备内存操作和内核启动的 CUDA 流
mr – [in] 用于分配返回列的设备内存的设备内存资源
default_outputs – 每行的默认值列，将作为返回值而非 null。用于 LEAD()/LAG()，如果行偏移量超出列或组边界。

返回：

一个可为空的输出列，包含滚动窗口结果

std::unique_ptr<column> grouped_range_rolling_window(table_view const &group_keys, column_view const &orderby_column, cudf::order const &order, column_view const &input, range_window_bounds const &preceding, range_window_bounds const &following, size_type min_periods, rolling_aggregation const &aggr, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = cudf::get_current_device_resource_ref())#

对列中的值应用分组感知、基于值范围的滚动窗口函数。

此函数对指定 input 列中每个元素周围窗口中的行进行聚合。窗口基于有序 orderby 列的值，以及代表有序列值包含范围的 preceding 和 following 标量值来确定。

input 列的元素被分组到不同的组中（例如 groupby 的结果），由 group_keys 下的列的对应值确定。窗口聚合不能跨越组边界。
在组内，所有行按 orderby 列排序后，索引为 i 的行的聚合窗口确定如下：a) 如果 orderby 是 ASCENDING，行 i 的聚合窗口包括索引为 j 的所有 input 行，使得
```
(orderby[i] - preceding) <= orderby[j] <= orderby[i] + following
```
b) 如果 orderby 是 DESCENDING，行 i 的聚合窗口包括索引为 j 的所有 input 行，使得
```
(orderby[i] + preceding) >= orderby[j] >= orderby[i] - following
```

注意：此方法要求行已按组键和 orderby 列值预排序。

窗口间隔指定为适合 orderby 列的标量值。目前，仅支持以下 orderby 列类型和范围类型的组合：

如果 orderby 列是 TIMESTAMP 类型，则 preceding / following 窗口以相同分辨率的 DURATION 标量值指定。例如，对于 TIMESTAMP_SECONDS 类型的 orderby 列，间隔只能是 DURATION_SECONDS。不能使用更高分辨率（例如 DURATION_NANOSECONDS）或更低分辨率（例如 DURATION_DAYS）的持续时间。
如果 orderby 列是整数类型（例如 INT32），则 preceding / following 应为完全相同的类型 (INT32)。

Example: Consider a motor-racing statistics dataset, containing the following columns:
  1. driver_name:   (STRING) Name of the car driver
  2. num_overtakes: (INT32)  Number of times the driver overtook another car in a lap
  3. lap_number:    (INT32)  The number of the lap

The `group_range_rolling_window()` function allows one to calculate the total number of overtakes
each driver made within any 3 lap window of each entry:
  1. Group/partition the dataset by `driver_id` (This is the group_keys argument.)
  2. Sort each group by the `lap_number` (i.e. This is the orderby_column.)
  3. Calculate the SUM(num_overtakes) over a window (preceding=1, following=1)

For the following input:

 [ // driver_name,  num_overtakes,  lap_number
   {   "bottas",        1,            1        },
   {   "hamilton",      2,            1        },
   {   "bottas",        2,            2        },
   {   "bottas",        1,            3        },
   {   "hamilton",      3,            1        },
   {   "hamilton",      8,            2        },
   {   "bottas",        5,            7        },
   {   "bottas",        6,            8        },
   {   "hamilton",      4,            4        }
 ]

Partitioning (grouping) by `driver_name`, and ordering by `lap_number` yields the following
`num_overtakes` vector (with 2 groups, one for each distinct `driver_name`):

lap_number:      [ 1,  2,  3,  7,  8,   1,  1,   2,  4 ]
num_overtakes:   [ 1,  2,  1,  5,  6,   2,  3,   8,  4 ]
                   <-----bottas------>|<----hamilton--->

The SUM aggregation is applied, with 1 preceding, and 1 following, with a minimum of 1
period. The aggregation window is thus 3 (laps) wide, yielding the following output column:

 Results:        [ 3,  4,  3,  11, 11,  13, 13,  13,  4 ]

注意：参与每个窗口的行数可能因组内索引、日期戳和 min_periods 而异。例如

results[0] 考虑 2 个值，因为它位于其组的开头，并且没有前置值。
results[5] 考虑 3 个值，尽管它位于其组的开头。根据其 orderby_column 值，它必须包含 2 个后置值。

每个聚合操作不能跨越组边界。

返回列的类型取决于输入列类型 T 和聚合类型

COUNT 返回 INT32 列
MIN/MAX 返回 T 列
SUM 返回 T 的提升类型。对 INT32 求和会产生 INT64。
MEAN 返回 FLOAT64 列
COLLECT 返回 LIST<T> 类型的列。

LEAD/LAG/ROW_NUMBER 对于范围查询是未定义的。

参数：

group_keys – [in] （预排序的）分组列
orderby_column – [in] （预排序的）用于范围比较的排序依据列
order – [in] 排序依据列的排序顺序 (ASCENDING/DESCENDING)
input – [in] 输入列（待聚合）
preceding – [in] 向后方向的间隔值
following – [in] 向前方向的间隔值
min_periods – [in] 窗口中需要有值的最小观察值数量，否则元素 i 为 null。
aggr – [in] 滚动窗口聚合类型（SUM, MAX, MIN 等）
stream – [in] 用于设备内存操作和内核启动的 CUDA 流
mr – [in] 用于分配返回列的设备内存的设备内存资源

返回：

一个可为空的输出列，包含滚动窗口结果

std::unique_ptr<column> rolling_window(column_view const &input, column_view const &preceding_window, column_view const &following_window, size_type min_periods, rolling_aggregation const &agg, rmm::cuda_stream_view stream = cudf::get_default_stream(), rmm::device_async_resource_ref mr = cudf::get_current_device_resource_ref())#

对列中的值应用可变大小的滚动窗口函数。

此函数对输入列中每个元素 i 周围窗口中的值进行聚合，并在观察值不足时使元素 i 的位掩码无效。窗口大小是动态的（随每个元素变化）。这与 Pandas 的 DataFrame.rolling API 相匹配，但有一些显著差异：

代替中心标志，它使用两部分窗口以允许更灵活的窗口。总窗口大小 = preceding_window + following_window。元素 i 使用元素 [i-preceding_window+1, i+following_window] 进行窗口计算。
此函数不为不满足最小观察值数量的输出行存储 NA/NaN，而是更新列的有效位掩码以指示哪些元素有效。
支持动态滚动窗口，即可以使用附加数组指定每个元素的窗口大小。

计数聚合的返回列始终具有 INT32 类型。所有其他运算符返回与输入类型相同的列。因此，建议在进行滚动 MEAN 计算之前，将整数列类型（尤其是低精度整数）转换为 FLOAT32 或 FLOAT64。

注意

preceding_window 和 following_window 中的所有条目必须产生位于 input 边界内的窗口范围。也就是说，对于所有 i，要求由区间 [i - preceding_window[i] + 1, ..., i + following_window[i] + 1) 定义的行集合是 [0, input.size()) 的子集。

抛出异常：

cudf::logic_error – 如果窗口列类型不是 INT32。

参数：

input – [in] 输入列
preceding_window – [in] 一个非空值的 INT32 列，表示向前方向的窗口大小。preceding_window[i] 指定元素 i 的前置窗口大小。
following_window – [in] 一个非空值的 INT32 列，表示向后方向的窗口大小。following_window[i] 指定元素 i 的后置窗口大小。
min_periods – [in] 窗口中需要有值的最小观察值数量，否则元素 i 为 null。
agg – [in] 滚动窗口聚合类型（sum, max, min 等）
stream – [in] 用于设备内存操作和内核启动的 CUDA 流
mr – [in] 用于分配返回列的设备内存的设备内存资源

返回：

一个可为空的输出列，包含滚动窗口结果

struct range_window_bounds#

#include <range_window_bounds.hpp>

用于窗口边界大小的抽象，与 grouped_range_rolling_window() 一起使用。

类似于 grouped_rolling_window() 中的 window_bounds，range_window_bounds 表示与 grouped_range_rolling_window() 一起使用的窗口边界。窗口可以指定为以下之一：

固定宽度的数字标量值。例如：a) DURATION_DAYS 标量，用于 TIMESTAMP_DAYS orderby 列 b) INT32 标量，用于 INT32 orderby 列
“unbounded”，表示边界延伸到组中的第一行/最后一行。
“current row”，表示边界在组中与当前行值匹配的第一行/最后一行结束。

公共类型

enum class extent_type : int32_t#

range_window_bounds 的类型。

值

enumerator CURRENT_ROW#

enumerator BOUNDED#: 边界定义为与当前行匹配的第一行/最后一行。

enumerator UNBOUNDED#

边界延伸到整个组中的第一行/最后一行。

边界定义为在与当前行指定范围内的第一行/最后一行。

公共函数

inline bool is_current_row() const#

窗口是否绑定到当前行。

返回：: true 如果窗口绑定到当前行
返回：: false 如果窗口未绑定到当前行

inline bool is_unbounded() const#

窗口是否是无界的。

返回：: true 如果窗口是无界的
返回：: false 如果窗口是有限边界的

inline scalar const &range_scalar() const#

返回边界的基础标量值。

返回：: 边界的基础标量值

range_window_bounds(range_window_bounds const&) = default#: 复制构造函数。

公共静态函数

static range_window_bounds get(scalar const &boundary, rmm::cuda_stream_view stream = cudf::get_default_stream())#

工厂方法，用于构造有界窗口边界。

参数：

boundary – 有限窗口边界
stream – 用于设备内存操作和内核启动的 CUDA 流

返回：

一个有界窗口边界对象

static range_window_bounds current_row(data_type type, rmm::cuda_stream_view stream = cudf::get_default_stream())#

构造一个窗口边界的工厂方法，该边界限制为当前行值。

参数：

type – 窗口边界的数据类型
stream – 用于设备内存操作和内核启动的 CUDA 流

返回：

一个“当前行”窗口边界对象

static range_window_bounds unbounded(data_type type, rmm::cuda_stream_view stream = cudf::get_default_stream())#

构造一个无界窗口边界的工厂方法。

参数：

type – 窗口边界的数据类型
stream – 用于设备内存操作和内核启动的 CUDA 流

返回：

一个无界窗口边界对象

struct bounded_closed#

#include <rolling.hpp>

有界闭合滚动窗口的强类型包装器。

此窗口的端点包含在内。

参数 delta:: 与当前行的标量差值。必须有效，否则行为未定义。如果标量表示浮点类型，则该值既不能是 inf 也不能是 nan，否则行为未定义。

公共函数

inline cudf::scalar const *delta() const noexcept#

返回行差值标量的指针。

返回：: 指向标量的指针，非空。

公共成员

cudf::scalar const &delta_#: 窗口中与当前行的差值。必须有效，否则行为未定义。

struct bounded_open#

#include <rolling.hpp>

有界开放滚动窗口的强类型包装器。

此窗口的端点排除在外。

参数 delta:: 与当前行的标量差值。必须有效，否则行为未定义。如果标量表示浮点类型，则该值既不能是 inf 也不能是 nan，否则行为未定义。

公共函数

inline cudf::scalar const *delta() const noexcept#

返回行差值标量的指针。

返回：: 指向标量的指针，非空。

公共成员

cudf::scalar const &delta_#: 窗口中与当前行的差值。必须有效，否则行为未定义。类似地，如果差值是浮点类型，则该值既不能是 inf 也不能是 nan，否则行为未定义。

struct unbounded#

#include <rolling.hpp>

无界滚动窗口的强类型包装器。

此窗口运行到当前行组的开始/结束。

公共函数

inline constexpr cudf::scalar const *delta() const noexcept#

返回一个空行差值。

返回：: nullptr

struct current_row#

#include <rolling.hpp>

current_row 滚动窗口的强类型包装器。

此窗口包含与当前行相等的所有行。

公共函数

inline constexpr cudf::scalar const *delta() const noexcept#

返回一个空行差值。

返回：: nullptr

struct window_bounds#

#include <rolling.hpp>

窗口边界大小的抽象。

公共函数

inline bool is_unbounded() const#

此 window_bounds 是否为无界。

返回：: 如果窗口边界无界，则为 true。
返回：: 如果窗口边界具有有限行边界，则为 false。

inline size_type value() const#

获取此 window_bounds 的行边界。

返回：: 行边界值（以天或行为单位）

公共静态函数

static inline window_bounds get(size_type value)#

构造有界窗口边界。

参数：: value – 有限窗口边界（以天或行为单位）
返回：: 一个窗口边界

static inline window_bounds unbounded()#

构造无界窗口边界。

返回：: window_bounds

聚合滚动#

本页