DataFrame#

构造函数#

DataFrame([data, index, columns, dtype, ...])

一个 GPU Dataframe 对象。

属性和底层数据#

轴

`DataFrame.axes`	返回一个表示 DataFrame 轴的列表。
`DataFrame.index`	获取行的标签。
`DataFrame.columns`	返回一个列的元组

`DataFrame.dtypes`	返回此对象中的数据类型 (dtypes)。
`DataFrame.info`([verbose, buf, max_cols, ...])	打印 DataFrame 的简洁摘要。
`DataFrame.select_dtypes`([include, exclude])	根据列的数据类型返回 DataFrame 列的子集。
`DataFrame.values`	返回 DataFrame 的 CuPy 表示。
`DataFrame.ndim`	数据的维度。
`DataFrame.size`	返回底层数据中的元素数量。
`DataFrame.shape`	返回表示 DataFrame 维度的元组。
`DataFrame.memory_usage`([index, deep])	返回对象的内存使用量。
`DataFrame.empty`	指示 DataFrame 或 Series 是否为空。

转换#

`DataFrame.astype`(dtype[, copy, errors])	将对象转换为给定的数据类型 (dtype)。
`DataFrame.convert_dtypes`([infer_objects, ...])	将列转换为最优的可空数据类型 (nullable dtypes)。
`DataFrame.copy`([deep])	创建此对象的索引和数据的副本。

索引，迭代#

`DataFrame.head`([n])	返回前 n 行。
`DataFrame.at`	DataFrame.loc 的别名；为了与 Pandas 兼容而提供。
`DataFrame.iat`	DataFrame.iloc 的别名；为了与 Pandas 兼容而提供。
`DataFrame.loc`	按标签或布尔掩码选择行和列。
`DataFrame.iloc`	按位置选择值。
`DataFrame.insert`(loc, column, value[, ...])	在 loc 指定的索引处向 DataFrame 添加一列。
`DataFrame.__iter__`()
`DataFrame.items`()	迭代列名和 Series 对
`DataFrame.keys`()	获取列。
`DataFrame.iterrows`()	不支持迭代。
`DataFrame.itertuples`([index, name])	不支持迭代。
`DataFrame.pop`(item)	返回一列并从 DataFrame 中删除。
`DataFrame.tail`([n])	返回最后 n 行作为新的 DataFrame 或 Series
`DataFrame.isin`(values)	DataFrame 中的每个元素是否包含在 values 中。
`DataFrame.squeeze`([axis])	将一维轴对象压缩为标量。
`DataFrame.where`(cond[, other, inplace, ...])	替换条件为 False 的值。
`DataFrame.mask`(cond[, other, inplace, axis, ...])	替换条件为 True 的值。
`DataFrame.query`(expr[, local_dict])	使用布尔表达式进行查询，利用 Numba 编译 GPU 内核。

二元运算符函数#

`DataFrame.add`(other[, axis, level, fill_value])	获取 DataFrame 或 Series 与 other 的逐元素相加结果（二元运算符 add）。
`DataFrame.sub`(other[, axis, level, fill_value])	获取 DataFrame 或 Series 与 other 的逐元素相减结果（二元运算符 sub）。
`DataFrame.subtract`(other[, axis, level, ...])	获取 DataFrame 或 Series 与 other 的逐元素相减结果（二元运算符 sub）。
`DataFrame.mul`(other[, axis, level, fill_value])	获取 DataFrame 或 Series 与 other 的逐元素相乘结果（二元运算符 mul）。
`DataFrame.multiply`(other[, axis, level, ...])	获取 DataFrame 或 Series 与 other 的逐元素相乘结果（二元运算符 mul）。
`DataFrame.truediv`(other[, axis, level, ...])	获取 DataFrame 或 Series 与 other 的逐元素浮点相除结果（二元运算符 truediv）。
`DataFrame.div`(other[, axis, level, fill_value])	获取 DataFrame 或 Series 与 other 的逐元素浮点相除结果（二元运算符 truediv）。
`DataFrame.divide`(other[, axis, level, ...])	获取 DataFrame 或 Series 与 other 的逐元素浮点相除结果（二元运算符 truediv）。
`DataFrame.floordiv`(other[, axis, level, ...])	获取 DataFrame 或 Series 与 other 的逐元素整数相除结果（二元运算符 floordiv）。
`DataFrame.mod`(other[, axis, level, fill_value])	获取 DataFrame 或 Series 与 other 的逐元素取模结果（二元运算符 mod）。
`DataFrame.pow`(other[, axis, level, fill_value])	获取 DataFrame 或 Series 与 other 的逐元素指数结果（二元运算符 pow）。
`DataFrame.dot`(other[, reflect])	获取 frame 与 other 的点积（二元运算符 dot）。
`DataFrame.radd`(other[, axis, level, fill_value])	获取 DataFrame 或 Series 与 other 的逐元素相加结果（二元运算符 radd）。
`DataFrame.rsub`(other[, axis, level, fill_value])	获取 DataFrame 或 Series 与 other 的逐元素相减结果（二元运算符 rsub）。
`DataFrame.rmul`(other[, axis, level, fill_value])	获取 DataFrame 或 Series 与 other 的逐元素相乘结果（二元运算符 rmul）。
`DataFrame.rdiv`(other[, axis, level, fill_value])	获取 DataFrame 或 Series 与 other 的逐元素浮点相除结果（二元运算符 rtruediv）。
`DataFrame.rtruediv`(other[, axis, level, ...])	获取 DataFrame 或 Series 与 other 的逐元素浮点相除结果（二元运算符 rtruediv）。
`DataFrame.rfloordiv`(other[, axis, level, ...])	获取 DataFrame 或 Series 与 other 的逐元素整数相除结果（二元运算符 rfloordiv）。
`DataFrame.rmod`(other[, axis, level, fill_value])	获取 DataFrame 或 Series 与 other 的逐元素取模结果（二元运算符 rmod）。
`DataFrame.rpow`(other[, axis, level, fill_value])	获取 DataFrame 或 Series 与 other 的逐元素指数结果（二元运算符 rpow）。
`DataFrame.round`([decimals, how])	四舍五入到可变的小数位数。
`DataFrame.lt`(other[, axis, level, fill_value])	获取 DataFrame 或 Series 与 other 的逐元素小于结果（二元运算符 lt）。
`DataFrame.gt`(other[, axis, level, fill_value])	获取 DataFrame 或 Series 与 other 的逐元素大于结果（二元运算符 gt）。
`DataFrame.le`(other[, axis, level, fill_value])	获取 DataFrame 或 Series 与 other 的逐元素小于或等于结果（二元运算符 le）。
`DataFrame.ge`(other[, axis, level, fill_value])	获取 DataFrame 或 Series 与 other 的逐元素大于或等于结果（二元运算符 ge）。
`DataFrame.ne`(other[, axis, level, fill_value])	获取 DataFrame 或 Series 与 other 的逐元素不等于结果（二元运算符 ne）。
`DataFrame.eq`(other[, axis, level, fill_value])	获取 DataFrame 或 Series 与 other 的逐元素等于结果（二元运算符 eq）。
`DataFrame.product`([axis, skipna, ...])	返回 DataFrame 中值的乘积。

函数应用，GroupBy 和窗口#

`DataFrame.agg`(aggs[, axis])	使用一个或多个操作沿指定轴进行聚合。
`DataFrame.apply`(func[, axis, raw, ...])	沿 DataFrame 的轴应用函数。
`DataFrame.applymap`(func[, na_action])	逐元素地将函数应用于 Dataframe。
`DataFrame.apply_chunks`(func, incols, outcols)	使用用户提供的函数转换用户指定的块。
`DataFrame.apply_rows`(func, incols, outcols, ...)	应用一个行级别的用户定义函数。
`DataFrame.groupby`([by, axis, level, ...])	使用映射器或通过列的 Series 进行分组。
`DataFrame.map`(func[, na_action])	逐元素地将函数应用于 Dataframe。
`DataFrame.pipe`(func, args, *kwargs)	应用 `func(self, args, *kwargs)`。
`DataFrame.rolling`(window[, min_periods, ...])	滚动窗口计算。

计算 / 描述性统计#

`DataFrame.abs`()	返回一个 Series/DataFrame，其中包含每个元素的绝对数值。
`DataFrame.all`([axis, bool_only, skipna])	返回 DataFrame 中是否所有元素都为 True。
`DataFrame.any`([axis, bool_only, skipna])	返回 DataFrame 中是否任何元素为 True。
`DataFrame.clip`([lower, upper, axis, inplace])	在输入阈值处修剪值。
`DataFrame.corr`([method, min_periods, ...])	计算 DataFrame 的相关矩阵。
`DataFrame.count`([axis, numeric_only])	计算每列或每行中 `non-NA` 单元格的数量。
`DataFrame.cov`([min_periods, ddof, numeric_only])	计算 DataFrame 的协方差矩阵。
`DataFrame.cummax`([axis])	返回 IndexedFrame 的累积最大值。
`DataFrame.cummin`([axis])	返回 IndexedFrame 的累积最小值。
`DataFrame.cumprod`([axis])	返回 IndexedFrame 的累积乘积。
`DataFrame.cumsum`([axis])	返回 IndexedFrame 的累积总和。
`DataFrame.describe`([percentiles, include, ...])	生成描述性统计信息。
`DataFrame.diff`([periods, axis])	元素的第一个离散差分。
`DataFrame.eval`(expr[, inplace])	评估描述 DataFrame 列操作的字符串。
`DataFrame.ewm`([com, span, halflife, alpha, ...])	提供指数加权 (EW) 函数。
`DataFrame.kurt`([axis, skipna, numeric_only])	返回样本的 Fisher 无偏峰度。
`DataFrame.kurtosis`([axis, skipna, numeric_only])	返回样本的 Fisher 无偏峰度。
`DataFrame.max`([axis, skipna, numeric_only])	返回 DataFrame 中的最大值。
`DataFrame.mean`([axis, skipna, numeric_only])	返回请求轴的值的平均值。
`DataFrame.median`([axis, skipna, numeric_only])	返回请求轴的值的中位数。
`DataFrame.min`([axis, skipna, numeric_only])	返回 DataFrame 中的最小值。
`DataFrame.mode`([axis, numeric_only, dropna])	获取沿选定轴的每个元素的众数。
`DataFrame.pct_change`([periods, fill_method, ...])	计算 DataFrame 中连续元素之间的百分比变化。
`DataFrame.prod`([axis, skipna, numeric_only, ...])	返回 DataFrame 中值的乘积。
`DataFrame.product`([axis, skipna, ...])	返回 DataFrame 中值的乘积。
`DataFrame.quantile`([q, axis, numeric_only, ...])	返回给定分位数处的值。
`DataFrame.rank`([axis, method, numeric_only, ...])	计算沿轴的数值数据排名（1 到 n）。
`DataFrame.round`([decimals, how])	四舍五入到可变的小数位数。
`DataFrame.scale`()	将值缩放到 float64 中的 [0, 1] 范围
`DataFrame.skew`([axis, skipna, numeric_only])	返回样本的 Fisher-Pearson 无偏偏度。
`DataFrame.sum`([axis, skipna, numeric_only, ...])	返回 DataFrame 中值的总和。
`DataFrame.std`([axis, skipna, ddof, numeric_only])	返回 DataFrame 的样本标准差。
`DataFrame.var`([axis, skipna, ddof, numeric_only])	返回 DataFrame 的无偏方差。
`DataFrame.nunique`([axis, dropna])	计算指定轴中不同元素的数量。
`DataFrame.value_counts`([subset, normalize, ...])	返回一个包含 DataFrame 中唯一行计数的 Series。

重新索引 / 选择 / 标签操作#

`DataFrame.add_prefix`(prefix[, axis])	为标签添加字符串前缀 prefix。
`DataFrame.add_suffix`(suffix[, axis])	为标签添加字符串后缀 suffix。
`DataFrame.drop`([labels, axis, index, ...])	从行或列中删除指定的标签。
`DataFrame.drop_duplicates`([subset, keep, ...])	返回已删除重复行的 DataFrame。
`DataFrame.duplicated`([subset, keep])	返回指示重复行的布尔 Series。
`DataFrame.equals`(other)	测试两个对象是否包含相同的元素。
`DataFrame.first`(offset)	根据日期偏移量选择时间序列数据的初始周期。
`DataFrame.head`([n])	返回前 n 行。
`DataFrame.last`(offset)	根据日期偏移量选择时间序列数据的最终周期。
`DataFrame.reindex`([labels, index, columns, ...])	使 DataFrame 符合新的索引。
`DataFrame.rename`([mapper, index, columns, ...])	更改列和索引标签。
`DataFrame.reset_index`([level, drop, ...])	重置 DataFrame 的索引或其级别。
`DataFrame.sample`([n, frac, replace, ...])	返回对象轴上的项目随机样本。
`DataFrame.searchsorted`(values[, side, ...])	查找应该插入元素以保持顺序的索引
`DataFrame.set_index`(keys[, drop, append, ...])	返回一个带有新索引的新 DataFrame
`DataFrame.repeat`(repeats[, axis])	连续重复元素。
`DataFrame.tail`([n])	返回最后 n 行作为新的 DataFrame 或 Series
`DataFrame.take`(indices[, axis])	返回包含由 indices 指定的行的新帧。
`DataFrame.tile`(count)	将行重复 count 次以形成新帧。
`DataFrame.truncate`([before, after, axis, copy])	截断 Series 或 DataFrame 在某个索引值之前和之后的部分。

缺失数据处理#

`DataFrame.backfill`([value, axis, inplace, limit])	是 `Series.fillna()` 的同义词，使用 `method='bfill'`。
`DataFrame.bfill`([value, axis, inplace, ...])	是 `Series.fillna()` 的同义词，使用 `method='bfill'`。
`DataFrame.dropna`([axis, how, thresh, ...])	从列中删除包含 null 的行（或列）。
`DataFrame.ffill`([value, axis, inplace, ...])	是 `Series.fillna()` 的同义词，使用 `method='ffill'`。
`DataFrame.fillna`([value, method, axis, ...])	使用 `value` 或指定的 `method` 填充 null 值。
`DataFrame.interpolate`([method, axis, limit, ...])	在某些点之间插值数据值。
`DataFrame.isna`()	识别缺失值。
`DataFrame.isnull`()	识别缺失值。
`DataFrame.nans_to_nulls`()	将 nans（如果存在）转换为 nulls
`DataFrame.notna`()	识别非缺失值。
`DataFrame.notnull`()	识别非缺失值。
`DataFrame.pad`([value, axis, inplace, limit])	是 `Series.fillna()` 的同义词，使用 `method='ffill'`。
`DataFrame.replace`([to_replace, value, ...])	将 `to_replace` 中给定的值替换为 `value`。

重塑，排序，转置#

`DataFrame.argsort`([by, axis, kind, order, ...])	返回可以对 Series 值进行排序的整数索引。
`DataFrame.interleave_columns`()	将表的 Series 列交叉合并到单个列中。
`DataFrame.partition_by_hash`(columns, nparts)	根据 columns 中数据的哈希值对 DataFrame 进行分区。
`DataFrame.pivot`(*, columns[, index, values])	返回根据给定索引和列值组织的重塑 DataFrame。
`DataFrame.pivot_table`([values, index, ...])	创建一个电子表格风格的透视表作为 DataFrame。
`DataFrame.scatter_by_map`(map_index[, ...])	分散到 DataFrame 列表中。
`DataFrame.sort_values`(by[, axis, ascending, ...])	沿任一轴按值排序。
`DataFrame.sort_index`([axis, level, ...])	按标签（沿轴）对对象进行排序。
`DataFrame.nlargest`(n, columns[, keep])	返回按 columns 降序排列的前 n 行。
`DataFrame.nsmallest`(n, columns[, keep])	返回按 columns 升序排列的前 n 行。
`DataFrame.swaplevel`([i, j, axis])	交换级别 i 和级别 j。
`DataFrame.stack`([level, dropna, future_stack])	将指定级别从列堆叠到索引
`DataFrame.unstack`([level, fill_value, sort])	透视（必然是分层的）索引标签的一个或多个级别。
`DataFrame.melt`([id_vars, value_vars, ...])	将 DataFrame 从宽格式转换为长格式，可选地保留标识符变量。
`DataFrame.explode`(column[, ignore_index])	将列表状的每个元素转换为一行，复制索引值。
`DataFrame.to_struct`([name])	返回由 DataFrame 列组成的结构体 Series。
`DataFrame.T`	转置索引和列。
`DataFrame.transpose`()	转置索引和列。

组合 / 比较 / 连接 / 合并#

`DataFrame.assign`(**kwargs)	从关键字参数为 DataFrame 分配列。
`DataFrame.join`(other[, on, how, lsuffix, ...])	将列与另一个 DataFrame 按索引或键列连接。
`DataFrame.merge`(right[, how, on, left_on, ...])	通过按列或索引执行数据库风格的连接操作来合并 GPU DataFrame 对象。
`DataFrame.update`(other[, join, overwrite, ...])	使用另一个 DataFrame 中的非 NA 值就地修改 DataFrame。

序列化 / IO / 转换#

`DataFrame.deserialize`(header, frames)	从序列化表示生成对象。
`DataFrame.device_deserialize`(header, frames)	执行设备侧的解序列化任务。
`DataFrame.device_serialize`()	序列化与设备内存相关的数据和元数据。
`DataFrame.from_arrow`(table)	从 PyArrow Table 转换为 DataFrame。
`DataFrame.from_dict`(data[, orient, dtype, ...])	从类似数组或字典的字典构造 DataFrame。
`DataFrame.from_pandas`(dataframe[, nan_as_null])	从 Pandas DataFrame 转换。
`DataFrame.from_records`(data[, index, ...])	将结构化或记录型 ndarray 转换为 DataFrame。
`DataFrame.hash_values`([method, seed])	计算此列中值的哈希值。
`DataFrame.host_deserialize`(header, frames)	执行设备侧的解序列化任务。
`DataFrame.host_serialize`()	序列化与主机内存相关的数据和元数据。
`DataFrame.serialize`()	生成对象的等效可序列化表示。
`DataFrame.to_arrow`([preserve_index])	转换为 PyArrow Table。
`DataFrame.to_dict`([orient, into, index])	将 DataFrame 转换为字典。
`DataFrame.to_dlpack`()	将 cuDF 对象转换为 DLPack 张量。
`DataFrame.to_parquet`(path[, engine, ...])	将 DataFrame 写入 parquet 格式。
`DataFrame.to_csv`([path_or_buf, sep, na_rep, ...])	将 DataFrame 写入 csv 文件格式。
`DataFrame.to_cupy`([dtype, copy, na_value])	将 Frame 转换为 CuPy 数组。
`DataFrame.to_hdf`(path_or_buf, key, *args, ...)	使用 HDFStore 将包含的数据写入 HDF5 文件。
`DataFrame.to_dict`([orient, into, index])	将 DataFrame 转换为字典。
`DataFrame.to_json`([path_or_buf])	将 cuDF 对象转换为 JSON 字符串。
`DataFrame.to_numpy`([dtype, copy, na_value])	将 Frame 转换为 NumPy 数组。
`DataFrame.to_pandas`(*[, nullable, arrow_type])	转换为 Pandas DataFrame。
`DataFrame.to_feather`(path, args, *kwargs)	将 DataFrame 写入 feather 格式。
`DataFrame.to_records`([index, column_dtypes, ...])	转换为 numpy recarray
`DataFrame.to_string`()	转换为字符串
`DataFrame.values`	返回 DataFrame 的 CuPy 表示。
`DataFrame.values_host`	返回数据的 NumPy 表示。
`DataFrame.to_pylibcudf`([copy])	将此 DataFrame 转换为 pylibcudf.Table。
`DataFrame.from_pylibcudf`(table, metadata)	从 pylibcudf.Table 创建 DataFrame。