跳到主要内容
Ctrl+K
cudf 25.04.00 documentation - Home cudf 25.04.00 documentation - Home
  • cuDF 用户指南
  • cudf.pandas
  • Polars GPU 引擎
  • pylibcudf 文档
  • libcudf 文档
    • 开发者指南
  • GitHub
  • Twitter
主页
cudf
cucimcudf-javacudfcugraphcumlcuprojcuspatialcuvscuxfilterdask-cudadask-cudfkvikiolibcudflibcumllibcuprojlibcuspatiallibkvikiolibrmmlibucxxraftrapids-cmakerapidsmpfrmm
稳定 (25.04)
夜间构建 (25.06)稳定 (25.04)旧版 (25.02)
  • cuDF 用户指南
  • cudf.pandas
  • Polars GPU 引擎
  • pylibcudf 文档
  • libcudf 文档
  • 开发者指南
  • GitHub
  • Twitter

章节导航

  • API 参考
    • Series
      • cudf.core.series.DatetimeProperties.year
      • cudf.core.series.DatetimeProperties.month
      • cudf.core.series.DatetimeProperties.day
      • cudf.core.series.DatetimeProperties.hour
      • cudf.core.series.DatetimeProperties.minute
      • cudf.core.series.DatetimeProperties.second
      • cudf.core.series.DatetimeProperties.microsecond
      • cudf.core.series.DatetimeProperties.nanosecond
      • cudf.core.series.DatetimeProperties.dayofweek
      • cudf.core.series.DatetimeProperties.weekday
      • cudf.core.series.DatetimeProperties.dayofyear
      • cudf.core.series.DatetimeProperties.day_of_year
      • cudf.core.series.DatetimeProperties.quarter
      • cudf.core.series.DatetimeProperties.is_month_start
      • cudf.core.series.DatetimeProperties.is_month_end
      • cudf.core.series.DatetimeProperties.is_quarter_start
      • cudf.core.series.DatetimeProperties.is_quarter_end
      • cudf.core.series.DatetimeProperties.is_year_start
      • cudf.core.series.DatetimeProperties.is_year_end
      • cudf.core.series.DatetimeProperties.is_leap_year
      • cudf.core.series.DatetimeProperties.days_in_month
      • cudf.core.column.string.StringMethods.byte_count
      • cudf.core.column.string.StringMethods.capitalize
      • cudf.core.column.string.StringMethods.cat
      • cudf.core.column.string.StringMethods.center
      • cudf.core.column.string.StringMethods.character_ngrams
      • cudf.core.column.string.StringMethods.character_tokenize
      • cudf.core.column.string.StringMethods.code_points
      • cudf.core.column.string.StringMethods.contains
      • cudf.core.column.string.StringMethods.count
      • cudf.core.column.string.StringMethods.detokenize
      • cudf.core.column.string.StringMethods.edit_distance
      • cudf.core.column.string.StringMethods.edit_distance_matrix
      • cudf.core.column.string.StringMethods.endswith
      • cudf.core.column.string.StringMethods.extract
      • cudf.core.column.string.StringMethods.filter_alphanum
      • cudf.core.column.string.StringMethods.filter_characters
      • cudf.core.column.string.StringMethods.filter_tokens
      • cudf.core.column.string.StringMethods.find
      • cudf.core.column.string.StringMethods.findall
      • cudf.core.column.string.StringMethods.find_multiple
      • cudf.core.column.string.StringMethods.get
      • cudf.core.column.string.StringMethods.get_json_object
      • cudf.core.column.string.StringMethods.hex_to_int
      • cudf.core.column.string.StringMethods.htoi
      • cudf.core.column.string.StringMethods.index
      • cudf.core.column.string.StringMethods.insert
      • cudf.core.column.string.StringMethods.ip2int
      • cudf.core.column.string.StringMethods.ip_to_int
      • cudf.core.column.string.StringMethods.is_consonant
      • cudf.core.column.string.StringMethods.is_vowel
      • cudf.core.column.string.StringMethods.isalnum
      • cudf.core.column.string.StringMethods.isalpha
      • cudf.core.column.string.StringMethods.isdecimal
      • cudf.core.column.string.StringMethods.isdigit
      • cudf.core.column.string.StringMethods.isempty
      • cudf.core.column.string.StringMethods.isfloat
      • cudf.core.column.string.StringMethods.ishex
      • cudf.core.column.string.StringMethods.isinteger
      • cudf.core.column.string.StringMethods.isipv4
      • cudf.core.column.string.StringMethods.isspace
      • cudf.core.column.string.StringMethods.islower
      • cudf.core.column.string.StringMethods.isnumeric
      • cudf.core.column.string.StringMethods.isupper
      • cudf.core.column.string.StringMethods.istimestamp
      • cudf.core.column.string.StringMethods.istitle
      • cudf.core.column.string.StringMethods.jaccard_index
      • cudf.core.column.string.StringMethods.join
      • cudf.core.column.string.StringMethods.len
      • cudf.core.column.string.StringMethods.like
      • cudf.core.column.string.StringMethods.ljust
      • cudf.core.column.string.StringMethods.lower
      • cudf.core.column.string.StringMethods.lstrip
      • cudf.core.column.string.StringMethods.match
      • cudf.core.column.string.StringMethods.minhash
      • cudf.core.column.string.StringMethods.ngrams
      • cudf.core.column.string.StringMethods.ngrams_tokenize
      • cudf.core.column.string.StringMethods.normalize_characters
      • cudf.core.column.string.StringMethods.normalize_spaces
      • cudf.core.column.string.StringMethods.pad
      • cudf.core.column.string.StringMethods.partition
      • cudf.core.column.string.StringMethods.porter_stemmer_measure
      • cudf.core.column.string.StringMethods.repeat
      • cudf.core.column.string.StringMethods.removeprefix
      • cudf.core.column.string.StringMethods.removesuffix
      • cudf.core.column.string.StringMethods.replace
      • cudf.core.column.string.StringMethods.replace_tokens
      • cudf.core.column.string.StringMethods.replace_with_backrefs
      • cudf.core.column.string.StringMethods.rfind
      • cudf.core.column.string.StringMethods.rindex
      • cudf.core.column.string.StringMethods.rjust
      • cudf.core.column.string.StringMethods.rpartition
      • cudf.core.column.string.StringMethods.rsplit
      • cudf.core.column.string.StringMethods.rstrip
      • cudf.core.column.string.StringMethods.slice
      • cudf.core.column.string.StringMethods.slice_from
      • cudf.core.column.string.StringMethods.slice_replace
      • cudf.core.column.string.StringMethods.split
      • cudf.core.column.string.StringMethods.startswith
      • cudf.core.column.string.StringMethods.strip
      • cudf.core.column.string.StringMethods.swapcase
      • cudf.core.column.string.StringMethods.title
      • cudf.core.column.string.StringMethods.token_count
      • cudf.core.column.string.StringMethods.tokenize
      • cudf.core.column.string.StringMethods.translate
      • cudf.core.column.string.StringMethods.upper
      • cudf.core.column.string.StringMethods.url_decode
      • cudf.core.column.string.StringMethods.url_encode
      • cudf.core.column.string.StringMethods.wrap
      • cudf.core.column.string.StringMethods.zfill
      • cudf.core.column.categorical.CategoricalAccessor.categories
      • cudf.core.column.categorical.CategoricalAccessor.ordered
      • cudf.core.column.categorical.CategoricalAccessor.codes
      • cudf.core.column.categorical.CategoricalAccessor.reorder_categories
      • cudf.core.column.categorical.CategoricalAccessor.add_categories
      • cudf.core.column.categorical.CategoricalAccessor.remove_categories
      • cudf.core.column.categorical.CategoricalAccessor.set_categories
      • cudf.core.column.categorical.CategoricalAccessor.as_ordered
      • cudf.core.column.categorical.CategoricalAccessor.as_unordered
      • cudf.core.column.lists.ListMethods.astype
      • cudf.core.column.lists.ListMethods.concat
      • cudf.core.column.lists.ListMethods.contains
      • cudf.core.column.lists.ListMethods.index
      • cudf.core.column.lists.ListMethods.get
      • cudf.core.column.lists.ListMethods.leaves
      • cudf.core.column.lists.ListMethods.len
      • cudf.core.column.lists.ListMethods.sort_values
      • cudf.core.column.lists.ListMethods.take
      • cudf.core.column.lists.ListMethods.unique
      • cudf.core.column.struct.StructMethods.field
      • cudf.core.column.struct.StructMethods.explode
    • DataFrame
      • cudf.DataFrame.dtypes
      • cudf.DataFrame.info
      • cudf.DataFrame.select_dtypes
      • cudf.DataFrame.values
      • cudf.DataFrame.ndim
      • cudf.DataFrame.size
      • cudf.DataFrame.shape
      • cudf.DataFrame.memory_usage
      • cudf.DataFrame.empty
    • Index 对象
      • cudf.Index.dtype
      • cudf.Index.duplicated
      • cudf.Index.empty
      • cudf.Index.has_duplicates
      • cudf.Index.hasnans
      • cudf.Index.is_monotonic_increasing
      • cudf.Index.is_monotonic_decreasing
      • cudf.Index.is_unique
      • cudf.Index.name
      • cudf.Index.names
      • cudf.Index.ndim
      • cudf.Index.nlevels
      • cudf.Index.shape
      • cudf.Index.size
      • cudf.Index.values
      • cudf.CategoricalIndex.codes
      • cudf.CategoricalIndex.categories
      • cudf.IntervalIndex.from_breaks
      • cudf.IntervalIndex.values
      • cudf.IntervalIndex.get_indexer
      • cudf.IntervalIndex.get_loc
      • cudf.MultiIndex.from_arrays
      • cudf.MultiIndex.from_tuples
      • cudf.MultiIndex.from_product
      • cudf.MultiIndex.from_frame
      • cudf.MultiIndex.from_arrow
      • cudf.DatetimeIndex.year
      • cudf.DatetimeIndex.month
      • cudf.DatetimeIndex.day
      • cudf.DatetimeIndex.hour
      • cudf.DatetimeIndex.minute
      • cudf.DatetimeIndex.second
      • cudf.DatetimeIndex.microsecond
      • cudf.DatetimeIndex.nanosecond
      • cudf.DatetimeIndex.day_of_year
      • cudf.DatetimeIndex.dayofyear
      • cudf.DatetimeIndex.dayofweek
      • cudf.DatetimeIndex.weekday
      • cudf.DatetimeIndex.quarter
      • cudf.DatetimeIndex.is_leap_year
      • cudf.DatetimeIndex.isocalendar
      • cudf.TimedeltaIndex.days
      • cudf.TimedeltaIndex.seconds
      • cudf.TimedeltaIndex.microseconds
      • cudf.TimedeltaIndex.nanoseconds
      • cudf.TimedeltaIndex.components
      • cudf.TimedeltaIndex.inferred_freq
    • GroupBy
      • cudf.Grouper
      • cudf.core.groupby.groupby.DataFrameGroupBy.bfill
      • cudf.core.groupby.groupby.DataFrameGroupBy.corr
      • cudf.core.groupby.groupby.DataFrameGroupBy.count
      • cudf.core.groupby.groupby.DataFrameGroupBy.cumcount
      • cudf.core.groupby.groupby.DataFrameGroupBy.cummax
      • cudf.core.groupby.groupby.DataFrameGroupBy.cummin
      • cudf.core.groupby.groupby.DataFrameGroupBy.cumsum
      • cudf.core.groupby.groupby.DataFrameGroupBy.describe
      • cudf.core.groupby.groupby.DataFrameGroupBy.diff
      • cudf.core.groupby.groupby.DataFrameGroupBy.ffill
      • cudf.core.groupby.groupby.DataFrameGroupBy.fillna
      • cudf.core.groupby.groupby.DataFrameGroupBy.idxmax
      • cudf.core.groupby.groupby.DataFrameGroupBy.idxmin
      • cudf.core.groupby.groupby.DataFrameGroupBy.nunique
      • cudf.core.groupby.groupby.DataFrameGroupBy.quantile
      • cudf.core.groupby.groupby.DataFrameGroupBy.shift
      • cudf.core.groupby.groupby.DataFrameGroupBy.size
    • 通用函数
    • 通用工具
    • 窗口
    • 输入/输出
    • 字符串处理
      • cudf.core.column.string.StringMethods.byte_count
      • cudf.core.column.string.StringMethods.capitalize
      • cudf.core.column.string.StringMethods.cat
      • cudf.core.column.string.StringMethods.center
      • cudf.core.column.string.StringMethods.character_ngrams
      • cudf.core.column.string.StringMethods.character_tokenize
      • cudf.core.column.string.StringMethods.code_points
      • cudf.core.column.string.StringMethods.contains
      • cudf.core.column.string.StringMethods.count
      • cudf.core.column.string.StringMethods.detokenize
      • cudf.core.column.string.StringMethods.edit_distance
      • cudf.core.column.string.StringMethods.edit_distance_matrix
      • cudf.core.column.string.StringMethods.endswith
      • cudf.core.column.string.StringMethods.extract
      • cudf.core.column.string.StringMethods.filter_alphanum
      • cudf.core.column.string.StringMethods.filter_characters
      • cudf.core.column.string.StringMethods.filter_tokens
      • cudf.core.column.string.StringMethods.find
      • cudf.core.column.string.StringMethods.findall
      • cudf.core.column.string.StringMethods.find_multiple
      • cudf.core.column.string.StringMethods.get
      • cudf.core.column.string.StringMethods.get_json_object
      • cudf.core.column.string.StringMethods.hex_to_int
      • cudf.core.column.string.StringMethods.htoi
      • cudf.core.column.string.StringMethods.index
      • cudf.core.column.string.StringMethods.insert
      • cudf.core.column.string.StringMethods.ip2int
      • cudf.core.column.string.StringMethods.ip_to_int
      • cudf.core.column.string.StringMethods.is_consonant
      • cudf.core.column.string.StringMethods.is_vowel
      • cudf.core.column.string.StringMethods.isalnum
      • cudf.core.column.string.StringMethods.isalpha
      • cudf.core.column.string.StringMethods.isdecimal
      • cudf.core.column.string.StringMethods.isdigit
      • cudf.core.column.string.StringMethods.isempty
      • cudf.core.column.string.StringMethods.isfloat
      • cudf.core.column.string.StringMethods.ishex
      • cudf.core.column.string.StringMethods.isinteger
      • cudf.core.column.string.StringMethods.isipv4
      • cudf.core.column.string.StringMethods.isspace
      • cudf.core.column.string.StringMethods.islower
      • cudf.core.column.string.StringMethods.isnumeric
      • cudf.core.column.string.StringMethods.isupper
      • cudf.core.column.string.StringMethods.istimestamp
      • cudf.core.column.string.StringMethods.istitle
      • cudf.core.column.string.StringMethods.jaccard_index
      • cudf.core.column.string.StringMethods.join
      • cudf.core.column.string.StringMethods.len
      • cudf.core.column.string.StringMethods.like
      • cudf.core.column.string.StringMethods.ljust
      • cudf.core.column.string.StringMethods.lower
      • cudf.core.column.string.StringMethods.lstrip
      • cudf.core.column.string.StringMethods.match
      • cudf.core.column.string.StringMethods.minhash
      • cudf.core.column.string.StringMethods.ngrams
      • cudf.core.column.string.StringMethods.ngrams_tokenize
      • cudf.core.column.string.StringMethods.normalize_characters
      • cudf.core.column.string.StringMethods.normalize_spaces
      • cudf.core.column.string.StringMethods.pad
      • cudf.core.column.string.StringMethods.partition
      • cudf.core.column.string.StringMethods.porter_stemmer_measure
      • cudf.core.column.string.StringMethods.repeat
      • cudf.core.column.string.StringMethods.removeprefix
      • cudf.core.column.string.StringMethods.removesuffix
      • cudf.core.column.string.StringMethods.replace
      • cudf.core.column.string.StringMethods.replace_tokens
      • cudf.core.column.string.StringMethods.replace_with_backrefs
      • cudf.core.column.string.StringMethods.rfind
      • cudf.core.column.string.StringMethods.rindex
      • cudf.core.column.string.StringMethods.rjust
      • cudf.core.column.string.StringMethods.rpartition
      • cudf.core.column.string.StringMethods.rsplit
      • cudf.core.column.string.StringMethods.rstrip
      • cudf.core.column.string.StringMethods.slice
      • cudf.core.column.string.StringMethods.slice_from
      • cudf.core.column.string.StringMethods.slice_replace
      • cudf.core.column.string.StringMethods.split
      • cudf.core.column.string.StringMethods.startswith
      • cudf.core.column.string.StringMethods.strip
      • cudf.core.column.string.StringMethods.swapcase
      • cudf.core.column.string.StringMethods.title
      • cudf.core.column.string.StringMethods.token_count
      • cudf.core.column.string.StringMethods.tokenize
      • cudf.core.column.string.StringMethods.translate
      • cudf.core.column.string.StringMethods.upper
      • cudf.core.column.string.StringMethods.url_decode
      • cudf.core.column.string.StringMethods.url_encode
      • cudf.core.column.string.StringMethods.wrap
      • cudf.core.column.string.StringMethods.zfill
    • CharacterNormalizer
    • WordPieceTokenizer
    • TokenizeVocabulary
    • 列表处理
      • cudf.core.column.lists.ListMethods.astype
      • cudf.core.column.lists.ListMethods.concat
      • cudf.core.column.lists.ListMethods.contains
      • cudf.core.column.lists.ListMethods.index
      • cudf.core.column.lists.ListMethods.get
      • cudf.core.column.lists.ListMethods.leaves
      • cudf.core.column.lists.ListMethods.len
      • cudf.core.column.lists.ListMethods.sort_values
      • cudf.core.column.lists.ListMethods.take
      • cudf.core.column.lists.ListMethods.unique
    • 结构体处理
      • cudf.core.column.struct.StructMethods.field
      • cudf.core.column.struct.StructMethods.explode
    • 选项与设置
    • 扩展数据类型
      • cudf.core.dtypes.CategoricalDtype.categories
      • cudf.core.dtypes.CategoricalDtype.construct_from_string
      • cudf.core.dtypes.CategoricalDtype.deserialize
      • cudf.core.dtypes.CategoricalDtype.device_deserialize
      • cudf.core.dtypes.CategoricalDtype.device_serialize
      • cudf.core.dtypes.CategoricalDtype.from_pandas
      • cudf.core.dtypes.CategoricalDtype.host_deserialize
      • cudf.core.dtypes.CategoricalDtype.host_serialize
      • cudf.core.dtypes.CategoricalDtype.is_dtype
      • cudf.core.dtypes.CategoricalDtype.name
      • cudf.core.dtypes.CategoricalDtype.ordered
      • cudf.core.dtypes.CategoricalDtype.serialize
      • cudf.core.dtypes.CategoricalDtype.str
      • cudf.core.dtypes.CategoricalDtype.to_pandas
      • cudf.core.dtypes.CategoricalDtype.type
      • cudf.core.dtypes.Decimal32Dtype.ITEMSIZE
      • cudf.core.dtypes.Decimal32Dtype.MAX_PRECISION
      • cudf.core.dtypes.Decimal32Dtype.deserialize
      • cudf.core.dtypes.Decimal32Dtype.device_deserialize
      • cudf.core.dtypes.Decimal32Dtype.device_serialize
      • cudf.core.dtypes.Decimal32Dtype.from_arrow
      • cudf.core.dtypes.Decimal32Dtype.host_deserialize
      • cudf.core.dtypes.Decimal32Dtype.host_serialize
      • cudf.core.dtypes.Decimal32Dtype.is_dtype
      • cudf.core.dtypes.Decimal32Dtype.itemsize
      • cudf.core.dtypes.Decimal32Dtype.precision
      • cudf.core.dtypes.Decimal32Dtype.scale
      • cudf.core.dtypes.Decimal32Dtype.serialize
      • cudf.core.dtypes.Decimal32Dtype.str
      • cudf.core.dtypes.Decimal32Dtype.to_arrow
      • cudf.core.dtypes.Decimal64Dtype.ITEMSIZE
      • cudf.core.dtypes.Decimal64Dtype.MAX_PRECISION
      • cudf.core.dtypes.Decimal64Dtype.deserialize
      • cudf.core.dtypes.Decimal64Dtype.device_deserialize
      • cudf.core.dtypes.Decimal64Dtype.device_serialize
      • cudf.core.dtypes.Decimal64Dtype.from_arrow
      • cudf.core.dtypes.Decimal64Dtype.host_deserialize
      • cudf.core.dtypes.Decimal64Dtype.host_serialize
      • cudf.core.dtypes.Decimal64Dtype.is_dtype
      • cudf.core.dtypes.Decimal64Dtype.itemsize
      • cudf.core.dtypes.Decimal64Dtype.precision
      • cudf.core.dtypes.Decimal64Dtype.scale
      • cudf.core.dtypes.Decimal64Dtype.serialize
      • cudf.core.dtypes.Decimal64Dtype.str
      • cudf.core.dtypes.Decimal64Dtype.to_arrow
      • cudf.core.dtypes.Decimal128Dtype.ITEMSIZE
      • cudf.core.dtypes.Decimal128Dtype.MAX_PRECISION
      • cudf.core.dtypes.Decimal128Dtype.deserialize
      • cudf.core.dtypes.Decimal128Dtype.device_deserialize
      • cudf.core.dtypes.Decimal128Dtype.device_serialize
      • cudf.core.dtypes.Decimal128Dtype.from_arrow
      • cudf.core.dtypes.Decimal128Dtype.host_deserialize
      • cudf.core.dtypes.Decimal128Dtype.host_serialize
      • cudf.core.dtypes.Decimal128Dtype.is_dtype
      • cudf.core.dtypes.Decimal128Dtype.itemsize
      • cudf.core.dtypes.Decimal128Dtype.precision
      • cudf.core.dtypes.Decimal128Dtype.scale
      • cudf.core.dtypes.Decimal128Dtype.serialize
      • cudf.core.dtypes.Decimal128Dtype.str
      • cudf.core.dtypes.Decimal128Dtype.to_arrow
      • cudf.core.dtypes.ListDtype.deserialize
      • cudf.core.dtypes.ListDtype.device_deserialize
      • cudf.core.dtypes.ListDtype.device_serialize
      • cudf.core.dtypes.ListDtype.element_type
      • cudf.core.dtypes.ListDtype.from_arrow
      • cudf.core.dtypes.ListDtype.host_deserialize
      • cudf.core.dtypes.ListDtype.host_serialize
      • cudf.core.dtypes.ListDtype.is_dtype
      • cudf.core.dtypes.ListDtype.leaf_type
      • cudf.core.dtypes.ListDtype.serialize
      • cudf.core.dtypes.ListDtype.to_arrow
      • cudf.core.dtypes.ListDtype.type
      • cudf.core.dtypes.StructDtype.deserialize
      • cudf.core.dtypes.StructDtype.device_deserialize
      • cudf.core.dtypes.StructDtype.device_serialize
      • cudf.core.dtypes.StructDtype.fields
      • cudf.core.dtypes.StructDtype.from_arrow
      • cudf.core.dtypes.StructDtype.host_deserialize
      • cudf.core.dtypes.StructDtype.host_serialize
      • cudf.core.dtypes.StructDtype.is_dtype
      • cudf.core.dtypes.StructDtype.serialize
      • cudf.core.dtypes.StructDtype.to_arrow
      • cudf.core.dtypes.StructDtype.type
    • 性能跟踪
  • cuDF 和 Dask cuDF 十分钟入门
  • cuDF 与 Pandas 对比
  • 支持的数据类型
  • 输入/输出
    • 输入/输出
    • 处理 JSON 数据
  • 处理缺失数据
  • GroupBy
  • cuDF 用户自定义函数 (UDF) 概述
  • cuDF 与 CuPy 的互操作性
  • 选项
  • 性能对比
    • 性能对比
  • Pandas 兼容性注意事项
  • 写时复制
  • 内存性能分析
  • cuDF 24.04+ 中针对 pandas 2 的破坏性更改
  • cuDF 用户指南
  • API 参考
  • CharacterNormalizer

CharacterNormalizer#

构造函数#

CharacterNormalizer(do_lower, special_tokens)

用于归一化输入文本的归一化器对象。

CharacterNormalizer.normalize(text)

上一页

字符串处理

下一页

cudf.core.character_normalizer.CharacterNormalizer

本页内容
  • 构造函数

本页

  • 显示源代码

© Copyright 2018-2025, NVIDIA Corporation.

使用 Sphinx 8.2.3 创建。

基于 PyData Sphinx Theme 0.16.1 构建。