cuSpatial Python 用户指南#

cuSpatial 是一个 GPU 加速的 Python 空间数据分析库,包括距离和
轨迹计算、空间数据索引和空间连接操作。cuSpatial 的
Python API 提供了一个易于访问的接口,用于由 CUDA 启用的 GPU 加速的高性能空间算法。

目录#

本指南提供了 cuSpatial 所有 Python API 组件的示例。
以下列表链接到每个小节。

安装 cuSpatial#

阅读 RAPIDS 快速入门指南 了解更多关于安装所有 RAPIDS 库(包括 cuSpatial)的信息。

如果您正在使用配备 CUDA 的 GPU 并已安装 CUDA 的系统上工作,请取消注释以下
单元格并安装 cuSpatial
[ ]:
# !conda create -n rapids-25.04 -c rapidsai -c conda-forge -c nvidia \
#     cuspatial=25.04 python=3.9 cudatoolkit=11.5
有关创建 RAPIDS 环境的其他选项,例如 docker 或从源代码构建,请参见

如果您希望为 cuSpatial 做出贡献,您应该使用出色的 rapids-compose 进行源代码构建。

GPU 加速内存布局#

cuSpatial 使用 GeoArrow 缓冲区,这是一种适用于几何数据的 GPU 友好数据格式,非常
适合大规模并行编程。有关将您的数据导入 cuSpatial 的最快方法,请参见 [I/O](#Input-/-Output)。
数据导入 cuSpatial。GeoArrow 扩展了 PyArrow 绑定,并引入了几种适用于
几何应用的类型。GeoArrow 支持 ListArrays,用于 PointsMultiPoints
LineStringsMultiLineStringsPolygonsMultiPolygons。使用 Arrow DenseArray
GeoArrow 存储异构类型的 Features。几何对象及其元数据的 DataFrame
可以以类似于 GeoPandas.GeoSeries 的方法加载和转换。
[1]:
# Imports used throughout this notebook.
import cuspatial
import cudf
import cupy
import geopandas
import os
import pandas as pd
import numpy as np
from shapely.geometry import *
from shapely import wkt
[2]:
# For deterministic result
np.random.seed(0)
cupy.random.seed(0)

输入 / 输出#

将 features 加载到 cuSpatial 中的主要方法是使用 cuspatial.from_geopandas

也可以直接使用支持 __array_interface__ 的任何 Python 缓冲区创建 feature 几何图形,用于坐标及其 feature 偏移量。
__array_interface__ for coordinates and their feature offsets.

cuspatial.from_geopandas#

将数据导入 cuSpatial 最简单的方法是通过 cuspatial.from_geopandas

[3]:
host_dataframe = geopandas.read_file('https://naturalearth.s3.amazonaws.com/110m_cultural/ne_110m_admin_0_countries.zip')
host_dataframe = host_dataframe.set_crs(4326)
gpu_dataframe = cuspatial.from_geopandas(host_dataframe)
print(gpu_dataframe.head())
        featurecla  scalerank  LABELRANK                   SOVEREIGNT SOV_A3  \
0  Admin-0 country          1          6                         Fiji    FJI
1  Admin-0 country          1          3  United Republic of Tanzania    TZA
2  Admin-0 country          1          7               Western Sahara    SAH
3  Admin-0 country          1          2                       Canada    CAN
4  Admin-0 country          1          2     United States of America    US1

   ADM0_DIF  LEVEL               TYPE TLC                        ADMIN  ...  \
0         0      2  Sovereign country   1                         Fiji  ...
1         0      2  Sovereign country   1  United Republic of Tanzania  ...
2         0      2      Indeterminate   1               Western Sahara  ...
3         0      2  Sovereign country   1                       Canada  ...
4         1      2            Country   1     United States of America  ...

      FCLASS_TR     FCLASS_ID     FCLASS_PL FCLASS_GR  FCLASS_IT  \
0          None          None          None      None       None
1          None          None          None      None       None
2  Unrecognized  Unrecognized  Unrecognized      None       None
3          None          None          None      None       None
4          None          None          None      None       None

      FCLASS_NL FCLASS_SE  FCLASS_BD FCLASS_UA  \
0          None      None       None      None
1          None      None       None      None
2  Unrecognized      None       None      None
3          None      None       None      None
4          None      None       None      None

                                            geometry
0  MULTIPOLYGON (((180 -16.06713, 180 -16.55522, ...
1  POLYGON ((33.90371 -0.95, 34.07262 -1.05982, 3...
2  POLYGON ((-8.66559 27.65643, -8.66512 27.58948...
3  MULTIPOLYGON (((-122.84 49, -122.97421 49.0025...
4  MULTIPOLYGON (((-122.84 49, -120 49, -117.0312...

[5 rows x 169 columns]
(GPU)

Geopandas 和 cuDF 集成#

cuSpatial GeoDataFramecudf Seriescuspatial.GeoSeries "geometry" 对象的集合。
这两种类型的 series 都存储在 GPU 上,并且 GeoSeries 内部使用 GeoArrow 数据布局表示。
cuSpatial 最重要的特性之一是它与 cuDF 高度集成。
您可以在 cuSpatial 非 feature 列上使用任何 cuDF 操作,并且大多数操作也适用于
包含 geometry 列。减少或整理 DataFrame 行数的操作,
例如 groupby,目前尚不支持。
[4]:
gpu_dataframe = cuspatial.from_geopandas(host_dataframe)
continents_dataframe = gpu_dataframe.sort_values("NAME")
print(continents_dataframe.head())
          featurecla  scalerank  LABELRANK   SOVEREIGNT SOV_A3  ADM0_DIF  \
103  Admin-0 country          1          3  Afghanistan    AFG         0
125  Admin-0 country          1          6      Albania    ALB         0
82   Admin-0 country          1          3      Algeria    DZA         0
74   Admin-0 country          1          3       Angola    AGO         0
159  Admin-0 country          1          4   Antarctica    ATA         0

     LEVEL               TYPE TLC        ADMIN  ... FCLASS_TR  FCLASS_ID  \
103      2  Sovereign country   1  Afghanistan  ...      None       None
125      2  Sovereign country   1      Albania  ...      None       None
82       2  Sovereign country   1      Algeria  ...      None       None
74       2  Sovereign country   1       Angola  ...      None       None
159      2      Indeterminate   1   Antarctica  ...      None       None

    FCLASS_PL FCLASS_GR  FCLASS_IT FCLASS_NL FCLASS_SE  FCLASS_BD FCLASS_UA  \
103      None      None       None      None      None       None      None
125      None      None       None      None      None       None      None
82       None      None       None      None      None       None      None
74       None      None       None      None      None       None      None
159      None      None       None      None      None       None      None

                                              geometry
103  POLYGON ((66.51861 37.36278, 67.07578 37.35614...
125  POLYGON ((21.02004 40.84273, 20.99999 40.58, 2...
82   POLYGON ((-8.6844 27.39574, -8.66512 27.58948,...
74   MULTIPOLYGON (((12.99552 -4.7811, 12.63161 -4....
159  MULTIPOLYGON (((-48.66062 -78.04702, -48.1514 ...

[5 rows x 169 columns]
(GPU)

您还可以在 GPU 支持的 cuspatial.GeoDataFrame 和 CPU 支持的
geopandas.GeoDataFrame 之间使用 from_geopandasto_geopandas 进行转换,使您能够
利用任何原生的 GeoPandas 操作。但请注意,GeoPandas 运行在
CPU 上,因此性能不如 cuSpatial 操作。以下
示例显示了按名称字母顺序排序的 DataFrame 中第一个项目关联的 Polygon
alphabetically by name.
[5]:
gpu_dataframe = cuspatial.from_geopandas(host_dataframe)
sorted_dataframe = gpu_dataframe.sort_values("NAME")
sorted_dataframe = sorted_dataframe.to_geopandas()
sorted_dataframe['geometry'].iloc[0]
[5]:
../../_images/user_guide_cuspatial_api_examples_14_0.svg

轨迹#

轨迹是 LineStringLineString 中每个点的时间采样相结合。
使用 cuspatial.trajectory.derive_trajectories 对轨迹数据集进行分组并按时间排序。

2180fa31159f45978f8de2ea5911c5c1

cuspatial.derive_trajectories#

[6]:
# 1m random trajectory samples
ids = cupy.random.randint(1, 400, 1000000)
timestamps = cupy.random.random(1000000)*1000000
xy= cupy.random.random(2000000)
trajs = cuspatial.GeoSeries.from_points_xy(xy)
sorted_trajectories, trajectory_offsets = \
    cuspatial.core.trajectory.derive_trajectories(ids, trajs, timestamps)
# sorted_trajectories is a DataFrame containing all trajectory samples
# sorted first by `object_id` and then by `timestamp`.
print(sorted_trajectories.head())
# trajectory_offsets is a Series containing the start position of each
# trajectory in sorted_trajectories.
print(trajectory_offsets)
   object_id         x         y               timestamp
0          1  0.680146  0.874341 1970-01-01 00:00:00.125
1          1  0.843522  0.044402 1970-01-01 00:00:00.834
2          1  0.837039  0.351025 1970-01-01 00:00:01.335
3          1  0.946184  0.479038 1970-01-01 00:00:01.791
4          1  0.117322  0.182117 1970-01-01 00:00:02.474
0           0
1        2455
2        4899
3        7422
4        9924
        ...
394    987408
395    989891
396    992428
397    994975
398    997448
Length: 399, dtype: int32
derive_trajectoriesobject_id,然后按 timestamp 对轨迹进行排序,并返回一个
元组,其中第一个索引位置包含排序后的轨迹数据帧,第二个索引位置包含定义每条轨迹起点和终点的偏移量
缓冲区。

cuspatial.trajectory_distances_and_speeds#

使用 trajectory_distance_and_speed 计算总的行驶距离(以米为单位)和
一组轨迹的速度,其格式与 derive_trajectories 返回的结果相同。
[7]:
trajs = cuspatial.GeoSeries.from_points_xy(
    sorted_trajectories[["x", "y"]].interleave_columns()
)
d_and_s = cuspatial.core.trajectory.trajectory_distances_and_speeds(
  len(cudf.Series(ids).unique()),
  sorted_trajectories['object_id'],
  trajs,
  sorted_trajectories['timestamp']
)
print(d_and_s.head())
                   distance        speed
trajectory_id
0              1.278996e+06  1280.320089
1              1.267179e+06  1268.370390
2              1.294437e+06  1295.905261
3              1.323413e+06  1323.956714
4              1.309590e+06  1311.561012
最后,计算遵循上述两个示例格式的轨迹的边界框
examples

边界框#

计算 n 个多边形或线串的边界框

0241c1476a564c0b8ea063b98ce903cd

cuspatial.trajectory_bounding_boxes#

trajectory_bounding_boxes 可以直接使用 derive_trajectories 返回的值。
其参数是输入对象的数量、这些对象的偏移量以及 x 和 y 点缓冲区。
[7]:
bounding_boxes = cuspatial.core.trajectory.trajectory_bounding_boxes(
  len(cudf.Series(ids, dtype="int32").unique()),
  sorted_trajectories['object_id'],
  trajs
)
print(bounding_boxes.head())
      x_min     y_min     x_max     y_max
0  0.000361  0.000170  0.999582  0.999485
1  0.000184  0.000647  0.999939  0.999884
2  0.000461  0.001395  0.999938  0.999297
3  0.000093  0.000073  0.999819  0.999544
4  0.000105  0.000112  0.999952  0.999013

cuspatial.polygon_bounding_boxes#

polygon_bounding_boxes 支持更复杂的几何对象,例如具有多个环的 Polygon`。
part_offsetring_offset 的组合允许该函数仅使用
外部环来计算边界框。
[8]:
single_polygons = cuspatial.from_geopandas(
    host_dataframe['geometry'][host_dataframe['geometry'].type == "Polygon"]
)
bounding_box_polygons = cuspatial.core.spatial.bounding.polygon_bounding_boxes(
    single_polygons
)
print(bounding_box_polygons.head())
        minx       miny       maxx       maxy
0  29.339998 -11.720938  40.316590  -0.950000
1 -17.063423  20.999752  -8.665124  27.656426
2  46.466446  40.662325  87.359970  55.385250
3  55.928917  37.144994  73.055417  45.586804
4  12.182337 -13.257227  31.174149   5.256088

cuspatial.linestring_bounding_boxes#

同样,我们可以将轨迹视为 Linestrings,并从
上面的轨迹计算中更通用地计算相同的边界框
[9]:
lines = cuspatial.GeoSeries.from_linestrings_xy(
    trajs.points.xy, trajectory_offsets, cupy.arange(len(trajectory_offsets))
)
trajectory_bounding_boxes = cuspatial.core.spatial.bounding.linestring_bounding_boxes(
    lines, 0.0001
)
print(trajectory_bounding_boxes.head())
       minx      miny      maxx      maxy
0  0.000261  0.000070  0.999682  0.999585
1  0.000084  0.000547  1.000039  0.999984
2  0.000361  0.001295  1.000038  0.999397
3 -0.000007 -0.000027  0.999919  0.999644
4  0.000005  0.000012  1.000052  0.999113

投影#

cuSpatial 提供了一个简单的正弦经度/纬度到笛卡尔坐标变换。
此函数需要一个原点来确定 lonlat 输入的缩放参数。

cuspatial.sinusoidal_projection#

以下单元格将阿富汗的经纬度坐标转换为
以公里为单位的笛卡尔坐标,以该国中心为中心,适合绘图和显示。
[10]:
gpu_dataframe = cuspatial.from_geopandas(host_dataframe)
afghanistan = gpu_dataframe['geometry'][gpu_dataframe['NAME'] == 'Afghanistan']
points = cuspatial.GeoSeries.from_points_xy(afghanistan.polygons.xy)
projected = cuspatial.sinusoidal_projection(
    afghanistan.polygons.x.mean(),
    afghanistan.polygons.y.mean(),
    points
)
print(projected.head())
0     POINT (112.174 -281.59)
1     POINT (62.152 -280.852)
2     POINT (-5.573 -257.391)
3    POINT (-33.071 -243.849)
4     POINT (-98.002 -279.54)
dtype: geometry

距离#

cuSpatial 提供了一系列不断增长的距离计算函数。并行距离函数主要有两种形式:
pairwise(成对),计算每对相应输入几何图形之间的距离;
和 all-pairs(所有对),计算输入几何图形笛卡尔积的每个元素之间的距离(对于 A 中的每个输入几何图形,计算其与 B 中每个输入几何图形之间的距离)。
of input geometries (for each input geometry in A, compute the distance from A to each input geometry in B).”
cuSpatial 中包含两个 pairwise 距离函数:haversinepairwise_linestring
还提供了 hausdorff 聚类距离算法,计算其单个输入的笛卡尔积的 hausdorff
距离。

cuspatial.directed_hausdorff_distance#

从一个空间到另一个空间的定向 Hausdorff 距离是第一个空间中任意一点到第二个空间中最近点之间所有距离中的最大值。这对于轨迹之间的相似性度量特别有用。
between any point in the first space to the closet point in the second. This is especially useful
as a similarity metric between trajectories.

f82edd5be968455385ad10a67b8fedd5

Hausdorff 距离

[11]:
coordinates = sorted_trajectories[['x', 'y']].interleave_columns()
spaces = cuspatial.GeoSeries.from_multipoints_xy(
    coordinates, trajectory_offsets
)
hausdorff_distances = cuspatial.core.spatial.distance.directed_hausdorff_distance(
    spaces
)
print(hausdorff_distances.head())
        0         1         2         3         4         5         6    \
0  0.000000  0.034755  0.031989  0.031959  0.031873  0.038674  0.029961
1  0.030328  0.000000  0.038672  0.032086  0.031049  0.032170  0.032275
2  0.027640  0.030539  0.000000  0.036737  0.033055  0.043447  0.028812
3  0.031497  0.033380  0.035224  0.000000  0.032581  0.035484  0.030339
4  0.031079  0.032256  0.035731  0.039084  0.000000  0.036416  0.031369

        7         8         9    ...       388       389       390       391  \
0  0.029117  0.040962  0.033259  ...  0.031614  0.036447  0.035548  0.028233
1  0.030215  0.034443  0.032998  ...  0.030594  0.035665  0.031473  0.031916
2  0.031807  0.039269  0.033250  ...  0.031998  0.033636  0.034646  0.032615
3  0.034792  0.045755  0.031810  ...  0.033623  0.031359  0.034923  0.032287
4  0.030388  0.033751  0.034029  ...  0.030705  0.040339  0.034328  0.029027

        392       393       394       395       396       397
0  0.034176  0.030057  0.033863  0.031111  0.034590  0.033850
1  0.037483  0.033489  0.041403  0.029784  0.035374  0.038179
2  0.036681  0.030642  0.038432  0.032481  0.034810  0.036695
3  0.032808  0.029771  0.040891  0.030802  0.032279  0.038443
4  0.035645  0.027703  0.037529  0.029356  0.031260  0.035501

[5 rows x 398 columns]

cuspatial.haversine_distance#

Haversine 距离是经度和纬度对之间的大圆距离。cuSpatial 使用 lon/lat 顺序,以更好地反映大圆坐标的笛卡尔坐标:x/y
uses the lon/lat ordering to better reflect the cartesian coordinates of great circle
coordinates: x/y`.

c157799b1aef45c98403066035e8077e

[12]:
gpu_dataframe = cuspatial.from_geopandas(host_dataframe)
polygons_first = gpu_dataframe['geometry'][0:10]
polygons_second = gpu_dataframe['geometry'][10:30]

points_first = polygons_first.polygons.xy[0:1000]
points_second = polygons_second.polygons.xy[0:1000]

first = cuspatial.GeoSeries.from_points_xy(points_first)
second = cuspatial.GeoSeries.from_points_xy(points_second)

# The number of coordinates in two sets of polygons vary, so
# we'll just compare the first set of 1000 values here.
distances_in_meters = cuspatial.haversine_distance(
    first, second
)
cudf.Series(distances_in_meters).head()
[12]:
0    9959.695143
1    9803.166859
2    9876.857085
3    9925.097106
4    9927.268486
Name: None, dtype: float64

上述方法将 GeoPandas 数据从 CPU 内存读取到 GPU 内存中,然后由 cuSpatial 进行处理。如果数据已经在一个 cuDF GPU 数据帧中,您可以使用下面的方法快速计算 Haversine 距离。这通过将所有处理保留在 GPU 上来最大化速度,并且在处理大型数据集时非常有用。

[13]:
# Generate data to be used to create a cuDF dataframe.
# The data to be processed by Haversine MUST be a Float.
a = {"latitude":[17.1167, 17.1333, 25.333, 25.255, 24.433, 24.262, 35.317, 34.21, 34.566, 31.5, 36.7167, 30.5667, 28.05, 22.8, 35.7297, 36.97, 36.78, 36.8, 36.8, 36.72],
     "longitude": [-61.7833, -61.7833, 55.517, 55.364, 54.651, 55.609, 69.017, 62.228, 69.212, 65.85, 3.25, 2.8667, 9.6331, 5.4331, 0.65, 7.79, 3.07, 3.03, 3.04, 4.05]}
df = cudf.DataFrame(data=a)

# Create cuSpatial GeoSeries from cuDF Dataframe
cuGeoSeries = cuspatial.GeoSeries.from_points_xy(df[['longitude', 'latitude']].interleave_columns())

# Create Comparator cuSpatial GeoSeries from a comparator point
df['atlanta_lat'] = 33.7490
df['atlanta_lng'] = -84.3880
atlGeoSeries = cuspatial.GeoSeries.from_points_xy(df[['atlanta_lat', 'atlanta_lng']].interleave_columns())

# Calculate Haversine Distance of cuDF dataframe to comparator point
df['atlanta_dist'] = cuspatial.haversine_distance(cuGeoSeries, atlGeoSeries)
print(df.head())
   latitude  longitude  atlanta_lat  atlanta_lng  atlanta_dist
0   17.1167   -61.7833       33.749      -84.388  11961.556540
1   17.1333   -61.7833       33.749      -84.388  11963.392729
2   25.3330    55.5170       33.749      -84.388  12243.126130
3   25.2550    55.3640       33.749      -84.388  12233.867463
4   24.4330    54.6510       33.749      -84.388  12139.822218

Pairwise distance#

pairwise_linestring_distance 计算长度为 n 的 Linestrings GeoSeries 与相应长度为 n 的 Linestrings GeoSeries 之间的距离。它返回该对中第一个线串中任意点到第二个线串中最近的线段或点之间的最小距离。
length n and a corresponding GeoSeries of Linestrings of n length. It returns the
minimum distance from any point in the first linestring of the pair to the nearest segment
or point within the second Linestring of the pair.

输入接受一对 geoseries 作为线串数组的输入序列。

下面的示例使用 naturalearth_lowres 中的多边形,并将其视为线串。
第一个示例计算所有多边形与其自身之间的距离,而第二个示例计算前 50 个多边形与后 50 个多边形之间的距离。
example computes the distance between the first 50 polygons and the second 50 polygons.

cuspatial.pairwise_linestring_distance#

[14]:
gpu_boundaries = cuspatial.from_geopandas(host_dataframe.geometry.boundary)
zeros = cuspatial.pairwise_linestring_distance(
    gpu_boundaries[0:50],
    gpu_boundaries[0:50]
)
print(zeros.head())
lines1 = gpu_boundaries[0:50]
lines2 = gpu_boundaries[50:100]
distances = cuspatial.pairwise_linestring_distance(
    lines1, lines2
)
print(distances.head())
0    0.0
1    0.0
2    0.0
3    0.0
4    0.0
dtype: float64
0    152.200610
1     44.076445
2      2.417269
3     44.197151
4     75.821029
dtype: float64
pairwise_point_linestring_distance 计算点对和
线串对之间的距离。它也可以用于将多边形视为线串的情况。在以下
示例中,计算了国家中心到其边界的最小距离。

cuspatial.pairwise_point_linestring_distance#

使用 WGS 84 伪墨卡托投影,距离以米为单位。

[15]:
# Convert input dataframe to Pseudo-Mercator projection.
host_dataframe3857 = host_dataframe.to_crs(3857)
polygons = host_dataframe3857[host_dataframe3857['geometry'].type == "Polygon"]
gpu_polygons = cuspatial.from_geopandas(polygons)
# Extract mean_x and mean_y from each country
mean_x = [gpu_polygons['geometry'].iloc[[ix]].polygons.x.mean() for ix in range(len(gpu_polygons))]
mean_y = [gpu_polygons['geometry'].iloc[[ix]].polygons.y.mean() for ix in range(len(gpu_polygons))]
# Convert mean_x/mean_y values into Points for use in API.
points = cuspatial.GeoSeries([Point(point) for point in zip(mean_x, mean_y)])
# Convert Polygons into Linestrings for use in API.
linestring_df = cuspatial.from_geopandas(geopandas.geoseries.GeoSeries(
    [MultiLineString(mapping(polygons['geometry'].iloc[ix])["coordinates"]) for ix in range(len(polygons))]
))
gpu_polygons['border_distance'] = cuspatial.pairwise_point_linestring_distance(
    points, linestring_df
)
print(gpu_polygons.head())
         featurecla  scalerank  LABELRANK                        SOVEREIGNT  \
1   Admin-0 country          1          3       United Republic of Tanzania
2   Admin-0 country          1          7                    Western Sahara
5   Admin-0 country          1          3                        Kazakhstan
6   Admin-0 country          1          3                        Uzbekistan
11  Admin-0 country          1          2  Democratic Republic of the Congo

   SOV_A3  ADM0_DIF  LEVEL               TYPE TLC  \
1     TZA         0      2  Sovereign country   1
2     SAH         0      2      Indeterminate   1
5     KA1         1      1        Sovereignty   1
6     UZB         0      2  Sovereign country   1
11    COD         0      2  Sovereign country   1

                               ADMIN  ...     FCLASS_ID     FCLASS_PL  \
1        United Republic of Tanzania  ...          None          None
2                     Western Sahara  ...  Unrecognized  Unrecognized
5                         Kazakhstan  ...          None          None
6                         Uzbekistan  ...          None          None
11  Democratic Republic of the Congo  ...          None          None

   FCLASS_GR FCLASS_IT     FCLASS_NL FCLASS_SE FCLASS_BD  FCLASS_UA  \
1       None      None          None      None      None       None
2       None      None  Unrecognized      None      None       None
5       None      None          None      None      None       None
6       None      None          None      None      None       None
11      None      None          None      None      None       None

                                             geometry border_distance
1   POLYGON ((3774143.866 -105758.362, 3792946.708...     8047.288391
2   POLYGON ((-964649.018 3205725.605, -964597.245...   593137.492497
5   POLYGON ((9724867.413 6311418.173, 9640131.701...    37091.213890
6   POLYGON ((6230350.563 5057973.384, 6225978.591...   278633.467299
11  POLYGON ((3266113.592 -501451.658, 3286149.877...    35812.988244

[5 rows x 170 columns]
(GPU)

cuspatial.pairwise_point_polygon_distance#

使用 WGS 84 伪墨卡托投影,距离以米为单位。

[16]:
countries = host_dataframe

cities = geopandas.read_file('https://naturalearth.s3.amazonaws.com/110m_cultural/ne_110m_populated_places_simple.zip')
cities = cities.to_crs(3857)

gpu_cities = cuspatial.from_geopandas(cities)
gpu_countries = cuspatial.from_geopandas(countries)
dist = cuspatial.pairwise_point_polygon_distance(
    gpu_cities.geometry[:len(gpu_countries)], gpu_countries.geometry
)

gpu_countries["distance_from"] = cities.name
gpu_countries["distance"] = dist

print(gpu_countries.head())
        featurecla  scalerank  LABELRANK                   SOVEREIGNT SOV_A3  \
0  Admin-0 country          1          6                         Fiji    FJI
1  Admin-0 country          1          3  United Republic of Tanzania    TZA
2  Admin-0 country          1          7               Western Sahara    SAH
3  Admin-0 country          1          2                       Canada    CAN
4  Admin-0 country          1          2     United States of America    US1

   ADM0_DIF  LEVEL               TYPE TLC                        ADMIN  ...  \
0         0      2  Sovereign country   1                         Fiji  ...
1         0      2  Sovereign country   1  United Republic of Tanzania  ...
2         0      2      Indeterminate   1               Western Sahara  ...
3         0      2  Sovereign country   1                       Canada  ...
4         1      2            Country   1     United States of America  ...

      FCLASS_PL  FCLASS_GR FCLASS_IT     FCLASS_NL  FCLASS_SE FCLASS_BD  \
0          None       None      None          None       None      None
1          None       None      None          None       None      None
2  Unrecognized       None      None  Unrecognized       None      None
3          None       None      None          None       None      None
4          None       None      None          None       None      None

  FCLASS_UA                                           geometry distance_from  \
0      None  MULTIPOLYGON (((180 -16.06713, 180 -16.55522, ...  Vatican City
1      None  POLYGON ((33.90371 -0.95, 34.07262 -1.05982, 3...    San Marino
2      None  POLYGON ((-8.66559 27.65643, -8.66512 27.58948...         Vaduz
3      None  MULTIPOLYGON (((-122.84 49, -122.97421 49.0025...       Lobamba
4      None  MULTIPOLYGON (((-122.84 49, -120 49, -117.0312...    Luxembourg

       distance
0  5.329915e+06
1  5.628613e+06
2  6.057264e+06
3  4.626961e+06
4  6.415631e+06

[5 rows x 171 columns]
(GPU)

cuspatial.pairwise_linestring_polygon_distance#

使用 WGS 84 伪墨卡托投影,距离以米为单位。

[17]:
# all driveways within 2km range of central park, nyc

if not os.path.exists("./streets_3857.csv"):
    import osmnx as ox
    graph = ox.graph_from_point((40.769361, -73.977655), dist=2000, network_type="drive")
    nodes, streets = ox.graph_to_gdfs(graph)
    streets = streets.to_crs(3857)
    streets = streets.reset_index(drop=True)
    streets.index.name = "index"
    streets[["name", "geometry"]].to_csv("streets_3857.csv")

# The data is under notebooks/streets_3857.csv
streets = pd.read_csv("./streets_3857.csv", index_col="index")
streets.geometry = streets.geometry.apply(wkt.loads)
streets = geopandas.GeoDataFrame(streets)
streets.head()
[17]:
名称 几何图形
索引
0 哥伦布大道 LINESTRING (-8234860.077 4980333.535, -8234863...
1 西 80 街 LINESTRING (-8235173.854 4980508.442, -8235160...
2 阿姆斯特丹大道 LINESTRING (-8235173.854 4980508.442, -8235168...
3 西 80 街 LINESTRING (-8235369.475 4980617.398, -8235347...
4 百老汇 LINESTRING (-8235369.475 4980617.398, -8235373...
[19]:
# The polygon of the Empire State Building

if not os.path.exists("./esb_3857.csv"):
    import osmnx as ox
    esb = ox.features.features_from_place('Empire State Building, New York', tags={"building": True})
    esb = esb.to_crs(3857)
    esb = esb.geometry.reset_index(drop=True)
    esb.index.name = "index"
    esb.to_csv("esb_3857.csv")

# The data is under notebooks/esb_3857.csv
esb = pd.read_csv("./esb_3857.csv", index_col="index")
esb.geometry = esb.geometry.apply(wkt.loads)
esb = geopandas.GeoDataFrame(esb)
esb = pd.concat([esb.iloc[0:1]] * len(streets))
esb.head()
/raid/jlamb/miniforge/envs/cuspatial-dev/lib/python3.11/site-packages/osmnx/features.py:294: DeprecationWarning: The 'unary_union' attribute is deprecated, use the 'union_all()' method instead.
  polygon = gdf_place["geometry"].unary_union
[19]:
几何图形
索引
0 POLYGON ((-8236139.639 4975314.625, -8235990.3...
0 POLYGON ((-8236139.639 4975314.625, -8235990.3...
0 POLYGON ((-8236139.639 4975314.625, -8235990.3...
0 POLYGON ((-8236139.639 4975314.625, -8235990.3...
0 POLYGON ((-8236139.639 4975314.625, -8235990.3...
[20]:
# Straight line distance between the driveways to the Empire State Building
gpu_streets = cuspatial.from_geopandas(streets.geometry)
gpu_esb = cuspatial.from_geopandas(esb.geometry)

dist = cuspatial.pairwise_linestring_polygon_distance(gpu_streets, gpu_esb).rename("dist")
pd.concat([streets["name"].reset_index(drop=True), dist.to_pandas()], axis=1)
[20]:
名称 距离
0 哥伦布大道 4993.583717
1 西 80 街 5103.472213
2 阿姆斯特丹大道 5208.373183
3 西 80 街 5275.851781
4 百老汇 5178.999774
... ... ...
1862 西 82 街 5411.762092
1863 百老汇 5476.345847
1864 西 84 街 5613.403002
1865 西 75 街 4750.092380
1866 百老汇 4638.894939

1867 行 × 2 列

cuspatial.pairwise_polygon_distance#

使用 WGS 84 伪墨卡托投影,距离以米为单位。

[21]:
african_countries = gpu_countries[gpu_countries.CONTINENT == "Africa"].sort_values("POP_EST", ascending=False)
asian_countries = gpu_countries[gpu_countries.CONTINENT == "Asia"].sort_values("POP_EST", ascending=False)
[22]:
# Straight line distance between the top 10 most populated countries in Asia and Africa
population_top10_africa = african_countries[:10].reset_index(drop=True)
population_top10_asia = asian_countries[:10].reset_index(drop=True)
dist = cuspatial.pairwise_polygon_distance(
    population_top10_africa.geometry, population_top10_asia.geometry)

cudf.concat([
    population_top10_africa["NAME"].rename("Africa"),
    population_top10_asia["NAME"].rename("Asia"),
    dist.rename("dist")], axis=1
)
[22]:
非洲 亚洲 距离
0 尼日利亚 中国 64.966620
1 埃塞俄比亚 印度 25.598868
2 埃及 印度尼西亚 60.717434
3 刚果民主共和国 巴基斯坦 37.489668
4 南非 孟加拉国 72.860545
5 坦桑尼亚 日本 97.872886
6 肯尼亚 菲律宾 75.450451
7 乌干达 越南 69.827567
8 阿尔及利亚 土耳其 17.927419
9 苏丹 伊朗 13.990335

过滤#

过滤模块包含 points_in_spatial_window,该函数从一组点中仅返回落在由四个边界坐标 min_xmax_xmin_ymax_y 定义的空间窗口内的点。以下示例仅找到落在所有多边形平均值 1 个标准差内的多边形点。

66f3baa121b9458e8ac57b4da445bbe6

cuspatial.points_in_spatial_window#

[23]:
gpu_dataframe = cuspatial.from_geopandas(host_dataframe)
geometry = gpu_dataframe['geometry']
points = cuspatial.GeoSeries.from_points_xy(geometry.polygons.xy)
mean_x, std_x = (geometry.polygons.x.mean(), geometry.polygons.x.std())
mean_y, std_y = (geometry.polygons.y.mean(), geometry.polygons.y.std())
avg_points = cuspatial.points_in_spatial_window(
    points,
    mean_x - std_x,
    mean_x + std_x,
    mean_y - std_y,
    mean_y + std_y
)
print(avg_points.head())
0       POINT (33.90371 -0.95)
1    POINT (34.07262 -1.05982)
2    POINT (37.69869 -3.09699)
3     POINT (37.7669 -3.67712)
4    POINT (39.20222 -4.67677)
dtype: geometry

通过仔细分组,可以重建落在范围内的原始完整多边形。

集合操作#

线串交集#

cuSpatial 提供了一个线串-线串交集算法,用于计算两个线串之间的重叠几何图形。该 API 还返回每个返回几何图形的 ID,以帮助用户追溯源几何图形。

[24]:
from cuspatial.core.binops.intersection import pairwise_linestring_intersection

usa_boundary = cuspatial.from_geopandas(host_dataframe[host_dataframe.NAME == "United States of America"].geometry.boundary)
canada_boundary = cuspatial.from_geopandas(host_dataframe[host_dataframe.NAME == "Canada"].geometry.boundary)

list_offsets, geometries, look_back_ids = pairwise_linestring_intersection(usa_boundary, canada_boundary)
[25]:
# The first integer series shows that the result contains 1 row (since we only have 1 pair of linestrings as input).
# This row contains 144 geometires.
list_offsets
[25]:
<cudf.core.column.numerical.NumericalColumn object at 0x7f525c27a570>
[
  0,
  142
]
dtype: int32
[31]:
# The second element is a geoseries that contains the intersecting geometries, with 144 rows, including points and linestrings.
geometries
[31]:
0                            POINT (-130.53611 54.80275)
1                            POINT (-130.53611 54.80278)
2                            POINT (-130.53611 54.80275)
3                                 POINT (-129.98 55.285)
4                            POINT (-130.53611 54.80278)
                             ...
137                  LINESTRING (-120 49, -117.03121 49)
138                     LINESTRING (-122.84 49, -120 49)
139               LINESTRING (-117.03121 49, -107.05 49)
140    LINESTRING (-83.89077 46.11693, -83.61613 46.1...
141    LINESTRING (-82.69009 41.67511, -82.43928 41.6...
Length: 142, dtype: geometry
[26]:
# The third element is a dataframe that contains IDs to the input segments and linestrings, 4 for each result row.
# Each represents ids to lhs, rhs linestring and segment ids.
look_back_ids
[26]:
lhs_linestring_id lhs_segment_id rhs_linestring_id rhs_segment_id
0 [8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, ... [18, 16, 18, 15, 17, 137, 14, 16, 13, 15, 14, ... [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... [9, 10, 10, 11, 11, 28, 12, 12, 13, 13, 14, 15...

空间连接#

cuSpatial 提供了许多函数来促进高性能空间连接,包括无索引和四叉树索引的点在多边形内查询以及四叉树索引的点到最近线串查询。

空间连接的 API 尚不完全与 GeoPandas 匹配,但了解 cuSpatial 数据格式后,您可以对大量点和 32 个或更少的多边形调用 cuspatial.point_in_polygon,或者对大量点和多边形调用 cuspatial.quadtree_point_in_polygon

无索引的点在多边形内连接#

cuspatial.point_in_polygon#

[27]:
single_polygons = host_dataframe[host_dataframe['geometry'].type == "Polygon"]
gpu_dataframe = cuspatial.from_geopandas(single_polygons)
x_points = (cupy.random.random(10000000) - 0.5) * 360
y_points = (cupy.random.random(10000000) - 0.5) * 180
xy = cudf.DataFrame({"x": x_points, "y": y_points}).interleave_columns()
points = cuspatial.GeoSeries.from_points_xy(xy)

short_dataframe = gpu_dataframe.iloc[0:31]
geometry = short_dataframe['geometry']

points_in_polygon = cuspatial.point_in_polygon(
    points, geometry
)
sum_of_points_in_polygons_0_to_31 = points_in_polygon.sum()
sum_of_points_in_polygons_0_to_31.head()
[27]:
1     11896
2      1268
5     50835
6      7792
11    29318
dtype: int64
cuSpatial 包含另一个连接算法 quadtree_point_in_polygon,它使用索引
四叉树进行更快的计算。quadtree_point_in_polygon 还支持数量
仅受内存限制的多边形。

四叉树索引#

索引模块用于创建空间四叉树。使用

cuspatial.quadtree_on_points(
    points,
    x_min,
    x_max,
    y_min,
    y_max,
    scale,
    max_depth,
    max_size
)
来创建由 join 模块中的 quadtree_point_in_polygon
函数使用的四叉树对象。
该函数使用一组点和一个用户定义的边界框来构建索引四叉树。
请务必适当调整参数,对于更大的数据集使用更大的
参数值。
scale:一个缩放函数,它从由 {x_min, y_min} 定义的原点增加点空间的尺寸。这可以增加生成
origin defined by {x_min, y_min}`. This can increase the likelihood of generating
分隔良好的四叉区域的可能性。
max_depth:为了使四叉树有效索引点,其深度必须与点数量的大小成对数比例。四叉树的每一层包含 4 个四叉区域。用于索引的可用四叉区域数量 \(q\) 等于 \(q = 4^{d}\),其中 \(d\)max_depth 参数。对于输入大小为 10m 个点且 max_depth = 7,最有效地打包到四叉树叶子中的点数量将是 \(\frac{10^6}{4^7}\)
quad tree contains 4 quads. The number of available quads \(q\) for indexing is then
equal to \(q = 4^{d}\) where \(d\) is the max_depth parameter. With an input size
of 10m points and max_depth = 7`, \(\frac{10^6}{4^7}\) points will be most
efficiently packed into the leaves of the quad tree.

max_size:内部节点在分裂成四个叶节点之前允许的最大点数。随着四叉树的生成,添加点时会创建一个包含可用索引点的叶节点。如果该叶节点中的点数超过 max_size,则该叶节点将被细分,添加四个新的叶节点,并从叶节点集合中移除原始节点。在大多数数据集中,将此数字设置为上述最佳叶节点大小计算结果的显著一部分可能会优化性能。考虑 \(10,000,000 / 4^7 / 4 = 153\)

cuspatial.quadtree_on_points#

[28]:
x_points = (cupy.random.random(10000000) - 0.5) * 360
y_points = (cupy.random.random(10000000) - 0.5) * 180
xy = cudf.DataFrame({"x": x_points, "y": y_points}).interleave_columns()
points = cuspatial.GeoSeries.from_points_xy(xy)

scale = 5
max_depth = 7
max_size = 125
point_indices, quadtree = cuspatial.quadtree_on_points(points,
                                                       x_points.min(),
                                                       x_points.max(),
                                                       y_points.min(),
                                                       y_points.max(),
                                                       scale,
                                                       max_depth,
                                                       max_size)
print(point_indices.head())
print(quadtree.head())
0     1507
1     1726
2     4242
3     7371
4    11341
dtype: uint32
   key  level  is_internal_node  length  offset
0    0      0              True       4       2
1    1      0              True       2       6
2    0      1              True       4       8
3    1      1              True       4      12
4    2      1              True       2      16

索引空间连接#

四叉树空间索引(point_indicesquadtree)被 quadtree_point_in_polygonquadtree_point_to_nearest_linestring 用于加速大型空间连接。quadtree_point_in_polygon 依赖于此处使用以下函数计算的一些中间结果。

cuspatial.join_quadtree_and_bounding_boxes#

cuspatial.quadtree_point_in_polygon#

[29]:
polygons = gpu_dataframe['geometry']

poly_bboxes = cuspatial.polygon_bounding_boxes(
    polygons
)
intersections = cuspatial.join_quadtree_and_bounding_boxes(
    quadtree,
    poly_bboxes,
    polygons.polygons.x.min(),
    polygons.polygons.x.max(),
    polygons.polygons.y.min(),
    polygons.polygons.y.max(),
    scale,
    max_depth
)
polygons_and_points = cuspatial.quadtree_point_in_polygon(
    intersections,
    quadtree,
    point_indices,
    points,
    polygons
)
print(polygons_and_points.head())
Empty DataFrame
Columns: [polygon_index, point_index]
Index: []
您可以在上面看到多边形 270 对应于前 5 个点。为了将其映射回原始数据帧的特定行,
必须将各个多边形映射回其原始的 MultiPolygon 行。
这留作练习。

cuspatial.quadtree_point_to_nearest_linestring#

cuspatial.quadtree_point_to_nearest_linestring 可用于从另一组混合几何图形中找到距离一组点最近的 Polygon 或 Linestring。
nearest to a set of points from another set of mixed geometries.
[30]:
gpu_countries = cuspatial.from_geopandas(countries[countries['geometry'].type == "Polygon"])
gpu_cities = cuspatial.from_geopandas(cities[cities['geometry'].type == 'Point'])
[31]:
polygons = gpu_countries['geometry'].polygons

boundaries = cuspatial.GeoSeries.from_linestrings_xy(
    cudf.DataFrame({"x": polygons.x, "y": polygons.y}).interleave_columns(),
    polygons.ring_offset,
    cupy.arange(len(polygons.ring_offset))
)

point_indices, quadtree = cuspatial.quadtree_on_points(gpu_cities['geometry'],
                                                       polygons.x.min(),
                                                       polygons.x.max(),
                                                       polygons.y.min(),
                                                       polygons.y.max(),
                                                       scale,
                                                       max_depth,
                                                       max_size)
poly_bboxes = cuspatial.linestring_bounding_boxes(
    boundaries,
    2.0
)
intersections = cuspatial.join_quadtree_and_bounding_boxes(
    quadtree,
    poly_bboxes,
    polygons.x.min(),
    polygons.x.max(),
    polygons.y.min(),
    polygons.y.max(),
    scale,
    max_depth
)
result = cuspatial.quadtree_point_to_nearest_linestring(
    intersections,
    quadtree,
    point_indices,
    gpu_cities['geometry'],
    boundaries
)
print(result.head())
Empty DataFrame
Columns: [point_index, linestring_index, distance]
Index: []

图片经维基百科知识共享许可使用

[ ]: