硬件资源与测试

资源分配

CTest 资源分配框架允许测试指定它们所需的硬件资源,并允许项目指定可用的特定本地/机器资源。结合起来,这确保了测试被告知应该使用哪些特定资源,并确保无论请求的测试并行级别如何,都不会发生超额订阅。

要使测试使用 CTest 资源分配,需要以下组件。

  • 一个 JSON 格式的每机器资源规范文件

  • CTEST_RESOURCE_SPEC_FILE 指向 JSON 文件

  • 每个 add_test() 通过测试属性记录其所需的资源

  • 每个测试读取相关的环境变量以确定应该使用哪些特定资源

这些要求很高,每个项目都需要大量的基础设施设置。此外,CTest 资源分配规范非常宽松,允许它表示任意要求,例如 CPU、GPU 和 ASIC。

rapids_test

为了帮助 RAPIDS 项目在运行测试时利用机器上的所有 GPU,rapids-cmake 项目提供了一套简化流程的命令。这些命令简化了 GPU 检测、设置资源规范文件、指定测试要求以及设置活跃的 CUDA GPU。

机器 GPU 检测

CTest 资源分配的关键组成部分是准确表示开发者机器上存在的硬件。rapids_test_init() 函数将执行系统内省,以确定当前机器上的 GPU 数量,并生成表示这些 GPU 的资源分配 JSON 文件。

include(${CMAKE_BINARY_DIR}/RAPIDS.cmake)

include(rapids-test)

enable_testing()
rapids_test_init()

CTest 资源分配规范不仅限于将 GPU 表示为单一单元。相反,它允许 JSON 文件指定每个 GPU 的容量(插槽)。在 rapids-cmake 的情况下,我们始终将每个 GPU 表示为拥有 100 个插槽,从而允许项目在计算要求时以总百分比来考虑。

指定测试 GPU 要求

如上所述,每个 CMake 测试都需要指定所需的 GPU 资源,以便 CTest 可以在给定的 CTest 并行级别下正确地划分 GPU。对于开发者来说,最简单的途径是使用 rapids_test_add(),它将每个执行包装在一个包装脚本中,该脚本设置 CUDA 可见设备,使得测试只能看到分配的设备。

例如,下面我们有三个测试,其中两个可以同时在同一个 GPU 上运行,另一个则需要一个完整的 GPU。这种规范将允许所有三个测试在机器拥有 2 个或更多 GPU 时并行运行,而无需修改测试!

include(rapids-test)

enable_testing()
rapids_test_init()

add_executable( cuda_test test.cu )
rapids_test_add(NAME test_small_alloc COMMAND cuda_test 50 GPUS 1 PERCENT 10)
rapids_test_add(NAME test_medium_alloc COMMAND cuda_test 100 GPUS 1 PERCENT 20)
rapids_test_add(NAME test_very_large_alloc COMMAND cuda_test 10000 GPUS 1)

多 GPU 测试

rapids_test_add() 命令还支持需要多个 GPU 绑定的测试。在这种情况下,您需要请求两个(或更多)具有完整分配的 GPU,如下所示

include(rapids-test)

enable_testing()
rapids_test_init()

add_executable( cuda_test test.cu )
rapids_test_add(NAME multi_gpu COMMAND cuda_test GPUS 3)

由于 CTest 的分配方式,如果您需要不同的 GPU,则需要请求 51% 或更高的百分比。否则,有可能多个分配会被放在同一个 GPU 上。

当 rapids-cmake 测试包装器不足时

有时使用包装脚本的方法不足,通常是由于使用了现有的测试包装器。

如上所述,每个 CMake 测试仍然需要指定其所需的 GPU 资源,以便 CTest 可以在给定的 CTest 并行级别下正确地划分 GPU。但在这些情况下,测试本身需要解析 CTest 环境变量以提取它们应该在哪些 GPU 上运行。

对于 CMake 端,您可以使用 rapids_test_gpu_requirements() 来指定要求

include(rapids-test)

enable_testing()
rapids_test_init()

add_executable( cuda_test test.cu )
target_link_libraries( cuda_test PRIVATE RAPIDS::test )

add_test(NAME test_small_alloc COMMAND cuda_test 50)
rapids_test_gpu_requirements(test_small_alloc GPUS 1 PERCENT 10)

现在在 C++ 代码中,您需要解析相关的 CTEST_RESOURCE_GROUP 环境变量。为了简化此过程,这里有一些辅助 C++ 代码可以为您完成繁重的工作

/*
 * Copyright (c) 2022-2023, NVIDIA CORPORATION.
 *
 * Licensed under the Apache License, Version 2.0 (the "License");
 * you may not use this file except in compliance with the License.
 * You may obtain a copy of the License at
 *
 *     https://apache.ac.cn/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

#pragma once

#include <cuda_runtime_api.h>
#include <vector>

namespace rapids_cmake {

/*
 * Represents a GPU Allocation provided by a CTest resource specification.
 *
 * The `device_id` maps to the CUDA gpu id required by `cudaSetDevice`.
 * The slots represent the percentage of the GPU that this test will use.
 * Primarily used by CTest to ensure proper load balancing of tests.
 */
struct GPUAllocation {
  int device_id;
  int slots;
};

/*
 * Returns true when a CTest resource specification has been specified.
 *
 * Since the vast majority of tests should execute without a CTest resource
 * spec (e.g. when executed manually by a developer), callers of `rapids_cmake`
 * should first ensure that a CTestresource spec file has been provided before
 * trying to query/bind to the allocation.
 *
 * ```cxx
 *   if (rapids_cmake::using_resouces()) {
 *     rapids_cmake::bind_to_first_gpu();
 *   }
 * ```
 */
bool using_resources();

/*
 * Returns all GPUAllocations allocated for a test
 *
 * To support multi-GPU tests the CTest resource specification allows a
 * test to request multiple GPUs. As CUDA only allows binding to a
 * single GPU at any time, this API allows tests to know what CUDA
 * devices they should bind to.
 *
 * Note: The `device_id` of each allocation might not be unique.
 * If a test says it needs 50% of two GPUs, it could be allocated
 * the same physical GPU. If a test needs distinct / unique devices
 * it must request 51%+ of a device.
 *
 * Note: rapids_cmake does no caching, so this query should be cached
 * instead of called multiple times.
 */
std::vector<GPUAllocation> full_allocation();

/*
 * Have CUDA bind to a given GPUAllocation
 *
 * Have CUDA bind to the `device_id` specified in the CTest
 * GPU allocation
 *
 * Note: Return value is the cudaError_t of `cudaSetDevice`
 */
cudaError_t bind_to_gpu(GPUAllocation const& alloc);

/*
 * Convenience method to bind to the first GPU that CTest has allocated
 * Provided as most RAPIDS tests only require a single GPU
 *
 * Will return `false` if no GPUs have been allocated, or if setting
 * the CUDA device failed for any reason.
 */
bool bind_to_first_gpu();

}  // namespace rapids_cmake
/*
 * Copyright (c) 2022-2023, NVIDIA CORPORATION.
 *
 * Licensed under the Apache License, Version 2.0 (the "License");
 * you may not use this file except in compliance with the License.
 * You may obtain a copy of the License at
 *
 *     https://apache.ac.cn/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

#include <rapids_cmake_ctest_allocation.hpp>

#include <cuda_runtime_api.h>

#include <algorithm>
#include <cstdlib>
#include <numeric>
#include <string>
#include <string_view>

namespace rapids_cmake {

namespace {
GPUAllocation noGPUAllocation() { return GPUAllocation{-1, -1}; }

GPUAllocation parseCTestAllocation(std::string_view env_variable)
{
  std::string gpu_resources{std::getenv(env_variable.begin())};
  // need to handle parseCTestAllocation variable being empty

  // need to handle parseCTestAllocation variable not having some
  // of the requested components

  // The string looks like "id:<number>,slots:<number>"
  auto id_start   = gpu_resources.find("id:") + 3;
  auto id_end     = gpu_resources.find(",");
  auto slot_start = gpu_resources.find("slots:") + 6;

  auto id    = gpu_resources.substr(id_start, id_end - id_start);
  auto slots = gpu_resources.substr(slot_start);

  return GPUAllocation{std::stoi(id), std::stoi(slots)};
}

std::vector<GPUAllocation> determineGPUAllocations()
{
  std::vector<GPUAllocation> allocations;
  const auto* resource_count = std::getenv("CTEST_RESOURCE_GROUP_COUNT");
  if (!resource_count) {
    allocations.emplace_back();
    return allocations;
  }

  const auto resource_max = std::stoi(resource_count);
  for (int index = 0; index < resource_max; ++index) {
    std::string group_env = "CTEST_RESOURCE_GROUP_" + std::to_string(index);
    std::string resource_group{std::getenv(group_env.c_str())};
    std::transform(resource_group.begin(), resource_group.end(), resource_group.begin(), ::toupper);

    if (resource_group == "GPUS") {
      auto resource_env = group_env + "_" + resource_group;
      auto&& allocation = parseCTestAllocation(resource_env);
      allocations.emplace_back(allocation);
    }
  }

  return allocations;
}
}  // namespace

bool using_resources()
{
  const auto* resource_count = std::getenv("CTEST_RESOURCE_GROUP_COUNT");
  return resource_count != nullptr;
}

std::vector<GPUAllocation> full_allocation() { return determineGPUAllocations(); }

cudaError_t bind_to_gpu(GPUAllocation const& alloc) { return cudaSetDevice(alloc.device_id); }

bool bind_to_first_gpu()
{
  if (using_resources()) {
    std::vector<GPUAllocation> allocs = determineGPUAllocations();
    return (bind_to_gpu(allocs[0]) == cudaSuccess);
  }
  return false;
}

}  // namespace rapids_cmake