硬件资源与测试
资源分配
CTest 资源分配框架允许测试指定它们所需的硬件资源,并允许项目指定可用的特定本地/机器资源。结合起来,这确保了测试被告知应该使用哪些特定资源,并确保无论请求的测试并行级别如何,都不会发生超额订阅。
要使测试使用 CTest 资源分配,需要以下组件。
一个 JSON 格式的每机器资源规范文件
CTEST_RESOURCE_SPEC_FILE
指向 JSON 文件每个
add_test()
通过测试属性记录其所需的资源每个测试读取相关的环境变量以确定应该使用哪些特定资源
这些要求很高,每个项目都需要大量的基础设施设置。此外,CTest 资源分配规范非常宽松,允许它表示任意要求,例如 CPU、GPU 和 ASIC。
rapids_test
为了帮助 RAPIDS 项目在运行测试时利用机器上的所有 GPU,rapids-cmake
项目提供了一套简化流程的命令。这些命令简化了 GPU 检测、设置资源规范文件、指定测试要求以及设置活跃的 CUDA GPU。
机器 GPU 检测
CTest 资源分配的关键组成部分是准确表示开发者机器上存在的硬件。rapids_test_init()
函数将执行系统内省,以确定当前机器上的 GPU 数量,并生成表示这些 GPU 的资源分配 JSON 文件。
include(${CMAKE_BINARY_DIR}/RAPIDS.cmake)
include(rapids-test)
enable_testing()
rapids_test_init()
CTest 资源分配规范不仅限于将 GPU 表示为单一单元。相反,它允许 JSON 文件指定每个 GPU 的容量(插槽)。在 rapids-cmake 的情况下,我们始终将每个 GPU 表示为拥有 100 个插槽,从而允许项目在计算要求时以总百分比来考虑。
指定测试 GPU 要求
如上所述,每个 CMake 测试都需要指定所需的 GPU 资源,以便 CTest 可以在给定的 CTest 并行级别下正确地划分 GPU。对于开发者来说,最简单的途径是使用 rapids_test_add()
,它将每个执行包装在一个包装脚本中,该脚本设置 CUDA 可见设备,使得测试只能看到分配的设备。
例如,下面我们有三个测试,其中两个可以同时在同一个 GPU 上运行,另一个则需要一个完整的 GPU。这种规范将允许所有三个测试在机器拥有 2 个或更多 GPU 时并行运行,而无需修改测试!
include(rapids-test)
enable_testing()
rapids_test_init()
add_executable( cuda_test test.cu )
rapids_test_add(NAME test_small_alloc COMMAND cuda_test 50 GPUS 1 PERCENT 10)
rapids_test_add(NAME test_medium_alloc COMMAND cuda_test 100 GPUS 1 PERCENT 20)
rapids_test_add(NAME test_very_large_alloc COMMAND cuda_test 10000 GPUS 1)
多 GPU 测试
rapids_test_add()
命令还支持需要多个 GPU 绑定的测试。在这种情况下,您需要请求两个(或更多)具有完整分配的 GPU,如下所示
include(rapids-test)
enable_testing()
rapids_test_init()
add_executable( cuda_test test.cu )
rapids_test_add(NAME multi_gpu COMMAND cuda_test GPUS 3)
由于 CTest 的分配方式,如果您需要不同的 GPU,则需要请求 51% 或更高的百分比。否则,有可能多个分配会被放在同一个 GPU 上。
当 rapids-cmake 测试包装器不足时
有时使用包装脚本的方法不足,通常是由于使用了现有的测试包装器。
如上所述,每个 CMake 测试仍然需要指定其所需的 GPU 资源,以便 CTest 可以在给定的 CTest 并行级别下正确地划分 GPU。但在这些情况下,测试本身需要解析 CTest 环境变量以提取它们应该在哪些 GPU 上运行。
对于 CMake 端,您可以使用 rapids_test_gpu_requirements()
来指定要求
include(rapids-test)
enable_testing()
rapids_test_init()
add_executable( cuda_test test.cu )
target_link_libraries( cuda_test PRIVATE RAPIDS::test )
add_test(NAME test_small_alloc COMMAND cuda_test 50)
rapids_test_gpu_requirements(test_small_alloc GPUS 1 PERCENT 10)
现在在 C++ 代码中,您需要解析相关的 CTEST_RESOURCE_GROUP
环境变量。为了简化此过程,这里有一些辅助 C++ 代码可以为您完成繁重的工作
/*
* Copyright (c) 2022-2023, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* https://apache.ac.cn/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#pragma once
#include <cuda_runtime_api.h>
#include <vector>
namespace rapids_cmake {
/*
* Represents a GPU Allocation provided by a CTest resource specification.
*
* The `device_id` maps to the CUDA gpu id required by `cudaSetDevice`.
* The slots represent the percentage of the GPU that this test will use.
* Primarily used by CTest to ensure proper load balancing of tests.
*/
struct GPUAllocation {
int device_id;
int slots;
};
/*
* Returns true when a CTest resource specification has been specified.
*
* Since the vast majority of tests should execute without a CTest resource
* spec (e.g. when executed manually by a developer), callers of `rapids_cmake`
* should first ensure that a CTestresource spec file has been provided before
* trying to query/bind to the allocation.
*
* ```cxx
* if (rapids_cmake::using_resouces()) {
* rapids_cmake::bind_to_first_gpu();
* }
* ```
*/
bool using_resources();
/*
* Returns all GPUAllocations allocated for a test
*
* To support multi-GPU tests the CTest resource specification allows a
* test to request multiple GPUs. As CUDA only allows binding to a
* single GPU at any time, this API allows tests to know what CUDA
* devices they should bind to.
*
* Note: The `device_id` of each allocation might not be unique.
* If a test says it needs 50% of two GPUs, it could be allocated
* the same physical GPU. If a test needs distinct / unique devices
* it must request 51%+ of a device.
*
* Note: rapids_cmake does no caching, so this query should be cached
* instead of called multiple times.
*/
std::vector<GPUAllocation> full_allocation();
/*
* Have CUDA bind to a given GPUAllocation
*
* Have CUDA bind to the `device_id` specified in the CTest
* GPU allocation
*
* Note: Return value is the cudaError_t of `cudaSetDevice`
*/
cudaError_t bind_to_gpu(GPUAllocation const& alloc);
/*
* Convenience method to bind to the first GPU that CTest has allocated
* Provided as most RAPIDS tests only require a single GPU
*
* Will return `false` if no GPUs have been allocated, or if setting
* the CUDA device failed for any reason.
*/
bool bind_to_first_gpu();
} // namespace rapids_cmake
/*
* Copyright (c) 2022-2023, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* https://apache.ac.cn/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#include <rapids_cmake_ctest_allocation.hpp>
#include <cuda_runtime_api.h>
#include <algorithm>
#include <cstdlib>
#include <numeric>
#include <string>
#include <string_view>
namespace rapids_cmake {
namespace {
GPUAllocation noGPUAllocation() { return GPUAllocation{-1, -1}; }
GPUAllocation parseCTestAllocation(std::string_view env_variable)
{
std::string gpu_resources{std::getenv(env_variable.begin())};
// need to handle parseCTestAllocation variable being empty
// need to handle parseCTestAllocation variable not having some
// of the requested components
// The string looks like "id:<number>,slots:<number>"
auto id_start = gpu_resources.find("id:") + 3;
auto id_end = gpu_resources.find(",");
auto slot_start = gpu_resources.find("slots:") + 6;
auto id = gpu_resources.substr(id_start, id_end - id_start);
auto slots = gpu_resources.substr(slot_start);
return GPUAllocation{std::stoi(id), std::stoi(slots)};
}
std::vector<GPUAllocation> determineGPUAllocations()
{
std::vector<GPUAllocation> allocations;
const auto* resource_count = std::getenv("CTEST_RESOURCE_GROUP_COUNT");
if (!resource_count) {
allocations.emplace_back();
return allocations;
}
const auto resource_max = std::stoi(resource_count);
for (int index = 0; index < resource_max; ++index) {
std::string group_env = "CTEST_RESOURCE_GROUP_" + std::to_string(index);
std::string resource_group{std::getenv(group_env.c_str())};
std::transform(resource_group.begin(), resource_group.end(), resource_group.begin(), ::toupper);
if (resource_group == "GPUS") {
auto resource_env = group_env + "_" + resource_group;
auto&& allocation = parseCTestAllocation(resource_env);
allocations.emplace_back(allocation);
}
}
return allocations;
}
} // namespace
bool using_resources()
{
const auto* resource_count = std::getenv("CTEST_RESOURCE_GROUP_COUNT");
return resource_count != nullptr;
}
std::vector<GPUAllocation> full_allocation() { return determineGPUAllocations(); }
cudaError_t bind_to_gpu(GPUAllocation const& alloc) { return cudaSetDevice(alloc.device_id); }
bool bind_to_first_gpu()
{
if (using_resources()) {
std::vector<GPUAllocation> allocs = determineGPUAllocations();
return (bind_to_gpu(allocs[0]) == cudaSuccess);
}
return false;
}
} // namespace rapids_cmake