KServe#

KServe 是一个为 Kubernetes 构建的标准模型推理平台。它为多个机器学习框架提供了统一的接口。在本页面中，我们将向您展示如何使用 KServe 部署 RAPIDS 模型。

注意

这些说明是针对运行在 Kubernetes v1.21 上的 KServe v0.10 进行测试的。

设置具有 GPU 访问权限的 Kubernetes 集群#

首先，您应该设置一个可以访问 NVIDIA GPU 的 Kubernetes 集群。请访问云部分获取指导。

安装 KServe#

请访问 KServe 入门以在您的 Kubernetes 集群中安装 KServe。如果您是初次使用，我们推荐使用该页面中提供的“快速入门”脚本（quick_install.sh）。另一方面，如果您正在设置生产级系统，请遵循管理指南中的说明。

设置第一个 InferenceService#

安装 KServe 后，请访问第一个 InferenceService 以快速设置第一个推理端点。（该示例使用 scikit-learn 的支持向量机来分类 Iris 数据集。）仔细按照所有步骤操作，确保一切正常。特别是，您应该能够使用 cURL 提交推理请求。

使用 Triton-FIL 设置 InferenceService#

Triton 推理服务器的 FIL 后端（简称 Triton-FIL）是一个针对多种树模型优化的推理运行时，包括：XGBoost、LightGBM、scikit-learn 和 cuML RandomForest。我们可以将 Triton-FIL 与 KServe 一起使用，来服务任何树模型。

以下 manifest 使用 Triton-FIL 设置了一个推理端点

# triton-fil.yaml
apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
  name: triton-fil
spec:
  predictor:
    triton:
      storageUri: gs://path-to-gcloud-storage-bucket/model-directory
      runtimeVersion: 22.12-py3

其中 model-directory 设置了以下文件层级

model-directory/
\__ model/
   \__ config.pbtxt
   \__ 1/
      \__ [model file goes here]

其中 config.pbtxt 包含了 Triton-FIL 后端的配置。下面是一个典型的 config.pbtxt 示例，其中穿插了 # 注释进行解释。在使用前，请务必删除 # 注释并填写空白。

backend: "fil"
max_batch_size: 32768
input [
  {
    name: "input__0"
    data_type: TYPE_FP32
    dims: [ ___ ]   # Number of features (columns) in the training data
  }
]
output [
 {
    name: "output__0"
    data_type: TYPE_FP32
    dims: [ 1 ]
  }
]

instance_group [{ kind: KIND_AUTO }]
    # Triton-FIL will intelligently choose between CPU and GPU

parameters [
  {
    key: "model_type"
    value: { string_value: "_____" }
      # Can be "xgboost", "xgboost_json", "lightgbm", or "treelite_checkpoint"
      # See subsections for examples
  },
  {
    key: "output_class"
    value: { string_value: "____" }
      # true (if classifier), or false (if regressor)
  },
  {
    key: "threshold"
    value: { string_value: "0.5" }
      # Threshold for predicing the positive class in a binary classifier
  }
]

dynamic_batching {}

下面我们将向您展示具体的示例。但首先需要一些一般性说明

负载 JSON 将与第一个 InferenceService 示例不同

{
  "inputs" : [
    {
      "name" : "input__0",
      "shape" : [ 1, 6 ],
      "datatype" : "FP32",
      "data" : [0, 0, 0, 0, 0, 0]
  ],
  "outputs" : [
    {
      "name" : "output__0",
      "parameters" : { "classification" : 2 }
    }
  ]
}

Triton-FIL 使用 KServe 协议的 v2 版本，因此在发送推理请求时请确保使用 v2 URL

$ INGRESS_HOST=$(kubectl -n istio-system get service istio-ingressgateway \
  -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
$ INGRESS_PORT=$(kubectl -n istio-system get service istio-ingressgateway \
  -o jsonpath='{.spec.ports[?(@.name=="http2")].port}')
$ SERVICE_HOSTNAME=$(kubectl get inferenceservice <endpoint name> -n kserve-test \
  -o jsonpath='{.status.url}' | cut -d "/" -f 3)

$ curl -v -H "Host: ${SERVICE_HOSTNAME}" -H "Content-Type: application/json" \
  "http://${INGRESS_HOST}:${INGRESS_PORT}/v2/models/<endpoint name>/infer" \
  -d @./payload.json

XGBoost#

要部署 XGBoost 模型，请使用 JSON 格式保存

import xgboost as xgb

clf = xgb.XGBClassifier(...)
clf.fit(X, y)
clf.save_model("my_xgboost_model.json")  # Note the .json extension

将模型文件重命名为 xgboost.json，这是 Triton-FIL 使用的约定。将模型文件移动到模型目录后，目录应如下所示

model-directory/
\__ model/
   \__ config.pbtxt
   \__ 1/
      \__ xgboost.json

在 config.pbtxt 中，设置 model_type="xgboost_json"。

cuML RandomForest#

要部署 cuML 随机森林模型，请将其保存为 Treelite checkpoint 文件

from cuml.ensemble import RandomForestClassifier as cumlRandomForestClassifier

clf = cumlRandomForestClassifier(...)
clf.fit(X, y)
clf.convert_to_treelite_model().to_treelite_checkpoint("./checkpoint.tl")

将 checkpoint 文件重命名为 checkpoint.tl，这是 Triton-FIL 使用的约定。将模型文件移动到模型目录后，目录应如下所示

model-directory/
\__ model/
   \__ config.pbtxt
   \__ 1/
      \__ checkpoint.tl

配置 Triton-FIL#

Triton-FIL 提供了许多配置选项，我们只向您展示了其中一部分。请访问 FIL 后端模型配置查看其余选项。