NVIDIA confidential NIM deployment

Mon, 01 Jan 0001 00:00:00 +0000

This example adapts an NVIDIA NIM inference deployment on Kubernetes to run with Confidential Containers. This particular scenario targets one AMD SEV-SNP Kubernetes worker node with NVIDIA GPU confidential computing support. The same NIM deployment pattern can be adapted to Intel TDX nodes, but the reference values and attestation policy must be generated for TDX rather than SNP. Those TDX-specific steps are out of scope for this exercise.

NVIDIA NIM is a set of inference microservices that package foundation models as containers with optimized runtimes and HTTP APIs for GPU infrastructure. This example starts with a plain NIM Pod manifest for the nvcr.io/nim/meta/llama-3.1-8b-instruct:1.13.1 image, which serves the Meta Llama 3.1 8B Instruct model through a chat completions API. The optional baseline step runs that manifest with the non-confidential kata-qemu-nvidia-gpu runtime class and queries its health, model list, and chat completion endpoints on port 8000. The confidential scenario uses the kata-qemu-nvidia-gpu-snp runtime class which moves the Pod into a confidential VM, but the change alone is not sufficient: A secure deployment also needs Trustee’s Key Broker Service (KBS), guest pull, Attestation Agent (AA) and Confidential Data Hub (CDH) configuration, sealed secrets, image signature policy, a generated Kata agent policy, trusted storage, and a KBS policy that approves the expected CPU, GPU, and initdata evidence. The checkpoints below add those pieces one at a time.

Nim on Confidential Containers

NVIDIA confidential NIM deployment