This is part 3 of a series of blog articles on the subject of using GPUs with VMware vSphere.
Part 1 of this series presents an overview of the various options for using GPUs on vSphere
Part 2 describes the DirectPath I/O (Passthrough) mechanism for GPUs
Part 3 gives details on setting up the NVIDIA Virtual GPU (vGPU) technology for GPUs on vSphere
Part 4 explores the setup for the Bitfusion Flexdirect method of using GPUs
In this article, we describe the NVIDIA vGPU (formerly “Grid”) method for using GPU devices on vSphere. The focus in this blog is on the use of GPUs for compute workloads (such as for machine learning, deep learning and high performance computing applications) and we are not looking at GPU usage for virtual desktop infrastructure (VDI) here.
The method of GPU usage on vSphere described here makes use of the products within the NVIDIA vGPU family. This vGPU family includes the “NVIDIA Virtual ComputeServer”, (VCS) and the “NVIDIA Quadro Virtual Datacenter Workstation” (vDWS) products for GPU access and management on vSphere, as well as other products. Here, we use the term “NVIDIA vGPU” as a synonym for the software product you choose from the vGPU family of products. NVIDIA recommends the vCS software product for machine learning and AI workloads, whereas vDWS was used for that purpose before vCS appeared on the market. These are licensed software products from NVIDIA.
Figure 1: Parts of the NVIDIA vGPU product shown in the ESXi Hypervisor and in virtual machines
Figure 1 shows the relationship of the parts of the NVIDIA vGPU product to each other in the overall vSphere and virtual machine architecture.
The NVIDIA vGPU software includes two separate components:
Using the NVIDIA vGPU technology with vSphere allows you to choose between dedicating a full GPU device to one virtual machine or to allow partial sharing of a GPU device by more than one virtual machine.
The reasons for choosing this NVIDIAvGPU option are
The released versions of the NVIDIA vGPU Manager and guest VM drivers that you install must be compatible. For all the versions of the software, versions of vSphere and the hardware versions, consult the current NVIDIA Release Notes. At the time of this writing, the NVIDIA vGPU release notes are located here.
The vSphere ESXi host server-specific part of the setup process is described first here. In order to set up the NVIDIA vGPU environment you will need:
VMware recommends that you choose vSphere version 6.7 for this work. Choosing vSphere 6.7 update 1 will allow you to use the vMotion feature along with your GPU-enabled VM’s. If you choose to use vSphere 6.5 then ensure you are on update 1 before proceeding.
Carefully review the pre-requisites and other details in the NVIDIA vGPU Software User Guide document
The host server part of the NVIDIA vGPU installation process makes use of a “vib install” technique that is used in vSphere for installing drivers into the ESXi hypervisor itself. For more information on using vSphere VIBs, you should check this material
The NVIDIA vGPU Manager is contained in the VIB package that is downloaded from NVIDIA’s website. The package can be found by searching for the NVIDIA Quadro Virtual DataCenter Workstation (or vDWS) products on the NVIDIA site.
To install the NVIDIA vGPU Manager software into the vSphere ESXi hypervisor follow the procedure below.
A GPU card can be configured in one of two modes: vSGA (shared virtual graphics) and vGPU. The NVIDIA card should be configured with vGPU mode. This is specifically for use of the GPU in compute workloads, such as in machine learning or high performance computing applications.
Access the ESXi host server either using the ESXi shell or through SSH. You will need to enable SSH access using the ESXi management console as SSH is disabled by default.To enable vGPU mode on the ESXi host, use the command line to execute this command:
# esxcli graphics host set –-default-type SharedPassthru
You may also get to this setting through the vSphere Client by choosing your host server and using the navigation
“Configure -> Hardware -> Graphics -> Host Graphics tab -> Edit”
A server reboot is required once the setting has been changed. The settings should appear as shown in Figure 2 below.
Figure 2: Edit Host Graphics screen for a Host Server in the vSphere Client
To check that the settings have taken using the command line, type
# esxcli graphics host get
This command should produce output as follows: