Verifying Installation
To verify whether the expected hardware is working with the i915 driver, check the display hardware connected to your system:
hwinfo --display
On SLES, if hwinfo
is installed in /usr/sbin
and not in the default user path, run it using the following command:
/usr/sbin/hwinfo --display
Example output for Intel® Data Center GPU Max 1550 (device ID 0x0BD5)
51: PCI 8c00.0: 0380 Display controller
[Created at pci.386]
Unique ID: JefI.QAjErpDk4H4
Parent ID: juVd.xbjkZcxCQYD
SysFS ID: /devices/pci0000:89/0000:89:02.0/0000:8a:00.0/0000:8b:01.0/0000:8c:00.0
SysFS BusID: 0000:8c:00.0
Hardware Class: graphics card
Model: "Intel Display controller"
Vendor: pci 0x8086 "Intel Corporation"
Device: pci 0x0bd5
SubVendor: pci 0x8086 "Intel Corporation"
SubDevice: pci 0x0000
Revision: 0x2f
Driver: "i915"
Driver Modules: "i915"
Memory Range: 0x23fe7e000000-0x23fe7fffffff (ro,non-prefetchable)
Memory Range: 0x236000000000-0x237fffffffff (ro,non-prefetchable)
IRQ: 138 (447 events)
Module Alias: "pci:v00008086d00000BD5sv00008086sd00000000bc03sc80i00"
Driver Info #0:
Driver Status: i915 is active
Driver Activation Cmd: "modprobe i915"
Config Status: cfg=new, avail=yes, need=no, active=unknown
Attached to: #26 (PCI bridge)
Diagnosing the installed GPU using the XPU manager
The Intel® XPU Manager (Intel® XPUM) tool helps with system administration, GPU monitoring, diagnostics, and configuration for Intel Data Center GPUs. You can use it in full-featured mode with a RESTful API as well as via the simplified XPU System Management Interface (XPU-SMI) tool. The following examples present commands that can help you get more information about your GPU installation.
Getting information about the available GPU
$ xpu-smi discovery
+-----------+--------------------------------------------------------------------------------------+
| Device ID | Device Information |
+-----------+--------------------------------------------------------------------------------------+
| 0 | Device Name: Intel(R) Data Center GPU Flex 170 |
| | Vendor Name: Intel(R) Corporation |
| | UUID: 00000000-0000-0000-6769-df256e271362 |
| | PCI BDF Address: 0000:4d:00.0 |
| | DRM Device: /dev/dri/card1 |
| | Function Type: physical |
+-----------+--------------------------------------------------------------------------------------+
Getting information about the available GPU, including installed driver and firmware versions
$ sudo xpu-smi discovery -d 0
+-----------+--------------------------------------------------------------------------------------+
| Device ID | Device Information |
+-----------+--------------------------------------------------------------------------------------+
| 0 | Device Type: GPU |
| | Device Name: Intel(R) Data Center GPU Flex 170 |
| | Vendor Name: Intel(R) Corporation |
| | UUID: 00000000-0000-0000-6769-df256e271362 |
| | Serial Number: LQAC13401787 |
| | Core Clock Rate: 2050 MHz |
| | Stepping: C0 |
| | |
| | Driver Version: I915_23.4.15_PSB_230307.15 |
| | Kernel Version: 5.15.0-47-generic |
| | GFX Firmware Name: GFX |
| | GFX Firmware Version: DG02_1.3267 |
| | GFX Firmware Status: normal |
| | GFX Data Firmware Name: GFX_DATA |
| | GFX Data Firmware Version: 0x46b |
| | GFX PSC Firmware Name: GFX_PSCBIN |
| | GFX PSC Firmware Version: |
| | AMC Firmware Name: AMC |
| | AMC Firmware Version: |
| | |
| | PCI BDF Address: 0000:4d:00.0 |
| | PCI Slot: J37 - Riser 1, Slot 1 |
| | PCIe Generation: 4 |
| | PCIe Max Link Width: 16 |
| | OAM Socket ID: |
| | |
| | Memory Physical Size: 14248.00 MiB |
| | Max Mem Alloc Size: 4095.99 MiB |
| | ECC State: enabled |
| | Number of Memory Channels: 2 |
| | Memory Bus Width: 128 |
| | Max Hardware Contexts: 65536 |
| | Max Command Queue Priority: 0 |
| | |
| | Number of EUs: 512 |
| | Number of Tiles: 1 |
| | Number of Slices: 1 |
| | Number of Sub Slices per Slice: 32 |
| | Number of Threads per EU: 8 |
| | Physical EU SIMD Width: 8 |
| | Number of Media Engines: 2 |
| | Number of Media Enhancement Engines: 2 |
| | |
| | Number of Xe Link ports: |
| | Max Tx/Rx Speed per Xe Link port: |
| | Number of Lanes per Xe Link port: |
+-----------+--------------------------------------------------------------------------------------+
Enabling GPU telemetry
$sudo xpu-smi stats -d 0
+-----------------------------+--------------------------------------------------------------------+
| Device ID | 0 |
+-----------------------------+--------------------------------------------------------------------+
| GPU Utilization (%) | 0 |
| EU Array Active (%) | |
| EU Array Stall (%) | |
| EU Array Idle (%) | |
| | |
| Compute Engine Util (%) | 0; Engine 0: 0, Engine 1: 0, Engine 2: 0, Engine 3: 0 |
| Render Engine Util (%) | 0; Engine 0: 0 |
| Media Engine Util (%) | 0 |
| Decoder Engine Util (%) | Engine 0: 0, Engine 1: 0 |
| Encoder Engine Util (%) | Engine 0: 0, Engine 1: 0 |
| Copy Engine Util (%) | 0; Engine 0: 0 |
| Media EM Engine Util (%) | Engine 0: 0, Engine 1: 0 |
| 3D Engine Util (%) | |
+-----------------------------+--------------------------------------------------------------------+
| Reset | |
| Programming Errors | |
| Driver Errors | |
| Cache Errors Correctable | |
| Cache Errors Uncorrectable | |
| Mem Errors Correctable | |
| Mem Errors Uncorrectable | |
+-----------------------------+--------------------------------------------------------------------+
| GPU Power (W) | 44 |
| GPU Frequency (MHz) | 2050 |
| GPU Core Temperature (C) | 40 |
| GPU Memory Temperature (C) | |
| GPU Memory Read (kB/s) | 1346 |
| GPU Memory Write (kB/s) | 286 |
| GPU Memory Bandwidth (%) | 0 |
| GPU Memory Used (MiB) | 26 |
| Xe Link Throughput (kB/s) | |
+-----------------------------+--------------------------------------------------------------------+
For more information on Intel® XPUM, see Intel® XPUM overview or XPU System Management Interface user guide.
Smoke testing the compute stack
Use the following command to smoke test the compute stack:
clinfo | head -n 5
Running the same command without head
displays multiple pages of GPGPU compute capability summary.
Example output
Number of platforms 1
Platform Name Intel(R) OpenCL HD Graphics
Platform Vendor Intel(R) Corporation
Platform Version OpenCL 3.0
Platform Profile FULL_PROFILE
Smoke testing the media stack
Use the following command to smoke test the media stack for the Data Center GPU Flex series:
vainfo
Intel® Data Center GPU Max Series does not include codec capabilities, so the expected output has minimal entry points. Intel® Data Center GPU Flex Series and client GPUs provide hardware codecs, so many entry points are expected from vainfo output. See the following examples for both GPU series.
Example output
Intel® Data Center GPU Max Series:
vainfo: VA-API version: 1.18 (libva 2.17.0)
vainfo: Driver version: Intel iHD driver for Intel(R) Gen Graphics - 23.1.4 (12e141d)
vainfo: Supported profile and entrypoints
VAProfileNone : VAEntrypointVideoProc
VAProfileNone : VAEntrypointStats
Intel® Data Center GPU Flex Series:
vainfo: VA-API version: 1.18 (libva 2.17.0)
vainfo: Driver version: Intel iHD driver for Intel(R) Gen Graphics - 23.1.4 (12e141d)
vainfo: Supported profile and entrypoints
VAProfileNone : VAEntrypointVideoProc
VAProfileNone : VAEntrypointStats
VAProfileMPEG2Simple : VAEntrypointVLD
VAProfileMPEG2Main : VAEntrypointVLD
VAProfileH264Main : VAEntrypointVLD
VAProfileH264Main : VAEntrypointEncSliceLP
VAProfileH264High : VAEntrypointVLD
VAProfileH264High : VAEntrypointEncSliceLP
VAProfileJPEGBaseline : VAEntrypointVLD
VAProfileJPEGBaseline : VAEntrypointEncPicture
VAProfileH264ConstrainedBaseline: VAEntrypointVLD
VAProfileH264ConstrainedBaseline: VAEntrypointEncSliceLP
VAProfileHEVCMain : VAEntrypointVLD
VAProfileHEVCMain : VAEntrypointEncSliceLP
VAProfileHEVCMain10 : VAEntrypointVLD
VAProfileHEVCMain10 : VAEntrypointEncSliceLP
VAProfileVP9Profile0 : VAEntrypointVLD
VAProfileVP9Profile0 : VAEntrypointEncSliceLP
VAProfileVP9Profile1 : VAEntrypointVLD
VAProfileVP9Profile1 : VAEntrypointEncSliceLP
VAProfileVP9Profile2 : VAEntrypointVLD
VAProfileVP9Profile2 : VAEntrypointEncSliceLP
VAProfileVP9Profile3 : VAEntrypointVLD
VAProfileVP9Profile3 : VAEntrypointEncSliceLP
VAProfileHEVCMain12 : VAEntrypointVLD
VAProfileHEVCMain422_10 : VAEntrypointVLD
VAProfileHEVCMain422_12 : VAEntrypointVLD
VAProfileHEVCMain444 : VAEntrypointVLD
VAProfileHEVCMain444 : VAEntrypointEncSliceLP
VAProfileHEVCMain444_10 : VAEntrypointVLD
VAProfileHEVCMain444_10 : VAEntrypointEncSliceLP
VAProfileHEVCMain444_12 : VAEntrypointVLD
VAProfileHEVCSccMain : VAEntrypointVLD
VAProfileHEVCSccMain : VAEntrypointEncSliceLP
VAProfileHEVCSccMain10 : VAEntrypointVLD
VAProfileHEVCSccMain10 : VAEntrypointEncSliceLP
VAProfileHEVCSccMain444 : VAEntrypointVLD
VAProfileHEVCSccMain444 : VAEntrypointEncSliceLP
VAProfileAV1Profile0 : VAEntrypointVLD
VAProfileAV1Profile0 : VAEntrypointEncSliceLP
VAProfileHEVCSccMain444_10 : VAEntrypointVLD
VAProfileHEVCSccMain444_10 : VAEntrypointEncSliceLP
Verifying the usage of the Virtual Special Engine Capability (VSEC) module
To access the full range of Intel® Data Center GPU Max Series telemetry features, you need to use the intel_vsec module instead of intel_pmt. The intel_vsec module supports Max telemetry features while intel_pmt focuses on CPU telemetry.
To check whether the VSEC change is needed, review the output of the xpu-smi discovery -d 0
command. If the serial number is unknown, there may be a VSEC issue for the device serial number.
In that case, follow this procedure to check and modify the used kernel driver module.
Use the following command to check whether the intel_vsec module loads and is associated with a PCI device.
for d in 8086:09A7 8086:4F93 8086:4F95; do sudo lspci -k -d $d; done
The correct output should like in the following example:
05:00.0 Memory controller: Intel Corporation Device 09A7 Kernel driver in use: intel-vsec Kernel modules: intel_vsec
If intel_pmt is used as a kernel driver instead of intel-vsec, proceed to the next steps to change the kernel driver.
Install the driverctl tool:
sudo dnf install driverctl
A driverctl package is not available for SUSE Linux Enterprise Server 15. Instead, install it from the driverctl repository.
git clone https://gitlab.com/driverctl/driverctl.git cd driverctl sudo make install
sudo apt install driverctl
Check which device the intel-pmt module is linked to.
sudo driverctl list-devices | grep -iE "pmt"
The expected output is
0000:8e:00.1 intel-pmt
, but you may see a different device address than 0000:8e:00.1.Override the default driver binding using the retrieved system’s device address.
sudo driverctl set-override 0000:8e:00.1 "intel_vsec"
Verifying Integrated Firmware Image (IFWI)
Use the Intel® XPUM tool to flash IFWI onto a Flex or Max GPU.
Check GFX firmware version for each GPU.
sudo xpu-smi discovery -d 0 sudo xpu-smi discovery -d 1
Check the latest firmware version for your hardware from your Intel or OEM portal and compare it with the version currently installed on your device. If the latest firmware version is newer than the one on your device, install the new firmware.
sudo xpu-smi updatefw -d 0 -t GFX -f /home/intel/ATS_M75_128_B0_PVT_ES_017_gfx_fwupdate_SOC2.bin -y sudo xpu-smi updatefw -d 0 -t GFX_PSCBIN -f /home/test/PVC_Tuscany_oam_cbb_otf_53G_220803.pscbin sudo xpu-smi updatefw -d 0 -t GFX -f /home/test/PVC.Fwupdate_Prod_2023.WW26.3_Tuscany_Pcie.bin
Update firmware options.
sudo xpu-smi updatefw Update GPU firmware Usage: xpu-smi updatefw [Options] xpu-smi updatefw -d [deviceId] -t GFX -f [imageFilePath] xpu-smi updatefw -d [pciBdfAddress] -t GFX -f [imageFilePath] Options: -h,--help Print this help message and exit -j,--json Print result in JSON format -d,--device The device ID or PCI BDF address -t,--type The firmware name. Valid options: GFX, GFX_DATA, GFX_CODE_DATA, GFX_PSCBIN, AMC. AMC firmware update just works on Intel M50CYP server (BMC firmware version is 2.82 or newer) and Supermicro SYS-620C-TN12R server (BMC firmware version is 11.01 or newer). -f,--file The firmware image file path on this server -u,--username Username used to authenticate for host redfish access -p,--password Password used to authenticate for host redfish access -y,--assumeyes Assume that the answer to any question which would be asked is yes --force Force GFX firmware update. This parameter only works for GFX firmware.