LAMMPS Workload
Follow this procedure to run the LAMMPS molecular dynamics simulator.
Operating system: Ubuntu* 22.04
Hardware: Intel® Data Center Max GPUs
Software: Intel® oneAPI Base toolkit, Intel® oneAPI HPC toolkit
Time to complete: 30 minutes
For more information, see the LAMMPS documentation.
Check whether the driver stack is installed.
$ xpu-smi discovery
The command should return at least one Intel® Data Center GPU Max device.
Check whether the oneAPI toolkit is installed.
$ apt list intel-basekit intel-hpckit
Expected output:
Listing... Done intel-basekit/all,now 2023.2.0-49384 amd64 [installed] intel-hpckit/all,now 2023.2.0-49438 amd64 [installed]
If you previously have not configured your environment, install the Ubuntu 22.04 graphics driver. See dgpu-docs for details.
Note
Access to Ubuntu repositories, such as https://repositories.intel.com and https://apt.repos.intel.com, is required for installation. If proxy settings involve changes to environment variables such as http_proxy or https_proxy, small modifications are required in the following steps, such as adding -E (preserve environment) to sudo commands.
If you previously have not configured your environment, enable access to the Intel repo serving the oneAPI packages and install the oneAPI Base toolkit and HPC toolkit for Ubuntu 22.04.
wget -O- https://apt.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS.PUB \ | gpg --dearmor | sudo tee /usr/share/keyrings/oneapi-archive-keyring.gpg > /dev/null echo "deb [signed-by=/usr/share/keyrings/oneapi-archive-keyring.gpg] https://apt.repos.intel.com/oneapi all main" | sudo tee /etc/apt/sources.list.d/oneAPI.list sudo apt update
sudo apt install -y intel-basekit intel-hpckit
Install build dependencies.
sudo apt install -y python3 python3-pip git build-essential cmake pip3 install mako pyyaml
Build LAMMPS.
git clone https://github.com/intel/compute-aggregation-layer.git cal git clone https://www.github.com/lammps/lammps lammps -b develop --depth 1 cd lammps/src source /opt/intel/oneapi/setvars.sh make yes-asphere yes-kspace yes-manybody yes-misc make yes-molecule yes-rigid yes-dpd-basic yes-gpu cd ../.. cd cal mkdir build; cd build cmake .. make -j export PATH=`pwd`:$PATH cd ../.. cd lammps/lib/gpu make -f Makefile.oneapi -j cd ../../src make oneapi -j cd ../..
Run LAMMPS.
cd cal/build export PATH=`pwd`:$PATH cd ../../lammps/src/INTEL/TEST/ #ONE-TIME RESTART FILE GENERATION FOR LIQUID CRYSTAL BENCHMARK mpirun --bootstrap ssh -np 72 ../../lmp_oneapi -in in.lc_generate_restart -log none #Environment Setup - NEO MASTER 026515 or later export I_MPI_FABRICS=shm export KMP_AFFINITY="granularity=core,scatter" export CAL_ASYNC_CALLS=1 #Run Liquid Crystal Benchmark (FOM is timesteps/sec) I_MPI_PIN_ORDER=bunch KMP_BLOCKTIME=1000 OMP_NUM_THREADS=4 I_MPI_PIN_DOMAIN=8:compact calrun mpirun \ --bootstrap ssh -np 16 ../../lmp_oneapi -v N off -in in.intel.lc -log none -pk gpu 2 -sf gpu
The following example presents an output from Intel® Data Center GPU Max 1550:
--------------------------------------------------------------------------
- Using acceleration for gayberne:
- with 8 proc(s) per device.
- with 4 thread(s) per proc.
- with OpenCL Parameters for: INTEL_GPU (500)
- Horizontal vector operations: ENABLED
- Shared memory system: No
--------------------------------------------------------------------------
Platform: Intel(R) Corporation Intel(R) OpenCL Graphics OpenCL 3.0
Device 0: Intel(R) Data Center GPU Max 1550, 448 CUs, 61 GB, 1.6 GHZ (Mixed Precision)
Device 1: Intel(R) Data Center GPU Max 1550, 448 CUs, 1.6 GHZ (Mixed Precision)
--------------------------------------------------------------------------
Initializing Device and compiling on process 0...Done.
Initializing Devices 0-1 on core 0...Done.
Initializing Devices 0-1 on core 1...Done.
Initializing Devices 0-1 on core 2...Done.
Initializing Devices 0-1 on core 3...Done.
Initializing Devices 0-1 on core 4...Done.
Initializing Devices 0-1 on core 5...Done.
Initializing Devices 0-1 on core 6...Done.
Initializing Devices 0-1 on core 7...Done.
Generated 0 of 0 mixed pair_coeff terms from geometric mixing rule
Setting up Verlet run ...
Unit style : lj
Current step : 10
Time step : 0.002
Per MPI rank memory allocation (min/avg/max) = 36.96 | 36.96 | 36.96 Mbytes
Step Temp E_pair E_mol TotEng Press
10 1.9986478 -0.38874135 0 2.6092246 7.3216782
100 1.9838512 -0.37184965 0 2.6039215 7.3625427
200 1.9866877 -0.37227117 0 2.6077547 7.3606555
300 1.9861461 -0.35463178 0 2.6245817 7.4090507
400 1.9953463 -0.37023171 0 2.622782 7.3629617
500 2.0061948 -0.38123724 0 2.6280492 7.3411057
600 1.9888506 -0.37910347 0 2.6041667 7.3414455
700 2.0001109 -0.37690831 0 2.6232523 7.3446158
800 2.0084155 -0.38964792 0 2.6229695 7.3212755
850 2.0003566 -0.3703234 0 2.6302058 7.3599885