LAMMPS Workload

Follow this procedure to run the LAMMPS molecular dynamics simulator.

Operating system: Ubuntu* 22.04
Hardware: Intel® Data Center Max GPUs
Software: Intel® oneAPI Base toolkit, Intel® oneAPI HPC toolkit
Time to complete: 30 minutes

For more information, see the LAMMPS documentation.

Check whether the driver stack is installed.
```
$ xpu-smi discovery
```
The command should return at least one Intel® Data Center GPU Max device.

Check whether the oneAPI toolkit is installed.

$ apt list intel-basekit intel-hpckit

Expected output:

Listing... Done
intel-basekit/all,now 2023.2.0-49384 amd64 [installed]
intel-hpckit/all,now 2023.2.0-49438 amd64 [installed]

If you previously have not configured your environment, install the Ubuntu 22.04 graphics driver. See dgpu-docs for details.

Note

Access to Ubuntu repositories, such as https://repositories.intel.com and https://apt.repos.intel.com, is required for installation. If proxy settings involve changes to environment variables such as http_proxy or https_proxy, small modifications are required in the following steps, such as adding -E (preserve environment) to sudo commands.

If you previously have not configured your environment, enable access to the Intel repo serving the oneAPI packages and install the oneAPI Base toolkit and HPC toolkit for Ubuntu 22.04.

wget -O- https://apt.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS.PUB \ | gpg --dearmor | sudo tee /usr/share/keyrings/oneapi-archive-keyring.gpg > /dev/null
echo "deb [signed-by=/usr/share/keyrings/oneapi-archive-keyring.gpg] https://apt.repos.intel.com/oneapi all main" | sudo  tee /etc/apt/sources.list.d/oneAPI.list
sudo apt update

sudo apt install -y intel-basekit intel-hpckit

Install build dependencies.

sudo apt install -y python3 python3-pip git build-essential cmake
pip3 install mako pyyaml

Build LAMMPS.

git clone https://github.com/intel/compute-aggregation-layer.git cal
git clone https://www.github.com/lammps/lammps lammps -b develop --depth 1
cd lammps/src
source /opt/intel/oneapi/setvars.sh
make yes-asphere yes-kspace yes-manybody yes-misc 
make yes-molecule yes-rigid yes-dpd-basic yes-gpu
cd ../..

cd cal 
mkdir build; cd build
cmake ..
make -j
export PATH=`pwd`:$PATH
cd ../..
cd lammps/lib/gpu
make -f Makefile.oneapi -j
cd ../../src
make oneapi -j 
cd ../..

Run LAMMPS.

cd cal/build
export PATH=`pwd`:$PATH
cd ../../lammps/src/INTEL/TEST/
#ONE-TIME RESTART FILE GENERATION FOR LIQUID CRYSTAL BENCHMARK
mpirun --bootstrap ssh -np 72 ../../lmp_oneapi -in in.lc_generate_restart -log none
#Environment Setup - NEO MASTER 026515 or later
export I_MPI_FABRICS=shm
export KMP_AFFINITY="granularity=core,scatter"
export CAL_ASYNC_CALLS=1

#Run Liquid Crystal Benchmark (FOM is timesteps/sec)
I_MPI_PIN_ORDER=bunch KMP_BLOCKTIME=1000 OMP_NUM_THREADS=4 I_MPI_PIN_DOMAIN=8:compact calrun mpirun \
  --bootstrap ssh -np 16 ../../lmp_oneapi -v N off -in in.intel.lc -log none -pk gpu 2 -sf gpu

The following example presents an output from Intel® Data Center GPU Max 1550:

--------------------------------------------------------------------------
- Using acceleration for gayberne:
-  with 8 proc(s) per device.
-  with 4 thread(s) per proc.
-  with OpenCL Parameters for: INTEL_GPU (500)
-  Horizontal vector operations: ENABLED
-  Shared memory system: No
--------------------------------------------------------------------------
Platform: Intel(R) Corporation Intel(R) OpenCL Graphics OpenCL 3.0
Device 0: Intel(R) Data Center GPU Max 1550, 448 CUs, 61 GB, 1.6 GHZ (Mixed Precision)
Device 1: Intel(R) Data Center GPU Max 1550, 448 CUs, 1.6 GHZ (Mixed Precision)
--------------------------------------------------------------------------

Initializing Device and compiling on process 0...Done.
Initializing Devices 0-1 on core 0...Done.
Initializing Devices 0-1 on core 1...Done.
Initializing Devices 0-1 on core 2...Done.
Initializing Devices 0-1 on core 3...Done.
Initializing Devices 0-1 on core 4...Done.
Initializing Devices 0-1 on core 5...Done.
Initializing Devices 0-1 on core 6...Done.
Initializing Devices 0-1 on core 7...Done.

Generated 0 of 0 mixed pair_coeff terms from geometric mixing rule
Setting up Verlet run ...
  Unit style    : lj
  Current step  : 10
  Time step     : 0.002
Per MPI rank memory allocation (min/avg/max) = 36.96 | 36.96 | 36.96 Mbytes
   Step          Temp          E_pair         E_mol          TotEng         Press
        10   1.9986478     -0.38874135     0              2.6092246      7.3216782
       100   1.9838512     -0.37184965     0              2.6039215      7.3625427
       200   1.9866877     -0.37227117     0              2.6077547      7.3606555
       300   1.9861461     -0.35463178     0              2.6245817      7.4090507
       400   1.9953463     -0.37023171     0              2.622782       7.3629617
       500   2.0061948     -0.38123724     0              2.6280492      7.3411057
       600   1.9888506     -0.37910347     0              2.6041667      7.3414455
       700   2.0001109     -0.37690831     0              2.6232523      7.3446158
       800   2.0084155     -0.38964792     0              2.6229695      7.3212755
       850   2.0003566     -0.3703234      0              2.6302058      7.3599885