LAMMPS Workload

Follow this procedure to run the LAMMPS molecular dynamics simulator.

  • Operating system: Ubuntu* 22.04

  • Hardware: Intel® Data Center Max GPUs

  • Software: Intel® oneAPI Base toolkit, Intel® oneAPI HPC toolkit

  • Time to complete: 30 minutes

For more information, see the LAMMPS documentation.

  1. Check whether the driver stack is installed.

    $ xpu-smi discovery
    

    The command should return at least one Intel® Data Center GPU Max device.

  2. Check whether the oneAPI toolkit is installed.

    $ apt list intel-basekit intel-hpckit
    

    Expected output:

    Listing... Done
    intel-basekit/all,now 2023.2.0-49384 amd64 [installed]
    intel-hpckit/all,now 2023.2.0-49438 amd64 [installed]
    
  3. If you previously have not configured your environment, install the Ubuntu 22.04 graphics driver. See dgpu-docs for details.

    Note

    Access to Ubuntu repositories, such as https://repositories.intel.com and https://apt.repos.intel.com, is required for installation. If proxy settings involve changes to environment variables such as http_proxy or https_proxy, small modifications are required in the following steps, such as adding -E (preserve environment) to sudo commands.

  4. If you previously have not configured your environment, enable access to the Intel repo serving the oneAPI packages and install the oneAPI Base toolkit and HPC toolkit for Ubuntu 22.04.

    wget -O- https://apt.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS.PUB \ | gpg --dearmor | sudo tee /usr/share/keyrings/oneapi-archive-keyring.gpg > /dev/null
    echo "deb [signed-by=/usr/share/keyrings/oneapi-archive-keyring.gpg] https://apt.repos.intel.com/oneapi all main" | sudo  tee /etc/apt/sources.list.d/oneAPI.list
    sudo apt update
    
    sudo apt install -y intel-basekit intel-hpckit
    
  5. Install build dependencies.

    sudo apt install -y python3 python3-pip git build-essential cmake
    pip3 install mako pyyaml
    
  6. Build LAMMPS.

    git clone https://github.com/intel/compute-aggregation-layer.git cal
    git clone https://www.github.com/lammps/lammps lammps -b develop --depth 1
    cd lammps/src
    source /opt/intel/oneapi/setvars.sh
    make yes-asphere yes-kspace yes-manybody yes-misc 
    make yes-molecule yes-rigid yes-dpd-basic yes-gpu
    cd ../..
    
    cd cal 
    mkdir build; cd build
    cmake ..
    make -j
    export PATH=`pwd`:$PATH
    cd ../..
    cd lammps/lib/gpu
    make -f Makefile.oneapi -j
    cd ../../src
    make oneapi -j 
    cd ../..
    
  7. Run LAMMPS.

    cd cal/build
    export PATH=`pwd`:$PATH
    cd ../../lammps/src/INTEL/TEST/
    #ONE-TIME RESTART FILE GENERATION FOR LIQUID CRYSTAL BENCHMARK
    mpirun --bootstrap ssh -np 72 ../../lmp_oneapi -in in.lc_generate_restart -log none
    #Environment Setup - NEO MASTER 026515 or later
    export I_MPI_FABRICS=shm
    export KMP_AFFINITY="granularity=core,scatter"
    export CAL_ASYNC_CALLS=1
    
    #Run Liquid Crystal Benchmark (FOM is timesteps/sec)
    I_MPI_PIN_ORDER=bunch KMP_BLOCKTIME=1000 OMP_NUM_THREADS=4 I_MPI_PIN_DOMAIN=8:compact calrun mpirun \
      --bootstrap ssh -np 16 ../../lmp_oneapi -v N off -in in.intel.lc -log none -pk gpu 2 -sf gpu
    
    

The following example presents an output from Intel® Data Center GPU Max 1550:

--------------------------------------------------------------------------
- Using acceleration for gayberne:
-  with 8 proc(s) per device.
-  with 4 thread(s) per proc.
-  with OpenCL Parameters for: INTEL_GPU (500)
-  Horizontal vector operations: ENABLED
-  Shared memory system: No
--------------------------------------------------------------------------
Platform: Intel(R) Corporation Intel(R) OpenCL Graphics OpenCL 3.0
Device 0: Intel(R) Data Center GPU Max 1550, 448 CUs, 61 GB, 1.6 GHZ (Mixed Precision)
Device 1: Intel(R) Data Center GPU Max 1550, 448 CUs, 1.6 GHZ (Mixed Precision)
--------------------------------------------------------------------------

Initializing Device and compiling on process 0...Done.
Initializing Devices 0-1 on core 0...Done.
Initializing Devices 0-1 on core 1...Done.
Initializing Devices 0-1 on core 2...Done.
Initializing Devices 0-1 on core 3...Done.
Initializing Devices 0-1 on core 4...Done.
Initializing Devices 0-1 on core 5...Done.
Initializing Devices 0-1 on core 6...Done.
Initializing Devices 0-1 on core 7...Done.

Generated 0 of 0 mixed pair_coeff terms from geometric mixing rule
Setting up Verlet run ...
  Unit style    : lj
  Current step  : 10
  Time step     : 0.002
Per MPI rank memory allocation (min/avg/max) = 36.96 | 36.96 | 36.96 Mbytes
   Step          Temp          E_pair         E_mol          TotEng         Press
        10   1.9986478     -0.38874135     0              2.6092246      7.3216782
       100   1.9838512     -0.37184965     0              2.6039215      7.3625427
       200   1.9866877     -0.37227117     0              2.6077547      7.3606555
       300   1.9861461     -0.35463178     0              2.6245817      7.4090507
       400   1.9953463     -0.37023171     0              2.622782       7.3629617
       500   2.0061948     -0.38123724     0              2.6280492      7.3411057
       600   1.9888506     -0.37910347     0              2.6041667      7.3414455
       700   2.0001109     -0.37690831     0              2.6232523      7.3446158
       800   2.0084155     -0.38964792     0              2.6229695      7.3212755
       850   2.0003566     -0.3703234      0              2.6302058      7.3599885