Configure MPI for Distributed Computing
This section will cover how to configure ADCME for MPI functionalities.
Configure the MPI backend
The first step is to configure your MPI backend. There are many choices depending on your operation system. For example, Windows have Microsoft MPI. There are also OpenMPI and Intel MPI available on most Linux distributions. If you want to use your own MPI backend, you need to locate the MPI libraries, header files, and executable (e.g., mpirun
). You need to build ADCME with the following environment variable:
MPI_C_LIBRARIES
: the MPI shared library, for example, on Windows, it may be
C:\\Program Files (x86)\\Microsoft SDKs\\MPI\\Lib\\x64\\msmpi.lib
On Unix systems, it may be /opt/ohpc/pub/compiler/intel-18/compilers_and_libraries_2018.2.199/linux/mpi/intel64/lib/release/libmpi.so
Note that you must include the shared library in the variable.
MPI_INCLUDE_PATH
: the directory wherempi.h
is located, for example,
C:\\Program Files (x86)\\Microsoft SDKs\\MPI\\Include
Or in a Unix system, we have /opt/ohpc/pub/compiler/intel-18/compilers_and_libraries_2018.2.199/linux/mpi/intel64/include/
The simplest way is to add these variables in the environment variables. For example, in Linux, we can add the following lines in the ~/.bashrc
file.
export MPI_C_LIBRARIES=/opt/ohpc/pub/compiler/intel-18/compilers_and_libraries_2018.2.199/linux/mpi/intel64/lib/libmpi.so
export MPI_INCLUDE_PATH=/opt/ohpc/pub/compiler/intel-18/compilers_and_libraries_2018.2.199/linux/mpi/intel64/include/
alias mpirun=/opt/ohpc/pub/compiler/intel-18/compilers_and_libraries_2018.2.199/linux/mpi/intel64/bin/mpirun
In the case you do not have an MPI backend, ADCME provides you a convenient way to install MPI by compiling from source. Just run
using ADCME
install_openmpi()
This should install an OpenMPI library for you. Note this functionality does not work on Windows and is only tested on Linux.
Build MPI Libraries
The MPI functionality of ADCME is not fulfilled at this point. To enable the MPI support, you need to recompile the built-in custom operators.
using ADCME
ADCME.precompile(true)
At this point, you will be able to use MPI features.
Build MPI Custom Operators
You can also build MPI-enabled custom operators by calling
using ADCME
customop(with_mpi=true)
In this case, there will be extra lines in CMakeLists.txt
to setup MPI dependencies.
IF(DEFINED ENV{MPI_C_LIBRARIES})
set(MPI_INCLUDE_PATH $ENV{MPI_INCLUDE_PATH})
set(MPI_C_LIBRARIES $ENV{MPI_C_LIBRARIES})
message("MPI_INCLUDE_PATH = ${MPI_INCLUDE_PATH}")
message("MPI_C_LIBRARIES = ${MPI_C_LIBRARIES}")
include_directories(${MPI_INCLUDE_PATH})
ELSE()
message("MPI_INCLUDE_PATH and/or MPI_C_LIBRARIES is not set. MPI operators are not compiled.")
ENDIF()
Running MPI Applications with Slurm
To run MPI applications with slurm, the following commands are useful
sbatch -n 4 -c 8 mpirun -n 4 julia app.jl
This specifies 4 tasks and each task uses 8 cores. You can also replace sbatch
with salloc
.
To diagonose the application, you can also let mpirun
print out the rank information, e.g., in OpenMPI we have
sbatch -n 4 -c 8 mpirun --report-bindings -n 4 julia app.jl