API Reference
FluxMPI.DistributedDataContainer
FluxMPI.DistributedOptimizer
FluxMPI.Iallreduce!
FluxMPI.Ibcast!
FluxMPI.Init
FluxMPI.Initialized
FluxMPI.allreduce!
FluxMPI.allreduce_gradients
FluxMPI.bcast!
FluxMPI.disable_cudampi_support
FluxMPI.fluxmpi_print
FluxMPI.fluxmpi_println
FluxMPI.local_rank
FluxMPI.reduce!
FluxMPI.synchronize!
FluxMPI.total_workers
Data Helpers¤
#
FluxMPI.DistributedDataContainer
— Type.
DistributedDataContainer(data)
data
must be compatible with MLUtils
interface. The returned container is compatible with MLUtils
interface and is used to partition the dataset across the available processes.
General Functions¤
#
FluxMPI.Init
— Function.
Init(; gpu_devices::Union{Nothing,Vector{Int}}=nothing, verbose::Bool=false)
Setup FluxMPI
. If GPUs are available and CUDA is functional, each rank is allocated a GPU in a round-robin fashion.
If calling this function, no need to call MPI.Init
first.
#
FluxMPI.Initialized
— Function.
Initialized()
Has FluxMPI been initialized?
#
FluxMPI.fluxmpi_print
— Function.
fluxmpi_print(args...; kwargs...)
Add rank
and size
information to the printed statement
#
FluxMPI.fluxmpi_println
— Function.
fluxmpi_println(args...; kwargs...)
Add rank
and size
information to the printed statement
#
FluxMPI.local_rank
— Function.
local_rank()
Get the rank of the process.
#
FluxMPI.total_workers
— Function.
total_workers()
Get the total number of workers.
MPIExtensions: Blocking Communication Wrappers¤
#
FluxMPI.allreduce!
— Function.
allreduce!(v, op, comm)
Perform MPI.Allreduce!
ensuring that CuArrays are safely transfered to CPU if CUDA-aware MPI is unavailable/disabled.
#
FluxMPI.bcast!
— Function.
bcast!(v, op, comm)
Perform MPI.Bcast!
ensuring that CuArrays are safely transfered to CPU if CUDA-aware MPI is unavailable/disabled.
#
FluxMPI.reduce!
— Function.
reduce!(v, op, comm)
Perform MPI.Reduce!
ensuring that CuArrays are safely transfered to CPU if CUDA-aware MPI is unavailable/disabled.
MPIExtensions: Non-Blocking Communication¤
#
FluxMPI.Iallreduce!
— Function.
Iallreduce!(sendbuf, recvbuf, op, comm)
Iallreduce!(sendrecvbuf, op, comm)
Performs non-blocking elementwise reduction using the operator op
on the buffer sendbuf
. sendbuf
can also be a scalar, in which case recvbuf
will be a value of the same type.
recvbuf
and an MPI_Request object are returned. The value in recvbuf
is only valid after the request has been completed. (MPI.Wait!
)
Warning
OpenMPI doesn't support Iallreduce! with CUDA. See this issue.
#
FluxMPI.Ibcast!
— Function.
Ibcast!(buf, root, comm)
Non-blocking broadcast of the buffer buf
to all processes in comm
.
recvbuf
and an MPI_Request object are returned. The value in recvbuf
is only valid after the request has been completed. (MPI.Wait!
)
Optimization¤
#
FluxMPI.DistributedOptimizer
— Type.
DistributedOptimizer(optimizer)
Wrap the optimizer
in a DistributedOptimizer
. Before updating the parameters, this adds the gradients across the processes using non-blocking Allreduce
Arguments
optimizer
: An Optimizer compatible with the Optimisers.jl package
Note
Remember to scale the loss function by 1 / total_workers()
to ensure that the gradients are correctly averaged
#
FluxMPI.allreduce_gradients
— Function.
allreduce_gradients(gs::NamedTuple; on_gpu::Bool=CUDA.functional())
Allreduce the gradients. This uses a non-blocking API which will be efficient for large containers of multiple parameter arrays.
Arguments
gs
: ANamedTuple
of gradients
Keyword Arguments
on_gpu
: Specify if the gradients are on GPU. Defaults toCUDA.functional()
Returns
Allreduce
d NamedTuple of gradients
Synchronization¤
#
FluxMPI.synchronize!
— Function.
synchronize!(x; root_rank::Integer=0)
Synchronize x
across all processes.
Note
this function is not in-place for CuArrays when MPI is not CUDA aware.
Configuration¤
#
FluxMPI.disable_cudampi_support
— Function.
disable_cudampi_support(; disable=true)
Disable CUDA MPI support. Julia Session needs to be restarted for this to take effect.