API Reference

Data Helpers¤

# FluxMPI.DistributedDataContainer — Type.

DistributedDataContainer(data)

data must be compatible with MLUtils interface. The returned container is compatible with MLUtils interface and is used to partition the dataset across the available processes.

source

General Functions¤

# FluxMPI.Init — Function.

Init(; gpu_devices::Union{Nothing,Vector{Int}}=nothing, verbose::Bool=false)

Setup FluxMPI. If GPUs are available and CUDA is functional, each rank is allocated a GPU in a round-robin fashion.

If calling this function, no need to call MPI.Init first.

source

# FluxMPI.Initialized — Function.

Initialized()

Has FluxMPI been initialized?

source

# FluxMPI.fluxmpi_print — Function.

fluxmpi_print(args...; kwargs...)

Add rank and size information to the printed statement

source

# FluxMPI.fluxmpi_println — Function.

fluxmpi_println(args...; kwargs...)

Add rank and size information to the printed statement

source

# FluxMPI.local_rank — Function.

local_rank()

Get the rank of the process.

source

# FluxMPI.total_workers — Function.

total_workers()

Get the total number of workers.

source

MPIExtensions: Blocking Communication Wrappers¤

# FluxMPI.allreduce! — Function.

allreduce!(v, op, comm)

Perform MPI.Allreduce! ensuring that CuArrays are safely transfered to CPU if CUDA-aware MPI is unavailable/disabled.

source

# FluxMPI.bcast! — Function.

bcast!(v, op, comm)

Perform MPI.Bcast! ensuring that CuArrays are safely transfered to CPU if CUDA-aware MPI is unavailable/disabled.

source

# FluxMPI.reduce! — Function.

reduce!(v, op, comm)

Perform MPI.Reduce! ensuring that CuArrays are safely transfered to CPU if CUDA-aware MPI is unavailable/disabled.

source

MPIExtensions: Non-Blocking Communication¤

# FluxMPI.Iallreduce! — Function.

Iallreduce!(sendbuf, recvbuf, op, comm)
Iallreduce!(sendrecvbuf, op, comm)

Performs non-blocking elementwise reduction using the operator op on the buffer sendbuf. sendbuf can also be a scalar, in which case recvbuf will be a value of the same type.

recvbuf and an MPI_Request object are returned. The value in recvbuf is only valid after the request has been completed. (MPI.Wait!)

Warning

OpenMPI doesn't support Iallreduce! with CUDA. See this issue.

source

# FluxMPI.Ibcast! — Function.

Ibcast!(buf, root, comm)

Non-blocking broadcast of the buffer buf to all processes in comm.

recvbuf and an MPI_Request object are returned. The value in recvbuf is only valid after the request has been completed. (MPI.Wait!)

source

Optimization¤

# FluxMPI.DistributedOptimizer — Type.

DistributedOptimizer(optimizer)

Wrap the optimizer in a DistributedOptimizer. Before updating the parameters, this adds the gradients across the processes using non-blocking Allreduce

Arguments

optimizer: An Optimizer compatible with the Optimisers.jl package

Note

Remember to scale the loss function by 1 / total_workers() to ensure that the gradients are correctly averaged

source

# FluxMPI.allreduce_gradients — Function.

allreduce_gradients(gs::NamedTuple; on_gpu::Bool=CUDA.functional())

Allreduce the gradients. This uses a non-blocking API which will be efficient for large containers of multiple parameter arrays.

Arguments

gs: A NamedTuple of gradients

Keyword Arguments

on_gpu: Specify if the gradients are on GPU. Defaults to CUDA.functional()

Returns

Allreduced NamedTuple of gradients

source

Synchronization¤

# FluxMPI.synchronize! — Function.

synchronize!(x; root_rank::Integer=0)

Synchronize x across all processes.

Note

this function is not in-place for CuArrays when MPI is not CUDA aware.

source

Configuration¤

# FluxMPI.disable_cudampi_support — Function.

disable_cudampi_support(; disable=true)

Disable CUDA MPI support. Julia Session needs to be restarted for this to take effect.

source