SWIFT: Using task-based parallelism, fully asynchronous communication, and graph partition-based domain decomposition for strong scaling on more than 100000 cores.

PASC16, EPFL, Lausanne, Switzerland (2016)

Abstract

We present a new open-source cosmological code, called \swift, designed to solve the equations of hydrodynamics using a particle-based approach (Smooth Particle Hydrodynamics) on hybrid shared / distributed-memory architectures. \swift was designed from the bottom up to provide excellent {\em strong scaling} on both commodity clusters (Tier-2 systems) and Top100-supercomputers (Tier-0 systems), without relying on architecture-specific features or specialized accelerator hardware. This performance is due to three main computational approaches:

\begin{itemize}

\item \textbf{Task-based parallelism} for shared-memory parallelism, which provides fine-grained load balancing and thus strong scaling on large numbers of cores.

\item \textbf{Graph-based domain decomposition}, which uses the task graph to decompose the simulation domain such that the {\em work}, as opposed to just the {\em data}, as is the case with most partitioning schemes, is equally distributed across all nodes.

\item \textbf{Fully dynamic and asynchronous communication}, in which communication is modelled as just another task in the task-based scheme, sending data whenever it is ready and deferrin on tasks that rely on data from other nodes until it arrives.

\end{itemize}

In order to use these approaches, the code had to be re-written from scratch, and the algorithms therein adapted to the task-based paradigm. As a result, we can show upwards of 60\% parallel efficiency for moderate-sized problems when increasing the number of cores 512-fold, on both x86-based and Power8-based architectures.