Speculatively Exploiting Cross-Invocation Parallelism
Abstract
                Automatic parallelization has shown promise in producing
scalable multi-threaded programs for multi-core architectures.
Most existing automatic techniques parallelize independent
loops and insert global synchronization between
loop invocations. For programs with many loop invocations,
frequent synchronization often becomes the performance
bottleneck. Some techniques exploit cross-invocation
parallelism to overcome this problem. Using static analysis,
they partition iterations among threads to avoid crossthread
dependences. However, this approach may fail if
dependence pattern information is not available at compile
time. To address this limitation, this work proposes
SpecCross–the first automatic parallelization technique to
exploit cross-invocation parallelism using speculation. With
speculation, iterations from different loop invocations can
execute concurrently, and the program synchronizes only on
misspeculation. This allows SpecCross to adapt to dependence
patterns that only manifest on particular inputs at
runtime. Evaluation on eight programs shows that SpecCross
achieves a geomean speedup of 3.43× over parallel
execution without cross-invocation parallelization.
        scalable multi-threaded programs for multi-core architectures.
Most existing automatic techniques parallelize independent
loops and insert global synchronization between
loop invocations. For programs with many loop invocations,
frequent synchronization often becomes the performance
bottleneck. Some techniques exploit cross-invocation
parallelism to overcome this problem. Using static analysis,
they partition iterations among threads to avoid crossthread
dependences. However, this approach may fail if
dependence pattern information is not available at compile
time. To address this limitation, this work proposes
SpecCross–the first automatic parallelization technique to
exploit cross-invocation parallelism using speculation. With
speculation, iterations from different loop invocations can
execute concurrently, and the program synchronizes only on
misspeculation. This allows SpecCross to adapt to dependence
patterns that only manifest on particular inputs at
runtime. Evaluation on eight programs shows that SpecCross
achieves a geomean speedup of 3.43× over parallel
execution without cross-invocation parallelization.
