Optimizing Distributed Boundary Exchanges for Benchmarks, Solvers and Sparse Matrix Operations