Test 3 -- Allpairs

This test runs ping-pong between every pair of hosts, using only one rank per host at a time. The traffic is not all simultaneous, but does overlap somewhat otherwise the test could take a very long time to run.

Each rank is paired with self+1, then self+2, then self+3, etc. The "blocksize" is a control for how many ranks do this simultaneously. At any point in time rank x, x+blocksize, x+2*blocksize etc are initiating ping-pong operations. The default blocksize is ~sqrt(nranks).

Visually an example of the order the data is collected on an 8-host cluster is roughly as follows:

h7 |          4  3  2  1  x
h6 |       4  3  2  1  x
h5 |    4  3  2  1  x
h4 | 4  3  2  1  x
h3 | 3  2  1  x
h2 | 2  1  x              3
h1 | 1  x              3  2
h0 | x              3  2  1
---|------------------------
   | h0 h1 h2 h3 h4 h5 h6 h7
The diagonal row of "1"s for example shows the first set of data that's collected, where each rank performs ping-pong with self+1. Due to the blocksize which would be 2 in the above example, that data isn't all collected simultaneously though. Instead in each block of two ranks "h0 h1" "h2 h3" etc, one of them is initiating a pingpong first, then the other.

The purpose behind testing multiple rank-pairs between the same two hosts is to see if any of the paths between those hosts are particularly bad. The graph shows the performance of the worst path for each host pair. If the worst path is greatly worse than the average path for a particular host pair that will be noted in the job's stdout.