Some timing results for the Giltner
346 Linux cluster
Recently the network switch in Giltner 346 was changed from a 10baseT to
100baseT ethernet connection. Theoretically this gives a maximum data transfer
rate of 100 megabits/second (MB/s). I used the performance tests that come
with the MPICH implementation of MPI to do some timing tests on the network.
The files used in testing are in ~hpf/Libraries/mpich/examples/perftest.
Here are some of the results I deem important:
NOTE: These tests are preliminary, the most important tests come when
running an actual computational program and measuring the speedup. I will
always do this for all the routines and such that I develop this summer.
Point-to-point communication speed
The point-to-point communication speed test mpptest showed no
major difference between different send/receive types, like synchronous,
asynchronous, persistant, etc. Also, I timed the networks from gauss
to each of the pc346g-xx, and found no significant variations
in speed, which is of course good.
The speed of the communication was about 9.3 megabites/second, or
about 70 MB/s. This about 70 percent of the maximal rate, which sounds
very reasonable for an ethernet-based network. Here is the graph of interest:
Collective communication speed
The collective communication speed tests were not nearly as impressive.
I tested both reduction (a gather operation) and broadcasts (a scatter
operation) using the program goptest. The testing gave no significant
differences between different topologies, which is not possible, so I think
the testing program always used a tree-type communication topology.
For large broadcasts, the speed was approximately constant and about
2.5 Mb/s, or about 20 MB/s. This is about a third of the maximum
point-to-point speed, which seems reasonable since the decrease in speed
should be of order log2(np)=log2(10). The broadcast time for small messages
(~1Kb) was approximately 0.3 ms, independent of message size,
which emphasizes the latency of the network.
However, reductions were extremely slow compared to broadcasts, which
is a confusing fact. The speed of reduction was about 0.1 Mb/s, or at
most about 0.8 MB/s, which is a percent or so of the maximum point-to-point
speed. This was true even for small message sizes, which is not OK when
compared against the behavior for broadcasts. I am currently trying to
resolve the reasons behind this behaviour with other testing programs.
Here is the graph of interest:
Scaling of collective communication speed
I also tested the scaling of reductions and broadcasts for a tree-topology,
and the results were OK, with an approximately linear (or logarithmic in
the case of reductions) scaling of the speed versus the number of participating
processors. Here is a graph showing the results, so I leave it up to your
interpretation:
