Some timing results for the Giltner 346 Linux cluster

Recently the network switch in Giltner 346 was changed from a 10baseT to 100baseT ethernet connection. Theoretically this gives a maximum data transfer rate of 100 megabits/second (MB/s). I used the performance tests that come with the MPICH implementation of MPI to do some timing tests on the network. The files used in testing are in ~hpf/Libraries/mpich/examples/perftest. Here are some of the results I deem important:

NOTE: These tests are preliminary, the most important tests come when running an actual computational program and measuring the speedup. I will always do this for all the routines and such that I develop this summer.

Point-to-point communication speed

The point-to-point communication speed test mpptest showed no major difference between different send/receive types, like synchronous, asynchronous, persistant, etc. Also, I timed the networks from gauss to each of the pc346g-xx, and found no significant variations in speed, which is of course good.
The speed of the communication was about 9.3 megabites/second, or about 70 MB/s. This about 70 percent of the maximal rate, which sounds very reasonable for an ethernet-based network. Here is the graph of interest:


 Collective communication speed

The collective communication speed tests were not nearly as impressive. I tested both reduction (a gather operation) and broadcasts (a scatter operation) using the program goptest. The testing gave no significant differences between different topologies, which is not possible, so I think the testing program always used a tree-type communication topology.


For large broadcasts, the speed was approximately constant and about 2.5 Mb/s, or about 20 MB/s.  This is about a third of the maximum point-to-point speed, which seems reasonable since the decrease in speed should be of order log2(np)=log2(10). The broadcast time for small messages (~1Kb) was approximately 0.3 ms,  independent of message size, which emphasizes the latency of the network.


However, reductions were extremely slow compared to broadcasts, which is a confusing fact. The speed of reduction was about 0.1 Mb/s, or at most about 0.8 MB/s, which is a percent or so of the maximum point-to-point speed. This was true even for small message sizes, which is not OK when compared against the behavior for broadcasts. I am currently trying to resolve the reasons behind this behaviour with other testing programs.
Here is the graph of interest:


Scaling of collective communication speed

I also tested the scaling of reductions and broadcasts for a tree-topology, and the results were OK, with an approximately linear (or logarithmic in the case of reductions) scaling of the speed versus the number of participating processors. Here is a graph showing the results, so I leave it up to your interpretation: