MPICH
implementation of MPI
MPI is the standard communication library for high performance computing.
I have installed and successfully tested the most widely used public-domain
version, Argon's MPICH. You can learn more about the library from the above
linked websites. For the purposes of our Linux cluster in Giltner 346,
the most important decission one needs to make when using MPICH is which
communications device you want to use for job startup:
I have installed two devices, chp4 and chp4_mpd:
-
chp4
device, which uses remote login via ssh
to start the jobs on the PCs, with a special chp4_servs
daemon to speed the process. This device is very stable and requires little
work, but it is very slow in starting the jobs. I recommend you don't use
it unless you have a reason. To initialize the jobs, first start by checking
whether all the machines are responding:
tstmachines
-v
If that goes well, then start the chp4
secure servers via typing mpistart_ch_p4.
Compiling programs with this device is done by:
lf95
$RM346 -o $HPF_EXE/file.exe file.f90 $HPF_MPI_ch_p4
where as usuall the capitalized option HPF_MPI_ch_p4
linkes the shared libraries:
HPF_MPI_ch_p4
= "-I$HPF_MODULES/MPI -L/usr/i386-redhat-linux/lib -L/usr/lib/gcc-lib/i386-redhat-linux/egcs-2.91.66
-lc -lgcc -lMPICH_ch_p4 -lmpichf -lMPICHF90"
There is a corresponding statically linked option with
lowercase letters, HPF_mpi_ch_p4, but
use this only if you must, since the executables will be huge and job startup
will take longer. The shared libraries work well with this device since
the initialization script .tcshrc in
/home/hpf contains a definition of
the LD_LIBRARY_PATH variable pointing
to the right direction, and this file is read even at remote logins via
ssh.
Jobs compiled with this device can be run via:
mpirun_ch_p4
-np number of processors executable.exe options_to_executable
A very important thing to notice is that this will always
have gauss as node 0 in the MPI context!
The program that is supposed to shut the servers on the
PC's, mpiend_ch_p4, does not work in
this release of MPICH.
-
chp4_mpd
device is the preferred one to use, but it is still experimental and has
some problems, which is the reason I decided to leave the old chp4 device
as well. Documentation for the device can be found in the MPICH userguide.
In short, mpd is a daemon like ftp, telnet, ssh etc. that is used specifically
for starting MPI jobs. Therefore, there is no remote login when starting
jobs, so that job startup takes less than a second. The device starts the
mpd daemon on all the nodes via ssh and connects the daemons in a mpd
ring. This is done either via the shortcut command,
mpistart_mpd_gauss
number_of_PC's
if you like gauss
to be included in the ring as node 0 (NOTE: the number of PC's does not
include gauss!), or via,
mpistart_mpd
number_of_PC's
if you like only the PC's to be in the ring (PC #10 will
be node 0).
After the daemons are started, check them by executing
the command mpdtrace, which will print
all the members of the ring with their left and right neighbours. Also
run,
mpdringtest
10
to test the ring connectivity. The wonderful thing about
this device is that it propagates signals to all the PC's, so that you
can now suspend a parallel job simply by pressing Ctrl-Z! This is still
experimental stuff though.
One of the problems with this device is that sometimes
the ring gets disconnected when you mess up (or someone sits and logs onto
a PC in NT!!!), and then you have to kill the daemons manually via killmpds,
and start again.
Compiling programs with this device is done by:
lf95
$RM346 -o $HPF_EXE/file.exe file.f90 $HPF_MPI_mpd
where as usuall the capitalized option HPF_MPI_mpd
links the shared libraries, and the corresponding lowercase HPF_mpi_mpd
links static libraries (not recommended).
Note that this device is the default, so you can just
ommit the _mpd part. If you use both
devices though, its wise to indicate which one it was. The two devices
are not compatible with eachother.
To run a job with shared libraries and this device, you
must propagate the LD_LIBRARY_PATH
manually via:
mpirun_mpd
-np number of processors executable.exe options $SLIB
where SLIB contains
the proper commands and is defined as usuall in ~/.HPF.
After you are done, one can kill the mpds via mpiend_mpd.
NOTE: The mpd device is under active development
and a new release is coming with many nifty features.
A simple example that calculates pi using MPI can be found here in Fortran
77 or here in Fortran 90. Learning how to program
with MPI takes a lot of time, but the above things pertaining to job startup
and compilation are very important even when using the HPF compiler Adaptor.
I should point out that the chp4_mpd device has problems with flushing
the input/output under Fortran, so you should put a:
CALL
FLUSH(6)
after writing to the I/O buffer for stdout.
A final point of great interest in message-passing programs is profiling
tools. Since I have not yet run any big programs that needed profiling
or extensive debugging, I have not looked into these yet (some, like nupshot,
come with MPICH, others
are free on the internet, yet others are commercial). Adaptor only has
an option to generate traces for the commercial Vampir tool.
This compiler is the heart of my computational venture on the Giltner 346
cluster. It is a public-domain HPF compiler that translates HPF programs
into SPMD programs that use the Adaptor DALIB library and MPI. The compiler
is not perfect and focuses too much on irregular computations (the research
interests of its authors), however, it is still a wonderful tool that has
lots and lots to offer at a very satisfactory performance. I don't have
time to comment on the compiler here, and lots of good documentation can
be found at the Adaptor homepage. Support is weak though since its authors
are busy people that don't like answering questions.
Compilation of HPF programs can be most easily done by using the driver
gmdhpf, which I renamed just hpf
with the options "-v -keep" included.
This driver first envokes the translator fadapt
to translate the HPF program into Fortran 90 with calls to DALIB and MPI.
Then it compiles this file using lf95
and it finally linkes it together with all the libraries, again using lf95
(which in turn envokes the Linux ld).
The default options for this process are set either in ~/.HPF
or ~/Adaptor/.adprc. If you want to
override these, you can do the translation/compilation steps separately
using the many shortcuts declared in ~/.HPF.
The usuall invocation is:
hpf
-o gmdhpf_options executable source.hpf
-Wa"options_for_fadapt" -Wf"options_for_lf95"
-Wl"linker_options"
You can look at the intermediate translation file source.f
to see whether the file was translated as you thought it would be. Although
this is a tough job, it is an essential part of dealing with the many quirks
HPF compilers have.
The remaining main options that one is likely to use for gmdhpf
are:
| Option |
Purpose |
| -c |
Compile only |
| -1 |
Compile specifically for single processor run (no parallelization
added) |
The main options for the translator fadapt
are:
| Option |
Purpose |
| -HPF_SUBSET, -HPF_BASE |
Set the language to restricted HPF versions. The default
is -HPF_ADP |
| -G |
Keep all intermediate files for debugging purposes |
| -w, -noinfo |
Suppress warning and info messages |
| -I<directory> |
Specify directory(ies) where module .ext
files and includes are. Use : to separate multiple directories. |
| -interface, -call |
Generate .h interface
file or a call graph |
| -safety <0|1|2>, -nostrict |
Runtime and compile-time argument checking (safety is
set to 0 by default! on gauss) |
| -sp, -dp |
Default precision is single or double |
| -[no]auto |
Enable/disable automatic parallelization |
The executables made by hpf are MPI
chp4_mpd shared-library executables and are thus
executed via:
mpirun -np
number_of_processors executable.exe runtime_options
$SLIB
The interesting runtime options for the Adaptor executables are:
| Option |
Purpose |
| n1xn2xn3... |
Run the executables on the given processor topology shape--on
PC clusters its best to use the default linear topology |
| -call |
Printout calling statistics |
| -redist |
Printout redistribution statistics (with safety set to
2 in fadapt) |
| -comm |
Printout communication statistics |
| -time |
Printout timing statistics |
If you are interested in learning more about HPF and this compiler,
please let me know, and I try to organize a small lab in Giltner in which
we can try different things together. As I get more experienced with Adaptor,
I will post more information on these webpages. For now, take a look at
the example for calculating pi recast in an Adaptor
suitable form.
The Basic Linear Algebra Communications Subroutines is a collection
of routines that are ment to replace MPI in scientific codes. They are
based on MPI and eventually use it for the communication, but they are
quite a lot easier to use, since you pass arrays between the processors
instead of messages. Book-keeping chores like message ID's are also partially
gone.
I have only rudimentary tested this library and it seemed to work. Its
integration in HPF programs is difficult though, and I may work on making
a suitable interface if it turns out I need BLACS for my project. A related
library is the Parallel version of BLAS, PBLAS,
but its use is even more difficult inside HPF programs, as well as other
interesting projects within the ScaLAPACK
project. For now, take a look at a trivial example
that uses BLACS. The file is compiled using the usuall syntax:
lf95
$RM346 -o $HPF_EXE/file.exe file.f $HPF_BLACS
NOTE: More parallel libraries are coming soon, and now you get
the gist of how I set up the system. For now though, I am turning my focus
on Adaptor and HPF.