Convex Network Optimization

Implementation in Adaptor High Performance Fortran
Aleksandar Donev, working with Dr. Phil Duxbury

Graphs and Networks

A graph G=(V, E) can be thought of as a collection of vertices (here-after nodes) V={i, i=1..n}, where n=|V| is the number of nodes, connected via directed or undirected edges (arcs), E={e(i,j) | i, j in V}, where m=|E| is the number of arcs in the graph and e(i,j) denotes an arc that connects node i (its head) and node is j (tail). For our purposes, a network N=(G, C) is a graph G with a real-valued cost functional C={c(i, j)} associated with the edges E. In particular, let c(i, j)(f(i,j)) be the cost incurred when flow f(i,j) is pushed through the arc e(i,j).

A few examples of directed and undirected graphs
Fig 1. A few examples of directed and undirected graphs

A textbook example is a collection of cities with a collection of roads connecting them, in which case the cost would be associated with the maintainance of roads/pollution etc. (notice that the cost becomes infinite if the capacity of an arc is exceeded--a huge trafic jam!). Normally, in most situations we want to minimize the cost, so we next look at:

Min-cost Network Flow Problem

Let us consider a network with given cost functions on each arc and try to maintain a flow throgh it with certain properties, as our needs dictate, under minimal cost. A crucial property of such network optimization problems is that they are constrained because of conservation of flow at each node--the same amount of flow should be incident in and out of all nodes,
Di=Se(i,j)f(i,j)-Se(j,i)f(i,j)+bi=0
where bi is the supply (if positive) or demand (negative) of node i. We can express this in matrix form by compressing the flows for all arcs and supplies/demands for all nodes into vectors f and b respectively, Af=b, where A is the node-arc incidence matrix of the graph, which has +1 in row i and -1 in row j for the column corresponding to arc e(i,j).
Every flow vector f that satisfies conservation of flow is called feasible. The convex separable min-cost network flow problem is a classical optimization problem which consists of finding the flow f for which the total cost g(f)=Se(j,i)c(i,j)(f(i,j)) is minimal over all feasible flows. Also, the cost is a convex (likely non-linear) function of the flow:
min g(f) | Af=b		(1)
This problem is the subject of this long-term research project.

Disordered Physical Systems Modeled as Networks

Consider a porous material, such as a rock above an oil well, through which we want to push fluid--create flow from a source S to a sink T by puting the fluid under preasure--applying potential. We can model the maze of channels in the porous material and their intersections as a network. In most physics research, we use grid-like networks, usually randomly diluted hypercube nearest-neighbour lattices to model bulk materials:
A two dimensional grid-based network
Fig 1. A two dimensional grid-based network

A similar situation appears when we study superconducting materials made of grains, in which case the potential is the applied voltage, and the current is the created flow. Both a narrow channel in a porous material and a superconducting grain exhibit flow-potential characteristics which have a critical potential or flow around which highly non-linear transitions happen:

Highly non-linear potential-flow characteristics of certain conducting elements
Fig 2. Highly non-linear potential-flow characteristics of certain conducting elements

A fundamental characteristic of many real physical systems in CMP is that they are disordered--each channel in the porous material or grain in the superconductor has randomly distributed properties, with well defined statistical (thermodynamical) average distributions. In order to study such systems numerically, very large samples need to be simulated and complex calculations carried out. The aim of this work is to develop parallel computational tools for convex network optimization problems, with special emphasis on the min-cost flow problem. The work is not finished yet, but some promising preliminary investigations have been done.

Numerical Algorithm

Mathematicians would recognize eq. 1 as a convex mathematical programming problem. But the special underlying network structure of the problem makes it possible to solve it much more efficiently than in the ordinary case, for networks with the size of several million sites or more, depending on the available memory.
Well known algorithms exist for the cases when the cost functional is piecewise linear and when the flow is integer; The general case of real valued flows and costs and strongly non-linear cost functionals has been studyied extensively, but there is no "optimal" general-purpose algorithm. The convex min-cost network optimization comes under the umbrella of the field of convex network optimization.

We decided to implement an algorithm known as the Truncated Dual Newton Method. It uses the method of Lagrange multipliers and convex-function theory. At its computational core lies the solution of the following Newton system of linear equations in x:
Hx=-G  --->  (AH*AT)x=-(ATG*-b)		(2)
where as before A is the node-arc incidence matrix of the network, which contains in it all the information about the connectivity of the network, and H* and G* are the so called conjugate Hessian and gradient and contain all the information about the cost function. A more formal description can be found in this applied mathematics writeup.

Computational Algorithm, Implementation and Scaling Properties

The implementation of the algorithm is coded in an extension of Fortran for parallel computers known as High Performance Fortran, using the public domain compiler Adaptor. The parallel hardware is a Linux Beowulf cluster of 11 Pentium II 350 MHz PC's in Giltner Hall 346. I have set up an extensive set of parallel programming tools and libraries on the cluster in the past year or so, and this summer was my first chance to try and use these in a really interesting and important project.
When solving the network optimization problem on this cluster, we distribute the network over several processors and let each processor handle its portion, ocasionally performing communication between the processors using the Ethernet switch that connects the cluster. Details of how this is done go beyond the scope of this presentation, but the basic idea should be clear? I will use coloring to denote ownership of the data on certain processors:
The initial graph connectivity and distribution
Fig 3. Initial labeling and distribution of the graph.
The example was executed on 3 processors and was a 2D grid of size 7x7 with 70% of the arcs present

The first step in the algorithm is to generate the above network and randomly remove some of the edges with a given dilution probability d. But because the dilution is random some of the nodes/edges will be left unconnected to other nodes. Obviously, flow circulation through such arcs is not related to the flow through other arcs in the network. Therefore, the next step is to extract a single connected component of the graph and use this as our network. This component will be what mathematicians call a pure network.
Extraction of the connected components in parallel is done with an algorithm known as a connected-component labeling algorithm. I have implemented one such algorithm successfully this summer. Take a look at the detailed overview for details. For the example in figure 3, it will extract the following graph to be our network:
The graph labeling after the connected component extraction
Fig 4. The graph labeling after the connected component extraction

Further complications appear when the lattice is very diluted (more formally, close to the percolation point), in which case there may be many dangling arcs and dangling loops. Here is an actual network with a source and sink plate at the percolation point:
Close to the percolation point there are many dangling components
Fig 5. Dangling components close to the percolation point
The source and sink plates (left-right) are fully connected and shown and two backbone flow carrying paths between them and the dangling components are highlighted

A good algorithm would first remove such dangling components. Identifying these is not an easy problem, though it has been well studied by computer scientists. It can be reduced to a depth-first search or the problem of identifying cycles and bypartite matchings in graphs. I may work on implementing this step if time permits.

Several other graph algorithms can be applyed at this stage to help us in selecting what amount of flow to push from S to T (for constant-current ST sources) or what potential to apply between S and T (for constant-potential ST plates). These are the max-flow algorithm and the shortest-path algorithms and apply to the case of a superconductor and a porous material respectively. Do you want to hear more (or better yet, do I have any time left?).

After the ST connected component has been extracted and the currents/voltages set, we can use the Dual Netwton Method to find the min-cost flow. This has not been implemented in full yet. However, the computationally intensive portion of it (have you heard of the 90/10% rule in scientific programming) is solving the system:
(AH*AT)x=-(ATG*-b)		(2)
We decided to use Preconditioned Conjugate Gradient to do this. The main portion of this iterative procedure is repeated multiplication of a vector with the matrix AH*AT, which has been implemented very efficietly using the Adaptor HPF_HALO_LIBRARY.The non-preconditioned CG algorithm has also been implemented in a preliminary version and the slides to follow will illustrate its behaviour and scaling computational properties. Here is a plot illustrating the convergence behaviour of CG for this coefficient matrix:
Convergence of CG for the coefficient matrix <I><TT>AH<SUP>*</SUP>A<SUP>T</SUP></TT></I>
Fig 5. Convergence of CG for the coefficient matrix AH*AT
An almost positive-semidefinite and a nice positive-definite case are shown. Diagonal preconditioning seems to help some in the former case, but not much in general.


Untill a separate web-page is developed, here are some timing and scaling results for the parallel conjugate-gradient HPF implementation. Comments will be given later on:
  1. In 2D:
  2. In 3D:
Much needs to be done though to reduce the overhead of HPF synchronization points, implement CG variations that cluster inner products and most of all implement efficient preconditioning techniques. Me and Dr. Duxbury are in the process of considering and possibly implementing a novel preconditioning routine based on minimal spanning trees of the graph. In fact, the graph algorithm which we are looking at is well studied in the computer science community as the problem of updating a minimal spanning forest combined with network simplex price/flow computations on generalized networks. Parallel implementation will be difficult, but the challenge makes it all the more interesting.