Connected Components of a Graph

Implementation in Adaptor High Performance Fortran
Aleksandar Donev, working under Dr. Phil Duxbury

Algorithm Description

Problem Formulation

A graph G=(N, E) can be thought of as a collection of points (here-after nodes) N={i, i=1..n}, where n=|N| is the number of nodes, connected via directed or undirected links (arcs), E={e(i, j) for some i and j in N}, where m=|E| is the number of arcs in the graph and e(i, j) denotes an arc whose head is i and tail is j. We can this represent a graph efficiently by specifying n and m and giving an array HT(m,2) giving the heads and tails of each arc:
	HT(e(i, j), :)=[i, j]
For example, assume that we have randomly generated the following graph:

Distribution of the graph

We will use coloring to denote ownership of the data on certain processors. The example from figure 1 was executed on 3 processors and was a 2D grid of size 7x7 with 70% of the arcs present. Processors P1-P3 will have red, green and blue color correspondingly, and the color of each node/arc will denote the processor that is its owner. Once again, here is the graph that I will use as an example throughout this section:

[hpf@gauss hpf-2DNet]$ mpirun -np 3 Test2D-Debug.x $SLIB
 Enter the length L and the dilution:
 7,.7
 Enter the number of multiplications and connected components labelings:
 0,1
# Here are node_lbound and node_ubound for different processors:
 The distribution of nodes among processors is:
   Procs:    1   2   3
   From :    1  18  35
   To   :   17  34  49
 Initial connectivity parenthood: 
The initial graph connectivity and distribution
Fig 5. Initial labeling of the graph

The connected components labeling algorithm consists of assigning each node i a label c(i) such that two nodes have the same label if and only if there is a path in the graph connecting the two nodes. Our purpose at the end will be to select only for those arcs and nodes that belong to a selected (usually the one containing the source or sink in network optimization problems) connected component. Mathematically, we want to extract a single (large) connected component C from the diluted graph:
G <-- ({e(i, j), c(i)=c(j)=C}, {i, c(i)=C})
For our specific example, the program will produce the output:
Final extracted largest connected component
Fig 2. Final extracted largest connected component

Parallel PRAM Algorithms for Connected-Components Labeling

The algorithms build the labeling of the graph by building a spanning forest of the graph. Namely, each connected component will be represented via a tree in the forest, with one node chosen as the root of the tree. To convert this spanning forest into a proper labeling, we simply collapse all the trees into stars---trees of height 1. See figure 3 for an illustration.
Tree collapse transforms trees into stars
Fig 3. A tree collapses into a star--TreeCollapse()

We represent the trees in the forest in the usuall way, via parenthood relations. So, let us make the array p (n), which stores the parent of each node, and p (node) = node for root nodes node. Thus tree collapse can be represented via:
Procedure TreeCollapse (N)
	For all nodes i in N do in parallel
            	p (i) <-- FindRoot (i) 
	End do
End procedure TreeCollapse
where we find the root of the tree a node belongs to by traversing the tree upward until a root node is found:
Procedure child = FindRoot (i)
	child = i		
	# Concurrent read:
	Do while ( p (child) != child )	child <-- p (child)                
End procedure FindRoot
Finding the root FindRoot (node) has a cost O(H) , where H is the depth of node in the tree. Therefore, it is beneficial to have trees of small height (i.e. almost stars).
Now, let us assume that two trees need to be merged together into a bigger tree because there is an arc e(i, j) connecting one of the nodes i from the first tree to another node j in the second tree. This can be done easily by hooking one of the roots to any node in the other tree. Again, to save on cost and minimize the height of the parenthood trees, it is best to hook one of the roots to the other root. See figure 4 for an illustration.
Two trees merge together
Fig 4. Hooking two trees together--MergeTrees(e(4, 9))

There are various ways to minimize the height of the trees formed by tree merging. The simplest one to understand (though not most memory efficient) is to always hook a shorter tree to a taller tree (this gurantees that no tree will have height larger than ln2(n)). For this purpose we make another array, h (n), and store in it the height of the tree underneath a given node (so leaf nodes in the tree have minimal height 0, while roots have the maximal height label). When a shorter tree merges with a bigger tree the heights of the two trees do not change. When two trees of equal height merge, the taller tree gets higher by 1 unit.
So we have the following tree merging procedure:
Procedure MergeTrees (e(i, j))
	r{i} = FindRoot (i)
	r{j} = FindRoot (j)
	# Find the taller tree and make it a parent of the shorter child:
	If ( h (r{i}) > h (r{j}) ) then 
		parent <-- r{i}  ;  child <-- r{j}		
	Else if ( h (r{i}) < h (r{j}) ) then
		parent <-- r{j}  ;  child <-- r{i}
	Else
		# In case of a tie, we can chose the smaller label to be the parent:
		parent <-- min (i, j)  ;  child <-- max (i, j)
		# Concurrent read and write:
		h (parent) <-- h (parent) + 1		
	End if
	# Now merge the trees (concurrent write):
	p (child) <-- parent               
End procedure MergeTrees
To shorten the trees in the spanning forest one can periodically perform pointer jumping, which simply means that we rehook a node higher up in its parenthood tree:
Procedure PointerJumping (N) 	
	For all nodes i in N do in parallel	
		Concurrent read and write:
		p (i) <-- p (p (i))                
	End do
End procedure PointerJumping
Edge condensation is a process of passing arcs higher up the parenthood tree, useful when some of the nodes become inactive and their arcs need to be passed to other nodes:
Procedure EdgeCondensation (E)	
	For all arcs e(i, j) in E do in parallel	
		Concurrent read:
		e(i, j) <-- e(p (i), p (j))               
	End do
End procedure EdgeCondensation
Now we can summarize an efficient parallel algorithm for computing the connected components of a graph in parallel that accepts a graph G=(N, E) as input:
Procedure BuildConnectedComponents (G)
	# Initially let each node be its own parent and mark all arcs as alive:
	For all nodes i in N do in parallel
		p (i) <-- i
	Mark all arcs as alive
	Mark all nodes as non-leaf
	Repeat until there are no alive arcs
		For all alive arcs e(i, j) in E do in parallel
			If certain conditions are true
				Merge the trees of nodes i and j
				Kill arc e(i,j)
			End if
		End for
		Optionally
			Mark leaf nodes and condense their arcs			
			Perform pointer jumping on non-leaf nodes
			and / or
			Condense the alive arcs 
			and / or 
			Remove duplicate alive arcs
		End optionally
	End repeat
	TreeCollapse () # Not neccessary if invoked above
End procedure BuildConnectedComponents
The main step in any algorithm is MergeTrees. If executed for all arcs, it will guarantee a correct connected component labeling so long as trees are merged acyclically (acyclicity of the parenthood relations is required in any tree). This can be guaranteed by using some acyclic property when choosing which tree merges with which, such as comparing the heights of the two trees or comparing the values of the labels of the roots of the two trees.

There are two main approaches to the above algorithm.
  1. In the first approach, if there are no additional certain conditions imposed on the merging, then all arcs will be used and killed in the first iteration. However, this requires traversing the trees to find their roots, which is a logarithmic operation O (ln (n)). The total running time of such an approach is O (m ln2 (n)).
  2. List Item Two In the second approach, trees are not always merged. The most popular choice is to only merge stars to other trees, or merging only stars and performing full tree collapse on each iteration so that all trees are either isolated vertices or stars. In this approach, each iteration in to the Repeat loop is of cost O (m), but then there are likely to be O (ln (n)) iterations until all arcs are dead. So the cost is still a logarithmic factor worse than the best serial algorithms for sparse graphs--O (n).
We will employ the first approach because of its simplicity:
Procedure BuildConnectedComponents-PRAM (G)
	# Initially let each node be its own parent:
	For all nodes i in N do in parallel
		p (i) <-- i
	For all arcs e(i, j) in E do in parallel
		MergeTrees (e(i, j))
		Optionally perform some kind of pointer jumping on i and j
	End for
	TreeCollapse ()
End procedure BuildConnectedComponents-PRAM
In the initial implementation we will not use pointer jumping. This is because the height property of trees can not be maintained under such operations and a O (m ln2 (n)) bound can not be guaranteed for some pathological graphs.

A Simple Distributed Relaxation Connected-Components Algorithm

The above algorithm were well suited for PRAM CRCW (Parallel Random Access Memory with Concurrent Read and Concurrent Write capability) models. This model can not be directly implemented even on most of todays SMP's, because of the concurrent read & write capability. HPF is not a suitable language for implementing such runtime communication and/or job distribution irregularities. So we need to look at another distributed-memory connected-component labeling algorithm.
The simplest kind of algorithm is a relaxation algorithm known as local diffusion [4]. Instead of the previous parenthood label p (node) we will use the label c (node) just to indicate that there are no parenthood trees being created but a labeling of the nodes is built right away. The algorithm can be expressed a few lines:
Procedure BuildConnectedComponents-LocalDiffusion (G) 
	# Initially let each node be its own parent:
	For all nodes i in N do in parallel
		p (i) <-- i
	# Local diffusion relaxation:
	Repeat until convergence to a stationary point
		For all nodes i in N do in parallel	
			c (i) <-- min (c (i), { c (j) for all e (i, j) in E} )
		End for
	End repeat
End procedure BuildConnectedComponents-LocalDiffusion
In words, set the label of each node to the minimum of its own label and all of its neighbouring nodes, and iterate until convergence to a stationary labeling of the nodes. We can easily implement the above algorithm using the heads & tails arrays and avoiding the use of a temporary by speeding up the relaxation:
Procedure BuildConnectedComponents-Relax (G) 
	For all nodes i in N do in parallel
		p (i) <-- i
	Repeat until convergence to a stationary point
		# Relaxation step:
		For all arcs e(i, j) in E do in parallel
			c (i), c (j) <-- min ( c (i), c (j) )
		End for
	End repeat
End procedure BuildConnectedComponents-Relax
The relaxation step has to be repeated as many times as the (chemical) length of the biggest percolation cluster in the network, which can be of the order of the size of the system [4]. So this algorithm is not very efficient. Figure 5 shows the absolute magnitude of the label mismatch M across all arcs during the course of the relaxation for two points, one above, one at, and one below the percolation point:
# Total mismatch over all arcs:
M <-- Sum of |c (i) - c (j)| over all arcs e(i, j)
Convergence behaviour of the local diffusion relaxation algorithm
Fig 4. Convergence behaviour of the local diffusion relaxation algorithm
The grid graph was 2D of size [ 1000 x 1000 ]

A Mixed Approach

The optimal distributed-memory connected-components algorithm on most machines will consist of two phases: In summary, here is the full distributed connected-components algorithm:
Procedure BuildConnectedComponents-Distributed (G)
	# Select the non-remote arcs (Pi is the processor owning node i)
	E(l) <-- {e(i, j) | (Pi = Pj)} 	
	# Local phase:	
	BuildConnectedComponents-PRAM(G(l)=(E(l), N))
	# Mark the nodes that are not parents (in TreeCollapse()):
	N(np) <-- {i | p (i) != i} 
	# Select only the remote arcs:
	E(g) <-- E \ E(l) 
	# Condense the remote arcs:
	EdgeCondensation (E(g))
	Remove duplicate entries from (E(g))
	# Global phase
	BuildConnectedComponents-Relax ({G(g)=(E(g), N))
	# Update of global labels:
	PointerJumping (N(np))
End procedure BuildConnectedComponents-Distributed