Tarjan's Strongly Connected Components Algorithm
Files: sc.c, sc.h
If a set of vertices within a graph are "strongly connected", then their is a
path between any vertex pair (vs, vf) where vs is any starting vertex and vf
is any finishing vertex. That is, each vertex is reachable from every other
vertex. Tarjan's strongly connected (SC) component algorithm determines which
vertices in a directed graph are strongly connected. The algorithm proceeds in
a depth first search manner. It is described below:
SC(v)
{
N[v] = c; /* Mark v visited by assigning it a visit number. */
L[v] = c; /* Low-link initially equal to visit number. */
c++;
push v onto the stack;
for each w in OUT(v) {
if N[w] == UNDEFINED { /* N[w] == UNDEFINED means w is unvisited. */
Add edge (v, w) to T;
DFS(w);
L[v] = min(L[v], L[w]); /* Low-link number can propagate upward.
*/
}
else if w is on the stack {
L[v] = min(L[v], N[w]);
}
}
/* Check if SC component found. */
if L[v] == N[v] {
pop vertices off stack down to v;
/* These vertices make up an SC component. */
}
}
main_program {
T := EMPTY; /* T is the tree made up of traversed edges. */
c : = 0; /* c is the counter for visit numbers. */
for each vertex, v, in the graph,
N[v] = UNDEFINED /* Mark v "unvisited". */
SC(v0); /* v0 is the starting vertex. */
}
As the DFS proceeds it assigns a DFS (or visit) number to each vertex as it is
visited. The algorithm also maintains a low-link number, which is initially
assigned the same value as the visit number when a vertex is visited for the
first time. A stack of vertices is maintained, and when a vertex is visited
for the first time it is placed on the stack. We describe Tarjan's algorithm
by considering the imaginary DFS tree that is built up as the algorithm
proceeds.
As the DFS tree is built vertices, v, are assigned visit numbers N[v] and
low-link numbers L[v]. There are two ways that it is possible for the low-link
number to be updated as DFS proceeds. Suppose the algorithm is at vertex v in
the graph. The first kind of update is when the algorithm follows an edge to a
vertex w in OUT(v) that was previously visited and is still in the stack. In
this case, L[v] is updated to the smaller of L[v] and N[w]. The second kind of
update is for L[w] to propagate up the DFS tree and overwrite L[v], since after
the DFS from w is completed L[v] is updated to the smaller of L[v] and L[w].
After the DFS from a vertex, v, is complete, the algorithm checks to see if
L[v] and N[v] are still equal (recall that the low-link and visit numbers were
initially the same). If L[v] is equal to N[v] then vertices are popped off the
stack, down to and including vertex v. A set of vertices popped off the stack
together make up an SC component. After the DFS terminates at the starting
vertex, all SC components have been found.
If the the SC components are condensed into single vertices with edges between
them, we have an acyclic graph. A side-effect of Tarjan's SC component
algorithm is that the SC components pop off the stack in topological order.
----------
In the source files sc.c and sc.h, SC components are represented using arrays
of vertex numbers. The structure type for SC component results is given below:
typedef struct sc_result {
int size, n_sets;
int *vertices;
int *sets_s, *sets_f;
} sc_result_t;
Field sets_n stores the number of SC components found in
the search. Field size specifies the array sizes. The SC components are
stored using three arrays:
- sets_s[i] gives the starting position in the vertices[] array of SC
component i.
- sets_f[i] gives the finishing position in the vertices[] array of SC
component i.
- vertices[] is used for storing the vertex numbers of vertices in the SC
components.
For example if there are three SC components the vertices in each are stored as
follows:
- SC0: vertices[sets_s[0]] ... vertices[sets_f[0]].
- SC1: vertices[sets_s[1]] ... vertices[sets_f[1]].
- SC2: vertices[sets_s[2]] ... vertices[sets_f[2]].
Note that array entries sets[3] onward are set to UNDEFINED.