Home / Algorithms

Intro to Graph Algorithms

Introduction

A graph is an abstraction representing some objects and connections between pairs of objects. In its simplest form, a graph has just two pieces:

More generally, a graph contains information about what pairs of objects are related to each other in some way.

Let us first define some terminology that will make talking about graphs less cumbersome. If GG has an edge e=(u,v)e = (u, v), we say uu and vv are adjacent. We also say uu is a neighbor of vv and vv is a neighbor of uu. The edge ee is also referred to as being incident on uu (and vv, as well).

The degree of a node vv (denoted deg(v)\text{deg}(v)) is the number of other nodes adjacent to it, or in other words, the number of neighbors it has. Equivalently, deg(v)\text{deg}(v) is the number of edges incident on vv.

We define a path as a sequence of nodes such that any two adjacent nodes in the sequence are adjacent in GG. Two nodes uu and vv are then reachable from each other if there is a path that starts at uu and ends at vv (or vice versa).

As with most sufficiently complex mathematical notions, there is an infinite black hole of graph theory terminology, much of which we will see in the future. For now, however, these terms are sufficient for making basic claims about graphs. The following is one of the most important facts about graphs, especially for the purpose of runtime analysis.

The Handshaking Lemma. For any graph GG, the sum of the degrees of every node is twice the number of edges. That is,

vVdeg(v)=2E\sum_{v \in V} \text{deg}(v) = 2|E|

Intuition. It is more useful to consider the edge-incidence definition of degree here. First, observe that every edge is incident on exactly two vertices. Thus, in our sum of degrees, any given edge contributes to exactly two deg(v)\text{deg}(v) terms by 1. Therefore, we would expect this sum to be equal to twice the number of edges.

Proof. We can formalize this intuitive idea.

TODO!

Representations

We have discussed the mathematical idea of a graph, but it remains to define the abstract data type for use in our code:

Operation Description
Vertex iteration Iterate over every vertex vVv \in V.
Edge iteration Iterate over every edge eEe \in E.
Edge detection For two vertices uu and vv, determine whether (u,v)E(u, v) \in E.
Neighbor iteration For some specified uu, iterate over every vVv \in V where (u,v)E(u, v) \in E.

There are two common ways of implementing this data type.

Adjacency List

The most common way of representing a graph is by associating each vertex with a list of its neighbors, as shown in Figure 2.

Figure 2: A graph (left) and its adjacency list representation (right). Hover over vertices to see their corresponding list in the map.

Adjacency Matrix

One way to interpret a graph is that it is simply a binary relation on the vertices – that is, any two vertices are either adjacent or they aren’t adjacent. This leads to a natural programmatic representation – for every pair of vertices, keep track of whether they are adjacent or not. We can use any data type that acts as a two-place boolean predicate, but the most obvious choice is a two dimensional array. True to the relation interpretation, we could use a two-dimensional boolean array, but for reasons that will make sense later, it is more typical to use an integer array and store a 11 for adjacency and 00 for non-adjacency. This array is known as an adjacency matrix.

Figure 3: A graph (left) and its adjacency matrix representation (right). Hover over vertices to see their corresponding row and column in the matrix.

Adjacency matrices perform one operation very quickly: to detect whether two vertices are connected by an edge, we can simply test whether the adjacency matrix has a 1 in the corresponding position. This can be done with (approximately) two pointer deferences, so this operation takes Θ(1)\Theta(1) time.

Unfortunately, unlike the case of the adjacency list, we can no longer quickly find neighbors. The adjacency matrix in Figure 3 demonstrates this. For a vertex viv_i, we have no choice but to search the entire ii-th row (or column) and check which entries have value 1. Neighbor iteration, thus, takes Θ(n)\Theta(n). For the worst possible graphs, this is no worse than an adjacency list (since the degree of every vertex might be in Θ(n)\Theta(n)), but many graphs have much fewer edges than the worst case. A graph is called sparse if its number of edges is small compared to the maximum possible. It is dense if its number of edges is closer to the maximum. Graphs that are sparse typically benefit a lot from the adjacency list representation, since vertex degrees are small compared to the number of vertices.

Searching

So far, we have only answered local questions about graphs:

But not yet questions which answer higher level, global properties of graphs. Here is an example:

The reachability problem: Given a graph GG and two vertices uu and vv, determine whether there is a path between uu and vv using edges in GG.

It is easy to see why this is a practically useful problem to have a solution for.

The most common way to solve the reachability problem is with a searching algorithm. There are many different ways to search a graph, but almost all of them are basic variations of the following high level algorithm.

We start with some specified vertex vsv_s (for start). We maintain a partition of the vertices VV into two sets, which I will call XX (for explored) and UU for (unexplored). As the names imply, XX consists of vertices which we are sure are reachable from vsv_s, while UU consists of vertices for which this reachability is unknown. Call an edge (x,u)(x, u) a fringe edge if xXx \in X and uUu \in U. Every iteration, we find a fringe edge and use it to increase the size of XX, and repeat this until there are no more fringe edges. Algorithm 1 shows very high level psuedocode for such an algorithm.

We have left out a lot of details here, but let’s first prove that this high-level algorithm accomplishes our goal before we try to flesh it out.

Theorem. After algorithm 1 terminates, XX contains exactly the vertices in GG which are reachable from vsv_s.

Intuition. The basic idea is that at all times, there is a path from vsv_s to any xXx \in X. So, in order to add a new vertex to XX, we need to find a path to it from vsv_s. If we have a fringe edge (x,u)(x, u), we can find the path from vsv_s to xx and append uu to get a path from vsv_s to uu. Thus, it is safe to add uu to XX. Every time we do this, we increase the size of XX by one, and as long as the graph is finite, we will eventually run out of nodes in GG to add.

Proof. We proceed by induction in the shortest path distance from vsv_s.

TODO!

We can’t analyze the runtime of this algorithm yet as we have not specified all of its details. How do we actually find one of these fringe edges in the first place? It’s easy to check whether any particular edge is a fringe edge, so the most straightforward approach is checking all of the edges.

There are at most nn iterations (in the case we add all vertices to XX), and FindFringe takes Θ(m)\Theta(m) time, so the entire algorithm takes Θ(nm)\Theta(nm) time.

But every iteration, we’re only adding one vertex to XX, so the fringe is not changing dramatically. It seems like we’re wasting a lot of time by checking all of the edges every time when most of them haven’t changed to or from being fringe edges. Indeed, we can do much better than this.

Breadth-first Search

Depth-first Search

TODO!

Path Reconstruction

TODO!

Minimum Spanning Trees

Theorem (The Minimum Spanning Tree Lemma). Suppose TT is a minimum spanning tree of a graph GG. Let ee be an edge in GTG \setminus T. Then the following are true:

Figure 5: A spanning tree of a graph. Hover over a non-tree edge to reveal the cycle guaranteed by the minimum spanning tree lemma.