A graph is an abstraction representing some objects and connections between pairs of objects. In its simplest form, a graph has just two pieces:
More generally, a graph contains information about what pairs of objects are related to each other in some way.
Let us first define some terminology that will make talking about graphs less cumbersome. If has an edge , we say and are adjacent. We also say is a neighbor of and is a neighbor of . The edge is also referred to as being incident on (and , as well).
The degree of a node (denoted ) is the number of other nodes adjacent to it, or in other words, the number of neighbors it has. Equivalently, is the number of edges incident on .
We define a path as a sequence of nodes such that any two adjacent nodes in the sequence are adjacent in . Two nodes and are then reachable from each other if there is a path that starts at and ends at (or vice versa).
As with most sufficiently complex mathematical notions, there is an infinite black hole of graph theory terminology, much of which we will see in the future. For now, however, these terms are sufficient for making basic claims about graphs. The following is one of the most important facts about graphs, especially for the purpose of runtime analysis.
The Handshaking Lemma. For any graph , the sum of the degrees of every node is twice the number of edges. That is,
Intuition. It is more useful to consider the edge-incidence definition of degree here. First, observe that every edge is incident on exactly two vertices. Thus, in our sum of degrees, any given edge contributes to exactly two terms by 1. Therefore, we would expect this sum to be equal to twice the number of edges.
Proof. We can formalize this intuitive idea.
We have discussed the mathematical idea of a graph, but it remains to define the abstract data type for use in our code:
Operation | Description |
---|---|
Vertex iteration | Iterate over every vertex . |
Edge iteration | Iterate over every edge . |
Edge detection | For two vertices and , determine whether . |
Neighbor iteration | For some specified , iterate over every where . |
There are two common ways of implementing this data type.
The most common way of representing a graph is by associating each vertex with a list of its neighbors, as shown in Figure 2.
One way to interpret a graph is that it is simply a binary relation on the vertices – that is, any two vertices are either adjacent or they aren’t adjacent. This leads to a natural programmatic representation – for every pair of vertices, keep track of whether they are adjacent or not. We can use any data type that acts as a two-place boolean predicate, but the most obvious choice is a two dimensional array. True to the relation interpretation, we could use a two-dimensional boolean array, but for reasons that will make sense later, it is more typical to use an integer array and store a for adjacency and for non-adjacency. This array is known as an adjacency matrix.
Adjacency matrices perform one operation very quickly: to detect whether two vertices are connected by an edge, we can simply test whether the adjacency matrix has a 1 in the corresponding position. This can be done with (approximately) two pointer deferences, so this operation takes time.
Unfortunately, unlike the case of the adjacency list, we can no longer quickly find neighbors. The adjacency matrix in Figure 3 demonstrates this. For a vertex , we have no choice but to search the entire -th row (or column) and check which entries have value 1. Neighbor iteration, thus, takes . For the worst possible graphs, this is no worse than an adjacency list (since the degree of every vertex might be in ), but many graphs have much fewer edges than the worst case. A graph is called sparse if its number of edges is small compared to the maximum possible. It is dense if its number of edges is closer to the maximum. Graphs that are sparse typically benefit a lot from the adjacency list representation, since vertex degrees are small compared to the number of vertices.
So far, we have only answered local questions about graphs:
But not yet questions which answer higher level, global properties of graphs. Here is an example:
The reachability problem: Given a graph and two vertices and , determine whether there is a path between and using edges in .
It is easy to see why this is a practically useful problem to have a solution for.
The most common way to solve the reachability problem is with a searching algorithm. There are many different ways to search a graph, but almost all of them are basic variations of the following high level algorithm.
We start with some specified vertex (for start). We maintain a partition of the vertices into two sets, which I will call (for explored) and for (unexplored). As the names imply, consists of vertices which we are sure are reachable from , while consists of vertices for which this reachability is unknown. Call an edge a fringe edge if and . Every iteration, we find a fringe edge and use it to increase the size of , and repeat this until there are no more fringe edges. Algorithm 1 shows very high level psuedocode for such an algorithm.
We have left out a lot of details here, but let’s first prove that this high-level algorithm accomplishes our goal before we try to flesh it out.
Theorem. After algorithm 1 terminates, contains exactly the vertices in which are reachable from .
Intuition. The basic idea is that at all times, there is a path from to any . So, in order to add a new vertex to , we need to find a path to it from . If we have a fringe edge , we can find the path from to and append to get a path from to . Thus, it is safe to add to . Every time we do this, we increase the size of by one, and as long as the graph is finite, we will eventually run out of nodes in to add.
Proof. We proceed by induction in the shortest path distance from .
We can’t analyze the runtime of this algorithm yet as we have not specified all of its details. How do we actually find one of these fringe edges in the first place? It’s easy to check whether any particular edge is a fringe edge, so the most straightforward approach is checking all of the edges.
There are at most iterations (in the case we add all vertices to ), and FindFringe takes time, so the entire algorithm takes time.
But every iteration, we’re only adding one vertex to , so the fringe is not changing dramatically. It seems like we’re wasting a lot of time by checking all of the edges every time when most of them haven’t changed to or from being fringe edges. Indeed, we can do much better than this.
Theorem (The Minimum Spanning Tree Lemma). Suppose is a minimum spanning tree of a graph . Let be an edge in . Then the following are true: