How to compute simplicial homology in Python

mathematics

Fri, Mar 25, 22

The last two weeks or so, I spent some time learning about homological algebra in algebraic topology. But I'm one of those guys who can't really familiarize with a concept until they write code for it, so I had to learn how to compute (real) homology algorithmically in the simple context of simplicial complexes. It turns out to be somewhat manageable once you understand a few tricks, and today I'm going to show you how I did it. At the end of this article, you'll be able to specify a simplicial complex as a set of k-simplices and automatically compute the boundary operators and bases of its homology spaces.

We will consider homology with real coefficients. If you’re already familiar with the subject, you know that some important topological information is lost by doing so, as homology with coefficients in a field ignores torsion effects. It is however simpler to understand. I may write a follow-up article to show how to compute homology with integer coefficients at some point. Most of what I’m going to discuss comes from this excellent lecture on computational algebraic topology by Michael Robinson, who works as an associate professor at the American University. This awesome guy also has a Github repo with computational homology implementations. I’m pretty sure his stuff runs faster than mine, but I wanted to implement something a bit more readable.

Disclaimer: I am not a mathematician, although I very much enjoy mathematics, so I’ll probably be doing some handwaving here and there. If you’re interested in learning computational algebraic topology the right way, find some online lecture like the one I mentioned, or read a reference handbook on the subject (I can recommend [1] which is a very intelligently put and cleverly illustrated overview of applied topology).

Terminology and definitions

We’ll need to give some quick definitions to set the ideas, before we do some code. You can skip this part if you already know what simplicial homology is and you’re eager to get your hands dirty.

Simplicial complex

A simplicial complex can be viewed as a generalization of a graph. It is a set of geometric elements called simplices (singular: simplex) that is closed under inclusion. They can approximate topological spaces (up to homeomorphism), and are used in algebraic topology (along with other kinds of complexes) to make certain calculations possible.

Definition : Given a set

V

of elements called vertices (singular: vertex), an abstract simplicial complex

X (V)

is a set of subsets of

V

such that:

If $v \in V$ then ${v} \in X (V)$ .
If $σ \in X (V)$ and $τ \subseteq σ$ , then $τ \in X (V)$ .

σ

is called a simplex and

τ

is a face of

σ

. Pairs of vertices are called edges, and an arbitrary simplex containing

k + 1

vertices is called a k-simplex.

Definition : The dimension of a simplex

σ

is its cardinality minus one:

\dim σ = | σ | - 1

, so a k-simplex is of dimension k. If

τ \subset σ

, we say that

τ

is a face of

σ

of codimension

n

n = codim τ = \dim σ - \dim τ

In what follows, if we don’t explicitly specify the codimension, we will assume that it is 1. So a face of a k-simplex $σ$ is understood as a (k-1)-simplex, where some vertex of $σ$ was deleted.

Now all the terminology here is borrowed from geometry for a reason: any abstract simplicial complex, which is a combinatorial object in nature, can be mapped to its geometric counterpart embedded in $R^{d}$ (called a simplicial complex) via an operation called geometric realization. Then to each vertex will correspond a point, to each edge a line segment, to each 2-simplex a triangle, and to each k-simplex a standard k-simplex of $R^{d}$ . Because of this unambiguous relation between an abstract simplicial complex and a simplicial complex, I may conflate the two terms.

An example simplicial complex. Here, FG is a face of the 2-simplex FGH. If we consider that the tetrahedron ABCD is filled, it is itself a 3-simplex, and ABC is one of its faces. The triangle CDE is not filled, so it is not a 2-simplex and CE is not a face.

Simplicial complexes have a wide range of applications in data science. Maybe you want to characterize the underlying topology of a communication / information network using something like a Čech complex or a Vietoris-Rips complex (see [5]). Or you’re studying a brain connectome (see [6]). In practice, the simplicial complexes one encounters in research are huge, and computational methods are desired in order to study them. One interesting measurement you can do which gives relevant topological information is simplicial homology.

Simplicial homology is actually a sequence of measurements $H_{k}$ each consisting of the k-cycles of the complex, up to an equivalence relation. In order to define simplicial homology more rigorously, we will need to specify some additional structure.

Additional structure on simplicial complexes

k-chain spaces

Let $X$ be an abstract simplicial complex, and consider the free vector spaces $C_{k} (X, F)$ spanned by its k-simplices. The elements of $C_{k} (X, F)$ are called k-chains, and consist of finite formal sums of k-simplices, with coefficients in a field $F$ (we will assume that $F = R$ in this article and henceforth omit to mention the field):

\begin{matrix} (1) & C_{k} (X) ≜ {span}_{R} {σ ∣ σ \in X} = {\sum_{i} a_{i} σ_{i} ∣ a_{i} \in R, σ_{i} \in X} \end{matrix}

Notice that the dimension of $C_{k} (X)$ is the number of k-simplices in $X$ , as they form a basis of $C_{k} (X)$ . If we have $d$ such k-simplices, then we have $C_{k} (X) ≅ R^{d}$ (as any real vector space of dimension $d$ is isomorphic $R^{d}$ ). So instead of thinking of the k-chains as formal sums, I find it more useful to imagine that we have a correspondence between each k-simplex and a true-to-god basis column vector of $R^{d}$ , and k-chains are just arbitrary column vectors in $R^{d}$ .

Without loss of generality, we can consider that abstract simplicial complexes are specified in terms of ordered lists of k-simplices. The order is whatever we like, as long as we are consistent. $C_{k} (X)$ is then taken to be the vector space whose basis is the ordered list of k-simplices in $X$ . Specifically, if $σ_{i} \in C_{k} (X)$ is the $i$ -th element in the ordered list, we assign it the familiar unit basis vector ${\hat{e}}_{i}$ , whose components are all zeros, except a one at index $i$ .

Boundary operators

Now we can define the linear maps $\partial_{k}$ between these spaces, called the boundary operators. $\partial_{k} : C_{k} (X) \to C_{k - 1} (X)$ takes any linear combination of k-simplices, and outputs a linear combination of faces of these simplices. Let $\partial_{k}$ act on a k-chain. By linearity of $\partial_{k}$ we have:

\begin{matrix} (2) & \partial_{k} \sum_{i} a_{i} σ_{i} = \sum_{i} a_{i} \partial_{k} (σ_{i}) \end{matrix}

So it is sufficient to specify the action of $\partial_{k}$ on a single simplex. Let $σ = [v_{0}, \dots, v_{k}]$ be a k-simplex. I’ll use the square brackets to denote that the vertices are ordered. Then:

\begin{matrix} (3) & \partial_{k} [v_{0}, \dots, v_{k}] ≜ \sum_{i}^{k} (- 1)^{i} [v_{0}, \dots, \hat{v_{i}}, \dots, v_{k}] \end{matrix}

Here the hat $\hat{v_{i}}$ denotes that the $i$ -th vertex is omitted. So in each term, we delete the $i$ -th vertex, and the parity is affected, so we end up with an alternating sum of faces of the original simplex.

If we consider this example simplicial complex, applying a boundary operator on the 2-simplex $[B C D]$ shown in orange gives the 1-chain $[C D] - [B D] + [B C]$ . Interpreting the signs as orientations of the faces, we see that the action of this operator is to take a k-simplex to its (oriented) boundary, hence the name.

In $C_{k} (X)$ , the elements of $Im \partial_{k + 1}$ are called boundaries, and the elements of $Ker \partial_{k}$ are called cycles. A cycle is thus a k-chain whose boundary is zero. The boundary operators are crafted in such a way that the boundary of a boundary is always zero (every boundary is a cycle):

Theorem :

\begin{matrix} (4) & (\partial_{k - 1} \circ \partial_{k}) σ = 0, \forall σ \in C_{k} (X), \forall k \end{matrix}

Proof [click to expand]

3

and by linearity of

\partial_{k - 1}

we have:

\begin{matrix} (5) & \begin{aligned} (\partial_{k - 1} \circ \partial_{k}) [v_{0}, \dots, v_{k}] & = \sum_{i = 0}^{k} (- 1)^{i} \partial_{k - 1} [v_{0}, \dots, \hat{v_{i}}, \dots, v_{k}] \\ = \sum_{i = 0}^{k} (- 1)^{i} [\sum_{j = 0}^{i - 1} (- 1)^{j} [v_{0}, \dots, \hat{v_{j}}, \dots, \hat{v_{i}}, \dots, v_{k}] + \sum_{j = i}^{k} (- 1)^{j + 1} [v_{0}, \dots, \hat{v_{i}}, \dots, \hat{v_{j}}, \dots, v_{k}]] & = 0 \end{aligned} \end{matrix}

We split the sum about index

i

. After

\hat{v_{i}}

which is not there, all the vertices are shifted to the left, so we pick up an extra minus sign (hence the

(- 1)^{j + 1}

). The two inner sums give rise to similar terms of opposite sign, so the whole bracket vanishes.

A chain complex

Notice that the collection of k-chains and boundary maps form a graded sequence of the form:

\begin{matrix} (6) & \dots \overset{\partial_{k + 2}}{\to} C_{k + 1} (X) \overset{\partial_{k + 1}}{\to} C_{k} (X) \overset{\partial_{k}}{\to} C_{k - 1} (X) \overset{\partial_{k - 1}}{\to} \dots \end{matrix}

And by $4$ we have that the image of $\partial_{k + 1}$ is always contained within the kernel of the next map $\partial_{k}$ . Such a structure is called a chain complex in algebra. Now, whenever we have a chain complex, we can ask ourselves to which extent $Im \partial_{k + 1}$ is exactly equal to $Ker \partial_{k}$ . This idea is formalized by the notion of homology, as we will see, but before we can get to it, let’s define what a generic chain complex is first.

Chain complex

The pattern we identified earlier, where one comes up with a sequence of morphisms between spaces, such that the composition of two consecutive morphisms always gives the zero map, is in fact very recurring in abstract algebra. This setup is so common in fact, that there is a name for it: chain complex.

Definition : Let

{C_{k}}

represent an ordered list of spaces (of the same abelian category), with a decreasing index

k

. For each consecutive pairs of spaces, let

ϕ_{k} : C_{k} \to C_{k - 1}

be a morphism:

\begin{matrix} (7) & \dots \overset{ϕ_{k + 2}}{\to} C_{k + 1} \overset{ϕ_{k + 1}}{\to} C_{k} \overset{ϕ_{k}}{\to} C_{k - 1} \overset{ϕ_{k - 1}}{\to} \dots \end{matrix}

Such a sequence is called a chain complex whenever

Im ϕ_{k} \subseteq Ker ϕ_{k - 1}, \forall k

. We will denote this structure by

(C_{∙}, ϕ_{∙})

A chain complex with its innards exposed. Here $Z_{k} = Ker \partial_{k}$ and $B_{k} = Im \partial_{k + 1}$ .

Definition : If the previous inclusion is a strict equality for all

k

, we say that the chain complex is exact or that we have an exact sequence (of morphisms).

So any morphism of an exact chain complex will always map a space onto the kernel of the next morphism.

Remark : Notice that if we have the following exact sequence:

\begin{matrix} (8) & 0 \to A \overset{f}{\to} B \overset{g}{\to} C \to 0 \end{matrix}

Then:

$f$ is injective: by exactness, $Ker f$ is the image of the leftmost map, which is $0$ .
$g$ is surjective: the kernel of the rightmost map is the whole $C$ , and by exactness, $Im g = C$ .

Such a sequence is called a short exact sequence.

Homology

Not all chain complexes are exact, that’s a fact. Given an arbitrary chain complex, we would like to quantify how it fails to be exact, and that’s the purpose of homology. In mathematics, when we want to measure by how much a given property fails to be respected by a collection of objects, we usually form a quotient of that collection, in order to mod out the elements that respect this property. We’re left with a quotient space whose elements represent in some sense the many ways to fail respecting this property. I’m not going to properly define quotient spaces here, I assume you already know that much if your goal is to compute homology spaces.

Definition : The k-th homology of a chain complex of the form

7

is defined as:

\begin{matrix} (9) & H_{k} (C_{k}) ≜ Ker ϕ_{k} / Im ϕ_{k + 1} \end{matrix}

Assuming our $C_{k}$ are vector spaces (which is the case for our simplicial k-chain spaces), by definition of the quotient, the k-th homology space is the following space of cosets:

\begin{matrix} (10) & H_{k} (C_{k}) = {{x + Im ϕ_{k + 1}} ∣ \forall x \in Ker ϕ_{k}} \end{matrix}

So if $Ker ϕ_{k} = Im ϕ_{k + 1} \forall k$ , which happens when the sequence is exact, then $x + Im ϕ_{k + 1}$ will span the whole kernel:

\begin{matrix} (11) & x + Im ϕ_{k + 1} = x + Ker ϕ_{k} = Ker ϕ_{k} \end{matrix}

So there is only one coset, therefore the quotient consists of a single element, which means it is trivial. Conversely, we can also show that if all homology spaces are trivial, then the chain complex is exact. So saying that a chain complex is exact or that it has all trivial homology spaces are equivalent statements. This goes to show that homology really quantifies the obstruction to exactness. Moreover, because we have an indexed sequence of homology spaces, it tells us exactly where this obstruction occurs.

Simplicial homology

Let’s go back to our simplicial complex $X$ , its k-chain spaces $C_{k} (X)$ and its boundary maps $\partial_{k}$ , and observe that $(C_{∙} (X), \partial_{∙})$ really is a chain complex under the previous definitions. This means that we can compute homology on it.

Definition : The k-th simplicial homology of

(C_{∙} (X), \partial_{∙})

is defined as:

\begin{matrix} (12) & H_{k} (X) ≜ Ker \partial_{k} / Im \partial_{k + 1} \end{matrix}

So we take cycles, and mod out boundaries. The combinatorial interpretation of simplicial homology is somewhat complicated to wrap one’s head around, and I find the geometric interpretation easier to grasp. In the geometric realization, cycles generalize the notion of a cycle in a graph: they represent a sequence of k-dimensional simplices that loop around somehow (and the signs can be interpreted as their respective orientations). And if such a cycle loops around an existing (k+1)-simplex, then it is the boundary of that simplex. But it is possible that the cycle loops around nothing, when its elements are not the faces of a higher dimensional simplex. In this case, we have a hole. The point is that in an arbitrary (abstract) simplicial complex not all cycles are boundaries, and simplicial homology quantifies the obstruction of cycles to be boundaries. So in some sense, it detects holes in the geometric realization.

In this example, $[A B C]$ is not a 2-simplex. Then, the 1-cycle $[B C] - [A C] + [A B]$ is not the boundary of a 2-simplex, so it identifies a hole in the complex, and corresponds to the unique basis vector in $H_{1}$ .

The dimension of $H_{k} (X)$ can be interpreted as the number of k-dimensional holes in the simplicial complex. $\dim H_{0} (X)$ in particular, can be understood as the number of path connected components in $X$ .

Computing the boundary operators

In this part, we are going to explicitly compute boundary operators within the context of a toy example, then we’ll derive some Python code to do this automatically.

Manual computation

Because the $\partial_{k}$ are linear maps between vector spaces, they can be expressed as matrices. Their expression will depend on the underlying simplicial complex, but we can use $3$ to cook up a procedure to build them. Let’s consider an example. Let $X$ be the following abstract simplicial complex:

\begin{matrix} (13) & X = [\emptyset, [A], [B], [C], [D], [A B], [A C], [B C], [B D], [C D], [B C D]] \end{matrix}

A geometric realization would be two triangles sharing an edge, one triangle being filled and the other hollow:

$\text{[math]}$

We have one 2-simplex, five edges and four vertices. So the associated chain complex is of the form:

\begin{matrix} (14) & 0 \overset{\partial_{3}}{\to} R \overset{\partial_{2}}{\to} R^{5} \overset{\partial_{1}}{\to} R^{4} \overset{\partial_{0}}{\to} 0 \end{matrix}

Note that as there is no 3-simplex, $C_{3} (X)$ is trivial and $\partial_{3}$ is a zero map. Also, we can always complete the chain complex on the right with another zero map $\partial_{0}$ and a trivial space: as we don’t suppose exactness, we don’t force surjectivity on $\partial_{1}$ by doing so. $\partial_{3}$ can be seen as a $(1 \times 0)$ matrix and $\partial_{0}$ as a $(0 \times 4)$ matrix. I know it’s weird, but interestingly enough, the numpy library we are going to use can handle these kinds of edge cases seamlessly.

Only $\partial_{2}$ and $\partial_{1}$ are non-trivial here. Let’s do $\partial_{2}$ first. We have a single 2-simplex $[B C D]$ that is mapped to a linear combination of five edges. By $3$ we have:

\begin{matrix} (15) & \begin{aligned} \partial_{2} [B C D] & = [C D] - [B D] + [B C] \\ = 0 \cdot [A B] + 0 \cdot [A C] + [C D] - [B D] + [B C] \end{aligned} \end{matrix}

This linear combination of basis vectors of $C_{1} (X)$ can be written in column form:

\begin{matrix} (16) & \partial_{2} = (\begin{array}{cc} [B C D] \\ [A B] & 0 \\ [A C] & 0 \\ [B C] & 1 \\ [B D] & - 1 \\ [C D] & 1 \end{array}) \end{matrix}

Here we annotated the rows and columns of our matrix so we can keep track of where simplices are mapped to. Same spiel for $\partial_{1}$ , we just need to consider the action of $\partial_{1}$ on each edge and concatenate the columns we obtain. So $\partial_{1} [A B] = [B] - [A]$ , $\partial_{1} [A C] = [C] - [A]$ and so on. We arrive at this expression for $\partial_{1}$ :

\begin{matrix} (17) & \partial_{1} = (\begin{array}{cccccc} [A B] & [A C] & [B C] & [B D] & [C D] \\ [A] & - 1 & - 1 & 0 & 0 & 0 \\ [B] & 1 & 0 & - 1 & - 1 & 0 \\ [C] & 0 & 1 & 1 & 0 & - 1 \\ [D] & 0 & 0 & 0 & 1 & 1 \end{array}) \end{matrix}

We can check by matrix multiplication that $\partial_{1} \partial_{2}$ is a zero matrix.

Implementation

Suppose that our abstract simplicial complex is specified as a list of strings. Each string element can be a single character for a vertex, two characters for an edge and so on. Our previous example would look like this:

X = ["A", "B", "C", "D", "AB", "AC", "BC", "BD", "CD", "BCD"]

Note that we don’t need to specify the empty set, the algorithm does not care about it. Let’s write a boundary() function which will end up producing the full list of boundary operators. The first thing we would like to do, is to group the simplices by dimension. During this step we will also make sure that the simplices are in lexicographic order, that’s the ordering convention I used in the example without mentioning it.

def boundary(complex):
    maxdim = len(max(complex, key=len))
    simplices = [sorted([spx for spx in complex if len(spx)==i]) for i in range(1,maxdim+1)]

So we first look for the maximal simplex dimension. The max() function is used with parameter key=len to find the string of maximal length within the input array. Then for each possible string length, we form sublists of simplices of the same length. The sorted() function ensures a lexicographic ordering of these simplices within each sublist. If we were to print out the simplices list for X at this stage, we would get:

[['A', 'B', 'C', 'D'], ['AB', 'AC', 'BC', 'BD', 'CD'], ['BCD']]

These sublists represent the respective bases of all the $C_{k}$ spaces. Then we want to iterate over all the consecutive pairs of such sublists. For a given pair $(C_{k}, C_{k + 1})$ we will iterate over all the simplices $σ_{m}$ of $C_{k + 1}$ , and we will extract the faces of $σ_{m}$ thanks to a function get_faces() that we’ll specify later on. Now we just need to check which simplex $τ_{l}$ of $C_{k}$ is a face of $σ_{m}$ . If a given $τ_{l}$ is the $i$ -th face of $σ_{m}$ , then the corresponding matrix coefficient is set to the parity of $i$ : $(\partial_{k + 1})_{l, m} = (- 1)^{i}$ . Otherwise we set this coefficient to zero:

def boundary(complex):
    maxdim = len(max(complex, key=len))
    simplices = [sorted([spx for spx in complex if len(spx)==i]) for i in range(1,maxdim+1)]

    # Iterate over consecutive groups (dim k and k+1)
    bnd = []
    for spx_k, spx_kp1 in zip(simplices, simplices[1:]):
        mtx = []
        for sigma in spx_kp1:
            faces = get_faces(sigma)
            mtx.append([get_coeff(spx, faces) for spx in spx_k])
        bnd.append(np.array(mtx).T)

    return bnd

The list comprehension [get_coeff(spx, faces) for spx in spx_k] will form a row vector containing the coefficients $(\partial_{k + 1})_{l, m}$ for a given $m$ , and such vectors are concatenated in mtx for all $m$ . So we need to transpose the matrix mtx before we push it in the bnd list.

The helper functions are defined as follows:

def get_faces(lst):
    return [lst[:i] + lst[i+1:] for i in range(len(lst))]


def get_coeff(simplex, faces):
    if simplex in faces:
        idx = faces.index(simplex)
        return 1 if idx%2==0 else -1
    else:
        return 0

The function get_faces() takes a string input and returns all the possible substrings obtained by deleting a character at index $i$ , thus forming an indexed list of faces. get_coeff() takes a simplex and a list of faces, and returns either the index parity if the input simplex is found in the faces list, or zero otherwise.

We can check our boundary() function against the example abstract simplicial complex X and print out the boundary operators it returns:

print(boundary(X))

[array([[-1, -1,  0,  0,  0],
       [ 1,  0, -1, -1,  0],
       [ 0,  1,  1,  0, -1],
       [ 0,  0,  0,  1,  1]]),
array([[ 0],
       [ 0],
       [ 1],
       [-1],
       [ 1]])]

We recognize the matrices of $\partial_{1}$ and $\partial_{2}$ that we computed manually before! Now, we can do some numerical linear algebra wizardry on these to extract bases of our homology spaces.

Aside: Computing (co)kernels and (co)images

For what comes next, we will need to be able to compute numerically the bases of the kernels and cokernels of arbitrary matrices. For those who read a few articles on this blog, you’ll know by now that we love singular value decomposition, like, a lot. It turns out, once again, they’re the tool for the job. Let’s write a quick definition, and recapitulate what an SVD does.

Singular values and Singular Value Decomposition

Definition : Given an

m \times n

matrix

M

σ

is a singular value of M if and only if

\exists u \in R^{m}, v \in R^{n}

such that:

\begin{matrix} (18) & {\begin{cases} M v = σ u \\ M^{*} u = σ v \end{cases} \end{matrix}

Then,

u

(resp.

v

) is called a left (resp. right)-singular vector of

M

Definition : Given an

m \times n

matrix

M

, a singular value decomposition of

M

will produce three matrices

U

Σ

and

V

such that:

\begin{matrix} (19) & M = U Σ V^{*} \end{matrix}

$Σ = diag {σ_{i}}$ is an $m \times n$ diagonal matrix, and the singular values $σ_{i}$ are in descending order.
$U$ is an $m \times m$ unitary matrix.
$V$ is an $n \times n$ unitary matrix.

$U$ and $V$ being unitary, we know that:

The columns $u_{1}, \dots, u_{m}$ of $U$ yield an orthonormal basis of $R^{m}$ (the output space)
The columns $v_{1}, \dots, v_{n}$ of $V$ yield an orthonormal basis of $R^{n}$ (the input space)

Then, $M$ can be thought of as sending a basis vector $v_{i}$ to a scaled basis vector $σ_{i} u_{i}$ , by definition of a singular value:

\begin{matrix} (20) & M v_{i} = σ_{i} u_{i} \end{matrix}

The vectors $v_{i}$ corresponding to null singular values are sent to zero, so they span the kernel of $M$ .
Similarly, the vectors $u_{i}$ corresponding to non-zero singular values span the image of $M$ .
Note that the cokernel and image are in direct sum: $Im M \oplus Coker M = R^{m}$ .
And so are the kernel and coimage: $Ker M \oplus Coim M = R^{n}$ .

Assuming descending order of the singular values, we have that $σ_{i} = 0 \forall i > r$ , then the following results hold:

\begin{array}{r} (21) & Ker M = \underset{i > r}{span} {v_{i}} \\ (22) & Im M = \underset{i \leq r}{span} {u_{i}} \\ (23) & Coker M = \underset{i > r}{span} {u_{i}} \\ (24) & Coim M = \underset{i \leq r}{span} {v_{i}} \end{array}

We’re only going to need $21$ and $23$ , but I’m stating all these results in one place because I found out that they are either not wildly known or trivial enough that very few people ever care to mention them. [3] touches upon it.

Implementation

So to find a basis for the kernel of a matrix, according to $21$ , we compute its SVD thanks to np.linalg.svd(), and extract the rows of $V^{*}$ corresponding to null singular values:

def kernel(A, tol=1e-5):
    _, s, vh = np.linalg.svd(A)
    singular = np.zeros(vh.shape[0], dtype=float)
    singular[:s.size] = s
    null_space = np.compress(singular <= tol, vh, axis=0)
    return null_space.T

The array s returned by np.linalg.svd() is one-dimensional, and contains only the singular values big enough so their floating point representation is non-zero. But it could happen that some of these should actually be interpreted as null singular values. So we put them in a bigger array singular of size the number of rows of vh, padding with zeros, and we use the np.compress() function to extract the rows of vh where the values of singular are close enough to zero (using an input tolerance parameter). The output of np.compress() is a matrix whose rows are the ones we selected. We transpose this matrix before we return it, so in the end this function outputs a matrix whose columns span the kernel of the input matrix. You’ll see later on why we do this, rather than returning a list of vectors.

As for the cokernel, we do a similar manipulation following our previous result $23$ :

def cokernel(A, tol=1e-5):
    u, s, _ = np.linalg.svd(A)
    singular = np.zeros(u.shape[1], dtype=float)
    singular[:s.size] = s
    return np.compress(singular <= tol, u, axis=1)

The code above is more or less a copy paste of this implementation by Michael Robinson and Chris Capraro. I did come up with something similar on my own, but I found this version more elegant as Python code goes.

Computing homology

There are multiple algorithms out there to compute homology. I should mention the great article [7] (do check out this guy’s blog) which shows a way to compute homology dimensions directly by simultaneous row reduction on consecutive boundary operators and pivot counting.

What we will do instead, is to instantiate $H_{k} (X)$ as a subspace of $C_{k} (X)$ , and compute a basis (a set of generators) of this subspace. We’re looking for these basis vectors that are inside $Ker \partial_{k}$ and outside $Im \partial_{k + 1}$ . We could quite happily find a basis of $Ker \partial_{k}$ , and for each vector in this basis check that it is not in $Im \partial_{k + 1}$ . But we’re going to do much better, and compute all the homology generators in one go.

The endofunctor trick

This method was presented by Michael Robinson in his lecture on computational algebraic topology on Youtube, and it’s gold. Let’s consider the chain complex around $C_{k} (X)$ , and unfold the diagram a bit so all of the subspaces of interest appear explicitly:

$\text{[math]}$

There is a natural inclusion $κ : Ker \partial_{k} ↪ C_{k} (X)$ that maps kernel elements to where they’re found in $C_{k} (X)$ . Now, by exactness, we have that $Im \partial_{k + 1} \subseteq Ker \partial_{k}$ , so surely, there exists a map $ψ : C_{k + 1} (X) \to Ker \partial_{k}$ such that $κ \circ ψ = \partial_{k + 1}$ . $H_{k} (X)$ is itself a subspace of $Ker \partial_{k}$ , and so we have the inclusion $ξ : H_{k} (X) ↪ Ker \partial_{k}$ .

To see why the cokernel of $ψ$ is in fact the k-th homology space, we can consider the quotient definition of the cokernel (take the target space, mod out the image), and we get $Coker ψ = Ker \partial_{k} / Im ψ$ . But $Im ψ$ and $Im \partial_{k + 1}$ are the same thing, therefore $Coker ψ$ is really $H_{k} (X)$ by $12$ . Another way to see it is that the basis vectors of $H_{k} (X)$ are outside $Im ψ$ , so by complementarity they lie inside $Coker ψ$ .

A few practical observations concerning these new maps:

$κ$ is simply a transition matrix, if we think about it. Its columns are the basis vectors of $Ker \partial_{k}$ : applying such a matrix to any conformant vector will produce an arbitrary kernel element as a linear combination of basis vectors of the kernel.
Because $κ ψ = \partial_{k + 1}$ , we can solve for $ψ$ numerically thanks to the np.linalg.lstsq() function.
$ξ$ is another transition matrix whose column space is $Coker ψ$ .
The product $κ ξ$ is a transition matrix to the homology subspace, so its columns are the basis of $H_{k} (X)$ .

Because our previously defined kernel() and cokernel() functions cast the respective basis vectors in transition matrix form, they can be used to compute the matrices of $κ$ and $ξ$ directly, and the columns of the product $κ ξ$ will be the homology generators we’re looking for!

Implementation

Recall that from our previous boundary() function, we get a list of matrices, one for each boundary operator. We now define a homology() function that takes in this list of matrices, and outputs a list of homology spaces specified by their respective bases. The list of boundary operators shall be augmented by two additional zero maps of the appropriate size (I mentioned that numpy can handle zero-size matrices). We will iterate over all consecutive pairs of boundary operators in the augmented list, and each time compute the product $κ ξ$ previously defined:

def homology(boundary_ops, tol=1e-5):
    # Insert zero maps
    mm = boundary_ops[-1].shape[1]
    nn = boundary_ops[0].shape[0]
    boundary_ops.insert(0, np.ones(shape=(0, nn)))
    boundary_ops.append(np.ones(shape=(mm, 0)))

    H = []
    for del_k, del_kp1 in zip(boundary_ops, boundary_ops[1:]):
        kappa = kernel(del_k, tol)
        # Solve for psi
        psi, _, _, _ = np.linalg.lstsq(kappa, del_kp1, rcond=None)
        # Compute homology
        ksi = cokernel(psi, tol)
        H.append(np.dot(kappa, ksi))

    return H

To be a bit provocative here, in the end, computing a homology takes four lines of code.

Computing the Betti numbers

Once we have the bases of each $H_{k} (X)$ , we can easily compute their dimensions:

def betti(H):
    return [basis.shape[1] for basis in H]

These are known as the Betti numbers $b_{k} = \dim H_{k} (X)$ , and they count the numbers of k-holes in the simplicial complex.

Putting it all together

Let’s now write a proper main() function that will read a Json file containing various simplicial complexes accompanied by a short description, so we can experiment with this a bit more easily:

def main():
    with open('./complexes.json') as data_file:
        data = json.load(data_file)

    for attr, obj in data.items():
        print(f'[{attr}]')
        print(obj['description'])
        bnd = boundary(obj['complex'])
        H = homology(bnd)
        b = betti(H)
        print(f'Betti numbers: {b}')

And here’s how the example simplicial complex is encoded in the Json file:

{
    "C1": {
        "description": "Two triangles sharing an edge, one of them being hollow, the other filled",
        "complex": ["A", "B", "C", "D", "AB", "AC", "BC", "BD", "CD", "BCD"]
    }
}

Running the main() function should now output:

[C1]
Two triangles sharing an edge, one of them being hollow, the other filled
Betti numbers: [1, 1, 0]

That’s it! One connected component, a single 1-hole (the hollow triangle), and no 2-hole. And for the grand finale, we are going to compute the homology of a 2-torus. [2] mentions an efficient triangulation of a two dimensional torus, from which I could build a simplicial complex:

$\text{[math]}$

Here is the Json data:

{
    "T2": {
        "description": "A triangulated 2-torus",
        "complex": ["A","B","C","D","E","F","G",
                    "AB","AC","AD","AE","AF","AG",
                    "BC","BD","BE","BF","BG",
                    "CD","CE","CF","CG",
                    "DE","DF","DG",
                    "EF","EG",
                    "FG",
                    "ABD","ABF","ACD","ACG","AEF","AEG",
                    "BCE","BCG","BDE","BFG",
                    "CDF","CEF",
                    "DEG","DFG"]
    }
}

And our program now outputs:

[T2]
A triangulated 2-torus
Betti numbers: [1, 2, 1]

Which is precisely the Betti numbers for the 2-torus: one connected component, two circular 1-holes, and one 2-hole (the volume inside the torus). Imagine doing that by hand…

Conclusion

Simplicial complexes appear naturally when one forms the Vietoris-Rips complex of a point cloud in $R^{d}$ . Computational homology can be used in this context to better understand the topology of the point cloud, with applications in machine learning, data analysis, signal processing… In practice, the more robust persistent homology is calculated, and the underlying $C_{k} (X)$ are no longer vector spaces, but modules over a polynomial ring, which makes both the mathematics and the algorithms way more involved (see [4]). I hope however that the humble presentation I gave here can serve as a starting point, if you’re interested in studying cutting edge methods later on.

Acknowledgements

Many thanks to nervous_lemma for the proofreading help!

Sources

[1] R. Ghrist, “Elementary Applied Topology”, ed. 1.0, Createspace, 2014
[2] J. Michael Boardman, “The Torus Triangulated”
[3] Roman, Steven. Advanced Linear Algebra. 3rd ed, Springer, 2007, p.444
[4] Zomorodian, A., Carlsson, G. Computing Persistent Homology. Discrete Comput Geom 33, 249–274 (2005).
[5] M. E. Aktas et al. Persistence homology of networks: methods and applications. Applied Network Science (2019).
[6] Giusti, C., Ghrist, R. & Bassett, D.S. Two’s company, three (or more) is a simplex. J Comput Neurosci 41, 1–14 (2016).
[7] j2kun, Computing Homology

The full source code can be found here.

The comment section requires the Utterances cookie in order to work properly. If you want to see people's comments or post a comment yourself, please enable the Utterances cookie here.