Surface flatness and mathematical optimization

mathematics / engineering

Fri, Dec 17, 21

The problem of estimating "how flat" a surface is comes up quite often in the industry. A machined surface can be subjected to a series of local height measurements with the use of a CNC height probe for example, giving rise to a point cloud from which the flatness information can be extracted. We are going to see how the principal component analysis method arises naturally when the flatness estimation problem is formulated as an optimization problem.

The problem

Input

We start with a point cloud of size $N$ in 3D space, each height measurement corresponds to a point ${\vec{x}}_{i}$ . The $z$ component of ${\vec{x}}_{i}$ denoted by ${\vec{x}}_{i}^{2}$ is the actual height value, and the $x$ and $y$ components, denoted respectively by ${\vec{x}}_{i}^{0}$ and ${\vec{x}}_{i}^{1}$ are the horizontal position on the surface. Let $X$ denote the collection of points ${\vec{x}}_{i}$ . $X$ can be represented as a regular data matrix, with each point in the cloud written as a row:

\begin{matrix} (1) & X = [\begin{matrix} {\vec{x}}_{0}^{⊺} \\ {\vec{x}}_{1}^{⊺} \\ ⋮ \\ {\vec{x}}_{N - 1}^{⊺} \end{matrix}] = [\begin{matrix} x_{0}^{0} & x_{0}^{1} & x_{0}^{2} \\ x_{1}^{0} & x_{1}^{1} & x_{1}^{2} \\ ⋮ & ⋮ & ⋮ \\ x_{N - 1}^{0} & x_{N - 1}^{1} & x_{N - 1}^{2} \end{matrix}] \end{matrix}

We suppose that $X$ is centered, meaning that the barycenter of the point cloud is at the origin. If it is not the case, we can always subtract the barycenter to each point of the cloud:

\begin{matrix} (2) & X \leftarrow X - ⟨ \vec{x} ⟩ \end{matrix}

An optimization problem

The RMS plane

A good estimation method for the surface flatness is to compute a best fit plane $P^{⋆}$ (aka RMS plane) that optimally fits the point cloud $X$ . Then, a maximum deviation is calculated above and below this plane, creating two additional upper and lower planes (named respectively $P_{U}$ and $P_{L}$ ), both parallel to the RMS plane. The whole point cloud is comprised in between. The distance between the upper and lower planes is taken as a flatness measurement. This makes sense, right? The smaller the deviation from the best fit plane, the flatter the surface. So lower scores mean flatter.

Distance to an arbitrary plane

Let $P$ be an arbitrary plane with unit normal $\hat{n} = [\begin{matrix} n^{0} \\ n^{1} \\ n^{2} \end{matrix}]$ . Let $\vec{n} = [\begin{matrix} \hat{n} \\ n^{3} \end{matrix}]$ denote the normal vector in homogeneous coordinates. The plane’s equation is then:

\begin{matrix} (3) & P : n^{0} x + n^{1} y + n^{2} z + n^{3} = 0 \end{matrix}

Then, for each ${\vec{x}}_{i} \in X$ we can evaluate a signed distance ${\overset{―}{D}}_{i}$ to the plane $P$ . Because the plane gets an orientation thanks to its normal, by convention, when ${\overset{―}{D}}_{i} > 0$ the point ${\vec{x}}_{i}$ is said to be above the plane, and when ${\overset{―}{D}}_{i} < 0$ the point is below. To evaluate ${\overset{―}{D}}_{i}$ , it is handier to have the points in homogeneous coordinates too:

\begin{matrix} (4) & {\vec{x}}_{i} \mapsto {\vec{y}}_{i} = [\begin{matrix} {\vec{x}}_{i} \\ 1 \end{matrix}] \end{matrix}

With this we can write the signed distance as:

\begin{matrix} (5) & {\overset{―}{D}}_{i} = \vec{n} \cdot {\vec{y}}_{i} = n^{0} x_{i}^{0} + n^{1} x_{i}^{1} + n^{2} x_{i}^{2} + n^{3} \end{matrix}

Note that when we set ${\overset{―}{D}}_{i} = 0$ we recover the plane’s equation $3$ . It all makes sense: if a point has zero distance to the plane, then it is on the plane, and as such, must be a solution to the plane equation.

Problem formulation

By hypothesis, our data is centered, implying that the best fit plane always contains the origin. This translates to $n^{3}$ being null, which means that the subspace of interest spans the first three indices only, and we can forget about homogeneous coordinates (that I only mentioned for the sake of completeness). If you can’t make this assumption for whatever reason, you will need to fallback to an homogeneous representation. Then, the distance to the plane is simply:

\begin{matrix} (6) & {\overset{―}{D}}_{i} = \hat{n} \cdot {\vec{x}}_{i} \end{matrix}

Optimizing the plane is equivalent to optimizing its normal. The idea is to find the normal ${\hat{n}}^{⋆}$ to the plane $P^{⋆}$ that minimizes the sum of the squared distances for each point. Why the square? Because it behaves nicely under differentiation, as opposed to an absolute value for instance. We have the additional constraint that the normal must remain a unit vector. So the problem writes:

\begin{matrix} (7) & Solve: \underset{\hat{n}}{a r g m i n} \sum_{i = 0}^{N - 1} {\overset{―}{D}}_{i}^{2} subject to: ‖ \hat{n} ‖ = 1 \end{matrix}

Solution

Solution using linear algebra

In order to take the normalization constraint into account, we are going to use a Lagrange multiplier. Now let’s write a Lagrangian for this problem:

\begin{matrix} (8) & L = \sum_{i = 0}^{N - 1} {(\hat{n} \cdot {\vec{x}}_{i})}^{2} - λ ((n^{0})^{2} + (n^{1})^{2} + (n^{2})^{2} - 1) \end{matrix}

Here, $λ$ is the Lagrange multiplier, and what comes after it is the normalization constraint function, that is always equal to zero when the normal vector is normalized. The minus sign in front of $λ$ is useful later on, but is merely a convention. Solving $7$ is then equivalent to solving:

\begin{matrix} (9) & \frac{\partial L}{\partial n^{j}} = 0, \forall j \leq 2 \end{matrix}

Substituting $8$ into this and dividing both sides by $2$ we have the following system of equations:

\begin{matrix} (10) & \sum_{i = 0}^{N - 1} (\hat{n} \cdot {\vec{x}}_{i}) {\vec{x}}_{i}^{j} - λ n^{j} = 0, \forall j \leq 2 \end{matrix}

A covariance matrix $C$ is hidden in the summation. Without loss of generality, we can multiply it by a $\frac{1}{N}$ normalization factor for better stability:

\begin{matrix} (11) & C = \frac{1}{N} X^{⊺} X = \frac{1}{N} \sum_{i = 0}^{N - 1} [\begin{matrix} (x_{i}^{0})^{2} & x_{i}^{0} x_{i}^{1} & x_{i}^{0} x_{i}^{2} \\ x_{i}^{1} x_{i}^{0} & (x_{i}^{1})^{2} & x_{i}^{1} x_{i}^{2} \\ x_{i}^{2} x_{i}^{0} & x_{i}^{2} x_{i}^{1} & (x_{i}^{2})^{2} \end{matrix}] \end{matrix}

Then, $10$ becomes:

\begin{matrix} (12) & C \hat{n} - λ \hat{n} = \vec{0} \Leftrightarrow (C - λ I) \hat{n} = \vec{0} \end{matrix}

So this is in fact an eigenvalue problem! The solutions to $12$ are the eigenvectors of the covariance matrix, associated to the eigenvalues $λ$ . To formalize this, we are looking for an eigendecomposition of $C$ , i.e. finding $W$ and $Λ$ such that:

\begin{matrix} (13) & C = W Λ W^{⊺} \end{matrix}

With $W$ a $3 \times 3$ matrix whose columns are the eigenvectors, and $Λ$ a $3 \times 3$ diagonal matrix whose values are the eigenvalues. This can always be done, because $C$ is symmetric, so it is diagonalizable.

Because in our context $λ$ must be a maximizer of $L$ (the critical point we’re looking for is a saddle point), the optimal solution for us is the eigenvector ${\hat{n}}^{⋆}$ corresponding to the smallest eigenvalue $λ^{⋆}$ (remember the negative sign in front of $λ$ ?). So all we need to do to find the normal of $P^{⋆}$ is to compute $C$ , its eigendecomposition, and select the column in $W$ that has the same index as the lowest value in $Λ$ .

A better approach

Let’s take a step back and contemplate what we’re doing here. The algorithm that consists in computing the eigendecomposition of a covariance matrix $C$ associated to a (centered) data matrix $X$ is known as a Principal Component Analysis or PCA for short. Its purpose is to find the directions of greatest variance of a dataset. These directions (known as the principal components) are the eigenvectors of $C$ , and the bigger the variance across a principal component, the bigger the associated eigenvalue.

Because our dataset is comprised of points that are very close to the best fit plane $P^{⋆}$ , the points will show the greatest variance in the axes of the local basis of $P^{⋆}$ . Complementarily, they will show the lowest variance in the direction of the normal ${\hat{n}}^{⋆}$ to $P^{⋆}$ . This justifies intuitively the previous section’s final paragraph.

Now, for practical reasons, we don’t usually mess with a covariance matrix because it will introduce numerical errors. It is more favorable from this standpoint to compute the compact Singular Value Decomposition (SVD) of $X$ directly, which can be done efficiently on a computer:

\begin{matrix} (14) & X = U Σ V^{⊺} \end{matrix}

with $U$ an $N \times 3$ semi-unitary matrix, $Σ$ a square diagonal $3 \times 3$ matrix containing the singular values $σ_{j}$ , and $V$ a $3 \times 3$ unitary matrix (because our points have dimension 3, and assuming $N > 3$ ). Substituting $14$ into $11$ gives:

\begin{matrix} (15) & C = \frac{1}{N} X^{⊺} X = \frac{1}{N} V Σ U^{⊺} U Σ V^{⊺} = \frac{V Σ^{2} V^{⊺}}{N} \end{matrix}

because $U^{⊺} U = I$ as $U$ is semi-unitary, and $Σ$ is square diagonal. As $Σ = d i a g {σ_{j}}$ , and by comparison with $13$ the singular values are just proportional to the square roots of our eigenvalues:

\begin{matrix} (16) & σ_{j} = \sqrt{N λ_{j}}, \forall j \leq 2 \end{matrix}

Also, the right-singular vectors of $V^{⊺}$ are just the same as the eigenvectors of the eigendecomposition of $C$ . So the process stays the same: we find the right-singular vector in $V^{⊺}$ associated to the smallest singular value in $Σ$ .

I had a computer vision teacher who liked to say, with the eyes of a crazy man, that an SVD is the proper way to rip open a matrix and see what’s inside! It often comes as second nature for people in these fields to perform an SVD by reflex on whatever data matrix they encounter, and this would have been the good move here.

The algorithm

So it turns out the algorithm is quite simple:

Compute the SVD of the data matrix: $X = U Σ V^{⊺}$
Find the index of the smallest singular value in $Σ$ : $j_{m i n} = \underset{j}{a r g m i n} {σ_{j}}$
Set the best fit plane normal ${\hat{n}}^{⋆}$ to the corresponding singular vector: ${\hat{n}}^{⋆} = V_{j_{m i n}}^{⊺}$
Find the extremal distances to the best fit plane:
- $d_{m i n} = \underset{i}{m i n} ({\hat{n}}^{⋆} \cdot {\vec{x}}_{i})$
- $d_{m a x} = \underset{i}{m a x} ({\hat{n}}^{⋆} \cdot {\vec{x}}_{i})$
Obtain the flatness estimation: $F^{⋆} = d_{m a x} - d_{m i n}$

Implementation

Generating point clouds for a test

If you don’t have a CNC height probe at home to play with, you can still generate some points procedurally. To obtain a point cloud that is randomly distributed around a plane $P$ , knowing the normal $\hat{n}$ to $P$ is good, but insufficient. We must choose a local basis $B = {\hat{u}, \hat{v}}$ of $P$ such that $\hat{u} ⊥ \hat{v} ⊥ \hat{n}$ . Because of this orthogonality constraint we have:

\begin{matrix} (17) & \hat{u} \cdot \hat{n} = u^{0} n^{0} + u^{1} n^{1} + u^{2} n^{2} = 0 \end{matrix}

By choosing $u^{0} = 1$ and $u^{2} = 0$ , it follows that:

\begin{matrix} (18) & \hat{u} = \frac{1}{\sqrt{1 + {(\frac{n^{0}}{n^{1}})}^{2}}} [\begin{matrix} 1 \\ - \frac{n^{0}}{n^{1}} \\ 0 \end{matrix}] \end{matrix}

And $\hat{v}$ is obtained by cross-product: $\hat{v} = \hat{n} \times \hat{u}$ .

Let $(s, t)$ be the coordinates with respect to the basis $B$ . Then the points of $P$ are generated by:

\begin{matrix} (19) & \vec{r} (s, t) = {\vec{r}}_{0} + s \hat{u} + t \hat{v}, with {\vec{r}}_{0} an arbitrary translation \end{matrix}

Now, let $S$ and $T$ be uniform random variables, and $B$ a random variable of arbitrary distribution. Then the following discrete stochastic process will model our point cloud around the plane:

\begin{matrix} (20) & {\vec{x}}_{i} = {\vec{r}}_{0} + S_{i} \hat{u} + T_{i} \hat{v} + B_{i} \hat{n} \end{matrix}

The distribution of $B$ controls the flatness, as $B$ generates the random coefficients in front of the normal vector, which encodes by how much the point ${\vec{x}}_{i}$ deviates from the plane. It is also possible to consider an additional non-linear term in the coefficient of $\hat{n}$ to model slight surface deformations. For example:

\begin{matrix} (21) & {\vec{x}}_{i} = {\vec{r}}_{0} + S_{i} \hat{u} + T_{i} \hat{v} + (B_{i} + α S_{i} T_{i}) \hat{n} \end{matrix}

will exhibit an hyperbolic warp. This may come in handy when assessing the algorithm’s robustness.

Example Python implementation

First, let’s import Numpy and Matplotlib:

import matplotlib.pyplot as plt
import numpy as np

Now, we add a function to generate a random orthogonal basis. The first two vectors returned are in the plane, and the last one is the plane’s normal:

def random_basis():
    # Unit plane normal
    n0 = np.random.rand(3)
    n0 = n0 / np.sqrt(np.sum(n0**2))
    # Plane basis
    u0 = np.array([1, -n0[0]/n0[1], 0], dtype='float')
    u0 = u0 / np.sqrt(np.sum(u0**2))
    v0 = np.cross(n0, u0)

    return (u0, v0, n0)

Given a basis, we can generate our point cloud. Let’s choose the $s$ and $t$ coordinates uniformly in $[- 10, 10]$ . The deviation along the normal axis is scaled by a scalar factor, and for now, a uniform distribution is used. Then we need to center the data by subtracting the cloud barycenter to all points:

def generate_data(num_points: int, noise_amp: float, basis):
    u, v, n = basis
    # Random coordinate in the plane
    s = np.random.uniform(-10, 10, num_points)
    t = np.random.uniform(-10, 10, num_points)
    # Random deviation along the normal axis
    b = noise_amp * np.random.uniform(-1, 1, num_points)
    # Generate points
    X = s[:, np.newaxis]*u + t[:, np.newaxis]*v + b[:, np.newaxis]*n
    # Center data
    m = np.mean(X, axis=0)
    X = X - m

    return X

Now, we write the function that takes the point cloud as input, and spits out the best fit normal vector ${\hat{n}}^{⋆}$ :

def best_fit_normal(X):
    # Perform SVD
    U, Sigma, Vh = np.linalg.svd(X, full_matrices=True)
    # Find index of smallest singular value
    jmin = np.argmin(Sigma)

    # Return the right-singular vector at that index
    return Vh[jmin]

We also need a function to compute the extremal distances to the best fit plane:

def max_dist(X, normal):
    # Dot product of every point in X with the normal
    D = np.sum(normal*X, axis=1)

    return (np.min(D), np.max(D))

And this helper function will allow us to plot the planes more easily:

def plot_plane(ax, origin, normal, color, alpha):
    x = np.linspace(-13, 13, 2)
    y = np.linspace(-13, 13, 2)
    px, py = np.meshgrid(x, y)
    pz = - (px * normal[0] + py * normal[1]) / normal[2]
    ax.plot_surface(px + origin[0], py + origin[1],
                    pz + origin[2], color=color, alpha=alpha)

Let’s put it all together:

def main():
    basis = random_basis()
    X = generate_data(100, 2, basis)
    n_opt = best_fit_normal(X)
    dmin, dmax = max_dist(X, n_opt)
    F = dmax - dmin
    n_rms = np.linalg.norm(n_opt - basis[2])

    print(f'flatness score: {F}')
    print(f'norm RMS error: {n_rms}')

    # Plot the points, best fit plane and upper and lower planes
    fig = plt.figure()
    ax = fig.add_subplot(projection='3d')
    plot_plane(ax, [0, 0, 0], n_opt, "red", 0.5)
    plot_plane(ax, dmin*n_opt, n_opt, "orange", 0.5)
    plot_plane(ax, dmax*n_opt, n_opt, "orange", 0.5)
    ax.scatter(X[:, 0], X[:, 1], X[:, 2])
    plt.show()


if __name__ == '__main__':
    main()

Example output:

flatness score: 4.709666038694197
norm RMS error: 0.05532841814392836

The norm RMS error is the Euclidean distance between the normal vector ${\hat{n}}_{0}$ that was used to generate the point cloud, and the normal vector ${\hat{n}}^{⋆}$ that was optimized by the algorithm. They should match pretty closely as long as the noise amplitude $B$ isn’t too high.

The point cloud and the planes. The best fit plane is displayed in red, and the upper and lower planes in orange.

The same point cloud from another view point.

Here I’m deliberately choosing a big noise amplitude so the different planes can be seen, but in a real world scenario, your points should be much closer to $P^{⋆}$ .

Limitations

This implementation is just a starting point. In a real world scenario you will have additional problems to solve. For example, there will be outliers, points in the cloud that are way out in the distance, because your height probe bumped into a bread crumb or any other more realistic reason. Maybe you could use an outlier detection pre-pass on $X$ using a k-NN score, or an LOF? Perhaps you want to characterize the surface deformation, in which case fitting a plane is no more ideal…

Anyway, I hope you enjoyed this trip in the land of Linear Algebra!

The full source code can be found here.

The comment section requires the Utterances cookie in order to work properly. If you want to see people's comments or post a comment yourself, please enable the Utterances cookie here.