The problem of estimating "how flat" a surface is comes up quite often in the industry. A machined surface can be subjected to a series of local height measurements with the use of a CNC height probe for example, giving rise to a point cloud from which the flatness information can be extracted. We are going to see how the principal component analysis method arises naturally when the flatness estimation problem is formulated as an optimization problem.
The problem
Input
We start with a point cloud of size in 3D space, each height measurement corresponds to a point . The component of denoted by is the actual height value, and the and components, denoted respectively by and are the horizontal position on the surface. Let denote the collection of points . can be represented as a regular data matrix, with each point in the cloud written as a row:
We suppose that is centered, meaning that the barycenter of the point cloud is at the origin. If it is not the case, we can always subtract the barycenter to each point of the cloud:
An optimization problem
The RMS plane
A good estimation method for the surface flatness is to compute a best fit plane (aka RMS plane) that optimally fits the point cloud . Then, a maximum deviation is calculated above and below this plane, creating two additional upper and lower planes (named respectively and ), both parallel to the RMS plane. The whole point cloud is comprised in between. The distance between the upper and lower planes is taken as a flatness measurement. This makes sense, right? The smaller the deviation from the best fit plane, the flatter the surface. So lower scores mean flatter.
Distance to an arbitrary plane
Let be an arbitrary plane with unit normal . Let denote the normal vector in homogeneous coordinates. The plane’s equation is then:
Then, for each we can evaluate a signed distance to the plane . Because the plane gets an orientation thanks to its normal, by convention, when the point is said to be above the plane, and when the point is below. To evaluate , it is handier to have the points in homogeneous coordinates too:
With this we can write the signed distance as:
Note that when we set we recover the plane’s equation . It all makes sense: if a point has zero distance to the plane, then it is on the plane, and as such, must be a solution to the plane equation.
Problem formulation
By hypothesis, our data is centered, implying that the best fit plane always contains the origin. This translates to being null, which means that the subspace of interest spans the first three indices only, and we can forget about homogeneous coordinates (that I only mentioned for the sake of completeness). If you can’t make this assumption for whatever reason, you will need to fallback to an homogeneous representation. Then, the distance to the plane is simply:
Optimizing the plane is equivalent to optimizing its normal. The idea is to find the normal to the plane that minimizes the sum of the squared distances for each point. Why the square? Because it behaves nicely under differentiation, as opposed to an absolute value for instance. We have the additional constraint that the normal must remain a unit vector. So the problem writes:
Solution
Solution using linear algebra
In order to take the normalization constraint into account, we are going to use a Lagrange multiplier. Now let’s write a Lagrangian for this problem:
Here, is the Lagrange multiplier, and what comes after it is the normalization constraint function, that is always equal to zero when the normal vector is normalized. The minus sign in front of is useful later on, but is merely a convention. Solving is then equivalent to solving:
Substituting into this and dividing both sides by we have the following system of equations:
A covariance matrix is hidden in the summation. Without loss of generality, we can multiply it by a normalization factor for better stability:
Then, becomes:
So this is in fact an eigenvalue problem! The solutions to are the eigenvectors of the covariance matrix, associated to the eigenvalues . To formalize this, we are looking for an eigendecomposition of , i.e. finding and such that:
With a matrix whose columns are the eigenvectors, and a diagonal matrix whose values are the eigenvalues. This can always be done, because is symmetric, so it is diagonalizable.
Because in our context must be a maximizer of (the critical point we’re looking for is a saddle point), the optimal solution for us is the eigenvector corresponding to the smallest eigenvalue (remember the negative sign in front of ?). So all we need to do to find the normal of is to compute , its eigendecomposition, and select the column in that has the same index as the lowest value in .
A better approach
Let’s take a step back and contemplate what we’re doing here. The algorithm that consists in computing the eigendecomposition of a covariance matrix associated to a (centered) data matrix is known as a Principal Component Analysis or PCA for short. Its purpose is to find the directions of greatest variance of a dataset. These directions (known as the principal components) are the eigenvectors of , and the bigger the variance across a principal component, the bigger the associated eigenvalue.
Because our dataset is comprised of points that are very close to the best fit plane , the points will show the greatest variance in the axes of the local basis of . Complementarily, they will show the lowest variance in the direction of the normal to . This justifies intuitively the previous section’s final paragraph.
Now, for practical reasons, we don’t usually mess with a covariance matrix because it will introduce numerical errors. It is more favorable from this standpoint to compute the compact Singular Value Decomposition (SVD) of directly, which can be done efficiently on a computer:
with an semi-unitary matrix, a square diagonal matrix containing the singular values, and a unitary matrix (because our points have dimension 3, and assuming ). Substituting into gives:
because as is semi-unitary, and is square diagonal. As , and by comparison with the singular values are just proportional to the square roots of our eigenvalues:
Also, the right-singular vectors of are just the same as the eigenvectors of the eigendecomposition of . So the process stays the same: we find the right-singular vector in associated to the smallest singular value in .
I had a computer vision teacher who liked to say, with the eyes of a crazy man, that an SVD is the proper way to rip open a matrix and see what’s inside! It often comes as second nature for people in these fields to perform an SVD by reflex on whatever data matrix they encounter, and this would have been the good move here.
The algorithm
So it turns out the algorithm is quite simple:
Compute the SVD of the data matrix:
Find the index of the smallest singular value in :
Set the best fit plane normal to the corresponding singular vector:
Find the extremal distances to the best fit plane:
Obtain the flatness estimation:
Implementation
Generating point clouds for a test
If you don’t have a CNC height probe at home to play with, you can still generate some points procedurally. To obtain a point cloud that is randomly distributed around a plane , knowing the normal to is good, but insufficient. We must choose a local basis of such that . Because of this orthogonality constraint we have:
By choosing and , it follows that:
And is obtained by cross-product: .
Let be the coordinates with respect to the basis . Then the points of are generated by:
Now, let and be uniform random variables, and a random variable of arbitrary distribution. Then the following discrete stochastic process will model our point cloud around the plane:
The distribution of controls the flatness, as generates the random coefficients in front of the normal vector, which encodes by how much the point deviates from the plane. It is also possible to consider an additional non-linear term in the coefficient of to model slight surface deformations. For example:
will exhibit an hyperbolic warp. This may come in handy when assessing the algorithm’s robustness.
Example Python implementation
First, let’s import Numpy and Matplotlib:
importmatplotlib.pyplotaspltimportnumpyasnp
Now, we add a function to generate a random orthogonal basis. The first two vectors returned are in the plane, and the last one is the plane’s normal:
defrandom_basis():# Unit plane normal
n0=np.random.rand(3)n0=n0/np.sqrt(np.sum(n0**2))# Plane basis
u0=np.array([1,-n0[0]/n0[1],0],dtype='float')u0=u0/np.sqrt(np.sum(u0**2))v0=np.cross(n0,u0)return(u0,v0,n0)
Given a basis, we can generate our point cloud. Let’s choose the and coordinates uniformly in . The deviation along the normal axis is scaled by a scalar factor, and for now, a uniform distribution is used. Then we need to center the data by subtracting the cloud barycenter to all points:
defgenerate_data(num_points:int,noise_amp:float,basis):u,v,n=basis# Random coordinate in the plane
s=np.random.uniform(-10,10,num_points)t=np.random.uniform(-10,10,num_points)# Random deviation along the normal axis
b=noise_amp*np.random.uniform(-1,1,num_points)# Generate points
X=s[:,np.newaxis]*u+t[:,np.newaxis]*v+b[:,np.newaxis]*n# Center data
m=np.mean(X,axis=0)X=X-mreturnX
Now, we write the function that takes the point cloud as input, and spits out the best fit normal vector :
defbest_fit_normal(X):# Perform SVD
U,Sigma,Vh=np.linalg.svd(X,full_matrices=True)# Find index of smallest singular value
jmin=np.argmin(Sigma)# Return the right-singular vector at that index
returnVh[jmin]
We also need a function to compute the extremal distances to the best fit plane:
defmax_dist(X,normal):# Dot product of every point in X with the normal
D=np.sum(normal*X,axis=1)return(np.min(D),np.max(D))
And this helper function will allow us to plot the planes more easily:
defmain():basis=random_basis()X=generate_data(100,2,basis)n_opt=best_fit_normal(X)dmin,dmax=max_dist(X,n_opt)F=dmax-dminn_rms=np.linalg.norm(n_opt-basis[2])print(f'flatness score: {F}')print(f'norm RMS error: {n_rms}')# Plot the points, best fit plane and upper and lower planes
fig=plt.figure()ax=fig.add_subplot(projection='3d')plot_plane(ax,[0,0,0],n_opt,"red",0.5)plot_plane(ax,dmin*n_opt,n_opt,"orange",0.5)plot_plane(ax,dmax*n_opt,n_opt,"orange",0.5)ax.scatter(X[:,0],X[:,1],X[:,2])plt.show()if__name__=='__main__':main()
The norm RMS error is the Euclidean distance between the normal vector that was used to generate the point cloud, and the normal vector that was optimized by the algorithm. They should match pretty closely as long as the noise amplitude isn’t too high.
Here I’m deliberately choosing a big noise amplitude so the different planes can be seen, but in a real world scenario, your points should be much closer to .
Limitations
This implementation is just a starting point. In a real world scenario you will have additional problems to solve. For example, there will be outliers, points in the cloud that are way out in the distance, because your height probe bumped into a bread crumb or any other more realistic reason. Maybe you could use an outlier detection pre-pass on using a k-NN score, or an LOF? Perhaps you want to characterize the surface deformation, in which case fitting a plane is no more ideal…
Anyway, I hope you enjoyed this trip in the land of Linear Algebra!
The comment section requires the Utterances cookie in order to work properly.
If you want to see people's comments or post a comment yourself,
please enable the Utterances cookie here.