Shape modelling with Gaussian processes and kernels

The goal in this tutorial is to learn how to build a continuous Gaussian process from generic kernels and to experiment with different types of kernels and their effects.

In case you would like to learn more about the syntax of the Scala programming language, have a look at the Scala Cheat Sheet. This document is accessible under:

Documents -> Scala Cheat Sheet

Note: for optimal performance and to keep the memory footprint low, we recommend to restart Scalismo Lab at the beginning of every tutorial.

Modelling deformations using Gaussian processes:

As we previously saw, in shape modelling we are interested in studying variations of deformation fields defined over both continuous and discrete domains. Gaussian processes being mathematical tools allowing to study normal distributions over both discrete and continuous functions, they are the perfect tool to model shape deformations.

Creating a continuous Gaussian Process:

Similar to the Multivariate Normal Distribution (MVN), a continuous Gaussian Process is entirely defined by 2 components, the mean and the covariance. The only difference being that these 2 components are now defined over a continuous domain since we wish to model function variations at any set of points we choose (and not just at the points at which we collected the samples, as is the case for a regular MVN).

In order to experiment on GPs, let's load a mesh on which to operate:

val reference = MeshIO.readMesh(new File("datasets/lowResPaola.stl")).get
The mean:

All the deformations sampled from the Gaussian process will be normally distributed around this mean, which itself is a deformation field.

Thinking in terms of shapes, the mean function is the deformation field that deforms our reference mesh into the mean shape.

Without any prior knowledge about our shape space, we can for example define a zero mean function returning, for every point, a null deformation vector.

val zeroMean = VectorField(RealSpace[_3D], (pt:Point[_3D]) => Vector(0,0,0))
Question: what would the mean shape be in this case?
Answer: The mean shape being the reference shape deformed with the mean deformation field, it would in this case be the reference shape.
The covariance function:

Given that we wish to evaluate covariances at any sample of points we choose, the notion of covariance is no longer given by a discrete matrix but by a covariance function also referred to as kernel.

Depending on the points on which we wish to evaluate our sampled functions, such a kernel function guarantees that any matrix, taken by evaluating pairwise kernel values, is Positive Semi-Definite and is therefore a valid covariance matrix to be used in a Multivariate Normal Distribution.

Intuition for the kernel function:

An intuitive way of understanding kernel functions, in the context of Gaussian processes, is to regard them as functions that determine the similarity between function values at 2 points of the domain, based on the similarity between the 2 points.

In the context of shape modelling, the kernel function therefore specifies the similarity between the deformation vectors defined at 2 points x and x' of our reference shape, based on how similar the kernel judges x and x' to be.

Many kernel functions are possible when modelling with Gaussian processes. Below, we will experiment with one of these: the Squared Exponential Kernel, also known as the Gaussian Kernel.

Gaussian Kernel or Squared Exponential Kernel:

This is perhaps the most common and intuitive kernel.

According to this formula, the similarity between the function values should be at highest when the associated points are near and decreases exponentially the further they are.

In the formula above, where both domain and codomain values are scalar, two parameters allow us to tune the kernel. An intuitive interpretation for these is:

1- l allows to scale distances in the domain of the input points. This allows to tune how far inter-dependencies should span between the points of our shapes.

2- s is a scaling factor that emphasizes the returned similarity. This generally affects the variance of the output values (codomain) and results in more pronounced deformations when increased.

We will shortly see the effect of these parameters further below.

Exercise: let's extend our kernel to multivariate inputs. Implement such a Gaussian kernel function that, when given two 3-dimensional points, returns a scalar similarity value. You can do this by replacing the squared point differences above with a squared Euclidean distance between the points. Hint: use the math.exp method for the exponential and norm2 method of Scalismo vectors to compute the squared norm.
val s :Double = 10.0
val l :Double = 10.0 
def myGaussian(p1: Point[_3D], p2: Point[_3D]) : Double = ???     
def myGaussian(p1: Point[_3D], p2: Point[_3D]) : Double = s * scala.math.exp((-1f / (l*l)) * (p2-p1).norm2.toFloat)
// myGaussian: (p1: scalismo.geometry.Point[scalismo.geometry._3D], p2: scalismo.geometry.Point[scalismo.geometry._3D])Double

This Gaussian kernel is in fact already implemented in Scalismo:

val gaussKer : PDKernel[_3D] = GaussianKernel[_3D](l) * s 

The Scalismo PDKernel class is the class of Positive Definite kernels that, when given two D-dimensional points, return a scalar similarity value.

Similarly to other functions in Scalismo, a PDKernel is generally defined over a domain and implements a few basic algebraic operations (as the scalar multiplication used above):

println("kernel domain : " + gaussKer.domain) // defined everywhere
val scaledGauss = gaussKer * 3.0 // scaling
val linearCombination = scaledGauss + gaussKer * 1.5 // sum 
val kernelMult = scaledGauss * gaussKer // multiplication with other kernel

All of these operations return a valid positive definite kernel, or a PDKernel in Scalismo as you can see in the result pane.

Exercise: compare your returned values for the Gaussian kernel to those of our implementation. In case you got stuck, check out our implementation: StandardKernels.scala
myGaussian(Point(0,0,0), Point(1,1,1)) - gaussKer(Point(0,0,0), Point(1,1,1))
// res3: Double = 0.0
Exercise: click 2 (not too far) landmark points on the reference mesh and evaluate their Gaussian kernel. Now increase the l parameter and try again. What does this tell you about the l parameter?
addLandmarksTo(Seq(Landmark("A",reference.point(PointId(113)))), "reference")

addLandmarksTo(Seq(Landmark("B",reference.point(PointId(121)))), "reference")

val pts = getLandmarksOf("reference") => l.point)
// pts: Seq[scalismo.geometry.Point[scalismo.geometry._3D]] = Vector(Point3D(-32.21122,-5.6394296,94.52778), Point3D(-32.07104,-15.139886,93.13328))

gaussKer(pts(0), pts(1))
// res6: Double = 3.976315478413732

val gaussKer2 : PDKernel[_3D] = GaussianKernel[_3D](l* 1.5) * s 
// gaussKer2: scalismo.kernels.PDKernel[scalismo.geometry._3D] = scalismo.kernels.PDKernel$$anon$3@775fdac

gaussKer2(pts(0), pts(1))
// res7: Double = 6.6373005174739

As mentioned above, this demonstrates that the l parameter allows to tune how far inter-dependencies should span between the points of our shapes.
Matrix-valued kernels:

As mentioned above, the kernel determines how similar the function values should be, based on the similarity between the points at which the kernel is evaluated.

Given that the functions we model are vector-valued (3-dimensional deformation vectors), the similarity between the output values needs to be more than just a scalar.

In this case, our similarity should indicate how each dimension of the first output is similar to each dimension of the second output.

We therefore have a matrix-valued similarity that would be a 3-by-3 matrix in the case of our 3-dimensional deformation fields.

Without any data it is rather difficult to determine how the different dimensions of our deformations should correlate.

Therefore a straightforward way to build such a matrix-valued kernel is to transform a scalar-valued kernel between the 2 points into a diagonal matrix-valued one.

Exercise: complete the code below, and multiply a scalar-valued Gaussian kernel by the identity matrix to obtain a diagonal matrix-valued similarity:
def matrixValuedKernel(pt1: Point[_3D], pt2: Point[_3D]) : SquareMatrix[_3D] = {  
  val identity = SquareMatrix.eye[_3D]
def matrixValuedKernel(pt1: Point[_3D], pt2: Point[_3D]) : SquareMatrix[_3D] = {  
  val identity = SquareMatrix.eye[_3D]
  identity * gaussKer(pt1, pt2)
// matrixValuedKernel: (pt1: scalismo.geometry.Point[scalismo.geometry._3D], pt2: scalismo.geometry.Point[scalismo.geometry._3D])scalismo.geometry.SquareMatrix[scalismo.geometry._3D]

When using such a diagonal kernel, 2 similar points (x1,y1,z1) and (x2,y2,z2) will have corresponding deformation vectors (dx1, dy1, dz1) and (dx2, dy2, dz2) with similar dx1 , dx2, similar dy1, dy2 and similar dz1, dz2. There will be however no similarity encoded between dx1 and dy2 or between dx1 and dz2, etc ...

Scalismo already implements such a method for transforming any scalar-valued kernel into such a diagonal matrix-valued one, using the DiagonalKernel class:

val matrixVauedGaussian : MatrixValuedPDKernel[_3D, _3D] = DiagonalKernel[_3D](gaussKer)
matrixVauedGaussian(reference.point(PointId(0)), reference.point(PointId(1)))

The DiagonalKernel performs exactly the same operation as the one you implemented above. Given a scalar-valued kernel, it multiplies it by an identity matrix to obtain the matrix-valued kernel.

Notice here the returned Scalismo type MatrixValuedPDKernel[_3D, _3D] indicating both the input and output dimensionality.

The MatrixValuedPDKernel class implements the same algebraic operations as the scalar-valued one:

println("matrix-valued kernel domain : " + matrixVauedGaussian.domain) // defined everywhere
val scaledMatGauss = matrixVauedGaussian * 3.0 // scaling
val linearCombination = scaledMatGauss+ matrixVauedGaussian * 1.5 // sum 
val kernelMult = scaledMatGauss * matrixVauedGaussian // multiplication with other kernel

Also in this case, all the results of the operations are valid kernels.

Building the GP :

Now that we (finally :) ) have our mean and covariance functions, we can build a Gaussian process as follows:

val gp : GaussianProcess[_3D, _3D] = GaussianProcess(zeroMean, matrixVauedGaussian)

We can now sample deformations from our Gaussian process at any desired set of points. Below we choose the points to be those of the reference mesh:

val sample = gp.sampleAtPoints(reference)
show(sample, "gaussianKernelGP_sample")

As you can see, this is now an instance (or a random sample function) from the Gaussian Process evaluated at the points we indicated; in this case on the reference mesh points.

Let us now visualize the effect of this deformation field on the reference mesh:

def transformFromDefField(defField : DiscreteVectorField[_3D, _3D]) =   (pt : Point[_3D]) => {

show(reference.transform(transformFromDefField(sample)), "warped_reference")
Exercise: reiterate the necessary steps for building a GP in the code below such that you can easily (and repeatedly) sample deformations.
val l : Double = ???
val s : Double = ???
val l : Double = 40.0
// l: Double = 40.0

val s : Double = 10.0
// s: Double = 10.0
val scalarValuedKernel = GaussianKernel[_3D](l) * s 
val matrixValuedKernel = DiagonalKernel[_3D](scalarValuedKernel)
val gp = GaussianProcess(zeroMean, matrixValuedKernel)

val sample =  gp.sampleAtPoints(reference)
show(sample, "gaussianKernelGP_sample")
val transform =  transformFromDefField(sample)
show(reference.transform(transformFromDefField(sample)), "warped_reference")
Exercise: change the values of both parameters l and s such that you get, satisfactory, smooth deformations that warp the reference in a nice way.
Exercise: now replace the mean function with a new function (could be an interpolated sample for example). Do the properties of the sampled deformations change? What about the resulting shapes?
val newMean = sample.interpolateNearestNeighbor
// newMean: scalismo.common.VectorField[scalismo.geometry._3D,scalismo.geometry._3D] = <function1>

val gp = GaussianProcess(newMean, matrixValuedKernel)
// gp: scalismo.statisticalmodel.GaussianProcess[scalismo.geometry._3D,scalismo.geometry._3D] = scalismo.statisticalmodel.GaussianProcess@2ef83f9e



val sample =  gp.sampleAtPoints(reference)
// sample: scalismo.common.DiscreteVectorField[scalismo.geometry._3D,scalismo.geometry._3D] = <function1>

show(sample, "gaussianKernelGP_sample")

val transform =  transformFromDefField(sample)
// transform: scalismo.geometry.Point[scalismo.geometry._3D] => scalismo.geometry.Point[scalismo.geometry._3D] = <function1>

show(reference.transform(transformFromDefField(sample)), "warped_reference")

Given that the properties (e.g. smoothness) of the sampled deformation fields are dictated by the covariance function, 
these properties remain the same but are simply added to the new mean instead. 

In this particular case, this manifests in more pronounced deformations as the mean is no longer null.
See the "Sampling from a Shape Model" article for more details.

Other kernels and kernel combinations

Many other kernels exist beside the Gaussian kernel. These kernels attribute different similarities to the same domain points and therefore give rise to differently behaving deformations.

The power of the Gaussian process framework is that it is completely agnostic to the kernel function being used. You can strongly modify the behavior of the skeleton code above by simply plugging a different kernel function.

Implementing new kernels:

Implementing a new kernel can be easily done by extending the PDKernel class.As an example, let us implement a simple linear kernel

case class LinearKernel() extends PDKernel[_3D] {
  override def domain = RealSpace[_3D]
  override def k(x: Point[_3D], y: Point[_3D]) = {
     x.toVector dot y.toVector

val linearPDKernel = new LinearKernel() * 0.01

As you can see here, extending the PDKernel class requires implementing two methods:

1- domain: that specifies the continuous domain on which the kernel is defined. In this case we chose it to be the Real space (everywhere).

2- k: the actual similarity function of the kernel. In the case of the linear kernel, this is simply the dot product of the 2 points.

Exercise: plug this kernel into the GP skeleton above and sample from it. How do the deformation fields look?
// linear Kernel 

val matrixValuedKernel = DiagonalKernel[_3D](linearPDKernel)
// matrixValuedKernel: scalismo.kernels.DiagonalKernel[scalismo.geometry._3D] = IsotropicDiagonalKernel(scalismo.kernels.PDKernel$$anon$3@50e0eae3)

val gp = GaussianProcess(zeroMean, matrixValuedKernel)
// gp: scalismo.statisticalmodel.GaussianProcess[scalismo.geometry._3D,scalismo.geometry._3D] = scalismo.statisticalmodel.GaussianProcess@6049eb17

val sample =  gp.sampleAtPoints(reference)
// sample: scalismo.common.DiscreteVectorField[scalismo.geometry._3D,scalismo.geometry._3D] = <function1>

show(sample, "linearKernelGP_sample")

val trans =  transformFromDefField(sample)
// trans: scalismo.geometry.Point[scalismo.geometry._3D] => scalismo.geometry.Point[scalismo.geometry._3D] = <function1>

show(reference.transform(trans), "warped_reference_linear")

Symmetrizing a kernel

Quite often, the shapes that we aim to model exhibit a symmetry. This is particularly valid in the case of faces. Therefore when modelling over such shapes, one would want deformation fields that yield symmetric shapes.

Once we obtained a kernel yielding the type of deformations we desire, it is possible to symmetrize the resulting deformation fields by applying the formula below:

where xm is the symmetric point to x around the YZ plane.

The resulting kernel will preserve the same smoothness properties of the deformation fields while adding the symmetry around the YZ plane.

Let's turn it into code:

case class XmirroredKernel(ker : PDKernel[_3D]) extends PDKernel[_3D] {
  override def domain = RealSpace[_3D]
  override def k(x: Point[_3D], y: Point[_3D]) = ker(Point(x(0) * -1f ,x(1), x(2)), y)

def SymmetrizeKernel(ker : PDKernel[_3D]) : MatrixValuedPDKernel[_3D,_3D] = {
   val xmirrored = XmirroredKernel(ker)
   val k1 = DiagonalKernel(ker) 
   val k2 = DiagonalKernel(xmirrored * -1f, xmirrored, xmirrored)  
   k1 + k2

val simLinear = SymmetrizeKernel(linearPDKernel)

Similar to the formula, we started by defining a scalar kernel that flips the first coordinate of the first point (mirror around the YZ plane) before evaluating the similarity.

The final kernel is then the sum of a diagonal kernel, based on the given kernel, and the diagonalized xmirrored kernel where the first entry of the matrix-valued kernel is flipped.

Let's now plug and visualize its effect on the linear kernel:

val gp = GaussianProcess(zeroMean, simLinear)
val sample =  gp.sampleAtPoints(reference)
show(sample, "symmetricLinear_sample")
val transform =  transformFromDefField(sample)
show(reference.transform(transformFromDefField(sample)), "warped_reference_Symmetriclinear")
Exercise: symmetrize a smooth Gaussian kernel and visualize both its sampled deformations and their effect on the reference mesh.
val simGauss =  SymmetrizeKernel(GaussianKernel[_3D](40) * 10)
// simGauss: scalismo.kernels.MatrixValuedPDKernel[scalismo.geometry._3D,scalismo.geometry._3D] = scalismo.kernels.MatrixValuedPDKernel$$anon$5@1e756b8f

val gp = GaussianProcess(zeroMean, simGauss)
// gp: scalismo.statisticalmodel.GaussianProcess[scalismo.geometry._3D,scalismo.geometry._3D] = scalismo.statisticalmodel.GaussianProcess@7ce65754

val sample =  gp.sampleAtPoints(reference)
// sample: scalismo.common.DiscreteVectorField[scalismo.geometry._3D,scalismo.geometry._3D] = <function1>

show(sample, "symmetricGaussian_sample")

val trans =  transformFromDefField(sample)
// trans: scalismo.geometry.Point[scalismo.geometry._3D] => scalismo.geometry.Point[scalismo.geometry._3D] = <function1>

show(reference.transform(trans), "warped_reference_SymmetricGaussian")

Sample covariances are just another kernel:

As you previously saw, a Statistical Shape Model (SSM) in Scalismo ships with a Gaussian process that is discretized over the points of the reference mesh.

The sample covariance matrix that is used in such a GP can also be seen as a kernel that can be combined with other kernels and plugged in the GP framework again.

To demonstrate this, let us first load such a SSM model:

val pcaModel = StatismoIO.readStatismoMeshModel(new File("datasets/lowresModel.h5")).get

The GP of this model is governed by a fixed, i.e discrete, covariance function or kernel:

val discreteCov : DiscreteMatrixValuedPDKernel[_3D, _3D]=

In order to be able to combine and use such a discrete covariance with other continuous kernels into the GP framework, an interpolation step is required:

val gpSSM =
val SSMKernel = gpSSM.cov
Exercise: plug this kernel in the skeleton code above and verify that it yields proper face deformations (might be slow)
val gp = GaussianProcess(zeroMean, SSMKernel) // The deformations will be weaker here as we do not use the mean of the model!
// gp: scalismo.statisticalmodel.GaussianProcess[scalismo.geometry._3D,scalismo.geometry._3D] = scalismo.statisticalmodel.GaussianProcess@713de257

val sample =  gp.sampleAtPoints(reference)
// sample: scalismo.common.DiscreteVectorField[scalismo.geometry._3D,scalismo.geometry._3D] = <function1>

show(sample, "SSMKernelGP_sample")

val trans =  transformFromDefField(sample)
// trans: scalismo.geometry.Point[scalismo.geometry._3D] => scalismo.geometry.Point[scalismo.geometry._3D] = <function1>

show(reference.transform(trans), "warped_reference_SSM")

This interpolated kernel can now be combined with other kernels and plugged into the GP framework again:

val augmentedSSMKernel = SSMKernel + simLinear

Changepoint kernel:

Another kernel combination you previously saw consists in dividing the spatial domains where each kernel is active. A typical way of doing this is the changepoint kernel:

case class ChangePointKernel(ker1 :MatrixValuedPDKernel[_3D, _3D], ker2 :MatrixValuedPDKernel[_3D, _3D]) extends MatrixValuedPDKernel[_3D, _3D] {

  def s(p: Point[_3D]) =  1f / (1f + math.exp(-p(0)))
  def k(x: Point[_3D], y: Point[_3D]) = {
      val sx = s(x)
      val sy = s(y)
      ker1(x,y) * sx * sy + ker2(x,y) * (1-sx) * (1-sy)
  override def domain = RealSpace[_3D]

Let's visualize its effect on the sample covariance and symmetric linear kernels:

val chgptKer = ChangePointKernel(SSMKernel, simLinear)
val gp = GaussianProcess(zeroMean, chgptKer)
val sample =  gp.sampleAtPoints(reference)
show(sample, "ChangePointKernelGP_sample")

As you can see each kernel is now active only on one half of the face.

Sampling on high resolution meshes:

As you probably noticed, the mesh we used above for sampling from the crafted Gaussian Processes is a low-resolution representation of Paola.

As you also know by now, in order to be able to sample deformations on a higher resolution mesh, we need a finite rank representation of the Gaussian Process.

To obtain such a representation in Scalismo, we can use the approximateGP method of the LowRankGaussianProcess object (depending on your computer, the computations below might take a minute or two):

val paola = MeshIO.readMesh(new File("datasets/higherResPaola.stl")).get
show(paola, "paola")

val sampler = RandomMeshSampler3D(
  numberOfPoints = 300, 
  seed = 42)

val simGauss = SymmetrizeKernel(GaussianKernel[_3D](40) * 10)
val gp = GaussianProcess(zeroMean, simGauss)

val lowRankGP = LowRankGaussianProcess.approximateGP(
  numBasisFunctions = 100)

Here, we started by loading a higher resolution mesh of Paola's face.

Then, we crafted a Gaussian Process generating symmetric deformation fields by symmetrizing a Gaussian kernel. Finally, we preformed a finite rank decomposition of the GP, retaining only the 100 most prominent basis functions.

Using this low rank Gaussian process, we can now sample symmetric deformation fields defined over all the points of the high resolution mesh:

val  defField = lowRankGP.sampleAtPoints(paola)
show(defField, "higherRes Deformation")

Finally, we can build a Statistical Mesh Model making use of this finite rank Gaussian Process to deform the high resolution mesh of Paola:

val  model = StatisticalMeshModel(paola, lowRankGP) 
show(model, "SymmetricFaces")
Exercise: use the GUI to sample a few faces from the crafted model. Feel free then to build other StatisticalMeshModel(s) making use of different kernels and kernel parameters (for example a kernel combining a symmetrized Gaussian with the sample covariances of the model stored in model.h5 file).