# Building a shape model from data

The goal in this tutorial is to learn how to build a Statistical Shape Model from data in correspondence and assess the importance of rigid alignment while doing so.

In case you would like to learn more about the syntax of the Scala programming language, have a look at the Scala Cheat Sheet. This document is accessible under:

Documents -> Scala Cheat Sheet

###### Additional note: this particular tutorial might be needy in memory. If possible, increase the allocated memory when launching Scalismo Lab:
`java -Xmx4g -jar scalismoLab.jar`

Let us load a dataset of faces based on which we would like to model shape variation:

```val files = new File("datasets/nonAlignedFaces/").listFiles
val dataset = files.map{f => MeshIO.readMesh(f).get}
(0 until dataset.size).foreach{i => show(dataset(i),"face_"+i)}```

Two things are to notice regarding this dataset:

1 - The shapes are not aligned, as you can see by the multitude of noses :)

2 - The shapes are in correspondence: This means that for every point on one of the face meshes (corner of eye, tip of nose, ...), one can identify the corresponding point on other meshes. Using the same identifiers for corresponding points ensures this.

##### Exercise: verify that the meshes are indeed in correspondence by displaying a few corresponding points.
```val ptId = PointId(8156)
// ptId: scalismo.common.PointId = PointId(8156)

val pts = dataset.map(mesh => mesh.point(ptId))
// pts: Array[scalismo.geometry.Point[scalismo.geometry._3D]] = Array(Point3D(17.899899,2.2916002,138.78868), Point3D(72.836655,2.5988302,126.0999), Point3D(-0.352555,6.08609,127.84401), Point3D(49.839844,2.536,121.116196), Point3D(33.927753,6.2804303,129.24889))

show(pts.toIndexedSeq, "corresponding?")```

#### Rigidly aligning the data:

As we saw previously, in order to study shape variations, we need to eliminate variations due to relative spatial displacement of the shapes (rotation and translation).

We can achieve this by selecting a reference first and then aligning the rest of the dataset to the reference:

```val reference = dataset.head
val toAlign : IndexedSeq[TriangleMesh] = dataset.tail

val pointIds = IndexedSeq(2214, 6341, 10008, 14129, 8156, 47775)
val refLandmarks = pointIds.map{id => Landmark("L_"+id, reference.point(PointId(id))) }

val alignedSet = toAlign.map { mesh =>
val landmarks = pointIds.map{id => Landmark("L_"+id, mesh.point(PointId(id)))}
val rigidTrans = LandmarkRegistration.rigid3DLandmarkRegistration(landmarks, refLandmarks)
mesh.transform(rigidTrans)
}```

Here, we started by splitting our dataset into a reference that we chose to be the first element of our mesh list (head of the list), and the rest of the meshes that need to be aligned to the reference.

Then, given that our dataset is in correspondence, we specify a set of point identifiers that we use to locate corresponding points on the reference mesh and on each of the meshes to be aligned to the reference.

After locating the landmark positions on the reference, we iterate on each remaining data item, identify the corresponding landmark points and then rigidly align the mesh to the reference.

Now, the IndexedSeq of triangle meshes alignedSet contains the faces that are aligned to the reference mesh.

### Building a discrete Gaussian process from data

Now that we have our set of meshes that are in correspondence and aligned to our reference, we can turn the dataset into a set of deformation fields, as we saw previously:

```val defFields :IndexedSeq[DiscreteVectorField[_3D,_3D]] = alignedSet.map{ m =>
val deformationVectors = reference.pointIds.map{ id : PointId =>
m.point(id) - reference.point(id)
}.toIndexedSeq

DiscreteVectorField(reference, deformationVectors)
}```

And finally compute a Discrete Gaussian Process using the data above:

```val continuousFields = defFields.map(f => f.interpolateNearestNeighbor )
val gp = DiscreteLowRankGaussianProcess.createUsingPCA(reference, continuousFields)```

Here, we performed Principal Component Analysis (PCA) to compute the returned DiscreteLowRankGaussianProcess that is defined over the reference mesh.

This Gaussian process now models a normal distribution of discrete vector fields that are defined over the points of the reference mesh.

The mean of the obtained process is in this case the mean of all the deformations observed from the data.

The covariance function, or kernel of this Gaussian Process is the sample covariance, that is the matrix resulting from evaluating the covariance between the deformation vectors observed in the data at the points of the reference mesh.

##### Exercise: display the mean deformation field of the returned Gaussian Process.
`show(gp.mean, "meanDef")`
##### Exercise: sample and display a few deformation fields from this GP.
`show(gp.sample, "sampleDef")`
##### Exercise: using the GP's cov method, evaluate the sample covariance between two close points on the right cheek first, and a point on the nose and one on the cheek second. What does the data tell you?
```gp.cov(PointId(27136), PointId(729))
// res7: scalismo.geometry.SquareMatrix[scalismo.geometry._3D] =
// [[26.04408,6.333006,20.163792]
// [5.921082,5.7802024,12.329612]
// [19.182106,11.124177,31.499052]]

gp.cov(PointId(7565), PointId(729))
// res8: scalismo.geometry.SquareMatrix[scalismo.geometry._3D] =
// [[0.123047136,0.5420395,0.9213879]
// [12.631436,3.1701255,9.51926]
// [-2.8634527,-3.6559103,-8.400954]]

/*
The matrices obtained when evaluating the covariance between 2 points of the mesh indicate how similar the sampled deformation vectors defined
over these points should be.

When comparing the entries of both matrices, you should see that the first one has much higher values, especially on the diagonal, indicating therefore that
a sampled deformation vector defined on a cheek point will in general be much more similar to a vector defined on another cheek point than to one defined over a point of the nose. */```

Now that we have a Gaussian process yielding deformation fields and a reference mesh to warp using these fields, we can build our StatisticalMeshModel as follows:

```val model = StatisticalMeshModel(reference, gp.interpolateNearestNeighbor)
show(model, "model")
(0 until 5).foreach{i => remove("face_"+i)} // cleanup```

The obtained shape model now exhibits the variations seen in the provided data.

### Building a model directly from files

Performing all the operations above every time we want to build a PCA model from a set of files containing meshes in correspondence can be tedious. Therefore, Scalismo provides a handier implementation via the DataCollection data structure:

`val dc = DataCollection.fromMeshDirectory(reference, new File("datasets/nonAlignedFaces/"))._1.get`

The DataCollection class in Scalismo allows grouping together a dataset of meshes in correspondence, in order to make collective operations on such sets easier.

In the code above, we created a DataCollection from all the meshes in correspondence that are stored in the indicated directory.

When creating the collection, we need to indicate the reference mesh. In the case of dc, we chose the reference to be the reference mesh we previously defined:

`dc.reference == reference`

In addition to the reference, the data collection holds for each data item the transformation to apply to the reference to obtain the item:

```val item0 :DataItem[_3D] = dc.dataItems(0)
val transform : Transformation[_3D] = item0.transformation```

Note that this transformation simply consists in moving each reference point by its associated deformation vector according to the vector field deforming the reference mesh into the item mesh.

Now that we have our data collection, we can build a shape model as follows:

```val modelNonAligned = StatisticalMeshModel.createUsingPCA(dc).get
show(modelNonAligned, "nonAligned")```

Here again, a PCA is performed based the deformation fields retrieved from the data in correspondence.

Notice that, in this case, we built a model from misaligned meshes in correspondence.