Principles of Operation: Camera Frame Stacking 3D

3D Imaging via Camera Frame Stacks and Stitching

A standard microscope objective has several key specifications, such as its lateral magnification, numerical aperture, working distance, and so forth. The parameter of central interest here is one that is primarily determined by the numerical aperture, called the depth-of-field (DOF). In ‘object space’— the space where the sample being examined lives— the depth-of-field is defined as the sum of the distances in front of and behind the best-focus plane over which you would say that the focus of the sample features is still ‘pretty good’. Any surface features closer or further away than this, i.e. anything that falls outside of the DOF, you would say looks fuzzy, or completely out of focus. The DOF of an objective is like a vertical window over which you can see things clearly. You can shift this vertical window position to control what is in and out of focus by moving the microscope head up and down on its vertical stage. This basic operational fact is the essence of how the 3D camera frame stacking technique works. For example, let’s say you are looking at a grain of sand resting on a microscope slide under the microscope and you want to know the overall height and width of the grain of sand. A snapshot of the microscope image will give direct information about the xy dimensions of the grain, but no information about z. Z information can be determined manual in three basic steps: 1) Focus the microscope view on the glass slide and note the z stage position. Call this height Zb. 2) Move the microscope head up to bring the top of the grain of sand into focus. Call this height Zt. 3) Finally, the height difference Zt - Zb gives the height of the grain of sand. If this procedure is automated with a computer and motorized Z stage, so that a series of camera images is taken at some some uniform height spacing (typically every 2 to10 µm) across the total height of the object, then a “camera frame stack” is obtained which the computer can analyze frame-by-frame to find where the in-focus pixels are located in each camera frame. In this way an x,y,z position value is assigned to each pixel covering the object of interest, and a 3D image is obtained.

The computer algorithm must have some metric for determining in which camera frame of the stack a given x,y pixel is in sharpest focus. This is usually the number calculated by the application of a square kernel which operates over a small sub-region of the picture centered on x,y. The simplest example of this is the 3x3 “edge-detection” Laplacian kernel:

Below is a real-world example of the 3D Camera Frame Stack technique to illustrate how it works. Figure 1 shows one quadrant of the surface of a 0.125 inch (3.175 mm) diameter Torlon ball bearing, as viewed through a Nikon MM60 microscope with a 4x objective. Notice that with this objective/sample combination only a small portion of the surface is in focus in the camera view at a given objective z position. A camera frame stack was acquired to span this bearing’s (hemispherical) z range with170 frames recorded at 10 µm intervals.

Figure 1 A Torlon ball bearing surface as viewed through a microscope with a 4x objective. The highlighted arc is the only portion of the view that is in focus.

A graph of the computed focus signal at one of the pixels, as a function of the camera frame stack number, is shown in Figure 2. For this pixel the best focus occurred in camera frame 77. The frames in the stack were obtained at 10 µm intervals, so a z height of 770 µm would be assigned to this pixel. The noise in the focus signal, which is clearly evident before and after the signal peak, is mostly due to the bit noise in the camera’s image data. Edge detection algorithms are sensitive to bit noise.

Figure 2 The focus signal computed over a 7x7 pixel area of the surface, through the 170 frame stack.

Figures 3 and 4 show the result of processing all the pixels in the frame stack to generate a final 3D view of the bearing surface. Figure 4 was created using a presentation style that accentuates the step-detail in the calculated surface, which is really not a true reflection of the actual bearing surface. The surface steps are an artifact produced by the lateral and vertical resolution limitations of this imaging technique.

The vertical resolution limitation is set by the vertical spacing between the camera frames in the stack. If the frame spacing is 10 µm then the z steps in the image data will correspondingly be 10 µm. The minimum practical spacing for the frames is determined by the DOF of the objective employed. A tradeoff must be made: Using an objective with a higher magnification and larger numerical aperture will decrease the DOF of the camera view, allowing a tighter frame spacing to be used, but this is at expensive of reducing the surface area that can be imaged in one camera frame, and increasing the time to record and process the frame data. Note that it is possible to increase the z resolution somewhat by interpolating between the frame data points. In practice, camera data bit noise restricts the accuracy of frame interpolation, and only about a factor of 2 improvement in resolution is typically achieved.

The lateral resolution limitation of this technique (aside from the objective’s inherent lateral resolution limit) is determined by the size of the kernel used in the focus signal calculations. Any high-contrast features on the sample surface tend to be ‘spread out’ in their contribution to the focus signal for many adjacent pixels. Again, there is a tradeoff to be made: a larger kernel will produce a better signal-to-noise ratio, but this is at the expense of reducing the lateral resolution of the final 3D image.

Figure 3 Top view of the computed 3D surface of the bearing.

Figure 4 Light-shaded view of the same data presented in Figure 3.


To successfully apply the Camera Frame Stack 3D technique there are other factors which need to be considered, including 1) Brightfield illumination vs. darkfield Illumination, 2) Illumination source uniformity, and 3) Specular vs. diffuse surface reflectivity. Also, it is essential to have a good image stitching algorithm to complement the Frame Stack processing algorithm, because one of the strengths of this technique is that it can be extended over a very large surface area by creating several overlapping sub-images of a large object (larger than the FOV of the objective) , and then stitching them together to create a final large scale image. In general, any resolution limitations present in the sub images scale-down in importance as the size of the composite image is increased. An example of image stitching is given below in Figure 5

Figure 5 This is a composite view of a brass ball surface created from 64 overlapping sub-images. The closest part of the sphere is dented.