## Abstract

Mutual-information maximization is one of the most popular algorithms for automatic image registration. However, many implementation issues have not been evaluated in a single, coherent context. **Methods:** Twenty-one registrations between MR and SPECT brain images (8 patients) were achieved by mutual-information maximization with different implementation strategies. The results of a popular strategy were chosen as the standard. All other results were compared with the standard, and the statistics of misregistrations were computed. The registration speed, accuracy, precision, and success rate were assessed. **Results:** Compared with trilinear interpolation, nearest-neighbor interpolation slightly sped the registration process, but with a lower success rate. The number of bins used to estimate the probability density function (pdf) affects the speed and robustness. Using fewer bins yielded a less robust registration. Adaptively changing the number of bins increased the registration speed and robustness. Simplex optimization increased the registration speed considerably, with a slightly degraded success rate. Simplex optimization with adaptive bin strategy improved the success rate and further decreased the registration time. Multiresolution optimization yielded a better success rate, with little effect on the accuracy and precision of registration. An increase in the number of resolution levels increased the success rate. Multisampling optimization also improved the success rate, but the results were less accurate and precise than those obtained with multiresolution optimization, with an increase in the number of levels decreasing the performance. Segmentation affected the registration speed and success rate. Because segmentation is problem specific, the effects were not conclusive. **Conclusion:** Different implementation strategies considerably affect the performance of automatic image registration by mutual-information maximization. On the basis of the experimental findings, we suggest that the best implementation strategy would include trilinear interpolation, adaptive change of the number of bins when estimating pdf, and exploitation of a simplex optimization algorithm with a multiresolution scheme.

Image registration and fusion of medical images are increasingly useful in research and patient care (1–3). Accurately associating data points on different images is required for diagnosis, treatment planning, and detection of changes. Many registration methodologies have been developed, reviewed (4), and evaluated (5–7).

From a practical point of view, automatic registration is desirable because its results are objective and because quantitative analysis is possible. Among the various automatic registration methods, maximization of mutual information of voxel intensities (8,9) is one of the most popular for multi- as well as single-modality image registration. The method is robust and accurate and does not require segmentation and preprocessing (5–7).

Mutual information is an information-theoretic concept. It quantifies the similarity between 2 random variables A and B and is defined by:
where p_{AB}(a, b) is the joint probability density function (pdf) and p_{A}(a) and p_{B}(b) are the marginal pdf’s. In the context of image registration, random variable A is the voxel value in the reference image and random variable B is the voxel value in the floating image. A general flowchart of the implementation of mutual-information registration is given in Figure 1, where I_{1} and I_{2} are reference and floating images respectively. When the registration parameters are applied to the floating image, interpolation is generally needed such that the voxels are mapped to grid points. To compute the mutual information, one needs to estimate the pdf. To decide if a registration is optimal and, if it is not, how to update the registration parameters, one needs to use some optimization scheme.

Many implementation issues are involved in mutual-information image registration, such as interpolation methods, estimation of marginal and joint pdf’s, optimization algorithms, multiresolution and multisampling schemes, and segmentation options. Although the reports (8–13) dealing with mutual-information registration discuss some aspects of these issues, not all of these parameters have been evaluated in a single, coherent context. This study evaluated the use of different implementation strategies on 21 MR and SPECT brain image registrations, and we determined the best implementation strategy.

## MATERIALS AND METHODS

### Images

This study involved 8 patients who underwent brain MRI and nuclear transmission and emission scanning. The pertinent information on image files is in Table 1. The emission images using ^{201}Tl were excluded from this study because of their poor readability. Thus, 21 image pairs were available to be registered. All nuclear images were registered to their MRI counterparts.

### Image Registration

All image pairs were registered by mutual-information maximization. We did not try to hand-register the images first. The source code of commercial software (Image Volume Registration; Philips Medical Systems, Cleveland, OH) was used as the base and was modified to incorporate different implementation strategies. The computer used was an AlphaStation (Digital Equipment Corp., Maynard, MA) with 192 MB of memory. The operating system was UNIX (version 4.0D; Digital).

All registration transformations were restricted to the rigid-body type. The registration parameter is a 6-dimensional vector, (θ_{x,} θ_{y,} θ_{z,} t_{x}, t_{y}, t_{z}), where θ_{x}, θ_{y}, and θ_{z} are rotation angles in degrees around the *x-*, *y-*, and *z-*axes, respectively, and t_{x}, t_{y}, and t_{z} are translation offsets in millimeters along the *x-*, *y-*, and *z-*axes, respectively.

### Interpolation

When the registration transform was applied to the nuclear image, the original grid in the nuclear images was transformed to a generalized nongrid coordinate system. The registration process needed to access the voxel values at grids of the transformed image. Thus, interpolation had to be used.

There are different interpolation methods. Those most popular are nearest neighbor and trilinear. Maes et al. (9) reported a trilinear partial-volume distribution method that is not an interpolation per se. In this study, we used nearest-neighbor and trilinear interpolations and compared their impact on registration.

### Histogram

To compute the mutual information, one must have the marginal and joint pdf’s of voxel values and their pairs. Two methods of estimating those pdf’s have been published. Wells et al. (8) used a Parzen windowing method, which is time consuming. Maes et al. (9) used a histogram method, which is popular and used by many others (8–13). Because the latter approach is quick and easy to implement, we adopted it for this study.

To estimate the histograms, one maps the voxel values to an array or a sequence of bins. The number of bins is an important parameter. In our study, we used the same number of bins, n, for both images when estimating the marginal pdf’s and used n × n when estimating the joint pdf. Values tested for n were 16, 32, and 64. The voxel values were linearly mapped to bins. Nonlinear mapping was not tried.

Multiresolution and multisampling optimization was used to avoid local minima. Suppose the image size is 16 × 16 × 16. There then are, at most, 16 × 16 × 16 gray-value pairs (when all voxels overlap). If 64 × 64 bins are used to estimate the joint pdf, then on average, there is only 1 voxel pair per bin. As a result, the statistical error in the joint pdf estimation would render the result meaningless. Thus, a small number of bins is desirable. As the following results show, however, a small number of bins can yield less robust registration, implying that a large number of bins is required. To solve this paradox, we adaptively changed the number of bins. The number of bins was heuristically set to the value of the resolution; that is, if the resolution is 32, the number of bins is 32 (and 32 × 32 for the joint pdf). We did not try to optimize the number of bins at each resolution.

### Optimization

The optimal registration is found by an iterative process, and iterative optimization is an important part of a registration method. Different optimization algorithms have been used previously: Powell, simplex, steepest decent, conjugate gradient, quasi-Newton, Levenberg-Marquardt least squares, and simulated annealing, among others (8–14). Because Powell and simplex are the most frequently used algorithms in this context, we evaluated only these. In the simplex algorithm, a simplex in hyperspace is deformed in a well-defined manner to enclose the minima (14). The Powell algorithm, a direction-set method in multidimensions, loops on a set of directions and finds the minima along each direction. Brent’s method in 1-dimensional search was used in our implementation (14). The direction matrix was initialized to a unitary matrix, consisting of the vector (θ_{x}, θ_{y}, θ_{z}, t_{x}, t_{y}, t_{z}). The order of these parameters was fixed, and the termination condition for both algorithms was the same (tolerance = 0.001).

### Multiresolution and Subsampling

Optimization methods cannot guarantee that a global optimal value will be found. The value can easily be trapped to a local minimum. To find a true global optimal value, simulated annealing or genetic algorithms can be applied. The speed of these algorithms limits their application to 3-dimensional image registration. In practice, an approach using multiresolution and subsampling proves to be helpful (10).

In both multiresolution and subsampling, the idea is to register the coarse images first and then to use the result as the starting point for fine-image registration. For multiresolution, the coarse image is obtained by averaging the voxels in the sampled neighborhood. For subsampling, the coarse image is obtained simply by sampling the original input image.

The number of levels (i.e., the number of resolutions or subsamplings) in multiresolution and subsampling and the resolutions or sampling frequencies in each level can influence the registration performance. In this study, we fixed the sizes of images at 8 × 8 × 8, 16 × 16 × 16, 32 × 32 × 32, or 64 × 64 × 64 for both multiresolution and subsampling. When the number of levels was 4, images of all above-mentioned sizes were used, from coarse images to fine images. When the number of levels was 3, 2, or 1, only the latter 3 finer images, 2 finer images, or 1 finer image, respectively, was considered.

### Analysis of Registration Results

For these retrospective registrations, the ground truths are unknown and we cannot assess the absolute accuracy of different implementation strategies. To assess the influence of implementation parameters on registration results and compare the registration performance under different conditions, we established an artificial standard. We defined a standard implementation as one that uses trilinear interpolation, a fixed histogram bin (64 × 64) to estimate the marginal as well as joint histogram, and the Powell optimization algorithm, with a multiresolution strategy (resolutions change from 8 to 16 to 32 to 64). This implementation is typical of those used by other researchers (except that we used more levels of resolution) (10).

The misregistration parameter was defined as the difference between the registration parameter and the standard. A study has shown that a trained clinician can detect differences from the registration parameter of 4° in the *x-* and *y-*rotation angles, 2° in the *z-*rotation angle, 2 mm in the *x-* and *y-*translations, and 3 mm in the *z-*translation (10). If all misregistration parameters were within these detection thresholds, the registration was regarded as a success. If any misregistration parameter was outside this detection threshold, the registration results were visually assessed, because if the standard registration parameter is on the borderline of a failed registration and the current registration is on the opposite borderline, the difference between them can be as large as twice the detection threshold. If the visual inspection identified the registration as being good, it was treated as a success.

The mean and SD of the misregistration parameters of successful registrations were then calculated. If the mean was close to zero, the result was regarded as accurate. If the SD was small, the result was regarded as precise. A robust implementation gave an accurate and precise registration with a high success rate.

The accuracy of registration can also be evaluated by the absolute position error of selected regions or points. This error is known to comprise the error in translation offsets and the error caused by rotation angles (5). Thus, we did not report the absolute position error for this study.

The average run time of each implementation was compared with the standard implementation. If the other implementation ran faster than the standard, we defined a speedup as: otherwise, we defined a slowdown as:

## RESULTS

### Standard Implementation

The standard implementation achieved 20 successful registrations (95%) as judged by visual inspection, with an average run time of 170 s. A typical registration is shown in Figure 2. The angle parameter ranged from −24° to 0°, and the translation parameter ranged from −12 to 54 mm. We replaced the failed registration with the registrations given by the adaptive bin scheme that was judged to be successful by visual inspection. As the following results show, the adaptive bin scheme gave registrations very close to those of the standard, in terms of the extent of the difference and the SD of the difference. If the failed registration had been a success, the registration results would have been close to those given by the adaptive bin scheme. Thus, this replacement did not have an adverse effect on our evaluations; that is, what was replaced was reliable.

The registration achieved through mutual-information image registration has been reported to have a subvoxel accuracy (5). Visual inspection of the results of standard registration was satisfactory. The results of this standard implementation were also consistent when different initial registrations were introduced. Thus, we assume that the standard registration is close to the unknown truth.

### Interpolation

To evaluate the effects of nearest-neighbor interpolation, we used that interpolation and kept all other implementation parameters the same as for the standard implementation. The results showed that this implementation failed to register 5 image pairs, with a success rate of only 76%. The mean and SD of misregistration parameters are shown in Table 2. The average run time for this implementation was 161 s, which amounts to a speedup of 6%.

Comparing the data in Table 2 with those in Tables 3–6, we find that the implementation using nearest-neighbor interpolation has a large SD. It is generally believed that nearest-neighbor interpolation can speed up the registration because less computation is involved in the interpolation. Our data do not support this belief. It is true that each iteration cycle takes less time if nearest-neighbor interpolation is used, but the registration time is determined by the number of iterations and the time per iteration. The implementation using nearest-neighbor interpolation is likely to take more iterations. Because the nearest-neighbor interpolation is insensitive up to 1 voxel, there was concern that subvoxel registration accuracy could not be achieved. Our data indicate that although this implementation yields a result with a large SD, the registration result, compared with that obtained using the standard implementation, still achieved subvoxel accuracy (the error from translation offsets was 1.24 mm on average).

### Bin Size

In the standard implementation, we changed the number of bins to 16 × 16 and 32 × 32. The number of bins was also changed adaptively. The statistics of that implementation are tabulated in Table 3.

When the number of bins was small, less computation was involved. However, this did not necessarily speed up registration, as the data in Table 3 reveal. When the bin was 16 × 16, the registration slowed by 5%, from 170 to 178 s. When the bin was 32 × 32, the speed improved by 11%. The success rate for the latter case was reduced. When the number of bins was adaptively changed, the speedup was 13%.

When the number of bins was smaller, the mean and SD of the misregistration parameters were larger, indicating that a smaller number of bins may degrade registration accuracy and precision. The adaptive scheme registered all image pairs better than did the standard implementation, and the results were accurate and precise.

### Optimization Algorithm

The Powell direction-set method and the simplex downhill method were the optimization algorithms used for comparison, and the results are shown in Table 4. The result of simplex optimization with a fixed number of bins was close to that of the standard but failed to register 2 image pairs. The speedup was 91%.

When we added the adaptive bin scheme to simplex optimization, all image pairs were successfully registered, and the registration speed was improved further (speedup, 110%). However, the mean and SD of δθ_{x} and δt_{y} increased. The misregistrations of the 2 cases that failed by the simplex-only method were large. Nevertheless, the registration quality was good by visual inspection.

### Multiresolution and Subsampling Strategy

The implementation using different levels of resolution or sampling were studied, and the statistics of the misregistration parameters are reported in Table 5. For multiresolution, when the number of resolution levels was small, the success rate decreased and the processing time increased. However, accuracy and precision were practically independent of the number of levels. The registration speed was related to the number of levels. The registration slowdowns were 32%, 9%, and 6% for 1, 2, and 3 levels, respectively. The standard implementation (with 4 levels) was the fastest.

For subsampling, the success rate was maximized when the number of sampling frequencies (subsampling levels) was 2. Further increasing the number of sampling frequencies did not improve the success rate. In fact, the success rate decreased when the number of sampling frequencies was 3 and 4, probably because the interdependency among neighboring voxels decreased. The means and SDs of δt_{y} and δt_{z} were large, indicating that the subsampling strategy was not accurate and precise and thus was not ideal. The registration slowdowns were 27%, 16%, 24%, and 26% for 1, 2, 3, and 4 sampling frequencies, respectively. On average, the multiresolution approach was faster than the subsampling approach.

The multiresolution and subsampling registrations were not identical for 1 resolution level and 1 subsampling level. This occurs because the voxels in the high-resolution MR image are averaged in the multiresolution technique but are directly sampled in the subsampling technique. The nuclear image in either case was the same.

In the multiresolution registration, the neighboring voxels were averaged. Averaging was slower than pure sampling but helped the overall registration performance.

### Segmentation Effects

A meaningful segmentation may depend on the image content and the properties of the voxel-value histograms (13). A fixed segmentation of this kind would not be applicable to all images. Nevertheless, our segmentation was based on the percentile of voxel values. If the voxel value was less than some percentage of the maximum voxel value (background noise, for example), it was clamped. If the voxel value was larger than some percentage of the maximum voxel value (hot spots in SPECT and PET, for instance), it was clamped. We clamped only 1 end, although low-end and high-end clamping can be applied simultaneously.

The statistics of misregistration parameters are shown in Table 6. The data reveal that clamping the low-end voxels had a more dramatic effect on the registration performance. For example, when we clamped voxels whose values were <10% of the highest voxel value, the success rate was reduced. For <5% and <10% low-end clamping, the SDs of some registration parameters increased, indicating that different parameters have different sensitivities to low-end clamping. Clamping the high-end voxels had almost no effect on the registration performance (if more voxels had been clamped, the performance would have been degraded). The difference in the influence of low-end and high-end voxels was caused by the difference in the population of low-end and high-end voxels.

## CONCLUSION

The performance of mutual-information maximization for registration of MR and SPECT brain images has been evaluated using different implementation strategies, including interpolation methods, histogram bin strategies, optimization algorithms, multiresolution or subsampling schemes, and segmentation options. Our results indicate that different implementation strategies have a disparate performance in terms of speed, accuracy, precision, and success rate. On the basis of this study, we suggest that the best implementation will use trilinear interpolation, adaptively changing the number of bins when estimating pdf, using the simplex optimization algorithm, and exploiting a multiresolution strategy with as many levels as possible. Segmentation can be combined with the registration process, which has to be problem specific. Although this recommendation is based on the study of MR and SPECT images, we hope it will also be applicable to the registration of images from other modalities.

## Acknowledgments

The authors thank Dr. Michael Hartshorne, who provided the image files used in this study. Thanks are also due to the anonymous referees for helpful and constructive comments and suggestions and to Tom Brendza for proofreading the manuscript.

## Footnotes

Received May 10, 2001; revision accepted Oct. 15, 2001.

For correspondence or reprints contact: Yang-Ming Zhu, PhD, Nuclear Medicine Division, Philips Medical Systems, 595 Miner Rd., Cleveland, OH 44143.

E-mail: yang-ming.zhu{at}cle.philips.com