## Abstract

Images of the inside of the human body can be obtained noninvasively using tomographic acquisition and processing techniques. In particular, these techniques are commonly used to obtain images of a γ-emitter distribution after its administration in the human body. The reconstructed images are obtained given a set of their projections, acquired using rotating gamma cameras. A general overview of analytic and iterative methods of reconstruction in SPECT is presented with a special focus on filter backprojection (FBP), conjugate gradient, maximum likelihood expectation maximization, and maximum a posteriori expectation maximization algorithms. The FBP algorithm is faster than iterative algorithms, with the latter providing a framework for accurately modeling the emission and detection processes.

The basic principle of nuclear medicine imaging is the following: a γ-emitter–labeled pharmaceutical is administered to a subject and an external device, the gamma camera, detects the radioactivity stemming from the body, from 1 or several angles of views. The image obtained at 1 angle of view is the projection of the 3-dimensional (3D) distribution onto the 2-dimensional (2D) detector plane. Because of the projection operation, no information regarding the depth at which disintegrations occur is available; moreover, activities stemming from separate structures may overlap each other on the detector plane, and the contrast may be low. With only 1 projection image, it is impossible to determine the activity distribution because an infinite number of distributions can yield the same projection. It is as difficult as to find 2 values knowing only their sum. However, the overlap observed in the projections depends on the relative positions of the detector and of the structures inside the body. So, more information about the relative positions can be obtained by acquiring projections over a large number of angles of view around the subject. The basic idea of SPECT is to obtain, as accurately as possible, an image of the γ-emitter distribution in any slice of the body, using projections of this image acquired by a rotating gamma camera from several angles of view.

Although many different algorithms for SPECT exist (1), this article presents the basics of the more widely used algorithms: filtered backprojection (FBP), conjugate gradient (CG), maximum likelihood expectation maximization (MLEM), and maximum a posteriori expectation maximization (MAPEM) algorithms. It is not an exhaustive presentation of reconstruction algorithms and, in particular, it does not deal with the 2D Fourier reconstruction (2,3), the algorithms using linograms (4,5), Chebyshev polynomials (6), nonlinear backprojection (7), or those used when projections are acquired with a pinhole collimator (8). We shall not specifically address the reconstruction with fanbeam collimators, because in this case the projection data are usually rebinned to yield parallel projections (9) and, hence, the same reconstruction algorithm can be used for both sets of projections. Also, physical phenomena such as attenuation, scattering, and so forth are not under consideration here. Presentations of image reconstruction, without mathematics, can be read before reading this article (10,11). The appendices at the end of this article contain some important equations, very useful to better understand the principles described in the body of the text. In-depth, mathematic presentations of principles of CT can be found (1,12,13).

## PRESENTATION

The 3D dataset of γ-emitter distribution is obtained in SPECT by piling up many slices usually reconstructed independently. For this reason, we will consider in the following the reconstruction of 1 slice only. Data acquisition is schematically illustrated in Figure 1. The detector rotates around the object of interest and allows one to observe the pattern of γ-emission in the field of view for many angles. We define *g*(*s*, θ) as the number of scintillations detected at any location *s* along the detector when the detector head is at an angular position θ. We also define the quantity *f*(*x*, *y*) as the estimated number of photons emitted at any point (*x*, *y*) of the transverse slice in the field of view. This unknown quantity is assumed to be proportional to the tracer concentration we are interested in. The function *g* is the projection of *f* onto the crystal as allowed by the collimator. This means that *g*(*s*, θ) is the sum of the radioactive counts recorded in any time interval at point *s* when the detector is at angle θ. The collimator defines the kind of projection (e.g., an orthogonal projection for parallel collimators) and determines the direction of the incident photon for any scintillation in the crystal. The difficulty is that photons emitted at different depths, but along the same direction, can potentially produce scintillations at the same location in the crystal; thus, the distance between the emission site and the scintillation site is unfortunately unknown. As a consequence, the amount of information brought by only 1 projection is insufficient to obtain an image of the tracer distribution in the organ of interest, because a large number of radioactive distributions can generate the same pattern in a single projection. Fortunately, the number of possible solutions can be reduced in acquiring projections for many distinct angular positions of the detector. Moreover, as the number of projections increases, the possible solutions are increasingly alike.

At the end of the acquisition process, each point of the detector, for each angular position, contains the number of scintillations, or counts. A common representation for the projection data *g* corresponding to a slice is the sinogram (Fig. 2). A sinogram is a 2D image, in which the horizontal axis represents the count location on the detector, and the vertical axis corresponds to the angular position of the detector. The corresponding number of counts is assigned to each point of the sinogram. The problem is: Given a sinogram *g*, what is the distribution of radioactivity *f* in the slice of interest? We shall first describe the projection process when a parallel collimator is used and then see how an estimate of the activity distribution in the slice can be found using the FBP, CG, MLEM, and MAPEM algorithms.

## PROJECTION OPERATOR

### Geometric Considerations

The collimator allows only photons whose direction is parallel to the axis of its holes to be potentially detected. For each detector angle θ, and for each location *s* on the detector, this direction is defined by a line *D*′ whose equation is (Appendix 1):
Eq. 1 Let us first consider the projection operation, giving the number of counts detected in any point of the detector line *g*(*s*, θ) as a function of the number of counts *f*(*x*, *y*) emitted in any point of the field of view (here, we consider the ideal case, without noise, attenuation, scattering, numerical approximations, and so forth).

### Radon Transform

Mathematically, the projection operator can be defined by the Radon transform (13,14). The Radon transform *g*(*s*, θ) of a function *f*(*x*, *y*) is the line integral of the values of *f*(*x*, *y*) along the line inclined at an angle θ from the *x*-axis at a distance *s* from the origin:
Eq. 2 Because an integral is basically a sum of values, the value *g*(*s*, θ) is the sum of the values *f*(*x*, *y*) along *D*′. For this reason, *g*(*s*, θ) is called a ray-sum. The variable *u* defines the location of the points to be summed (the points along *D*′).

Because computers can only deal with a finite number of elements in the detector and in the field of view, *g*(*s*, θ) and *f*(*x*, *y*) are in practice functions of discrete variables—that is, the variables *s*, θ, *x*, and *y* have a finite number of possible values. In this presentation, the elements of the slice are the pixels, and each point of measurement on the detector, for each projection angle, is called a bin. So the number of bins equals the number of points of measurement multiplied by the number of angles. Figure 3 presents an example of a discrete projection for a 3 × 3 image at 2 angles. The result is a sinogram with 2 rows (corresponding to 2 projection angles) and 3 columns (corresponding to 3 points of measurements on the detector), so there are 6 bins.

From a mathematic point of view, the sets of values in the sinogram and the reconstructed slice can be considered as matrices or vectors. A matrix is a set of values organized in rows and columns; a vector is a set of values in 1 column. Readers unfamiliar with matrices should pay special attention to the easy concept of matrix product defined in Appendix 2, because it is a fundamental tool to understand image reconstruction. In Table 1 is listed the notation we will use throughout this presentation.

In the following, we will consider the sinogram and the slice as 2 vectors; the vectors are formed by stacking the rows of the matrices. In these vectors, the location of a bin is known by its index *i* and the location of a pixel by its index *j*. It can be shown that the vector *g* is the matrix product of matrix *A* and vector *f* (Appendix 3). The value *g*_{i} of any bin is a weighted sum of the *m* pixel values in the image:
Eq. 3 This is the discrete formula for the projection operation, and the essential point here is that the discrete projection operation can be defined as a matrix product. This product can be concisely written as *g* = *Af*, where *A* is called the forward projection operator (the adjective “forward” is usually omitted). In other words, the projection operation allows one to find the sinogram given the slice. An element of the matrix *A* is located in this matrix according to its row and its column. For example, the element *a*_{ij} is located at the *i*th row and *j*th column of the matrix. The point here is that any *a*_{ij} can be seen as a weighting factor representing the contribution of pixel *j* to the number of counts detected in bin *i*, or as the probability that a photon emitted from pixel *j* is detected in bin *i*. The use of zeros or ones in the matrix *A* can be interpreted as the binary choice “a given pixel contributes or does not contribute to the number of counts detected in any bin.” However, to get a more realistic model, we want to be able to modulate this contribution. To do so, the *a*_{ij}s may not be necessarily equal to 0 or 1 but can be any real values between 0 and 1. As we shall see below, the correct determination of the *a*_{ij}s is important because the iterative algorithms use the projection operation. The values are carefully chosen to take into account the geometry of acquisition and, more precisely, the fact that a variable fraction of a pixel contributes to the counts detected in a given bin, depending on their relative positions and the angle of acquisition. Moreover, the camera response (that especially depends on the collimator) and physical phenomena such as attenuation and scatter can be efficiently modeled by choosing the appropriate values for the *a*_{ij}s. The capability to account for physical phenomena and for a nonideal imaging system is one of the advantages of iterative approaches compared with the FBP method.

Theoretically, direct methods exist to solve the system *g* = *Af*. A method called direct inversion consists in the determination of the invert of *A*, noted *A*^{−1}, because *f* = *A*^{−1}*g*. This method has several flaws: (a) It is computationally intensive for current computers, even for 64 × 64 images; (b) *A*^{−1} may not exist; (c) *A*^{−1} may be not unique; (d) *A*^{−1} may be ill-conditioned—i.e., small changes in the data *g* may produce large differences in the result *f*—for example, when *g* is noisy.

Unfortunately, in practice, the matrix *A* is ill-conditioned, because of the noise in the projections. Thus, the direct methods are not widely employed. We shall present now the algorithms commonly used in SPECT: the FBP, CG, MLEM, and MAPEM algorithms.

## ANALYTIC RECONSTRUCTION METHODS

### Backprojection Operator

Associated with the projection, the backprojection operation is defined as:
Eq. 4 Backprojection represents the accumulation of the ray-sums of all the rays that pass through any point *M*(*x*, *y*). Remember that for a given point *M* and for a given projection angle θ, *s* is given by Equation 1. Applying backprojection to projection data is called the summation algorithm. In ideal conditions (in particular, without attenuation), the projections acquired at angles between π radians (180°) and 2π radians (360°) do not provide new information, because they are only the symmetric of the projections acquired at angles between 0 and π. This is why the orbit can be here limited to π radians. This redundancy of information is illustrated in Figure 2, where the lower half of the sinogram is identical to the upper half after symmetry along the vertical axis (inverting the left and the right).

Replacing the integral in Equation 4 by a sum, the implementation for a discrete backprojection is given by:
Eq. 5 where *p* is the number of projections acquired over π radians, θ_{k} is the *k*th angular position of the detector, *s*_{k} is the location along the detector, and Δθ is the angular step between 2 successive projections (Δθ = π/*p*). In the following, for sake of simplicity, we will use 1/*p* as the angular step and will ignore the factor π. Equation 5 can be interpreted as follows: The goal is to find b̃(*x*, *y*), the result of the backprojection at point *M*(*x*, *y*). For each angle θ_{k}, using Equation 1, find the location *s*_{k} (on the detector) that is the projection location of point *M*(*x*, *y*). Add the value *g*(*s*_{k}, θ_{k}) to the current value of point *M*(*x*, *y*) (initial value should be 0). Repeat this process for all angles, and divide the sum by the number of projection angles. A simplified example of backprojection can be seen in Figure 4. Distinct images can yield the same projection(s), but only for a limited set of angles; if 2 images are distinct, then one can find 1 or more angles at which their projections are different (Fig. 5). An infinite number of projections is theoretically required to perfectly reconstruct an object. When the number of projections is small relative to the matrix size, a particular artifact may be visible: the star (or streak) artifact. An illustration of this artifact is presented Figure 6. It can be reduced by increasing the number of acquired projections, or by interpolation (15).

The contribution of a given pixel to a given bin is variable, because it depends on their relative positions and the angle of acquisition. This has to be taken into account during the backprojection. At least 3 practical solutions exist: With the ray-driven method (16), the value in each bin (i.e., the ray-sum) is added to the image at pixels the ray goes through, but the addition is weighted according to the ray length in each pixel (Fig. 7A). With the pixel-driven approach, the projection site of the center of each pixel is calculated; the ray-sum value at this particular projection site is determined by interpolation using the values in the closest bins (Fig. 7B). A third approach is implemented in 3 steps: A line of the sinogram is replicated into the columns of a matrix (i.e., the line is backprojected). Then, this matrix is rotated about the center of rotation of the imaging system. Finally, this matrix is added to the matrix of the image under reconstruction (17).

The main point of this part is that the backprojection operation is not the inverse of the projection operation. This means that applying backprojection to the ray-sums *g*(*s*, θ) does not yield *f*(*x*, *y*) but a blurred *f*(*x*, *y*) (Figs. 8 and 9). This problem is due to the fact that during the backprojection process, each bin value is attributed to all pixels that project onto that bin and not only to the pixel(s) the activity is coming from (because the source location is unknown). In the following, we shall see how to reduce this blur.

### Backprojection Filtering and FBP

It is important to understand the concept of spatial frequency to apprehend the role of filtering in image reconstruction. The frequencies of an image are similar to the frequencies of a sound. A high frequency in a sound is heard when its amplitude varies quickly over a given period. A high frequency in an image is seen when its amplitude in any direction varies quickly over a given distance (and that is why the frequencies in an image are called spatial frequencies). A checkerboard with 4 large squares or with 64 small squares is an example of an image with low or high spatial frequencies, respectively. Any image will usually include several frequencies, when there are large and small objects in it. Using a mathematic tool called the Fourier transform, an image can be split up in several components, each corresponding to a given frequency. For sake of simplicity, we shall consider here only 2 components, the low- and high-frequency components. If we add the 2 components, we will get the original image back. Let us now imagine that the image is rebuilt by adding the 2 components, but suppose that, before the addition, we change the relative weights of the 2 components: For example, we may divide the low-frequency component by a factor of 5 to decrease the blur that is typically present in the low-frequency components. Thus, the image has been filtered to reduce the amplitude of the low-frequency component (the same way as you reduce the bass of your hi-fi using its graphic equalizer), and the result is an image in which the edges are better defined (Fig. 10). Coming back to our problem of image reconstruction, a first solution to decrease the blur is to backproject the projection dataset (using the summation algorithm) and then to filter the result: This is the backprojection filtering (BPF) algorithm. In practice, the image is split up in a large number of frequency components, and a filter defines the weight assigned to each of the components.

With the more commonly used FBP algorithm, the order of filtering and backprojection is reversed: Each line of the sinogram is filtered first, and the filtered projections are backprojected (18). This is mathematically expressed as:
Eq. 6 where ĝ(*s*, θ) is *g*(*s*, θ) filtered with the ramp filter, which gives a weight proportional to its frequency to each of the components. Notice that with this approach, the amplitude of the low-frequency components is reduced in the sinogram, row by row, before the backprojection. The action of filtering in FBP can be understood as follows: After filtering, the projections may contain negative values; the blur is reduced because negative and positive values cancel each other in the vicinity of the edges of the image, so that the edges are sharpened (19) (Fig. 11). Therefore, in the reconstructed image the high-frequency components corresponding to the sharp edges in the image are enhanced, but unfortunately much of the image noise is present at these frequencies. This problem can be partially solved by replacing the ramp filter with a filter that also reduces the weight of the highest-frequency components (Fig. 12). As always, there are trade-offs involved; the reduction in noise comes at the cost of a loss of spatial resolution. More details on image filtering in nuclear medicine can be found elsewhere (18–20).

## ITERATIVE RECONSTRUCTION METHODS

### General Concept of Iterative Methods

We are interested in finding a vector *f* that is a solution of *g* = *Af*. The principle of the iterative algorithms is to find a solution by successive estimates. The projections corresponding to the current estimate are compared with the measured projections. The result of the comparison is used to modify the current estimate, thereby creating a new estimate. The algorithms differ in the way the measured and estimated projections are compared and the kind of correction applied to the current estimate. The process is initiated by arbitrarily creating a first estimate—for example, a uniform image initialized to 0 or 1 (depending on whether the correction is carried out under the form of an addition or a multiplication).

For illustration purpose, we shall start with the additive form of the algebraic reconstruction technique (ART) as an example of iterative algorithm (21). The iterative process is given by:
Eq. 7 where *f*_{j}^{(k)} and *f*_{j}^{(k+1)} are the current and the new estimates, respectively; *N* is the number of pixels along ray *i*; ∑_{j=1}^{N} *f*_{ji}^{(k)} is the sum of counts in the *N* pixels along ray *i*, for the *k*th iteration; and *g*_{i} is the measured number of counts for ray *i*.

Observing Equation 7, we see that (a) the new estimate is found by adding a correction term to the current estimate and (b) the comparison method consists in the subtraction of the estimated projections from the measured projections.

Notice that when the projections of the current estimate are close to the measured projections, the correction factor is close to zero. This algorithm is applied to a 2 × 2 image (Fig. 13). The solution is found by successively applying the Equation 6 to the measured projections. The algorithms described below work basically the same way.

In the following, the CG algorithm (18,22) and the MLEM algorithm (23–27) are discussed. Both are optimization methods—that is, they find the best estimate for the solution fitting a given criterion; in the CG algorithm, the criterion is the minimization of the difference between *g* and *Af*, whereas in the MLEM algorithm, the criterion is the maximization of the likelihood of the reconstructed image.

### Gradient and CG Algorithms

Let us imagine that we have a huge number of images. For each of these images, we compute their projections, and we compare these projections to the measured projections by computing their difference. For some images this difference is large, for other images it is small. So if we plot this difference as a function of the values in the images, we can visualize the plot as a mountainous terrain. The visualization is easier if we first assume that an image has only 2 pixels, because in this case we can create a 3D plot, where the 2 horizontal axes are used to indicate the value of each pixel, and the vertical axis is used to plot the difference. However, the reasoning still holds for a larger number of pixels in the images, although the visualization is much more difficult because the number of dimensions of the plot is equal to the number of pixels in the images. The difference is minimal at the point of the lowest altitude. Our goal is to find the location of that point, because the coordinates of that point are the pixel values of the solution image. In the following, we present a method to find this point.

Actually, computing the difference for thousands of images would be a tedious and lengthy process. Again, we use the analogy with the mountainous terrain to devise a more efficient method: Basically, from a starting point on the mountain, we look for a point of lower altitude, we jump to this point, and this process is repeated until the point with the lowest altitude is reached. The problem is to optimize the direction and the length of the steps, so that the lowest point is reached in a minimal number of steps. Let us imagine the contours—that is, the curved lines joining the points with the same altitude. The gradient lines are curved lines perpendicular to the contours and along which the slope is constant. The more intuitive way to find, as quickly as possible, the point with the lowest altitude is to follow the steepest descent, whose direction is the opposite to the gradient. So, from an arbitrary location, we step in that direction, and the step length can be optimally chosen so that we stop before we are about to ascend (when the direction is parallel to a contour). Then, the gradient found for the new location indicates the new direction to follow, the new step length is found, and the process is repeated until the lowest location is found (22) (Fig. 14). This algorithm is called the gradient algorithm, or steepest-descent algorithm.

Mathematically, this algorithm iteratively searches for *f* using the equation:
Eq. 8 This means that the new estimate *f*^{(k+1)} is equal to the previous estimate *f*^{(k)} plus a vector *p*^{(k)} indicating the new direction (chosen to be opposite to the local gradient, and therefore directed toward the steepest descent), weighted by a coefficient α^{(k)} representing the step length. This coefficient can be chosen to optimize the convergence of the process.

However, the direction given by the gradient method is only locally the best, and the gradient algorithm is not very efficient (i.e., a large number of iterations is usually required). The more efficient conjugate gradient algorithm is basically identical, but it uses a different vector *p*^{(k)}. In this case, the direction of descent is not opposite to the gradient at the current location, but opposite to a combination of this gradient with the gradient found at the previous location. In both gradient and CG algorithms, the correction is additive (each new estimate is found by adding a quantity to the current estimate). Notice that these algorithms may generate negative values in the reconstructed image.

In theory, the conjugate gradient algorithm applied to *g* = *Af* converges in a number of iterations equal to the number *m* of equations of the system (i.e., the number of pixels here). Actually, because of rounding errors in the determination of the conjugate directions, the solution is not reached in *m* iterations. However, it can be shown that the convergence rate of reconstruction (reflected by the number of iterations required to reach a given estimate of the solution) can be increased by using *C*^{−1}*g* = *C*^{−1}*Af* instead of *g* = *Af*, where *C*^{−1} is a matrix judiciously chosen. In this case, *A* is said to be preconditioned (28,29).

### MLEM Algorithm

Again, one wants to solve *g* = *Af*, where *g* is the vector of values in the sinogram, *A* is a given matrix, and *f* is the unknown vector of pixel values in the image to be reconstructed. Measurements are subject to variations due to the Poisson probabilistic phenomena of the radioactive disintegrations. As a consequence, a dataset *g* corresponds to a particular measurement; when the probabilistic phenomena are not taken into account, *f* is the particular solution corresponding to that particular measurement *g*. The goal of the MLEM algorithm is to find a “general” solution as the best estimate for *f*: the mean number of radioactive disintegrations f̄ in the image that can produce the sinogram *g* with the highest likelihood (23–27). This can be done using the Poisson law that allows one to predict the probability of a realized number of detected counts, given the mean number of disintegrations. Thus, each iteration of the algorithm is divided in 2 steps: in the expectation step (E step), the formula expressing the likelihood of any reconstructed image given the measured data is formed, and in the maximization step (M step), the image that has the greatest likelihood to give the measured data is found (26).

This leads to the MLEM algorithm as described by Lange and Carson (24) (Appendix 4):
Eq. 9 The first image f̄^{(0)} can be a uniform disk enclosed in the field of view (30) or an image obtained by FBP (in this case the negative values, if any, must be set to zero or some small positive value). Notice that because the first image is positive (negative values do not make sense) and because each new value is found by multiplying the current value by a positive factor, any f̄^{(k)} cannot have negative values and that any values set initially to zero will remain zero.

The EM algorithm can be seen as a set of successive projections/backprojections (31): The factor *g*_{i}/(∑_{j′=1}^{m} *a*_{ij′}f̄_{j′}^{(k)}) is the ratio of the measured number of counts to the current estimate of the mean number of counts in bin *i*. ∑_{i=1}^{n} (*g*_{i}/∑_{j′=1}^{m} *a*_{ij′}f̄_{j′}^{(k)})*a*_{ij} is the backprojection of this ratio for pixel *j*. Equation 9, which is to be applied pixel by pixel, can be extended to the whole image and interpreted as:
Eq. 10 In other words, at each iteration *k*, a current estimate of the image is available. Using a system model (which may include attenuation and blur), it is possible to simulate what projections of this current estimate should look like. The measured projections are then compared with simulated projections of the current estimate, and the ratio between these simulated and measured projections is used to modify the current estimate to produce an updated (and hopefully more accurate) estimate, which becomes iteration *k* + 1. This process is then repeated many times. The MLEM algorithm converges slowly and may require 50–200 iterations.

The ordered-subsets expectation maximization (OSEM) has been proposed by Hudson and Larkin (32) to accelerate the reconstruction process using the MLEM algorithm. With this method, the set of projections is divided into subsets (or blocks). For example, if there were 64 projections (acquired at 64 angles about the patient), they might be divided as into 16 subsets as follows, each containing 4 images: It is useful for each subset to contain projections equally distributed about the patient, to help promote convergence of the algorithm. The MLEM is then applied to each subset in turn, as a subiteration. The first full iteration is complete when all subsets have been processed. Use of 16 subsets in OSEM, as in this example, would accelerate convergence by nearly a factor of 16 compared with standard MLEM and thus considerably shorten the computing time needed for reconstruction.

### Maximum A Posteriori (MAP) Algorithms

Actually, the reconstructed images obtained using the MLEM algorithm tend to become noisy as the number of iterations increases, because noisy reconstructed images may yield projections that are very close to the measured noisy projections. Thus, the criterion “estimated projections have to be as close as possible to measured projections” is not the best criterion to obtain subjectively high-quality images. A better criterion may be “(a) estimated projections have to be as close as possible to measured projections AND (b) the reconstructed images have to be not too noisy.”

The introduction of a prior knowledge as a constraint that may favor convergence of the EM process is called regularization. The prior, based on an assumption of what the true image is, is usually chosen to penalize the noisy images. The goal is now to find f̄ so that the requirements (a) and (b) above are simultaneously maximized (Appendix 5). The maximization leads to an iterative scheme called the one-step-late (OSL) algorithm, described by Green (31):
Eq. 11 where (∂/∂*f*_{j})*U*(f̄_{j}^{(k)}) (the prior term) is the derivative of a function *U*, called an energy function, and chosen to enforce smoothing and β is a value chosen to modulate the importance of the prior. This modified EM algorithm is called the MAPEM algorithm using the OSL approach.

To understand how the MAPEM algorithm favors smooth images, we shall use as an example the quadratic prior (33). The energy function *U* in this case is defined as:
Eq. 12 where *N*_{j} is a set of pixels in the neighborhood of pixel *j*.

Observe that the only difference between the MLEM and MAPEM algorithms resides in the additional term β(∂/∂f̄_{j}^{(k)}) × *U*(f̄_{j}^{(k)}) in the denominator (compare Eqs. 9 and 11); if all pixels in the neighborhood of pixel *j* have the same value (i.e., the image in this neighborhood is smooth), then β(∂/∂f̄_{j}^{(k)})*U*(f̄_{j}^{(k)})=0 and the new value is the same as when the MLEM algorithm is used. If the value for pixel *j* is in average higher than its neighbors, then the derivative of the energy function is positive. Then, the denominator is larger than ∑_{i=1}^{n} *a*_{ji}, and the new value f̄_{j}^{(k+1)} is forced to be lower than it would be if the MLEM would be used. On the contrary, if the value for pixel *j* is in average lower than its neighbors, the denominator is lower than ∑_{i=1}^{n} *a*_{ij}, and the new value f̄_{j}^{(k+1)} is forced to be higher than it would be if the MLEM were used.

A first remark is that the denominator can be negative, and, in this case, the new pixel value may become negative, which does not make sense when this value is a number of photons. Negative values can be avoided by keeping the weight of the prior β low enough or using an algorithm that avoids the OSL trick—for example, the interior point algorithm of Byrne (34) or the algorithm using surrogate functions of De Pierro (35).

Second, the quadratic prior forces the image to be smooth on the edges too, and this leads to a blurring of the image. To overcome this problem, many other priors have been proposed. When information about the edges is not available (for example, from anatomic images), the basic assumption is that if the difference between pixel values in the potential function is very high, then one considers that it is likely that the pixel is on an edge, and in this case the smoothing term β(∂/∂f̄_{j}^{(k)})*U*(f̄_{j}^{(k)}) has to be as close to zero as possible. One way to keep the smoothing term low when the difference f̄_{j} − f̄_{b} becomes too high is to use nonlinear functions as potential function, so that this term does not grow proportionally to the pixel differences (31,36–38). Another possibility is to compare the current pixel value with the median value of the neighborhood (39). If anatomic information is available (e.g., from CT or MRI), then a simple way to avoid edge smoothing is to consider for a neighborhood only the pixels that belong to the same anatomic area as the current pixel. However, because edges within scintigraphic and MRI or CT images may be different, more sophisticated approaches have been described (40–42).

## CONCLUSION

Three algorithms are presented here: the FBP, CG, and MLEM algorithms. The first applications of FBP (43,44) and iterative algorithms (45) were both described in the 1960s, but for a long time, and despite the advantages of iterative algorithms, the FBP algorithm was preferred because it was computationally faster (46). However, in clinical practice, the FBP algorithm has several flaws: A smoothing filter is usually required to reduce the noise, resulting in a loss of resolution. Also, an optimal filter has to be chosen (47,48) to provide the best trade-off between image noise and image resolution (although iterative algorithms may also require optimization—for example, by choosing the value of the β coefficient). The MAP algorithm can give better image quality compared with the FBP algorithm as shown in a simulation study (49). Moreover, the streak artifact observed when a region of the body is highly radioactive relative to the neighborhood (e.g., the bladder) has been shown to be intense with FBP but strongly reduced when OSEM is used in bone SPECT studies (50). Given the current computational power of modern computers, iterative reconstruction algorithms have become feasible for routine clinical use. Practical implementations of FBP, CG algorithm, and other algorithms can be found in the Donner Algorithms for Reconstruction Tomography. This package includes a detailed user’s manual and program sources in Fortran and can be freely downloaded from the Web site http://cfi.lbl.gov/cfi_software.html. Also freely available is the Kansas University Image Processing (KUIM) package (http://www.ittc.ukans.edu/∼jgauch/research/kuim/html/), which provides C source for many image processing tools—in particular, projection and backprojection. Commercial softwares IDL (Research Systems, Inc., Boulder, CO) and Matlab (The MathWorks, Inc., Natick, MA) include projection and backprojection functions.

## APPENDIX 1

The goal of the calculations below is to find in the field of view the set of points *M*(*x*, *y*) that perpendicularly projects on *D* at P (Fig. 15). Using the usual trigonometric formulas, we have:
Eq. A1.1 and
Eq. A1.2 Then, combining these 2 equations to eliminate *x*_{I} and *y*_{I}, we obtain:
Eq. A1.3 or equivalently
Eq. A1.4

The line *D*′ defined by the equation *s* = *x* cos θ+*y* sin θ perpendicularly projects on *D* at P. Thus, this line *D*′ is the set of points *M*(*x*, *y*) in the field of view whose photons can be detected at a distance *s* from the middle of the detector when the detector is at an angle θ.

## APPENDIX 2

Let us consider 2 matrices *B*(*n*, *m*) (i.e., with *n* rows and *m* columns) and *C*(*m*, *l*). The matrix product of *B* and *C* is the matrix *D* such that the value of any element *d*_{ij} (at the intersection of the *i*th row and *j*th column) of *D* is *d*_{ij} = *b*_{i1}*c*_{1j}+*b*_{i2}*c*_{2j}+…+*b*_{im}*c*_{mj} = ∑_{k=1}^{m} *b*_{ik}*c*_{kj}.

Example: if and then

## APPENDIX 3

In the example presented in Figure 3, we have the following relationships between values in the slice and in the sinogram:
Eq. A3.1 This set of equations can be equivalently expressed as a matrix product:
Eq. A3.2
The number of rows and columns of *A* equals the number of pixels and bins, respectively. The values in matrix *A* are appropriately chosen so that, using the definition of matrix product (Appendix 2), it can be verified that the product of *A* by *f* yields *g*_{1}, *g*_{2} … *g*_{6} as presented in Equation A3.1.

## APPENDIX 4

The following is a simplified description of the MLEM algorithm as presented by Shepp and Vardi (23) and by Lange and Carson (24). In these works, the numbers of both the emitted and the detected disintegrations are assumed to be Poisson random variables.

Let us consider f̄_{j}, the mean number of disintegrations in pixel *j*, and the element *a*_{ij} of the matrix *A*, the probability that bin *i* detects a photon emitted from pixel *j*. The mean number of photons emitted from pixel *j* and detected by bin *i* is *a*_{ij}f̄_{j}. The mean number of photons ḡ_{i} detected by bin *i* is the sum of the mean numbers of photons emitted from each pixel:
Eq. A4.1 It can be shown that the number of photons emitted from the *m* pixels and detected by bin *i* is a Poisson variable. Thus, the probability of detecting *g*_{i} photons is given by:
Eq. A4.2 where ! denotes the factorial operator defined for any positive integer *n* as *n*! = *n* × (*n* − 1) × … × 1, and 0! = 1. For example, if the mean number of photons detected in bin *i* is ḡ_{i} = 3, then the probability to detect zero, 1, or 5 photons is, respectively, *P*(0) = (*e*^{−3}3^{0})/0! ≈ 0.050, *P*(1) = (*e*^{−3}3^{1})/1! ≈ 0.149, and *P*(5) = (*e*^{−3}3^{5})/5! ≈ 0.101. The maximal probability is reached when the number of counts is equal to the mean number *P*(*g*_{i} = ḡ_{i}); here *P*(3) = (*e*^{−3}3^{3})/3! ≈ 0.224.

The *i* Poisson variables are independent, and the conditional probability *P*(*g*|f̄) of observing the vector *g* when the emission map is f̄ is the product of the individual probabilities *P*(*g*_{i}). The likelihood function *L*(f̄) is given by:
Eq. A4.3

The highest value for the likelihood *L*(f̄) is found by computing its derivative. To maximize the expectation, one usually considers *l*(*f*) = ln (*L*(*f*)), where ln denotes the natural logarithm. Remembering that for any non-null, positive real values *x*_{1}, *x*_{2} … *x*_{n} we have:

(1) ln (*x*_{1}*x*_{2} … *x*_{n}) = ln (*x*_{1}) + ln (*x*_{2}) + … + ln (*x*_{n}),

(2) ln (*e*^{x1}) = *x*_{1},

(3) ln (1/*x*_{1}) = −ln (*x*_{1}),

Equation A4.3 becomes:
Eq. A4.4 and using Equation A4.1 to introduce f̄_{j}, we obtain:
Eq. A4.5 This equation, called the likelihood function, is of fundamental importance in the MLEM algorithm, because it allows one to calculate the probability to observe a projection dataset for any mean image f̄. Obviously, we want the image that has the highest probability to yield *g*, so the essential point here is that the vector f̄ for which *l*(f̄) is maximal is considered as the best estimate for the solution.

It can be shown that *l*(f̄) has 1 and only 1 maximum (26). This maximum is found when the derivative of *l*(f̄) is zero:
Eq. A4.6 One can also write:
Eq. A4.7 that is
Eq. A4.8 This leads to the iterative scheme of the MLEM algorithm as described by Lange and Carson (24):
Eq. A4.9

## APPENDIX 5

The prior knowledge of what the reconstructed image should be is introduced in the EM algorithm using Bayes’ theorem. It states that:
Eq. A5.1 where *P*(*g*|f̄) is the likelihood function as defined in Equation A4.3. *P*(f̄), the prior function, defines the a priori knowledge of the image; *P*(*g*) is the a priori probability distribution of the measurements; *P*(f̄|*g*) is the a posteriori probability distribution of the image.

Taking the logarithm of both sides of Equation A5.1 yields:
Eq. A5.2 A common Bayesian prior to enforce local smoothness is the Gibbs prior, with *P*(f̄)=*Ce*^{−βU(f̄)}, where *U* is an energy function of f̄, β is the weight of the prior, and *C* is a normalizing constant. Using Equation A4.1, we obtain:
Eq. A5.3 where *K* = ln *C* − ln *P*(*g*) is a constant independent of f̄.

Again, the derivative of *P*(f̄|*g*) is used to find the image f̄ maximizing *P*(f̄|*g*):
Eq. A5.4 The OSL algorithm uses the current image when calculating the energy. With this simplification, *U*(f̄_{j}) is replaced by *U*(f̄_{j}^{(k)}) in Equation A5.4.

## Acknowledgments

The author is grateful to Jacques Sau for the invaluable discussions about this manuscript. The author also thanks Michael A. King and Stephen J. Glick for their assistance in improving this manuscript.

## Footnotes

Received Jan. 14, 2002; revision accepted Jun. 10, 2002.

For correspondence or reprints contact: Philippe P. Bruyant, PhD, Division of Nuclear Medicine, University of Massachusetts Medical School, 55 Lake Ave. N., Worcester, MA 01655.

E-mail: Philippe.Bruyant{at}umassmed.edu

↵* NOTE: FOR CE CREDIT, YOU CAN ACCESS THIS ACTIVITY THROUGH THE SNM WEB SITE (http://www.snm.org/education/ce_online.html) THROUGH OCTOBER 2003.