## Abstract

A parametric image of myocardial perfusion (mL/min/g) is a quantitative image generated by fitting a tracer kinetic model to dynamic ^{13}N-ammonia PET data on a pixel-by-pixel basis. There are several methods for such parameter estimation problems, including weighted nonlinear regression (WNLR) and a fast linearizing method known as Patlak analysis. Previous work showed that sigmoidal networks can be used for parameter estimation of mono- and biexponential models. The method used in this study is a hybrid of WNLR and sigmoidal networks called nonlinear regression estimation (NRE). The purpose of the study is to compare NRE with WNLR and Patlak analysis for parametric imaging of perfusion in the canine heart by ^{13}N-ammonia PET. **Methods:** A simulation study measured the statistical performance of NRE, WNLR, and Patlak analysis for a probabilistic model of time–activity curves. Four canine subjects were injected with 740 MBq ^{13}N-ammonia and scanned dynamically. Images were reconstructed with filtered backprojection and resliced into short-axis cuts. Parametric images of a single midventricular plane per subject were generated by NRE, WNLR, and Patlak analysis. Small regions of interest (ROIs) were drawn on each parametric image (8 ROIs per subject for a total of 32). **Results:** For the simulation study, the median absolute value of the relative error for a perfusion value of 1.0 mL/min/g was 16.6% for NRE, 17.9% for WNLR, 19.5% for Patlak analysis, and 14.5% for an optimal WNLR method (computable by simulation only). All methods are unbiased conditioned on a wide range of perfusion values. For the canine studies, the least squares line fits comparing NRE (*y*) and Patlak analysis (*z*) with WNLR (*x*) for all 32 ROIs were *y* = 1.02*x* − 0.028 and *z* = 0.90x + 0.019, respectively. Both NRE and Patlak analysis generate 128 × 128 parametric images in seconds. **Conclusion:** The statistical performance of NRE is competitive with WNLR and superior to Patlak analysis for parametric imaging of myocardial perfusion. NRE is a fast nonlinear alternative to Patlak analysis and other fast linearizing methods for parametric imaging. NRE should be applicable to many other tracers and tracer kinetic models.

- parametric imaging
^{13}N-ammonia PET- sigmoidal networks
- artificial neural networks
- function estimation
- nonlinear regression

Estimation of the parameters of tracer kinetic models is increasingly important in clinical and research PET studies. In parametric imaging, a tracer kinetic model is fit to time–activity curve data on a pixel-by-pixel basis. The result is a quantitative image of the parameters of a physiologic or biochemical model. An important application of parametric imaging is myocardial perfusion imaging. ^{13}N-ammonia PET is widely used for the estimation of myocardial perfusion (1,2). Parke-Davis has sponsored a multicenter, 300-patient trial in which a compartmental model is being fit to dynamic ^{13}N-ammonia data in the resting and stressed heart.

Numerical methods for parametric imaging differ in statistical performance, computational cost, and generality. Weighted nonlinear regression (WNLR) approaches are the most general. Although the basic WNLR approach offers good statistical performance, it derives from approximations that introduce error (3). Many refined WNLR approaches have also been studied (4–7). However, WNLR methods are iterative and time-consuming to compute.

Patlak analysis is a type of fast linearizing method. These methods differ considerably, but all use integration and linear regression to speed solution (8–11). Linearizing approaches are accurate and fast for a variety of problems (1,2,12,13), but they still have drawbacks. Though some linearizing methods handle all linear compartmental models (11), none handles nonlinear tracer kinetic models (except for linearizing perturbation experiments). Also, linearizing methods often do not estimate the microparameters directly but require model-specific nonlinear algebra to recover them from the macroparameters. For these reasons, it would be desirable to have a fast, nonlinear method that learns parametric imaging of the microparameters from simulated data.

Previous work from this group (14,15) and others (16,17) showed that sigmoidal networks can be used for estimation of the parameters of compartmental models. Sigmoidal networks are a type of nonlinear function estimator. Function estimators are empirical tools from statistics, connectionism, and fuzzy systems that learn nonlinear input-output mappings from data (18,19). They have been used successfully for most major engineering problems including control, prediction, filtering, pattern recognition, and system identification. Whereas WNLR methods for parametric imaging approximate the maximum likelihood (ML) estimate, function estimation methods approximate conditional expectation. Here, the approaches are combined to approximate the ML estimate orders of magnitude faster than is otherwise possible. This hybrid method is called nonlinear regression estimation (NRE) because it uses a function estimator trained on optimal WNLR estimates.

The goal of this study is to compare the statistical performance and computational cost of NRE with WNLR and with Patlak analysis for parametric imaging of myocardial perfusion by ^{13}N-ammonia PET. A simulation study estimated the performance of all 3 methods for ischemic, resting, and hyperemic myocardium. Parametric images of 4 canine subjects with resting and ischemic myocardium were computed by NRE, WNLR, and Patlak analysis. The results of this study show that NRE is an accurate, very fast method for perfusion imaging and has the potential to be an accurate, very fast, nonlinear method for parametric imaging with many tracers and tracer kinetic models.

## MATERIALS AND METHODS

### Simulation of Time–Activity Data

This section describes simulation of time–activity curve data used for training NRE and testing all estimators. A 2-compartment model for the tracer kinetics of ^{13}N-ammonia (20,21) was used. The distribution of tracer is described by a continuous-time, linear time-invariant (LTI) system:
Eq. 1
Eq. 2
where the state variables are Q_{f}(*t*), the free activity in a relatively fast compartment (Bq/g), and Q_{s}(*t*), the metabolically trapped activity in a relatively slow compartment (Bq/g). The input to the model is C_{p}(*t*), the activity in the perfusing coronary vasculature (Bq/mL), which was assumed to be equal to the left ventricular blood activity. The parameters of the model are *F*, the myocardial perfusion (mL/min/g); *f*, the spillover from blood to tissue (which is unitless); *V _{f}*, the (actual) volume of distribution; and

*k*, the return rate constant. In practice, the volume of distribution and return rate constant are fixed to known values (

_{r}*V*= 0.8 mL/g and

_{f}*k*= 0 for these experiments).

_{r}The model of the input function is a modified version of the model of Thompson et al. (22). The modification is an asymptotic recirculation term that yields the model
Eq. 3
where C_{p}(*t*) is the activity in the vasculature (Bq/mL); α (unitless), β (s), and C_{max} (Bq/mL) are parameters of the original Thompson model; C_{0} (Bq/mL) and τ (s) are parameters of the modified model; and t_{0} (s) is the appearance time. The model is 0 before time t_{0}. In practice, the appearance time is fixed to a known value (5 s). Also, the time constant τ is automatically scaled so that the recirculation term reaches a maximum when the original model falls off to 5% of its peak. Therefore, there are only 4 independent input function parameters: α, β, C_{max}, and C_{0}.

Sampling errors were modeled as follows. The input function samples are temporal averages of the input function model:
Eq. 4
where **m̄** denotes a sequence of model output for *i* = 1..*m* − 1, where *m* is the number of time intervals. The effect of integration is significant for the input function model because we use left ventricular PET data rather than arterial sampling data. The blood time–activity curve is sparsely sampled in the typical PET protocol, but this is largely unavoidable. By contrast, the tissue time–activity curve is usually well sampled and piecewise linear over the observation intervals. In this case, ideal samples are equivalent to temporal averages, and the midpoint approximation applies
Eq. 5
where ȳ denotes a sequence of model output over the observation intervals and y(.) denotes Equation 2 evaluated at 1 time point.

There are several sources of random error in reconstructed PET time–activity curves. We assumed that noise is additive, independent, and gaussian with zero mean and with variance directly proportional to signal (1,3,7). Several noise models have been proposed for time–activity curve data. Other noise models are SD proportional to variance (also called constant coefficient of variation) (4,23) and noise equivalent counts (24).

Table 1 presents the ranges of 8 parameters used for simulation: 4 parameters of the blood time–activity curve signal model, 2 parameters of the tissue time–activity curve signal model, 1 parameter of the blood time–activity curve noise model, and 1 of the tissue time–activity curve noise model. The probability density for perfusion is an exponentially decaying density between the limit values. All other parameters have uniform density. The parameter ranges of the blood time–activity curve model were estimated from a set of 14 canine blood time–activity curves taken from studies at University of California, Los Angeles; none of these blood time–activity curves was from the 4 canine study subjects. A typical constrained regression strategy was used to fit the 14 blood time–activity curves.

To simulate time–activity curves, 8 parameter values were drawn from a standard linear congruential pseudorandom number generator with a period much larger than the sample size. Using these parameters, 100 ideal samples of the blood time–activity curve model were computed, 1 sample per second over the range 1–100 s. These were used as input to a discrete time system, the state transition matrix (STM) solution of a continuous-time LTI system with first-order hold (FOH) input. An FOH input is a theoretical function that reconstructs discrete samples of the input function. The STM solution is exact in the case that the input function is actually FOH. In practice, the STM solution is nearly exact if the discrete input samples are fast, noiseless, and ideal. That is the case in this study. Other general solutions to LTI systems (time-domain and frequency-domain) have similar restrictions. The output of the STM solution was 100 ideal samples of the 2-compartment model. The sampling scheme for experimental PET data is a set of 10 time intervals of 10 s each between 0 and 100 s. To convert simulated data to this sampling scheme, the input samples were integrated with a trapezoid rule whereas the output samples were handled with the midpoint approximation. Gaussian noise with zero mean and variance proportional to signal was added to both the output and integrated input signals. Simulated time–activity curves with negative values caused by the noise model were discarded, because these time–activity curves do not model the negative values that sometimes occur in reconstructed time–activity curves. (Conditioning on positivity of the time–activity curves introduces only a very small bias.) Figure 1 is an example of a simulated pair of blood and tissue time–activity curves.

### Weighted Nonlinear Regression

The most common approach to parametric imaging is likelihood inference. For independent gaussian errors, the ML estimate is a minimum of the log-likelihood or weighted residual sum of squares:
where ŷ_{i} are the elements of **ŷ**, which is the PET tissue time–activity curve; *ȳ _{i}*

**(p)**are elements of

**ȳ**

**(p)**, which is the model output;

**p**denotes the vector of parameters; ς

_{i}is the SD of

*ŷ*and ∥ ∥ is a weighted 2-norm. WNLR is the process of computing Equation 6 by an iterative method for optimization or nonlinear equations.

_{i}For Equation 6 to define the ML estimate, the ς_{i} values and C_{p}(*t*) must be known. But in practice, these are replaced with estimates. Because of the proportional variance noise model, the *ŷ*_{i} values are an estimate of the ς_{i} values up to a constant that does not affect WNLR. To reconstruct C_{p}(*t*), it was assumed that PET blood time–activity curves are fast, noiseless, ideal samples (3). The blood time–activity curves were linearly interpolated and then used as input to the STM solution with FOH. This implies the midpoint approximation, which does not strictly hold. The error introduced into WNLR by estimates for the ς_{i} values and C_{p}(*t*) is significant, but the method is still a good one. This is the standard WNLR method for parametric imaging. More advanced methods are considered in the Discussion.

WNLR for parametric imaging may be ill posed: The solution may not be unique (multiple local optima) or may not exist (nonconvergence or instability). Only the latter problem was experienced during this study, though the former may also arise in parameter estimation (14). Nonconvergence necessitates that prior knowledge be used. During this study, parameter ranges were constrained with simple bounds. In likelihood inference, these are deterministic constraints on the ML estimate. In Bayesian inference, they are uniform priors. Because the same bounds were used for estimation as for Monte Carlo simulation of the forward model (Table 1), WNLR approximates the maximum a posteriori probability estimate. For all WNLR, perfusion and spillover were estimated using a Levenberg-Marquardt method with simple bounds through active sets (25).

### Nonlinear Regression Estimation

Methods for function estimation or statistical learning have arisen in the fields of nonparametric statistics, neural networks, and fuzzy systems. Sigmoidal networks, nonparametric kernel estimators, and support vector machines are closely related methods that use a univariate nonlinear transformation (usually to a higher dimension) followed by linear regression on the basis functions (26–31). In the deterministic case, such methods approximate nonlinear functions. In the probabilistic case, they estimate functions from noisy data. In theory, if a function approximator with a set of input random variables is brought into mean-squared correspondence with an output random variable, then it approximates conditional expectation (18) Eq. 7

Conditional expectation is the mean-squared optimal estimator (32). Fuzzy systems also approximate conditional expectation (19).

In this study, sigmoidal networks were used:
Eq. 8
where i is the input, w and b are parameters, g(.) is a univariate sigmoidal function, and *s* is the number of basis functions. The superscript denotes the elements of a vector, and the subscript denotes the elements of an ordered set. Sigmoidal networks are shown graphically in Figure 2. Sigmoidal networks are often called feedforward artificial neural networks.

To approximate Equation 7 for parametric imaging of perfusion, a sigmoidal network would be trained with simulated time–activity curve data for inputs and known perfusion values for outputs using a mean squared error criterion. But instead of using the known perfusion values as network outputs, an optimal WNLR estimate was precalculated as follows. Whereas real-world data analysis requires estimates for C_{p}(*t*) and the ς_{i} values, simulation allows WNLR with the known values for C_{p}(*t*) and the ς_{i} values. This is very close to an optimal estimate. The only difficulty in learning the optimal WNLR estimate is that it is conditioned on the known ς_{i} values and C_{p}(*t*) in addition to the observed data. But the sigmoidal network compensates by approximating conditional expectation to yield
Eq. 9
where ĉ is the observed blood time–activity curve. The use of a function estimator to learn an optimal WNLR estimate is NRE, a fully nonlinear method for parameter estimation.

To compute the optimal WNLR estimate to be used for sigmoidal network training, an important modification to Equations 1 and 2 was used. The logarithm of perfusion (log-perfusion) was considered, rather than perfusion directly. To simulate time–activity curves, a uniform density was used over log-perfusion; this is an exponentially decaying density for perfusion (Table 1). Instead of estimating perfusion directly, log-perfusion and its variance were estimated with optimal WNLR.

To train a sigmoidal network for NRE, a training set of 20,000 pairs of blood time–activity curves and tissue time–activity curves was simulated. The optimal WNLR estimate of log-perfusion was computed for each pair. Ten samples of the blood time–activity curve and 10 samples of the tissue time–activity curve were the inputs to a sigmoidal network, and the optimal WNLR estimate of log-perfusion was the output. The network used a logistic transfer function and a linear output function (linear regression on the basis functions). To estimate network weights, a simple cross-validation strategy was used, in which 3 sets of samples were simulated: training, validation, and test samples. The latter 2 samples each consisted of 25% of the number of training samples (18). Using a conjugate gradient method, the weighted mean squared error (WMSE) was optimized between the network and the log-perfusion training data; the log-perfusion variance estimates were the weights. Convergence was defined as a decrease in the training set and an increase in the validation set for 20 consecutive iterations (additional convergence criteria were defined but did not result in termination). After convergence, the WMSE of the test set was computed as a final check against overfitting. The number of basis functions was chosen pragmatically. The initial estimate was 20 basis functions, the number of inputs (18). The number was sequentially increased by 4, monitoring the WMSE of the test set.

### Canine and Simulation Studies

For the simulation study, 2,000 pairs of blood and tissue time–activity curves were simulated for each of 7 perfusion values: 0.25, 0.50, 0.75, 1.00, 2.00, 3.00, 4.00 (mL/min/g). The other parameters were as in Table 1. Perfusion for each of the 7 sets of 2,000 samples was estimated by WNLR, NRE, Patlak analysis, and optimal WNLR. For Patlak analysis, the Patlak transform was used followed by unweighted linear regression; leading zeros and 1 early data point were removed, as is necessary for the method.

In the canine studies, ^{13}N-ammonia (740 MBq) was administered to anesthetized animals with 1 ligated vessel. The protocol for handling the subjects has been described (33). The PET data were reconstructed with filtered backprojection and resliced to achieve short-axis cuts. For each subject, the blood time–activity curve was obtained from an ROI in the left ventricular chamber and averaged over several planes. A parametric image was generated for a single midventricular short-axis cut per subject. After calculating parametric images, 8 small circular ROIs (29 pixels) were drawn for each subject, and the spatial average perfusion for each ROI was computed to yield a total of 32 ROI measurements of perfusion. No partial-volume correction was applied, because this correction can be applied to any parametric imaging method without modification. No metabolite correction or decay correction is needed for data to 100 s (34).

The numerical code is in Matlab 5.3 (The Mathworks Inc., Natick, MA). The platform is a 233-MHz Pentium II system with 128 MB of memory.

## RESULTS

### Simulation study

Table 2 presents the results of the simulation study conditioned on a wide range of perfusion values and marginal over all other parameters in the model. All methods are unbiased (by *t* tests) for all entries in Table 2. The median absolute value of the relative error is a nonparametric measure of total error due to both bias and variance. The ideal WNLR estimator provides a benchmark for performance. The bias and spread of the WNLR and Patlak estimators are similar to those of other reports (1,2).

### Canine Studies

Figure 3 compares parametric imaging by NRE, WNLR, and Patlak analysis in 1 canine subject. Visually, the NRE perfusion images are quite similar to the WNLR images for all 4 subjects. Figure 4 compares WNLR with NRE and Patlak analysis for 32 ROIs (8 ROIs per subject). The slope between Patlak analysis and WNLR is similar to that previously reported (2).

WNLR required about 2 h to process a 128 × 128 parametric image (between 5,000 and 7,000 time–activity curves are processed after thresholding the sum activity image). Patlak analysis required 15 s and NRE required 3 s. Initially, sigmoidal network training for NRE required about 4 h and terminated in 400 iterations with a WMSE of 0.012 for a final value of 52 basis functions. Simulation of time–activity curves for training and calculation of optimal WNLR estimates required 8 h for 20,000 time–activity curves. The 12-h training process need only be done once, after which NRE generates parametric images in 3 s.

## DISCUSSION

The simulation studies and canine studies show that the statistical performance of NRE is competitive with WNLR. NRE is accurate and stable because it combines the optimal WNLR method with the ability of the sigmoidal network to approximate conditional expectation. Many other refined WNLR methods exist, including those that use input function models (4,5,35), wavelet domain filters (36), and multivariate statistics (e.g., principal components (37), factor analysis (38), and independent components (39)). Feng et al. (6) validated a double modeling approach; O’Sullivan and Saha (7) used ridge regression. Like these methods, NRE achieves good statistical performance and can be combined with any refined method. However, NRE is the only nonlinear method that also speeds evaluation by orders of magnitude.

For fast parametric imaging, linearizing methods are usually used. The simulation and canine studies show that the statistical performance of NRE is superior to Patlak analysis, a well-validated linearizing method for perfusion imaging with ^{13}N-ammonia (2). Feng’s generalized linear least squares (GLLS) (11) is an accurate and fast linearizing method that is well validated for perfusion imaging (1). Other linearizing methods include weighted integration (9) and linear least squares (10). Linearizing methods compare favorably with WNLR for parametric imaging of cerebral metabolic rate of glucose (12) and cerebral blood flow (13). Thus, the statistical performance of linearizing methods can be quite good. The results of this study suggest that the performance of NRE may be competitive with any method.

The NRE approach has advantages over traditional WNLR. Because the optimal WNLR estimate is very stable, multiple parameters can be estimated accurately under noisy conditions. Also, the use of a sigmoidal network to approximate optimal WNLR speeds parametric imaging by orders of magnitude (once the sigmoidal network is trained). In principle, NRE is more complex than a typical sigmoidal network approach, but it has several advantages that actually make training simpler. Recall that in conventional sigmoidal network training, the known parameter values are used for training rather than the optimal WNLR values. The known parameter values can be viewed as noisy point estimates, but the NRE training data exhibit the statistical properties of optimal WNLR. Therefore, NRE with even the simple training procedure used in this study produces images that are very similar to WNLR images. The training approach is robust to the stopping criteria, the number of iterations, the initial conditions, and even the number of basis functions. Although the traditional approach can likely be used for parametric imaging as well, successful training would require a more sophisticated strategy including optimization of several measures of bias and spread with respect to the stopping criteria, the initial conditions, and the number of basis functions. When using a simple training scheme, we found that sigmoidal networks trained with the conventional approach yield biases as high as 52% for low perfusion values (though for high perfusion values the biases are competitive with NRE performance) and seem to exhibit more trend in the bias at low perfusion values than NRE. Most importantly, images generated by sigmoidal networks trained with a simple version of the conventional approach are not as visually similar to WNLR images as are the NRE images. Another advantage of NRE over a traditional learning approach is that it allows multiple model parameters to be estimated by multiple networks having 1 output each. This is an easier problem than estimating multiple model parameters with 1 network having multiple outputs, as is necessary for Equation 7. Determining whether a sigmoidal network trained using the conventional approach can achieve the performance of NRE is an interesting direction for future work.

Two techniques were used to optimize the statistical performance of NRE. First, log-perfusion was estimated so that perfusion could be estimated accurately over several orders of magnitude. This property is necessary so that ischemic, resting, and hyperemic myocardium may exist in the same image. (The canine measurements are of ischemic and resting myocardium because the low perfusion case is harder.) Parameters such as spillover that are estimated over approximately 1 order of magnitude can be estimated directly. Second, the NRE method uses the WMSE criterion for sigmoidal network training; the weights are the variance estimates provided by optimal WNLR. The WMSE criterion is uncommon in sigmoidal network training because weights are not usually available.

To date, no parametric imaging method is completely general. WNLR methods are the most general because they directly estimate the microparameters of all linear and nonlinear tracer kinetic models. However, they are too expensive to compute and are ill posed for some problems. Among fast linearizing methods, GLLS is quite general, handling all linear compartmental models. For each new model, however, this method still requires model-specific nonlinear algebra to recover the microparameters. Also, it is not applicable to nonlinear tracer kinetic models (except for linearized perturbation experiments).

NRE has the potential to be a general nonlinear method that learns parametric imaging of microparameters from data generated by a user-supplied forward model. The method handles high noise levels and a wide parameter range. This study’s method handles a wide range of possible blood time–activity curves, and any blood time–activity curve model (with suitable parameter ranges) can be used. New parameter ranges, new sampling schedules, and new models of the blood time–activity curve, tissue time–activity curve, and noise can be accommodated by training new networks.

To apply NRE to other problems, it must be decided how to choose the number of basis functions and samples for any new network. For function learning in general, the curse of dimensionality can lead to optimization pathologies such as ill posed problems and intolerable run times. By contrast, this study’s learning problem is well sampled: The input space is only 20-variate, thousands of samples can be generated, and the target function is relatively smooth. This allows for adoption of a simple cross-validation approach to learning that is typical of successful applications. No regularization is needed, such as weight penalties or network pruning (18), though this is crucial in more challenging applications (40). Because most parametric imaging problems are of low dimensionality, the method used in this study will likely be generally useful.

A related question is whether the optimal WNLR method can be applied to all models to generate training data for NRE. This model has only 2 parameters, and the risk of ill posed WNLR problems increases with the number of parameters. However, simulation shows that ill posedness most often is caused by noise in the tissue time–activity curve variance estimate or in the blood time–activity curve estimate. Because optimal WNLR avoids these problems, it will likely be applicable to many models of interest.

## CONCLUSION

The criteria for any parametric imaging methodology are statistical performance, computational cost, and generality. Although Patlak analysis and other linearizing methods can be fast and accurate, they are not completely general and may not match WNLR methods as closely as possible. This study shows that NRE is statistically competitive with WNLR and superior to Patlak analysis for parametric imaging of myocardial perfusion by ^{13}N-ammonia PET. NRE computes accurate myocardial perfusion images in 3 s.

NRE achieves the statistical performance of other refined WNLR methods by combining an optimal WNLR estimate with the ability of the sigmoidal network to approximate conditional expectation. NRE is a very fast method that learns parametric imaging of the microparameters from a forward model of data. Because the method is nonlinear, it has the potential to handle all nonlinear and linear tracer kinetic models.

## Acknowledgments

The authors thank Heiko Schoeder for providing the canine data and an analysis of the data, and the reviewers and editors for helpful suggestions. This study was supported by grant DE-FC03-87ER60615 from the Department of Energy, by the Ahmanson Foundation, and by Medical Scientist Training Program training grant from the National Institutes of Health.

## Footnotes

Received Sep. 5, 2000; revision accepted Jan. 11, 2001.

For correspondence or reprints contact: Sanjiv S. Gambhir, MD, PhD, Crump Institute for Molecular Imaging, University of California, Los Angeles, School of Medicine, B3-399 BRI, 700 Westwood Plaza, Los Angeles, CA 90095-1770.