Abstract
The aim of this study was to assess the reliability of 2′methoxyphenyl(N2′pyridinyl)p^{18}Ffluorobenzamidoethylpiperazine (^{18}FMPPF) PET binding parameter's quantification via a test–retest study over a longterm period. Methods: Ten healthy volunteers underwent 2 dynamic ^{18}FMPPF PET scans in an interval of 6 mo. As a methodologic control, 10 simulated datasets, including interindividual functional and anatomic variabilities, were also used to assess the measurement variations in the absence of intraindividual variability. Indices of tracer binding were computed using 2 different models: (a) the simplified reference tissue model (SRTM) and (b) the Logan graphical model. The SRTM allows computing the binding potential (BP) index and plasmatobrain transport constants (R_{1}, k_{2}). The Logan model evaluates the distribution volume (DV). For both methods, cerebellum was taken as the reference region. From both models, binding indices were calculated with time–activity curves extracted from regions of interest, on one hand, and for each voxel to perform parametric images on the other hand. Results: Reliability indices—that is, bias, variability, and intraclass correlation (ICC)—indicated a good reproducibility: the BP percentage change in mean between test and retest is close to 1% in rich regions and 2% in poor regions. The typical error is around 7%. Mean ICC is over 0.70. The DV percentage change in the mean is ±2.5%, with a typical error close to 6% and an ICC over 0.60. Conclusion: Our results show a good reliability, with a reasonable level of intraindividual biologic variability that allows crossover studies with ^{18}FMPPF in which small percentage changes are expected between test and retest measurements, in group studies and for single subject assessment.
Serotonin (5HT) mediates a large variety of physiologic responses (development, pain, sleep, mood, eating, memory and attention), behaviors (stress, aggression, panic, sexual behavior), or neuropsychiatric problems (depression, sleep disturbance, eating disorders, anxiety, suicidal behavior, schizophrenia, obsessive–compulsive disorder) through one of the widest range of receptors known for any neurotransmitter (1). 5HT_{1A}, the bestcharacterized subtype of currently known 5HT receptors, is tightly involved in the pathogenesis of the previous problems and, thus, represents an important target for drug therapy (2). Recently, the selective 5HT_{1A} receptor antagonist 2′methoxyphenyl(N2′pyridinyl)p^{18}Ffluorobenzamidoethylpiperazine (^{18}FMPPF) has been successfully labeled with ^{18}Ffluorine, and an increasing number of PET studies with ^{18}FMPPF have been performed (3). The in vivo exploration of the 5HT_{1A} receptor subtype of the 5HT neurotransmission system with the ^{18}FMPPF PET radiotracer has revealed significant modulations of tracer binding due to pathologic (4–6) or pharmacologic actions in humans (7), and in animals (8,9).
The ^{18}FMPPF radiotracer has been characterized in humans in terms of selectivity (10). In addition, the modeling, studied using a 3compartmental model, confirmed that binding potential (BP) values were linearly correlated with the binding site density (B_{max}) and, therefore, could be considered as a reliable index of local 5HT_{1A} receptor concentrations (11). Characterization in the healthy population has been evaluated for age and sex (12). However, reproducibility and control of variability of the measurement have not been assessed yet for this radiotracer, contrary to other serotoninergic radiotracers (13–16). In several ^{18}FMPPF studies, ^{18}FMPPF binding was measured at consecutive sessions and compared within each subject. In this way, issues associated with variations between subjects due to individual differences (intersubject variability) can be limited. However, resting physiology may vary within an individual patient (intrasubject variability) and limits the ability to detect significant changes between baseline and “postintervention” conditions. Moreover, bias and noise introduced by the data acquisition, the reconstruction, and the correction processes, and also the simplified model used for binding parameter estimation (for the 5HT_{1A} radiotracer ^{11}CWAY100635 (14,15)), must be accounted for.
Therefore, without the estimation of the test–retest reproducibility; it is difficult to accurately determine the clinical significance of pharmacologic or pathophysiologic changes of the 5HT_{1A} receptor status. In this context, reliability studies are crucial when one wants to account for the error term in measurements. Because the test–retest reliability study has never been performed for ^{18}FMPPF, to our knowledge, the objective of our work was to perform a reliability study to evaluate reproducibility and measurement error of ^{18}FMPPF PET binding measures over a 6mo period.
MATERIALS AND METHODS
Subjects
Ten healthy volunteers (5 females, 5 males; mean age ± SD, 30 ± 5 y; age range, 23–38 y) were selected to participate in the study. Subjects gave their written consent to participate in the study, which was approved by the local ethical committee (Centre Léon Bérard, Lyon, France) in accordance with the Declaration of Helsinki. According to a screening assessment of history and physical examination, all subjects were free of neurologic, psychiatric, cardiovascular, pleuropulmonary, or hematologic disease and did not meet any exclusion criteria: (a) neuroleptic, antiparkinsonian αmethyldopa, βblocker, monoamine oxidase A or B inhibitor, tricyclic antidepressant, or thymoregulator treatment; (b) pregnancy; (c) hormone replacement therapy; (d) consumption of recreational drugs (cannabis, ecstasy); (e) contraindication to MRI; (f) MRI detection of brain lesion. Before the PET scan, subjects were evaluated for depression using the General Health Questionnaire (GHQ28) of Goldberg and Hillier (17). No subject with a score above the threshold for depression (7) was included in the study. Scores ranged between 0 and 2 (mean ± SD, 0.3 ± 0.7). No significant difference in the GHQ was found between both scans (Table 1).
MRI
A 3dimensional multiplanar reconstruction anatomic MRI scan was performed on each subject, yielding a volume containing 130–170 transverse planes of 256 × 256 × 1 mm^{3} voxels.
PET Scan with ^{18}FMPPF
Tracer Synthesis.
The ^{18}FMPPF was obtained by nucleophilic fluoration on a nitro precursor with a radiochemical yield of 20%–25% at the end of synthesis and a specific activity of 37–111 GBq/μmol (18,19).
Scanning Procedure.
Subjects underwent 2 ^{18}FMPPF PET scans (test and retest) separated by a 6mo period (mean delay ± SD, 27 ± 2 wk), randomly distributed along the year. Each PET session began at 1 pm. The PET scan acquisition, correction, and reconstruction procedures followed those described in (12). PET scans were obtained with a CTI Exact HR+ camera for 60 min after the injection of 2.7 MBq/kg (mean total dose ± SD, 169 ± 30 MBq) of ^{18}FMPPF. There was no significant difference (P = 0.93) in the paired t test between the injected dose of the test and retest scans.
Simulated Data.
Following the methodology for the simulation of realistic PET described in (20), we performed a joint simulation of a test–retest study. Ten individual different numeric brains associated with 10 different sets of regional time–activity curves were used to simulate 10 realizations of ^{18}FMPPF PET dynamic acquisition. The simulations were repeated to obtain the retest simulated data. The input time–activity curves used for the test and retest simulated acquisition set were identical for each subject. Therefore, the only difference between the simulated test and retest datasets was due to the degradation induced by the physical acquisition processes. Thus, physical variability can be compared with the measured one estimated from the actual data that include all sources of variability.
Image Processing
Modeling.
The binding parameters of the tracer were estimated according to the 3compartiment simplified reference tissue model (SRTM) (21) and with the Logan graphical method (22). The SRTM method lies on an analytic solution of the compartment model (Eq. 1) and allows estimating 3 indices—R_{1}, k_{2}, and BP—without requirement of an arterial sampling input function. The SRTM works under several assumptions: (a) the existence of a reference tissue region with negligible concentration of specific binding sites, (b) the magnitude of nonspecific binding is the same in the reference and in the target regions, (c) the distribution volumes (DVs) in the free and nonspecific compartments are the same in the reference and in the target regions, and (d) the exchanges between the free and unspecific binding compartments are rapid.
The analytic solution of the partial derivatives system is of the following form:Eq. 1where C_{ref} and C_{roi} are the PET time–activity curves of the chosen reference region and the target region of interest (ROI), R_{1} is the ratio of the plasmatobrain transport constant in the target region and in the reference region (R_{1} = k_{1roi}/k_{1ref}), k_{2} is the tracer's efflux in the vascular system, and BP is the binding potential of the tracer, defined as the ratio of available receptor density to the receptor affinity (BP = B_{max}/K_{d}). For the ^{18}FMPPF, the cerebellum was taken as the reference region as it is considered devoid of binding sites (11). The Logan model is based on the parametric plot involving the PET time–activity curve of the reference region and the target region. As for the SRTM, the cerebellum was taken as the reference region. The Logan model provides the DV of the target ROI, given by the slope of the linear regression after equilibrium.
Two approaches for binding index reliability analysis were used: The first is based on mean ROI activity curve measurements (ROI Analysis) and the second is based on a voxelwise computation leading to statistical parametric maps (Parametric Image Analysis).
For reorientation and registration purposes, mean ^{18}FMPPF static images were created by summing the individual dynamic frames from 0 to 60 min after injection. The MR image was coregistered with the static ^{18}FMPPF image by an automated method using Mutual Information criteria (Statistical Parametric Mapping [SPM], Welcome Department of Cognitive Neurology, London, U.K.). On the coregistered MRI, a large ROI was outlined on the cerebellum and used as a unique reference region for the simplified models.
ROI Analysis.
The anatomic target ROIs were drawn manually on the coregistered MRI. Four hundred ROIs were drawn and regrouped into anatomic volumes of interest (VOIs) to describe a group of regions from the limbic system, known to be rich in 5HT_{1A} receptors—that is, left and right hippocampi, amygdala, enthorinal cortex, parahippocampal gyri, anterior and posterior cingulate gyri, insula, temporal poles, and temporal cortex—and a second group of other cortical regions: left and right temporal neocortex, lateral occipitotemporal gyrus, frontal gyrus, prefrontal cortex, inferior and superior parietal cortices, occipital cortex, pole and gyrus, cerebellum. As raphe nuclei cannot be identified on MRI, this region was outlined directly on the static ^{18}FMPPF image by thresholding the activity at 80% of the local maximum in the brain stem. This region was visualized a posteriori on the MRI to check for its proper location in the periacqueductal gray matter. Time–activity curves were measured from the dynamic PET using the set of ROIs. The measured time–activity curves were used to derive regional values of R_{1}, k_{2}, and BP and Logan DV for each ROI.
Parametric Image Analysis.
From individual voxel time–activity curves, a parametric image of R_{1}, k_{2}, and BP was computed for the SRTM, and a parametric image of DV was computed for the Logan model. Individual parametric images were then transformed into a standard space using the nonlinear transformation matrix derived from the spatial normalization of the individual's MR image to the T1 MRI default template (Montreal Neurological Institute template of the International Consortium for Brain Mapping Project) with SPM. The visual inspection of the spatially registered images, particularly for subcortical structures, confirmed the accuracy of the spatial normalization. Normalized parametric images were smoothed using an 8 × 8 × 8 mm^{3} full width at half maximum isotropic gaussian kernel to account for the interindividual anatomy variability and to improve the sensitivity of the statistical analysis.
Reproducibility Indices.
Reliability of the R_{1}, k_{2}, and BP indices issued from the SRTM, and the DV index issued from Logan model, were assessed by computation of 3 characteristic parameters from the test–retest measurements:
The percentage change in mean (bias): the percentage change calculated as the difference between test and retest values divided by the test value. This index includes random changes and systematic biologic error;
The typical error or withinsubject SD of the bias: We expressed the typical error as the percentage of the mean;
The intraclass correlation coefficient (ICC) estimates the respect of the rank in a test–retest study: It depends on the size and the quality of the sample in the population. ICC = (MSBS − MSWS)/(MSBS + MSWS), where MSBS is the mean sum of squares between subjects, and MSWS is the mean sum of squares within subjects.
Because these variables have residuals that may be proportional to their respective mean, their computation is performed from the logarithmic transform of the variables as suggested in (23).
Statistical Inference.
For the ROI analysis, mean regional binding index values were considered as independent measures.
For the voxelbased analysis, SPM99 was used on the normalized smoothed parametric images of the 10 subjects. Statistical parametric maps of the t statistic (SPM{t}) were computed with a threshold of P = 0.001 uncorrected at the voxel level. Significant clusters were selected at a corrected cluster level of P < 0.05 determined from a joint probability of peak height and cluster size (“FamilyWise Error” (24)).
RESULTS
SRTM
Actual BP.
As shown in Table 2 (top) test and retest, mean regional BP values range from 0.28 ± 0.08 in the raphe to 1.47 ± 0.16 in the hippocampus. The percentage changes between test and retest values ranges from −1.15% (anterior cingulum) to 4.80% (enthorinal cortex), with a mean typical error of 7.75% in the limbic area and 7.71% in other regions. The maximal typical error is in the raphe nucleus (14.97%) and the minimal error is in the parahippocampal gryus (4.67%). The ICC goes from 0.50 (anterior cingulum) to 0.93 (inferior parietal cortex). Mean ICC values are 0.69 in the limbic area and 0.84 in other cortical regions. None of the regional BP differences was found to be statistically significant with a paired t test comparison.
Actual R_{1} (Table 2, Middle).
Reproducibility of the relative perfusion parameter (R_{1}) in the ROIs is also excellent, with an average percentage changes of 1.63% in the limbic area (from −5.03% in the temporal pole to 5.21% in the hippocampus), and a mean typical error of 5.77%. The typical error of the R_{1} parameter is generally inferior to the error of the BP parameter, going from 4.60% (raphe) to 7.86% (hippocampus). The average change is −3.43% in the other cortical regions, with a mean typical error of 5.98%. The ICC revealed very puzzling values, from −0.17 in the prefrontal cortex to 0.67 in the temporal cortex. The mean ICC is 0.37 in the limbic areas and 0.25 in the other cortical regions.
Actual k_{2} (Table 2, Bottom).
The k_{2} values go from 0.11 ± 0.01 min^{−1} in the raphe, to 0.35 ± 0.05 min^{−1} in the occipital gyrus. The percentage changes in the mean between test and retest series go from −4.42% in the enthorinal cortex to +2.25% in the prefrontal cortex. In the limbic areas, the average percentage change in mean was −0.50%, very similar to that of the other regions (−0.51%). The mean typical error is 9.50% in the limbic regions and 10.53% in the other cortical regions. The mean ICC value is 0.40, with a large range from −0.03 in the anterior cingulated cortex to 0.77 in the parahippocampal gyrus. In other regions, the mean ICC value is 0.31, with a range from 0.15 in the occipital cortex to 0.42 in the occipital gyrus.
Simulated Data.
Simulated data have an excellent ICC (>0.95), a mean percentage change in BP values around −1%, with a mean typical error of 2.5% (Table 3, top). The R_{1} parameter shows a percentage change in the mean around zero and a typical error of 1.79% in the limbic areas and 0.85% in the other regions (Table 3, middle). The k_{2} of simulated data has a percentage change in the mean of <1% and a typical error of <3% (Table 3, bottom).
Logan Model
The DV in the ROIs of test and retest series goes from 0.56 ± 0.07 in the raphe nucleus to 2.45 ± 0.24 in the hippocampus (Table 4). No difference was found between test and retest paired t test comparisons. The mean percentage change by region goes from −4.37% in the occipital pole to +8.05% in the amygdala, with a range of typical error from 3.45% in the posterior cingulum to >12.85% in the amygdala. The mean typical error is at 6.48% in the limbic area and 5.65% in the other cortical regions. ICC values range from 0.42 in the amygdala to 0.88 in the posterior cingulate gyrus, with a mean value of 0.66 in the limbic area and 0.74 in the other regions.
SPM
SPM results did not show any significant difference between test and retest scan series, in terms of variance and the mean difference with a paired t test model.
DISCUSSION
This test–retest reliability study of ^{18}FMPPF binding was designed to support interpretation of clinical studies implying a long delay between the first PET scan and the second. The obtained pairs of ^{18}FMPPF PET images were apparently identical by visual inspection, as exemplified in Figure 1. This similarity includes experimental conditions, such as reproducibility of head positioning, and injected radioactivity. Nineteen regions were studied and their binding index was calculated. This reliability of PET ^{18}FMPPF binding index, assessed by a longterm test–retest acquisition procedure, has shown high reproducibility. This reliability study gives the precision of the measurement and the ability to test differences between measurements with PET ^{18}FMPPF.
Methodologic Considerations
Bias and Typical Error.
The mean percentage changes for the BP are inferior in the limbic areas (<1%) to that in the other cortical regions (>2%); however, the typical errors are similar (around 7%). The simulated data predicted a mean percentage change of around 1%, with a typical error of 2%. Thus, we can conclude in favor of a better stability of the test–retest measurement close to the ideal—with a biologic uncertainty equivalent in regions rich and poor in 5HT_{1A} receptors—larger than the simple measurement error due to the PET image formation process. The R_{1} parameter has a variability around ±5% in rich regions, which systematically increases (values from 2% to 5%) in poor regions. For the 2 classes of regions, the typical error is close to 6%. The simulation study indicates that the reproducibility should be near zero and the typical error should be between 1% and 2%. Because R_{1} is related to cerebral blood flow, we can state the hypothesis that the regional cerebral blood flow was greater during the retest scan than during the first scan. Because this finding is not observable in the BP results, we can state that modeled parameters are effectively identified independently. Finally, the k_{2} parameter has a very good stability (1% in the mean) for a typical error of measurement close to 10%, whereas simulation predicted a 3% typical error in the absence of intraindividual variability. In conclusion with the SRTM, the R_{1} is the estimated parameter that presents the lowest measurement error (5%), followed by the BP (7%) and by the k_{2} (10%). The Logan model shows a reliability result similar to the BP reliability. Precisely, DV has a higher variability but a lower typical error than the BP.
Variability.
The ICC is a measure of the correlation between the values obtained with 2 methods within the same subject (25). It is used as an index of reliability of the test–retest measurements and combines information of the systemic difference between methods (test and retest) and of the measurement variations. In a PET study on ^{11}CWAY 100635, ICC values above 0.50 and 0.75 were considered as acceptable and excellent, respectively (16). In our study, the ICC values of BP and DV were poor (<0.5) to excellent (>0.75). On average, the ICC is slightly inferior in the limbic area than in the other cortical regions. Because this parameter is representative of the individual stability between test and retest scans, it appeared that it was more variable in regions with high 5HT_{1A} receptor densities than in poor regions. The SDs of the value in test–retest are similar in the limbic area and in the other regions (0.12 for the BP), so the difference in the ICC is due to a higher intrasubject variability in rich regions than in poor regions.
This phenomenon must be considered when individual test–retest results are examined, but it does not affect the reproducibility of a group comparison according to Parsey et al. (2000) and Hirvonen et al. (2006) with ^{11}CWAY 100635. Many studies reported moderate ICC values: ICC values are higher with the SRTM than with graphical or nonlinear fitting techniques with peripherical arterial blood function (14–16,26,27).
Physiologic Considerations
The 5HT_{1A} receptors are implicated in a range of behaviors and in many neuropsychiatric and neurodegenerative diseases. It explains why an increasing number of academic and industrial centers use the selective 5HT_{1A} receptor antagonist, ^{18}FMPPF, as the radiotracer in PET clinical studies (3). It must be noted that several ^{18}FMPPF PET studies have been designed either as a group's comparison or as repeated measures in the same individuals. Repeated measures in the same individuals are performed (a) when addressing drug effects in occupancy studies, (b) when after a disease condition over time, and (c) when measuring variability in receptor densities. The time interval between these measurements may well be several weeks or months. Therefore, interpretation of these studies requires understanding of the test–retest reliability of the methodology, particularly when the degree of change is subtle. This is important because 5HT_{1A} receptor availability can be physiologically or pharmacologically modified and, therefore, could lead to a modification of the apparent binding of ^{18}FMPPF. For example, preclinical studies suggested that ^{18}FMPPF was sensitive to 5HT, because of its affinity close to endogenous 5HT (3). Because it is known that increases in 5HT release are largely due to several physiologic, environmental, and behavioral manipulations (28), it can be hypothesized that the ^{18}FMPPF binding is reduced when the 5HT concentration is increased. Recently, a ^{18}FMPPF study suggested that 5HT_{1A} receptor availability was increased during sleep (4). Other physiologic processes, such as aging, could have a direct impact on the 5HT_{1A} receptor density (29). Therefore, a long period between the test and retest scans could lead to subtle but significant intraindividual variabilities. Finally, as recently described, 5HT_{1A} receptors in the raphe dorsalis can be partially internalized, leading to a considerable decrease in ^{18}FMPPF binding (>30%) (9,30). It must be noted, that although these multiple factors could theoretically modify the BP of ^{18}FMPPF, our results showed that the biologic variability was moderate (<10%). Furthermore, the paired t test between test and retest scans in the ROI analysis and in the SPM analysis did not showed significant changes. These results revealed a strong stability of individual measurements and, therefore, opportunity to use ^{18}FMPPF tracer for testing longitudinal clinical evolutions over periods of a few months.
Use of Reliability Study
Potential Use for Group Comparison.
The results of the reliability study may help to determine the minimal size of the sample of a study as far as the delay between consecutive pairs of trials is similar to the delay of the reliability study (around 6 mo in that experiment). Under that condition of delay, we take into account the 2 components of the typical error: the experimental error and the biologic variability. For a test–retest study with short delay, the biologic error may be different and ideally reduced to zero. In that case, variability of measurements is only due to experimental error rather than close to the typical error found in the simulated data presented in this article. But in specific cases of longterm clinical studies, the natural biologic variability of the control population has to be known to determine the minimal sample size allowing optimal conditions for detection. In crossover studies, a simplified formula useful to fix sample size is n = 8 s^{2}/d^{2}, where “d” is the minimal difference to be detected between pre and post test acquisitions, and “s” is the typical error found in the reliability study (23). The coefficient 8 is an approximation of 2 times the inverse Student distribution for a confidence level of 95%. As an example, expecting a difference of 5% of the variation of BP in the hippocampus (typical error of 7.4% in our reliability study) in a crossover test–retest study will require a sample size of 17 subjects for a statistical power of 95%. That sample size must be multiplied by 4 when another independent group is used as control. When the expected differences are much larger than noise—for example, looking for a difference of 15% in BP in the hippocampus—only 2 subjects are required (8 × 7.4^{2}/15^{2} = 1.9). In that case, the only restriction is to ensure that the selected 2 subjects are representative of a wider population. The major advantage of conducting a reliability study is that it allows performing a crossover study without a healthy control group, just from the knowledge of the typical error of the present reliability study.
Assessing for Individual Measurement.
In that case, the typical error is used quite differently. One approach consists of assessing the difference between 2 scans: if the difference exceeds a confidence interval based on the typical error found in the reliability study (mean ± 2 SD for a 95% likelihood). Another approach, more powerful and less restrictive, consists of establishing whether an expected difference between test and retest acquisitions is exceeded by measurement of an individual subject, with reasonable likelihood. That approach forces one to make an estimation of what smallest clinically important changes between measurements would have a significant importance. So, a priori knowledge of biologic variability is required. Let's suppose that a 10% modification of ^{18}FMPPF BP is an expected value for evidence of clinical variation in the serotoninergic system. With that case, if an individual patient presented a scan difference of 14% in the hippocampus, knowledge of the test–retest typical error allows evaluation of the confidence interval of the true value of changes for a determinate likelihood—that is, for an 80% likelihood, the factor to be applied to the typical error around the measurement is 1.81. With that value, the confidence interval around the patient variation of 14% will be [12.2;15.9]. We can assume that, with an 80% likelihood, true change of the measured change of 14% is greater than a 10% change, so that a change actually occurred between test and retest conditions. This approach is less conservative but is more effective and clinically practical for deciding on an effect in therapeutics.
More generally, a general usage of parametric imaging is the established individual or group comparison at a voxel level, thanks to the usual and friendly approach of SPM. In that study, we verified that the parametric images did not present significant bias between test and retest measurements: This database is then suitable for statistical inference in group comparisons and individual assessment via the general linear model.
CONCLUSION
The results of this study demonstrate—to our knowledge, for the first time—that ^{18}FMPPF is reliable for performing PET of brain 5HT_{1A} receptors in longitudinal studies: ^{18}FMPPF parametric imaging of BP, R_{1}, and k_{2} by noninvasive SRTM and DV by Logan graphical analysis was reproducible with longterm delay. The choice of the analytic method and of the measure of interest can be freely selected and motivated by the purpose of the study, as the results presented in this article showed that both models allow computing indices with similar reproducibility. The parametric data also provide knowledge of noise, allowing the estimation of sample size for group comparison, and confidence interval for individual subject assessment.
Acknowledgments
We thank Damien Dufournel and Caroline Cohen for help in image analysis and the chemistry team in the CERMEP.
Footnotes

COPYRIGHT © 2007 by the Society of Nuclear Medicine, Inc.
References
 Received for publication March 20, 2007.
 Accepted for publication May 15, 2007.