Assessing Amyloid Pathology in Cognitively Normal Subjects Using 18F-Flutemetamol PET: Comparing Visual Reads and Quantitative Methods

Our objective was to determine the optimal approach for assessing amyloid disease in a cognitively normal elderly population. Methods: Dynamic 18F-flutemetamol PET scans were acquired using a coffee-break protocol (a 0- to 30-min scan and a 90- to 110-min scan) on 190 cognitively normal elderly individuals (mean age, 70.4 y; 60% female). Parametric images were generated from SUV ratio (SUVr) and nondisplaceable binding potential (BPND) methods, with cerebellar gray matter as a reference region, and were visually assessed by 3 trained readers. Interreader agreement was calculated using κ-statistics, and semiquantitative values were obtained. Global cutoffs were calculated for both SUVr and BPND using a receiver-operating-characteristic analysis and the Youden index. Visual assessment was related to semiquantitative classifications. Results: Interreader agreement in visual assessment was moderate for SUVr (κ = 0.57) and good for BPND images (κ = 0.77). There was discordance between readers for 35 cases (18%) using SUVr and for 15 cases (8%) using BPND, with 9 overlapping cases. For the total cohort, the mean (±SD) SUVr and BPND were 1.33 (±0.21) and 0.16 (±0.12), respectively. Most of the 35 cases (91%) for which SUVr image assessment was discordant between readers were classified as negative based on semiquantitative measurements. Conclusion: The use of parametric BPND images for visual assessment of 18F-flutemetamol in a population with low amyloid burden improves interreader agreement. Implementing semiquantification in addition to visual assessment of SUVr images can reduce false-positive classification in this population.

mentia, accounting for 60%-80% of cases above 65 y of age (1). Its pathologic hallmark is the accumulation of the amyloid-b peptide, thought to start years before cognitive impairment (2). In fact, abnormal amyloid-b levels are seen in 20%-40% of cognitively normal subjects between the ages of 60 and 90 y (3). These subjects are considered to be in the preclinical stage of AD (4,5), which provides a unique opportunity for secondary prevention studies and is gaining increasing research focus (6). To this end, reliable identification of amyloid pathology in vivo using PET is of the utmost importance in this population.
The identification of amyloid burden by means of visual interpretation of summed late images or of semiquantitative SUV ratio (SUVr) images is currently suggested to be sufficient. Previous studies have shown a high interreader agreement for the visual assessment of SUVr images and a high imaging-pathology correlation in clinical populations and end-of-life subjects (7)(8)(9). It has been shown, however, that SUVr overestimates amyloid burden compared with quantitative nondisplaceable binding potential (BP ND ) (10). As such, quantitative BP ND images may be more reliable also for visual interpretation. In a memory clinic population, Zwan et al. showed that visual assessment of parametric BP ND 11 C-Pittsburgh compound B images resulted in a higher interreader agreement than the frequently used SUV and SUVr images (11). To date, it remains to be determined whether these findings translate to the increasingly available 18 F-labeled amyloid-b targeting tracers, such as 18 F-flutemetamol, and, more importantly, to the challenging population of cognitively normal elderly participants who generally have a minimal amyloid load.
The purpose of this study was to compare 2 parametric imaging methods (SUVr vs. BP ND ) to determine the optimal approach for assessment of early amyloid pathology. To this end, we investigated the agreement in visual assessment of SUVr and BP ND images between 3 readers and its relationship to (semi-)quantitative measures.

Project
The data used in this study originate from the Innovative Medicines Initiative of the European Medical Information Framework for AD (http://www.emif.eu/). The overall aim of this project is to discover and validate diagnostic markers, prognostic markers, and risk factors for AD in nondemented subjects.

Subjects
In total, 199 subjects from the preclinical AD cohort were included at the Vrije Universiteit (VU) Medical Center. Inclusion criteria were an age of at least 60 y and normal cognition according to a delayed recall score that was more than 21.5 SDs of the demographically adjusted normative data on the Consortium to Establish a Registry for Alzheimer Disease 10-word list (12), a Telephone Interview for Cognitive Status-modified score of 23 or higher (13), a 15-item Geriatric Depression Scale score of less than 11 (14), and a Clinical Dementia Rating Scale score of 0 (15). Exclusion criteria were any physical, neurologic, or psychiatric condition that interferes with normal cognition. PET acquisition failed in 3 subjects, and 6 BP ND images were lacking a visual assessment, resulting in 190 subjects who had a visual assessment for both SUVr and BP ND images. PET quantification failed in 5 subjects; thus, 185 subjects were used for the quantitative analysis. Written informed consent was obtained from all subjects, and the study was approved by the Medical Ethics Review Committee of the VU University Medical Center.
PET PET scans were obtained using an Ingenuity TF PET/MRI camera (Philips Healthcare). Thirty-minute scans were acquired immediately after a manual injection of 18 F-flutemetamol (191 6 20 MBq) (16). After 60 min, during which the patient remained outside the scanner bed, a second scan of 20 min was acquired, starting 90 min after injection (17). Immediately before each part of the PET scan, a T1weighted gradient echo pulse MRI scan was acquired for attenuation correction of the PET data. The first emission scan was reconstructed into 18 frames of increasing length (6 · 5, 3 · 10, 4 · 60, 2 · 150, 2 · 300, and 1 · 600 s) using the standard line-of-response-based rowaction maximum-likelihood algorithm for the brain. The second scan was reconstructed with the same algorithm into 4 frames of 300 s each. First, Vinci Software, version 2.56 (Max Planck Institute for Neurologic Research), was used to combine the 2 PET scans into a single multiframe image. Next, each individual's T1-weighted MR images were coregistered to the dynamic PET images using the generic multimodality setting of Vinci with a linear rigid-body schema and normalized mutual information as the similarity measure. Parametric BP ND images were generated from the entire image set using the receptor parametric mapping implementation in PPET (18)(19)(20). Generation of the SUVr images was based on the 90-to 110-min scan interval. Next, T1-based volumes of interest using the Hammers atlas implemented in PVElab software were projected onto the PET images to extract regional values (21). Cerebellar gray matter was used as reference tissue for both analyses (22). Finally, we computed global values based on the average of frontal (volume-weighted average of superior, middle, and inferior frontal gyrus), parietal (volumeweighted average of posterior cingulate, superior parietal gyrus, postcentral gyrus, and inferolateral remainder of parietal lobe), and temporal (volume-weighted average of parahippocampal gyrus; hippocampus; medial temporal lobe; and superior, middle, and inferior temporal gyrus) regions (23,24).

MRI
Whole-brain scans were obtained using the 3-T Achieva scanner (Philips Healthcare) of the PET/MRI system described above equipped with an 8-channel head coil. Isotropic structural 3-dimensional T1weighted images were acquired using a sagittal turbo field echo sequence with the following settings: 1.00 · 1.00 · 1.00 mm voxels, repetition time of 7.9 ms, echo time of 4.5 ms, and flip angle of 8°. A 3-dimensional sagittal fat-saturated fluid-attenuated inversion recovery sequence was acquired using the following settings: 1.12 · 1.12 · 1.12 mm voxels, repetition time of 4,800 ms, echo time of 279 ms, and inversion time of 1,650 ms. The structural 3-dimensional T1 and 3-dimensional fluidattenuated inversion recovery images were used for assessment of global cortical atrophy (25), average medial temporal atrophy (26), and Fazekas score for white matter hyperintensities (27,28).

Visual Assessment of PET Images
Three trained readers, masked to clinical information, first assessed all SUVr images and subsequently all BP ND images, in a randomized order. Images deemed dubious by the reader were reassessed on a separate occasion. Images were scaled to 90% of the pons signal using rainbow color scaling, and transverse, sagittal, and coronal views were displayed using the software package Vinci, version 2.56. Images were rated as either positive (binding in one or more cortical brain regions or striatum unilaterally) or negative (predominantly white matter uptake) according to criteria defined by the manufacturer (GE Healthcare). PET images were assessed together with a T1weighted MR scan to limit the influence of atrophy on the visual assessment.
The level of experience in visual assessment of 18 F-flutemetamol images differed among readers: a nuclear medicine physician with considerable experience, a nuclear medicine physician trainee with basic experience, and a radiologist in training with 6 mo of experience in nuclear medicine. All readers completed the 18 F-flutemetamol reader training provided by GE Healthcare.

Statistical Analysis
Baseline demographics were assessed using simple descriptive statistical analyses. k-statistics were used to asses interreader agreement among the 3 readers, intrareader agreement between the 2 methods, and agreement between visual and semiquantitative classifications. Agreement was considered poor if k was less than 0.20, satisfactory if k was 0.21-0.40, moderate if k was 0.41-0.60, good if k was 0.61-0.80, and excellent if k was more than 0.80. Differences in MRI measurements between PET-negative and PET-positive cases were assessed using a Mann-Whitney U analysis. The correlation between semiquantitative SUVr and BP ND measurements was assessed using Spearman r. Cutoffs were calculated for both SUVr and BP ND using a receiver-operatingcharacteristic analysis and the Youden index. Possible overestimation of amyloid burden using semiquantitative SUVr was investigated by calculating the difference between SUVr 2 1 and BP ND values. Differences in global overestimation between PET-negative and PET-positive cases were assessed using a Mann-Whitney U analysis. Regional differences in binding and overestimation were assessed using a Wilcoxon paired test. Amyloid status resulting from quantitative assessment was considered the true amyloid status for all analyses, in the absence of postmortem confirmation.

RESULTS
Baseline demographics are provided in Table 1.

Visual Reads
Interreader agreement in visual assessment was moderate for SUVr images (k 5 0.57) and good for BP ND images (k 5 0.77). There was discordance between readers for 35 cases (18%) using SUVr and for 15 cases (8%) using BP ND , with 9 overlapping cases. Figure 1 shows examples of agreement and disagreement in visual interpretation of 18  Intrareader agreement (i.e., within reader, between SUVr and BP ND ) differed among readers, with moderate agreement (k 5 0.52) between methods seen in the reader with least experience, excellent agreement (k 5 0.97) in the reader with moderate experience, and good agreement in the reader with most experience (k 5 0.76).
When applying majority rules (i.e., 2 of 3 readers agreed on a scan being either positive or negative), positivity was assigned to 27 (14%) cases based on SUVr and to 25 (13%) cases based on BP ND , with 22 overlapping cases. Thus, 8 cases showed intermethod discordance; that is, 5 cases were rated positive on SUVr but negative on BP ND , and 3 cases were rated positive on BP ND but negative on SUVr. The remaining 160 cases were classified as negative on both images ( Fig. 2A).

Visual Reads Related to Quantitative Measures
For the total cohort, mean global SUVr and BP ND were 1.33 6 0.21 and 0.16 6 0.12, respectively. There was good agreement between both measures (intraclass correlation coefficient, 0.89; P , 0.01). Interreader-concordant positive cases had significantly higher SUVr and BP ND than concordant negative cases  (P , 0.01) ( Table 1). Based on the visual read-concordant cohort alone (n 5 149), the cutoff for positivity was 1.52 for SUVr (area under the curve, 0.98; sensitivity, 95%; specificity, 98%) and 0.26 for BP ND (area under the curve, 1.00; sensitivity, 100%; specificity, 98%) using a receiver-operating-characteristic analysis (Supplemental Fig. 1; supplemental materials are available at http:// jnm.snmjournals.org). After applying both cutoffs to the dataset, the agreement between the SUVr majority visual read and semiquantitative negative-positive classification was good (k 5 0.78), with 16 cases (9%) discordant between the 2 classification methods. The agreement analysis was also done with a literaturebased cutoff (1.56) (8,29) resulting in a k increase of 0.01. For BP ND , the agreement between the majority visual read and the quantitative negative-positive classification was excellent (k 5 0.93), with 3 cases (2%) discordant between the 2 classification methods. Most of the 35 cases (91%) for which SUVr image assessment was discordant between readers were classified as negative using either cutoff (Fig. 2B). In addition, in the 8 cases with a discordant intermethod visual read, there was full agreement between visual and quantitative measurements when BP ND was used, which was not the case with SUVr ( Fig. 2A).

SUVr ≠ BP ND Quantification
We investigated the relationship between the 2 quantitative measures with regard to the majority visual read to assess any violations of the equilibrium assumptions (i.e., SUVr -1 5 BP ND ) in this population. For all cases except one, global SUVr -1 values overestimated the corresponding global BP ND values. Participants with a positive read (mean overestimation [difference SUVR 2 1 and BP ND ] 5 0.37 6 0.11) had a significantly higher overestimation than participants with a negative read (mean overestimation 5 0.14 6 0.07; P , 0.01). This relationship was also observed on a regional level, with the frontal lobe displaying the highest mean binding and the largest mean SUVr overestimation, compared with the parietal (P , 0.01) and temporal (P , 0.01) lobes. In turn, the parietal lobe did not show a significantly higher mean binding (P 5 0.1) but did show a significantly larger overestimation (P , 0.01) than the temporal lobe (Supplemental Fig. 2; Table 1). The SUVr overestimation seems to have a limited influence on the visual read of the high-binding group (i.e., BP ND . 0.26), considering no cases were visually assessed as positive on SUVr and negative on BP ND and only 2 SUVr images (7%) had a discordant read. For the low-binding group (i.e., BP ND # 0.26), the SUVr overestimation might have influenced the visual read, considering that 26 cases (16%) had a SUVr-discordant visual read. However, no obvious pattern was discernible (Fig. 3).

DISCUSSION
In a cognitively normal elderly population with low amyloid burden, we show a considerable improvement in interreader agreement of 18 F-flutemetamol visual assessment when using BP ND rather than standard SUVr images. Misclassifications can be reduced using semiquantitative SUVr measures and avoided using fully quantitative BP ND measures.
Our results are in line with the 11 C-Pittsburgh compound B findings of Zwan et al., who found a comparable improvement in interreader agreement using BP ND images (11). This result suggests that the underlying reason for discrepant interreader agreements was tracer-independent and likely related to the distinctive metrics being used (SUVr and BP ND ). SUVr is commonly used as a proxy for BP ND , under the assumption that a secular equilibrium is reached during scanning. However, these equilibrium conditions are rarely met in practice. As such, whereas parametric BP ND images reflect the density of available receptors (amyloid plaques), SUVr images are affected by a nondisplaceable (free and nonspecific) signal and may be affected by changes in regional flow and washout effects (28,30). As a result, SUVr can overestimate specific binding (10) and influence visual assessments (Fig. 3). Furthermore, our existing data show that this overestimation is not constant but instead increases with higher tracer binding (10,28).
The interreader agreement for the SUVr images and the concordance between semiquantitative and corresponding visual read classifications in our study are lower than previously reported (7-9). However, previous results were based on a clinical population of end-of-life subjects with a higher incidence of moderate to severe amyloid burden, which highlights the challenge of assessing amyloid pathology in a population with low amyloid burden. The challenge could be due to the nonspecific white matter uptake seen with 18 F-flutemetamol, which together with the overestimation resulting from static scanning may translate into a tendency to visually assign regions as positive (31). In our study, the frontal regions were most often perceived as difficult to assess, leading to the greatest doubt for final classification. Although the 18 F-flutemetamol reader training focuses on disentangling the white matter pattern from the cortical signal, assessment in this population seems additionally challenging, especially for less experienced readers. Indeed, the positive-assigning tendency was the strongest for the reader with the least experience, who also showed the lowest intrareader agreement between methods. This result stresses the need for experienced readers to make early assessments or for the reading guidelines to be updated, with the focus being on a cognitively normal elderly population. Of note, whereas the reference region used for visual assessment (i.e., pons) is different from that used for quantitative assessment (i.e., gray matter cerebellum), a separate agreement analysis using pons for quantification did not affect the agreement between classification methods.
Our results may have consequences for drug-intervention studies focused on early populations, since using the visual assessment of SUVr images as an inclusion criterion could result in false-positive inclusion due to the observed overestimation of cortical amyloid burden (32,33). Also, studies indicate that cerebral blood flow can change with age and disease progression (34,35). Therefore, using BP ND images in clinical trials could avoid false-positive classification in visual assessment (28) and ensure that measured changes are due to the treatment instead of a measurement error or blood flow confounders.
An important factor in considering dynamic PET acquisition is participant burden. In this cohort, 95% of participants indicated they had no objections to undergoing a second dynamic PET scan. The coffee-break protocol used in this study may have facilitated this response and suggests the feasibility of longitudinal dynamic acquisition in cognitively normal elderly persons.
In a clinical setting, however, amyloid burden will more likely be moderate to severe and dynamic acquisition more challenging. In addition, the utility of SUV or SUVr visual reads for the diagnosis of AD-type dementia in a clinical setting has been extensively shown (36). Thus, in this context, visual assessment of SUVr images may indeed be sufficient. Nevertheless, the present results illustrate that semiquantification using SUVr can help reduce false-positive classification, especially in a challenging population. Thus, the clinical preference for visual assessment could be revised in light of more available automatic semiquantification methods, such as the one already provided for 18 F-flutemetamol PET scans (8).
In this study, the standard manufacturer guidelines were used for reading both SUVr and BP ND images. Nonetheless, an interesting finding was the improvement in interreader agreement for BP ND images despite the lack of official guidelines and the limited experience of readers in assessing such images. However, it might still be of interest to formally assess whether the current guidelines are optimal for assessing BP ND images. In addition, optimizing visual assessment of SUVr images by updating the current guidelines and providing training specifically focused on early accumulation may also improve the certainty of classification, comparable to that observed using dynamically derived measures. Studies have suggested that, specifically, medial frontal, anterior/posterior/isthmus cingulate cortex, and precuneus are early-accumulating regions (37,38). These regions can be visually assessed using the sagittal view of the PET image. Thus, the importance of this plane may be of most interest for updating guidelines.
A limitation of this study is the lack of a gold standard, as no postmortem data were available, hampering the understanding of the findings in relation to underlying neuropathology. Furthermore, although the frequency of amyloid positivity in this cohort is comparable to previous reports (39), the low incidence may have induced reader bias with regard to searching for amyloid positivity. Lastly, both quantification and visual assessment of the PET images in this study were accompanied by structural MRI, which might not always be available.

CONCLUSION
The use of parametric BP ND images for visual assessment of 18 F-flutemetamol in a population with low amyloid burden improves interreader agreement. Implementing semiquantification in addition to visual assessment of SUVr images can reduce falsepositive classification in this population.

DISCLOSURE
This project received funding from the EU/EFPIA Innovative Medicines Initiative (IMI) Joint Undertaking (EMIF grant 115372) and the EU-EFPIA IMI-2 Joint Undertaking (grant 115952). This joint undertaking receives support from the European Union's Horizon 2020 research and innovation program and EFPIA. Support was also received from the NIHR UCLH Biomedical Research Center, and in-kind sponsoring of the PET tracer was received from GE Healthcare. Philip Scheltens received grants from GE Healthcare, Piramal, and Merck, paid to his institution, and speaker's fees paid to the Alzheimer Center, VU University Medical Center, Lilly, GE Healthcare, and Roche. Pieter Jelle Visser received research support from Biogen and grants from EU/EFPIA IMI Joint Undertaking, EU Joint Programme-Neurodegenerative Disease Research (JPND), ZonMw, and Bristol-Myers Squibb; served as a member of the advisory board of Roche Diagnostics; and received nonfinancial support from GE Healthcare. Frederik Barkhof received payment and honoraria from Bayer-Schering Pharma, Sanofi-Aventis, Genzyme, Biogen-Idec, TEVA, Merck-Serono, Novartis, Roche, Jansen Research, IXICO Ltd., GeNeuro, and Apitope Ltd. for consulting; payment from the Serono Symposia Foundation, IXICO Ltd., and MedScape for educational presentations; and research support via grants from EU/EFPIA Innovative Medicines Initiative Joint Undertaking (AMYPAD consortium), EuroPOND (H2020), U.K. MS Society, Dutch MS Society, PICTURE (IMDI-NWO), and ECTRIMS-MAGNIMS. No other potential conflict of interest relevant to this article was reported.