Visual Abstract
Abstract
68Ga-fibroblast activation protein inhibitors (FAPIs) are promising radiotracers for cancer imaging, with emerging data in the recent years. Nonetheless, the interobserver agreement on 68Ga-FAPI PET/CT study interpretations in cancer patients remains poorly understood. Methods: 68Ga-FAPI PET/CT was performed on 50 patients with various tumor entities (sarcoma [n = 10], colorectal cancer [n = 10], pancreatic adenocarcinoma [n = 10], genitourinary cancer [n = 10], and other types of cancer [n = 10]). Fifteen masked observers reviewed and interpreted the images using a standardized approach for local, local nodal, and metastatic involvement. Observers were grouped by experience as having a low (<30 prior 68Ga-FAPI PET/CT studies; n = 5), intermediate (30–300 studies; n = 5), or high level of experience (>300 studies; n = 5). Two independent readers with a high level of experience and unmasked to clinical information, histopathology, tumor markers, and follow-up imaging (CT/MRI or PET/CT) served as the standard of reference (SOR). Observer groups were compared by overall agreement (percentage of patients matching SOR) and Fleiss κ with mean and corresponding 95% CI. We defined acceptable agreement as a κ value of at least 0.6 (substantial or higher) and acceptable accuracy as at least 80%. Results: Highly experienced observers agreed substantially on all categories (primary tumor: κ = 0.71; 95% CI, 0.71–0.71; local nodal involvement: κ = 0.62; 95% CI, 0.61–0.62; distant metastasis: κ = 0.75; 95% CI, 0.75–0.75), whereas observers with intermediate experience showed substantial agreement on primary tumor (κ = 0.73; 95% CI, 0.73–0.73) and distant metastasis (κ = 0.65; 95% CI, 0.65–0.65) but moderate agreement on local nodal stages (κ = 0.55; 95% CI, 0.55–0.55). Observers with low experience had moderate agreement on all categories (primary tumor: κ = 0.57; 95% CI, 0.57–0.58; local nodal involvement: κ = 0.51; 95% CI, 0.51–0.52; distant metastasis: κ = 0.54; 95% CI, 0.53–0.54). Compared with SOR, the accuracy for readers with high, intermediate, and low experience was 85%, 83%, and 78%, respectively. In summary, only highly experienced readers showed substantial agreement and a diagnostic accuracy of at least 80% in all categories. Conclusion: The interpretation of 68Ga-FAPI PET/CT for cancer imaging had substantial reproducibility and accuracy among highly experienced observers only, especially for local nodal and metastatic assessments. Therefore, for accurate interpretation of different tumor entities and pitfalls, we recommend training or experience with at least 300 representative scans for future clinical readers.
Fibroblast activation protein is expressed by carcinoma-associated fibroblasts and cells of certain solid tumors (1). Radiolabeled fibroblast activation protein inhibitors (FAPIs) such as 68Ga-FAPI-46 have been developed as novel theranostic tools for cancer (2–5). Increasing data suggest a high sensitivity for tumor detection and less background uptake than with 18F-FDG (6–8). Mainly retrospective trials have demonstrated superior detection rates and higher accuracy for the localization for various tumor entities, but larger cohorts and prospective data are lacking. Nonetheless, multiple prospective trials to evaluate the accuracy and management impact of 68Ga-FAPI PET/CT are currently under way at our site and worldwide (NCT05160051, NCT04023240, NCT05262855) (7,9,10).
To serve as a reliable clinical and research tool in the future, findings from 68Ga-FAPI PET/CT must be accurate and reproducible. However, interobserver agreement on 68Ga-FAPI PET/CT interpretations remains poorly understood, and consistent interpretation might be challenging given the heterogeneity of tumor diseases and limited availability of these novel radiotracers. Regardless of the diagnostic value of an imaging method, the overall performance is closely linked to interobserver agreement and variability, which need to be established before implementation into clinical routine (11,12). To address this need, we prospectively evaluated interobserver agreement and accuracy for 68Ga-FAPI PET/CT interpretations in different tumor entities and compared findings among readers with various levels of experience.
MATERIALS AND METHODS
Study Design and Patients
Fifty patients who underwent 68Ga-FAPI-46 PET/CT for imaging of the following tumor types were retrospectively selected from 2 institutional databases (University Hospital Essen and University Hospital Bologna): sarcoma (n = 10), pancreatic adenocarcinoma (n = 10), colorectal cancer (n = 10), genitourinary cancer (n = 10) and miscellaneous cases (n = 10) consisting of various other cancer entities (breast cancer [n = 1]; lung cancer [n = 2]; pleural mesothelioma [n = 2]; cholangiocarcinoma [n = 2]; hepatocellular carcinoma [n = 1]; head and neck cancer [n = 1]; and lymphoma [n = 1]).
PET/CT-positive lesions were defined by consensus during a joint reading session by 2 expert readers, each with more than 500 prior clinical or research 68Ga-FAPI-46 PET/CT readings. Expert readers had access to all clinical data. Clinical information, histopathology, tumor markers, and follow-up imaging (CT/MRI or PET/CT) were used as the standard of reference (SOR) for lesion validation. Cases were selected to represent clinical routine, ranging from negative cases (n = 8, 16%) to extensive disease (n = 28, 56%), with typical pitfalls. Data were analyzed as part of the prospective interobserver study (NCT04990882) and approved by the local Ethics Committee (permits 19-8991-BO and 20-9485-BO). Patients gave written informed consent to undergo clinical 68Ga-FAPI-46 PET/CT, and the respective institutional review boards waived individual patient consent for anonymized assessment of their datasets as part of the interobserver study.
Image Acquisition and Reconstruction
Patient preparation and image acquisition were performed as previously described (13).
In brief, PET was performed on a PET/CT system (Biograph mCT or Vision; Siemens). Scanning was performed at a mean (±SD) of 14 ± 9 min after injection (minimum, 10 min; maximum, 48 min). The injected activity of 68Ga-FAPI was 150 ± 38 MBq. All PET images were iteratively reconstructed (Vision: 4 iterations, 5 subsets, 220 × 220 matrix, 5-mm Gauss filtering; mCT: 3 iterations, 21 subsets; 200 × 200 matrix, 4-mm Gauss filtering) with time-of-flight information, using the manufacturer’s dedicated software (syngo.via for MI; Siemens Healthineers). In all patients, a low-dose CT scan was acquired for attenuation correction (30 mAs, 120 keV, 512 × 512 matrix, 3-mm slice thickness). If available, diagnostic CT scans (with or without contrast enhancement, within 2 wk of 68Ga-FAPI PET/CT) were provided for the respective case.
Observers
Fifteen nuclear medicine physicians (5 with additional radiology training) who had at least 1 y of experience with other tracers (18F-FDG, 18F/68Ga-PSMA, 68Ga-DOTA-peptides) were prospectively recruited as observers. They were from 10 centers in Europe (n = 12), North America (n = 2), and Australia (n = 1). On the basis of the previously reported number of 68Ga-FAPI PET/CT interpretations, observers were classified as having a low (<30 prior 68Ga-FAPI PET/CT studies; n = 5), intermediate (30–300 studies; n = 5, 4 of whom also had radiology education), or high level of experience (>300 studies; n = 5, 1 of whom also had radiology education). The observers reviewed all provided 68Ga-FAPI PET/CT datasets (n = 50). Each dataset included low-dose CT and attenuation-corrected PET images and, if available, diagnostic CT images.
Guidelines for Visual Interpretation
For standardized visual interpretation of datasets, the observers were provided a written guide (Supplemental Data File 1; supplemental materials are available at http://jnm.snmjournals.org), 4 teaching cases (Supplemental Data File 2), an electronic case report form, and 1 test patient dataset with disclosed data entries. Furthermore, the observers were asked to learn about 68Ga-FAPI-46 PET/CT pitfalls (13).
Only necessary patient information was disclosed to the observers before image interpretation: indication for 68Ga-FAPI PET/CT (primary diagnosis, prior surgery/therapy, staging or restaging of metastatic disease), age (y), weight (kg), injected dose (MBq), and uptake time (min). The observers were masked to all other clinical data. Visual image interpretation for the presence or absence of malignant disease was reported for predefined organ and region categories.
Statistical Analyses and Reference Standard
Agreement among observer groups was evaluated using the Fleiss κ (14).
Ninety-five percent CIs are reported for κ. The interpretation of κ was based on a classification provided by Landis and Koch (15): 0.0, poor; 0.0–0.20, slight; 0.21–0.40, fair; 0.41–0.60, moderate; 0.61– 0.80, substantial; and 0.81–1.00, almost-perfect reproducibility. SOR was defined as the decision of the experienced expert readers, unmasked to all clinical information (e.g., histopathology, follow-up imaging, and tumor markers).
Overall agreement, defined as the total agreement of an observer on all categories (primary tumor, local nodal involvement, distant metastasis), and sensitivity, specificity, and positive and negative predictive value compared with SOR were calculated. The difference between 2 groups was assessed by the Student t test at a significance level of P less than 0.05. Statistical analyses were performed using SSPS software (version 26.0, SPSS Inc.) for all other statistical analyses, and graphs were generated using GraphPad Prism (version 9.1.0; GraphPad Software). At least substantial agreement (κ ≥ 0.6) on visual interpretation of all scans for the 3 major staging categories (primary tumor, local nodal involvement, distant metastasis) and diagnostic accuracy of at least 80% were defined as acceptable performance.
RESULTS
Patient Characteristics
Table 1 summarizes the patient characteristics. Twenty-seven male (54.0%) and 23 female (46.0%) patients were included in this study, with a mean age of 58 ± 18 y (range, 19–83 y). Twenty-one patients were imaged at initial diagnosis (42.0%), and 29 patients (58.0%) underwent 68Ga-FAPI PET/CT for disease restaging. Diagnostic CT was available for 68% of the patients (34 patients: 9 with sarcoma, 9 with pancreatic adenocarcinoma, 5 with colorectal cancer, 4 with genitourinary cancer, and 7 with miscellaneous cancers), and low-dose CT was performed on 16 (32%) patients. 68Ga-FAPI PET/CT studies were interpreted as positive for local or metastatic cancer lesions in 42 of 50 (84.0%) patients by the reference readers: primary tumor was present in 10 patients (20.0%); 4 patients (8.0%) had lymph node–positive disease, whereas 28 (56.0%) were staged as bone-positive or organ-positive (distant metastasis), respectively.
Patient Characteristics (n = 50)
Image Interpretation: Interobserver Agreement
The interobserver agreement on visual image interpretation is summarized in Figure 1 and Table 2. Highly experienced observers agreed substantially on all categories (primary tumor, κ = 0.71; local nodal involvement, κ = 0.62; distant metastasis, κ = 0.75), whereas observers with intermediate experience showed substantial agreement on primary tumor (κ = 0.73) and distant metastasis (κ = 0.65) and moderate agreement on nodal involvement (κ = 0.55). Observers with low experience had moderate agreement on all categories (primary tumor, κ = 0.57; local nodal involvement, κ = 0.51; distant metastasis, κ = 0.54). Regardless of the experience level, the overall interobserver agreement was moderate for primary tumor (κ = 0.56) and local nodal involvement (κ = 0.55) but substantial for other metastatic lesions (κ = 0.69).
Interobserver agreement by primary tumor (T), local nodal involvement (N), distant metastasis (M), and experience level (high, intermediate, and low).
Interobserver Agreement on Visual Image Interpretation
Image Interpretation: Comparison to SOR
Table 3 summarizes—by category—sensitivity, specificity, positive and negative predictive value, and accuracy for the entire group and separately for observers with a low, intermediate, or high level of experience. Independent of the experience level, all observers reported primary tumor and distant metastasis stages with high sensitivity (89.0% and 91%). Sensitivity was lower for local nodal involvement stages (79.0%). Because of higher rates of false-positive lesions, specificity was lower for primary tumor stages than for local nodal involvement and distant metastasis stages (65% vs. 81% vs. 77%). Five representative patient examples of disagreement among observers are given in Figure 2.
Sensitivity and Specificity for Observer with High, Intermediate, and Low Experience and for All Observers
Example cases and lesions (arrows) with high agreement, high disagreement, and mixed agreement with SOR. Maximum intensity projections (MIPs) are represented above, and fused transaxial PET/CT images are represented below. (A) Patient with local relapse of urothelial cancer confirmed by histopathology; 7 readers rated positive (2 high, 3 intermediate, 2 low experience), and 8 readers rated lesion negative (3 high, 2 intermediate, 3 low experience). (B) Patient with ovarian cancer; all readers agreed with expert on peritoneal involvement. (C) All readers agreed with expert on liver metastasis in patient with colorectal carcinoma. (D) All readers correctly identified diverticulosis in sarcoma patient. (E) Patient with urothelial cancer and retroperitoneal fibrosis; all readers falsely rated retroperitoneal lesions as malignant and disagreed with expert panel and histopathology.
Mean overall agreement with SOR for primary tumor, local nodal involvement, and distant metastasis staging had a κ value of 0.64 for the entire group of observers. Observers with high and intermediate experience showed substantial agreement with SOR (high, κ = 0.69; intermediate, κ = 0.66), whereas readers with low experience had moderate agreement with SOR (κ = 0.56) (Table 4). Observers with low and intermediate experience had significantly lower agreement than did highly experienced observers (κ = 0.56 and 0.66 vs. 0.69, P < 0.001). Irrespective of the stage categories, observers with high and intermediate experience had higher sensitivity, specificity, positive and negative predictive values, and accuracy than did observers with low experience (sensitivity, 87% vs. 91% vs. 74%; specificity, 82% vs. 75% vs. 70%; accuracy, 85% vs. 83% vs. 78%) (Table 4).
Overall Sensitivity and Specificity for Observer with High, Intermediate, or Low Experience
The interobserver agreement and accuracy of observers were analyzed separately for the different tumor entities (Table 5). Substantial overall agreement was observed for the miscellaneous (κ = 0.75), sarcoma (κ = 0.74), colorectal (κ = 0.70), and genitourinary cases (κ = 0.63). Only interpretation of pancreatic cancer cases revealed slight overall agreement (κ = 0.33). Readers were most sensitive for sarcoma and miscellaneous cases (96% and 97%) and had high accuracy (87% and 89%). Pancreatic cases had the lowest diagnostic performance (sensitivity, 73%; specificity, 61%; accuracy, 66%) (Table 5).
Agreement and Accuracy per Tumor Entity Irrespective of Reader Experience
DISCUSSION
This prospective study involving 50 68Ga-FAPI46 PET/CT patients with various tumor entities demonstrated substantial reproducibility for highly experienced observers, as well as—with limitations—for observers with intermediate experience.
On the basis of our predefined criteria, we recommend initial training with at least 300 representative patient cases (high experience) to reach an acceptable diagnostic performance for clinical and research interpretations of 68Ga-FAPI PET/CT scans. This is comparable to other recent radiotracer training recommendations (16). Training cases should include not just tumor findings (low and extensive tumor burden) but also common pitfalls, such as inflammatory changes or degenerative or posttraumatic bone lesions (13).
In recent years, 68Ga-FAPI PET/CT has been promoted as a novel highly tumor-specific imaging modality with beneficial radiotracer kinetics and low background uptake (17). Nonetheless, because of the heterogeneity of tumor diseases, variance in fibroblast activation protein expression of tumor and stromal cells, and potential pitfalls, the interpretation of 68Ga-FAPI PET/CT scans might prove more challenging than expected (8,9,13,18–22). Sources of misinterpretations and false-positive ratings of lesions include a variety of pitfalls similar to those for 18F-FDG but differing from those for other recent targeted radioligands such as 68Ga-PSMA (23,24). Examples of these pitfalls include inflammatory uptake (e.g., pancreatitis and myocarditis), degenerative bone lesions, and benign tumors (8,13,25). Because 68Ga-FAPI is relatively new, the list of pitfalls and benign findings is still evolving and therefore must be constantly updated. Radiotracers such as 68Ga-PSMA for prostate cancer or 68Ga-DOTATATE/DOTATOC for neuroendocrine tumors are entity-specific. On the other hand, 68Ga-FAPI is not specific for any malignancy and can be used for a variety of cancers. This might explain why physicians need to be trained on a high number of cases (300) for 68Ga-FAPI PET to achieve substantial reproducibility, compared with less than 50 patient cases for 68Ga-PSMA and 68Ga-DOTATATE/DOTATOC PET (16,26). Our data underline the necessity of high reader experience for best 68Ga-FAPI PET agreement and diagnostic performance, although intermediate-level readers do show appropriate results as well. As compared with our study on the 68Ga-FAPI tracer, other interobserver agreement studies on 68Ga-PSMA and 68Ga-DOTATATE PET/CT reported higher reproducibility with lower numbers of recommended initial training cases. This difference is partly due to higher specificity and tumor signal given by receptor radiotracers.
Our cohort consisted of a high proportion of patients with metastatic disease, which reflects the likely clinical-use scenario for 68Ga-FAPI PET/CT as a whole-body staging tool. We observed that despite high sensitivity for local staging, specificity was only 63% even for highly experienced readers. This is likely due to high nonmalignant tracer uptake after surgery or tracer uptake due to inflammatory changes, both frequently noted in pancreatic cancer patients in our cohort and likely reflecting chronic pancreatitis or scarred tissue after pancreatectomy (8,27). A careful review of the patient’s history before image interpretation, and sufficient experience, will therefore be required to avoid a negative impact on clinical management.
This study comes with limitations. Because 68Ga-FAPI PET/CT is applied for different indications, we decided on a balanced oncologic patient cohort from 2 centers. Only a small proportion of tumor entities was reflected by the patient cohort, and a larger number of patients would be necessary to draw conclusions on agreement and diagnostic performance for distinct tumor entities or imaging of benign disease. Although previous interobserver studies were focused on a single tumor entity (i.e., neuroendocrine tumors for 68Ga-DOTATATE and prostate cancer for 68Ga-PSMA) (16,26), 68Ga-FAPI could be used for a variety of malignancies. For this purpose, we aimed to include representative patients with different tumor entities. We acknowledge that only one fifth of the cases presented with primary tumor and that PET/CT was performed after the primary treatment in almost 60% of the patients. However, we consider restaging of disease to be a representative scenario to test the reproducibility of 68Ga-FAPI PET interpretation among readers with various levels of experience. The skill of a reader is determined by multiple factors, including clinical knowledge and general experience in imaging and may vary with the oncologic focus of the reader or center (e.g., lung cancer vs. breast cancer vs. sarcoma). We tried to address this limitation by recruitment of readers worldwide and by inclusion of different tumor entities.
CONCLUSION
Before clinical implementation of 68Ga-FAPI PET/CT, we recommend a high experience level based on at least 300 training cases for substantial agreement and diagnostic performance in all categories (primary tumor, local nodal involvement, and distant metastasis). On the basis of the different clinical scenarios and pitfalls, more extensive training is required for 68Ga-FAPI PET/CT than for other radiotracers.
DISCLOSURE
Lukas Kessler is a consultant for BTG and AAA and received fess from Sanofi outside the submitted work. Manuel Weber reports personal fees from Boston Scientific, Terumo, Advanced Accelerator Applications, and Eli Lilly. Wolfgang Fendler reports fees from SOFIE Biosciences (research funding), Janssen (consultant, speakers’ bureau), Calyx (consultant), Bayer (consultant, speakers’ bureau, research funding), Parexel (image review), and AAA (speakers’ bureau) outside the submitted work. Benedikt Schaarschmidt received a research grant from PharmaCept for an undergoing investigator-initiated study not related to this paper. No other potential conflict of interest relevant to this article was reported.
KEY POINTS
QUESTION: To what extent do 68Ga-FAPI PET/CT interpretations agree on various tumor entities for observers with different levels of experience?
PERTINENT FINDINGS: We observed substantial overall agreement and high accuracy for observers with a high experience level and also, partially, for intermediate-level observers.
IMPLICATIONS FOR PATIENT CARE: Before clinical implementation of 68Ga-FAPI PET/CT, we recommend a high level of reader experience based on at least 300 training cases.
Footnotes
Published online May 25, 2023.
- © 2023 by the Society of Nuclear Medicine and Molecular Imaging.
REFERENCES
- Received for publication November 23, 2022.
- Revision received February 21, 2023.