Abstract
Recently, the standardized reporting and data system for prostate-specific membrane antigen (PSMA)–targeted PET imaging studies, termed PSMA-RADS version 1.0, was introduced. We aimed to determine the interobserver agreement for applying PSMA-RADS to imaging interpretation of 18F-DCFPyL (2-(3-{1-carboxy-5-[(6-18F-fluoro-pyridine-3-carbonyl)-amino]-pentyl}-ureido)-pentanedioic acid) PET examinations in a prospective setting mimicking the typical clinical workflow at a prostate cancer referral center. Methods: Four readers (2 experienced readers (ERs, >3 y of PSMA-targeted PET interpretation experience) and 2 inexperienced readers (IRs, <1 y of experience)), who had all read the initial publication on PSMA-RADS 1.0, assessed 50 18F-DCFPyL PET/CT studies independently. Per scan, a maximum of 5 target lesions was selected by the observers, and a PSMA-RADS score for every target lesion was recorded. No specific preexisting conditions were placed on the selection of the target lesions, although PSMA-RADS 1.0 suggests that readers focus on the most avid or largest lesions. An overall scan impression based on PSMA-RADS was indicated, and interobserver agreement rates on a target lesion–based, on an organ-based, and on an overall PSMA-RADS score–based level were computed. Results: The number of target lesions identified by each observer was as follows: ER 1, 123; ER 2, 134; IR 1, 123; and IR 2, 120. Among those selected target lesions, 125 were chosen by at least 2 individual observers (all 4 readers selected the same target lesion in 58 of 125 [46.4%] instances, 3 readers in 40 of 125 [32%], and 2 observers in 27 of 125 [21.6%]). The interobserver agreement for PSMA-RADS scoring among identical target lesions was good (intraclass correlation coefficient [ICC] for 4, 3, and 2 identical target lesions, ≥0.60, respectively). For lymph nodes, an excellent interobserver agreement was derived (ICC, 0.79). The interobserver agreement for an overall scan impression based on PSMA-RADS was also excellent (ICC, 0.84), with a significant difference for ER (ICC, 0.97) vs. IR (ICC, 0.74) (P = 0.005). Conclusion: PSMA-RADS demonstrated a high concordance rate in this study, even among readers with different levels of experience. This finding suggests that PSMA-RADS can be effectively used for communication with clinicians and can be implemented in the collection of data for large prospective trials.
Radiotracers targeting prostate-specific membrane antigen (PSMA), such as the urea-based small-molecule 18F-DCFPyL (2-(3-{1-carboxy-5-[(6-18F-fluoro-pyridine-3-carbonyl)-amino]-pentyl}-ureido)-pentanedioic acid), have demonstrated excellent performance characteristics in identifying sites of disease in subjects with prostate cancer (PCa) (1–3). However, in patients with an extensive tumor burden (4) or for lesion detection in preoperative lymph node (LN) staging (5), clinical interpreters have to consider certain pitfalls, such as uptake in benign lesions or in nonprostatic malignancies (6–10). To aid in the interpretation of PSMA-targeted PET imaging studies, multiple structured reporting systems have been proposed. These include the Prostate Cancer Molecular Imaging Standardized Evaluation and the PSMA-Reporting and Data System (PSMA-RADS, version 1.0) (11–14). Such frameworks help convey to the reader the level of certainty that an equivocal finding or a finding without a cross-sectional imaging correlate is a site of disease. Striving for a readily applicable system for a clinical observer, PSMA-RADS is simple, easy to memorize and use, and based exclusively on imaging findings (i.e., the site and intensity of radiotracer uptake). Both individual target lesions (maximum of 5 per scan) and the overall impression of the imaging study should receive a PSMA-RADS score. Such scores are on a 5-point scale that reflects the confidence of the interpreting imaging specialist that a given lesion represents a site of PCa (from 1 = definitively benign to 5 = high degree of certainty that PCa is present). PSMA-RADS 1.0 may facilitate the collection of data for larger clinical trials, can serve as a guide for nuclear medicine physicians in interpreting PSMA-targeted PET scans, and can enable efficient communication with referring clinicians (13).
To validate the utility of PSMA-RADS, further confirmatory work on this proposed standardized reporting system is needed and the interobserver agreement among different interpreters has to be addressed. As such, we undertook to determine the interobserver reliability of PSMA-RADS in a prospective setting in which readers with varying experience levels evaluated 50 18F-DCFPyL PET/CT scans randomly selected from a large trial evaluating the clinical utility of the radiotracer. All observers had read the original PSMA-RADS publication but were masked to all information about the patients and were provided no other instructions, thus simulating some elements of a real-world busy clinical PCa practice.
MATERIALS AND METHODS
In total, 50 patients with histologically proven PCa who had undergone 18F-DCFPyL PET/CT imaging were included in this evaluation. All patients were originally imaged as part of an institutional review board–approved protocol (ClinicalTrials.gov identifier NCT02825875), and all patients gave written informed consent. 18F-DCFPyL was used according to Food and Drug Administration Investigational New Drug application 121064.
Imaging Procedure
As per our standard practice, patients were asked to be nil per os (with the exception of water and medications) for at least 4 h before radiotracer injection. 18F-DCFPyL was synthesized as previously described (15). Integrated PET/CT using either a Discovery RX 64-slice PET/CT scanner (GE Healthcare) or a Biograph mCT 128-slice PET/CT scanner (Siemens) operating in 3-dimensional emission mode with CT attenuation correction was performed on all patients. 18F-DCFPyL (≤333 MBq [≤9 mCi]) was administered intravenously, and after an uptake time of approximately 60 min, acquisitions from the mid thigh to the vertex of the skull were conducted, covering 6–8 bed positions (depending on patient height and the scanner) with patients supine. A detailed description was previously published (7).
Imaging Interpretation
PET images were analyzed using XD3 Software (Mirada Medical). PET, CT, and PET/CT images were assessed for all 50 patients. Two experienced readers (a dual board-certified nuclear medicine physician/radiologist [ER 1] and a board-certified nuclear medicine physician [ER 2] with >3 y of experience in reading PSMA-targeted PET scans) and 2 inexperienced readers (a recently board-certified nuclear medicine physician [IR 1] and a resident [IR 2] with <1 y of experience in reading PSMA-targeted PET scans), masked to the clinical status of the patients (other than knowing that the patients had been imaged because of a history of PCa), evaluated all scans independently. Except for ER 1, the remaining 3 readers had no previous experience with reading 18F-labeled PSMA-targeted PET images (i.e., those observers had clinical experience solely in interpreting 68Ga-PSMA-11 or 68Ga-PSMA imaging-and-therapy PET scans). Before beginning the masked independent reads, the IRs underwent a training session with 5 cases to gain familiarity with the workstation and the XD3 Software (Mirada Medical) that was used to display the scans.
PSMA-RADS-1A lesions are benign and have no abnormal radiotracer uptake, whereas PSMA-RADS-1B lesions are benign (often characterized by biopsy or pathognomonic imaging) but do have abnormal radiotracer uptake. Often, characterizing a lesion as PSMA-RADS-1B involves previous conventional imaging or histologic diagnosis; as such, PSMA-RADS-1A and -1B were subsumed under PSMA-RADS-1 in the present masked analysis. No other changes to the PSMA-RADS system were implemented in this study. A complete summary of the PSMA-RADS scoring system (from PSMA-RADS-1 to -5) can be found in a previous publication (13).
In accordance with the specifications of PSMA-RADS 1.0, a maximum of 5 target lesions was selected by the readers. PSMA-RADS suggests that target lesions be those that are largest or have the most intense radiotracer uptake, although ultimately target lesion selection is left to the discretion of the interpreting imaging specialist. Further, a maximum of 3 lesions per organ can be included. The following organ compartments were defined: LNs, skeleton, prostate/local recurrence, soft tissue (other than LNs), liver, thyroid, and lung (16). A PSMA-RADS score had to be assigned to every target lesion. Additionally, all involved organ compartments were identified by the readers, and an overall scan score was assigned. The overall PSMA-RADS score was defined analogously to somatostatin receptor RADS (i.e., the highest PSMA-RADS score of any of the individual target lesions) (17). Moreover, the following general parameters were assessed by each observer in a binary fashion: overall scan result (positive in cases of suggestive radiotracer uptake above the background level), organ involvement, and LN involvement. Additionally, the number of organs affected, the number of organ metastases, the number of LN regions, and the number of LNs had to be indicated on a 5-point scale (from 1 to ≥5 organ metastases, LNs, or number of organs/LN areas affected). The following LN areas were defined: cervical, thoracic/axillary, retroperitoneal, (pre)sacral, and pelvic (16). Moreover, the concordance between both ERs and IRs was evaluated in an interobserver setting for the overall PSMA-RADS score.
Statistical Analysis
Continuous data are presented as mean ± SD. The categoric variables are presented as frequency and percentage. The degrees of agreement were assessed using intraclass correlation coefficients (ICCs) and their 95% confidence intervals (CIs) based on a mean-rating, single-measure, consistency model. According to Cicchetti, an ICC of less than 0.4 indicates poor interobserver agreement, 0.4–0.59 indicates fair agreement, 0.6–0.74 indicates good agreement, and 0.75–1 indicates excellent agreement (18). Statistical analysis was performed using MedCalc statistical software (version 18.2.1; MedCalc Software bvba). The statistical significance level was set at a P value of less than 0.05.
RESULTS
The patients’ characteristics are detailed in Table 1.
General Parameters
For the 3 parameters that had to be evaluated in a binary fashion (overall scan result, organ involvement, and LN involvement), the interobserver agreement was excellent (ICC, 0.75, 0.80, and 0.78, respectively) (18). Except for the number of organs affected (good interobserver agreement; ICC, 0.74), all general parameters that were evaluated on a 5-point scale demonstrated excellent agreement (number of LN areas affected: ICC, 0.79; number of organ metastases: ICC, 0.92; number of LN metastases: ICC, 0.90). Table 2 summarizes all results for those general scan parameters, and Figure 1 displays the distribution for number of organ and LN metastases for all 4 readers.
Target Lesion- and Compartment-Based Interobserver Agreement
In total, the following numbers of target lesions were recorded by each reader: ER 1, 123; ER 2, 134; IR 1, 123; and IR 2, 120. Among those selected target lesions, 125 were chosen by at least 2 individual observers. The majority of the lesions were assigned to either LN (64/125, 51.2%) or skeleton (39/125, 31.2%) (Table 3).
Identical Target Lesion Included by 4 Readers
The identical target lesion was included by all 4 readers in 58 of 125 (46.4%) instances, with the majority of those findings being either LN (26, 44.8%) or bone lesions (19, 32.8%). In 29 (50%) of those 58 target lesions, all 4 readers designated the identical PSMA-RADS score, with another 17 lesions (29.3%) having agreement by 3 readers and the remaining 12 (20.7%) having agreement by 2 readers. The ICC was 0.60 (95% CI, 0.48–0.71). On an organ-based compartment level for all 4 readers selecting the same LN, interobserver agreement was 0.79 (95% CI, 0.66–0.89). Figure 2 illustrates the PSMA-RADS score for 4 identical target lesions among all readers.
Identical Target Lesion Included by 3 Readers
In 40 (32%) of the 125 cases, 3 readers identified an identical target lesion. LNs comprised 22 (55%) of these 40 target lesions, with 12 (30%) being bone findings. In 21 (52.5%), all 3 readers agreed on the same PSMA-RADS score; in 15 (37.5%), 2 readers agreed; and in the remaining 4 (10%), there was no agreement. The ICC was 0.60 (95% CI, 0.43–0.75). Similar to the situation for 4 identical target lesion selections, the interobserver agreement was 0.66 for LN (95% CI, 0.44–0.83).
Identical Target Lesion Included by 2 Readers
In 27 of the 125 identical target lesions (21.6%), a minimum of 2 readers selected the same finding. LNs (16, 59.3%) and bone lesions (8, 29.6%) were seen in the majority of the cases. In approximately half the cases (15, 55.6%), both readers agreed on the PSMA-RADS score, with no concordance being seen in the remaining cases (12, 44.4%). The ICC was 0.62 (95% CI, 0.32–0.81) for 2 identical target lesions (LN) (ICC, 0.57; 95% CI, 0.12–0.83).
Taken together, the ICC for 4, 3, and 2 identical chosen target lesions can be described as good. The number of investigated identical bone lesions by 4, 3, or 2 readers was too small for a reliable assessment of ICCs. Table 3 summarizes the compartment-based and target lesion interobserver agreement findings. Table 4 provides a distribution of the different PSMA-RADS scores for those target lesions that had been included by all 4 readers.
Overall PSMA-RADS
In the majority of the cases, the readers described the scan impression with an overall PSMA-RADS score of 4 or 5. The ICC was 0.84 (95% CI, 0.77–0.90; that is, excellent agreement). Table 4 gives an overview of the distribution of the different overall PSMA-RADS scores for all 4 readers. Figure 3 illustrates the overall PSMA-RADS distribution among different readers.
ERs Versus IRs
Compared with ERs serving as a gold standard, the ICC of the ERs for an overall PSMA-RADS score level was 0.97 (95% CI, 0.94–0.98), whereas for the IRs, the ICC was 0.74 (95% CI, 0.58–0.84). A statistically significant difference could be reached for the ICC of the ERs versus the ICC of the IRs (P = 0.005). These findings were further corroborated on a target-based level investigating all identical target lesions that were included by all 4 readers. The ICC for the ERs was 0.80 (95% CI, 0.68–0.88) and was statistically significantly different from the ICC for IRs, 0.53 (95% CI, 0.32–0.60) (P = 0.013). Figures 4 and 5 provide examples of lesions in which reader experience may have played a role in PSMA-RADS scoring.
DISCUSSION
In light of the growing availability of 68Ga- or 18F-labeled PSMA-targeted imaging agents (19–22), the number of molecular imaging specialists that routinely interpret PET scans with these compounds outside controlled clinical trials is expanding (23). However, numerous studies have reported pitfalls in the reading of PSMA-targeted PET studies, including studies of Paget disease, sarcoidosis, or nervous tissue such as ganglia (7–10). Any systematic approach to the interpretation of PSMA-targeted PET scans should therefore build in a measure of uncertainty as to the presence of PCa. The recently reported system PSMA-RADS 1.0 incorporates such uncertainty with recommended follow-up for indeterminate lesions (12). Further, such a system should also facilitate communication of important findings between image interpreters and referring clinicians, be useful for collecting data in multicenter prospective studies, and allow for the eventual implementation of machine learning algorithms based on the system. For all these applications, high interobserver reproducibility is necessary.
The ICC for the overall PSMA-RADS score for both ERs (0.97) was consistent with excellent interobserver agreement, whereas the 2 IRs still agreed well (0.74) on an overall PSMA-RADS score level (all 4 interpreters, 0.84; Fig. 3). This is in line with previous reports in which ERs demonstrated an almost perfect reproducibility on 68Ga-PSMA-11 PET/CT for specified lesions (low-experience observers, substantial agreement) (16). Notably, these results contrast with other standardized reporting systems, such as prostate imaging (PI)-RADS 2 for prostate MRI (moderate interobserver agreement among experienced radiologists, with a Fleiss k < 0.6) (24). In a similar vein, a significant variation was present in both the PI-RADS distribution between radiologists and, more importantly, in the detection of suspected clinically significant cancer by PI-RADS using multiparametric MRI (25).
On an overall scan impression level, most PET studies were assigned PSMA-RADS-4 or -5 scores by all observers (Table 4). We hypothesize that this finding derives from the high specificity and sensitivity of PSMA-targeted radiotracers. Although PI-RADS highly depends on the experience of the reading radiologists (25), PSMA-RADS seems to be readily applicable even for less experienced readers (ICC, 0.74). These findings were further corroborated on a target lesion level (Fig. 2). Despite the fact that PSMA-RADS provides little specific information on the selection of target lesions, a minimum of 3 readers (i.e., minimum of 1 IR) designated the same PSMA-RADS score within the context of all 4 readers selecting the same target lesion with an agreement rate of more than 79% (Table 3). Moreover, on an organ compartment level, the ICC for LN lesions based on PSMA-RADS was 0.79, which is almost identical to a previous assessment for the interobserver agreement for LNs (Fleiss k = 0.80) (16).
A nuance of the current study is that the ERs gained experience with subtly different PSMA-targeted radiotracers. There is a current trend toward increased use of 18F-labeled PSMA-targeted imaging agents for PCa molecular imaging, although 68Ga-PSMA-11 has been by far the most commonly used radiotracer to date (26). In head-to-head comparisons between 68Ga- and 18F-labeled compounds, a higher detection rate for sites of disease, as well as an increased tumor-to-background ratio, was demonstrated with a radiofluorinated agent (27,28). Some of the differences in interpretation between ER 1 and ER 2 might be related to their relative familiarities with these different PSMA-targeted radiotracers. A common example of a difference in classification between an 18F-trained reader and the 68Ga-trained readers is given in Figure 4: Although ER 1 called uptake in a right iliac LN lesion PSMA-RADS-4 (i.e., PCa highly likely to be present), 2 other readers (ER 2 and IR 1, both trained with 68Ga-PSMA PET imaging agents) classified this lesion as PSMA-RADS-3A (i.e., a suggestive but indeterminate LN) (13). The 18F-trained reader may have higher confidence in lesion interpretation on 18F-DCFPyL PET scans, most likely because of the higher sensitivity in the detection rate of small lesions using 18F-labeled radiotracers than using 68Ga-PSMA PET imaging agents (27,28).
Further corroborating the need for a standardized framework (11,12), one of the IRs classified moderate radiotracer uptake in mediastinal and hilar LNs as PSMA-RADS-4 (Fig. 5), whereas ER 1 called it PSMA-RADS-2 (i.e., likely benign). Even though the IR had potentially misinterpreted the low-level uptake in the LNs (longitudinal follow-up imaging showed no change in these LNs), this potential misinterpretation did not affect the overall scan score. Thus, PSMA-RADS may contribute to a self-learning effect: PSMA-RADS-4 lesions may be downgraded to PSMA-RADS-2 when subsequent imaging confirms stability, which in turn would increase the understanding of the IR about how to differentiate between typical and atypical sites of PCa metastases.
This study had several limitations. First, false-positive findings, in particular on a target lesion level, cannot be ruled out, as histopathologic assessment of the target lesions (many of which are small and not targetable on conventional imaging) would not be feasible. Second, the readers were masked to clinical status and potential corroborative imaging, potentially lowering interobserver agreement; however, the cases in this study were randomly selected and the readers masked to ancillary information in order to create a worst-case-scenario reflection of a busy real-world clinical practice to best test the applicability of PSMA-RADS. Although, in many situations, clinical information would be available to readers, we wished to ascertain the robustness of PSMA-RADS as an imaging-finding–driven construct. Nonetheless, future studies must clarify if providing clinical information has an important impact on the agreement rate of multiple observers and should also include stratification by serum prostate-specific antigen levels. Given the small number of identical bone lesions, ICC could not be provided for bone metastases. However, the readers in this study may have identified different target lesions in some patients with extensive skeletal involvement. Lastly, a larger trial including more scans and readers could further corroborate our preliminary findings. Nonetheless, the agreement rate of the overall PSMA-RADS score was excellent among all observers, and this initial result is promising.
CONCLUSION
In the present prospective study investigating interobserver agreement for the novel structured reporting system PSMA-RADS 1.0, a high concordance rate, even among readers with different experience, was observed. Thus, PSMA-RADS may be a useful framework for interpreting PSMA-targeted imaging studies, which in turn paves the way for implementing PSMA-RADS in the collection of data for larger prospective trials.
DISCLOSURE
Funding was provided by the Prostate Cancer Foundation Young Investigator Award; National Institutes of Health grants CA134675, CA183031, CA184228, and EB024495; and the European Union’s Horizon 2020 research and innovation program under Marie Sklodowska-Curie grant agreement 701983. Martin G. Pomper is a coinventor on a patent covering 18F-DCFPyL and is entitled to a portion of any licensing fees and royalties generated by this technology. This arrangement has been reviewed and approved by the Johns Hopkins University in accordance with its conflict-of-interest policies. He has also received research funding from Progenics Phamaceuticals, the licensee of 18F-DCFPyL. Michael A. Gorin has served as a consultant to, and has received research funding from, Progenics Phamaceuticals. Kenneth J. Pienta has received research funding from Progenics Phamaceuticals. Steven P. Rowe has received research funding from Progenics Phamaceuticals. No other potential conflict of interest relevant to this article was reported.
Footnotes
↵* Contributed equally to this work.
Published online Sep. 6, 2018.
- © 2018 by the Society of Nuclear Medicine and Molecular Imaging.
REFERENCES
- Received for publication July 13, 2018.
- Accepted for publication August 30, 2018.