Introduction

Prognosis and survival of patients with Hodgkin’s disease (HD) and non-Hodgkin’s lymphoma (NHL) depend on both histological grade and clinical stage at the moment of diagnosis and furthermore on the response to treatment [1]. The Ann Arbor classification with Cotswolds revision is the guideline for radiological staging of HD and most types of NHL [2, 3]. In the past, the Ann Arbor staging system was based on physical examination and bone marrow evaluation, later to include the computed tomography (CT) scan. With the implementation of the positron emission tomography (PET) scan and later the integrated PET/CT scan, several studies concluded that a baseline 2-deoxy-2-[18F]fluoro-D-glucose (FDG) PET scan provides significant more information than conventional CT, with subsequent therapeutic consequences [46]. Whereas the application of PET in therapy evaluation in malignant lymphoma is rapidly emerging, the added value of baseline PET scans in this context is less clear. In guidelines, baseline PET is recommended but not mandatory except in lymphoma types with variable FDG uptake. Baseline scans may facilitate interpretation of PET at therapy evaluation, increase reader confidence, and perhaps avoid misinterpretation. Guidelines on PET reading after therapy refer to “previously involved lesions” to avoid false positivity [79].

The aim of the present study was to test the hypothesis that adding baseline PET information decreases false positive readings with therapy evaluation PET and improves observer agreement.

Materials and Methods

Subjects

We retrospectively studied baseline and posttreatment PET scans of 44 patients with newly diagnosed malignant lymphoma undergoing first-line therapy between January 2005 and July 2007. In all patients, malignant lymphoma had been histopathologically proven, and initial staging was done according to the Ann Arbor classification (with supplementary baseline PET scan). The posttreatment PET scan was performed at least 4–6 weeks after completion of therapy, and patients were at least 10 days off granulocyte colony-stimulating factor (G-CSF) therapy [7, 10]. The study was approved by the ethical board, and informed consent was waived.

PET Imaging

PET was performed with a mobile scanner (ECAT-ACCEL, Siemens/CTI, Knoxville, TN, USA). Patients fasted for 6 h before the scan. Prior to injection, blood glucose levels were within the normal range (<11 mmol/L). One hour after injection of FDG (5 MBq/kg body weight), patients were scanned from mid femur to the base of the skull. Acquisition time was 5 min per bed position, with a transmission time of 60 s each. Patients were scanned in seven bed positions. PET images were reconstructed with and without attenuation correction using a weighted iterative ordered subsets expectation maximization algorithm (OSEM, two iterations, eight subsets). In the final step, a three-dimensional isotropic gauss filter was applied to a final image resolution with 8–9-mm full width at half maximum.

PET Analysis

Two nuclear medicine physicians with >5-year clinical PET experience (MJH, JMHK) independently evaluated the PET scans. First, they interpreted all 44 posttreatment PET scans, 3 weeks later followed by another session in which the same posttreatment PET scans were presented together with the baseline PET scans (paired reading). The observers were aware that the patients had malignant lymphoma but unaware of the type and grade of lymphoma and results of baseline or posttreatment conventional staging. For each scan, they analyzed 22 regions using a standardized form, classifying FDG uptake as positive, negative, or equivocal. Criteria for PET positivity were as follows: A region with FDG uptake above background in a location incompatible with normal anatomy or physiology was considered positive [7].

After evaluation of the forms of the posttreatment PET scans in both sessions, the observers were requested to assign a consensus score in case of discrepancies. Consensus was reached for the regions of the posttreatment PET without presentation of the baseline PET, followed by the posttreatment PET in combination with the baseline PET.

Besides this lesion-based analysis, the observers provided per patient evaluations. In each patient, all scores of the 22 different regions were taken together resulting in a negative posttreatment PET scan (all regions negative, indicating complete metabolic remission), or a positive posttreatment PET scan (≥1 positive region). Patients with only equivocal scores, besides negative scores, were classified as unclear.

Because both observers evaluated the PET scans without clinical information, the positive or equivocal classified regions of the posttreatment PET scans were compared with the initially affected regions reported in the clinical database (baseline CT, PET scan, bone marrow biopsy).

Statistical Analysis

We considered the consensus scores of the paired reading as the gold standard for presence or absence of viable tumor after therapy. To measure the impact of adding a baseline PET scan as a function of sensitivity of PET readings, the posttreatment PET results (isolated and paired reading) were dichotomized by assigning the unclear classification to either the PET-negative (complete responders) or to the PET-positive classification. Sensitivity, specificity, positive predictive value, and negative predictive value were determined for either strategy. To analyze interobserver variability and agreement of isolated and combined baseline and posttreatment PET readings, we used linear-weighted kappa (κ w; SAS 9.1; SAS Institute, Cary, NC), considering kappa <0.20 as poor observer agreement, 021–0.40 as fair, 0.41–0.60 as moderate, 0.61–0.80 as good, and 0.81–1.00 as very good [11]. Ninety-five percent confidence intervals of proportions were calculated with Confidence Interval Analysis 2.1.2.

Results

Subjects

We included 44 patients with a mean age of 56 years (SD 14), diagnosed with NHL in 35 (80%), and HD in nine (20%) patients, further details are provided in Table 1. All patients had FDG avid lymphoma at baseline and had been clinically staged (including PET) according to the Ann Arbor classification as stage 1 (n = 3, 7%), stage 2 (n = 15, 34%), stage 3 (n = 16, 36%), and stage 4 (n = 10, 23%).

Table 1 Patient characteristics of NHL and HD patients

Per Patient Analysis

For each patient, the initial stage of disease and type of lymphoma as reported in the clinical digital database, the scores of the “isolated” posttreatment PET scans, and the paired posttreatment PET readings for either observer or their consensus scores are provided in Appendix (Table 4).

In the consensus reading of the “isolated” posttreatment PET scans, 24 PET scans were classified as negative, six as unclear, and 14 as positive. In the consensus reading of the paired reading, 25 posttherapy PET scans were classified as negative, three as unclear, and 16 as positive (Table 2). The consensus scores of the isolated posttreatment PET scans vs paired reference standard were concordant in 29/44 cases (66%, 95% CI 54–77%) resulting in a good correlation (κ w = 0.73, 95% CI 0.64–0.83, p < 0.0001). The proportion of “unclear” readings was 6.8% in the paired reading vs 13.6% in the isolated reading. Adding the baseline PET scan altered the PET classification in 15 cases but in either direction of suspicion: in eight patients the level of suspicion increased, in seven decreasing (Table 2a). In six cases (13.6%) the conclusion was opposite, with three false positive and three false negative (Fig. 1) isolated PET readings. Dichotomization did not clearly affect the results with accuracies of 75% (95% CI 63–84%) and 77% (95% CI 65–86%) for sensitive and conservative reading strategies, respectively (kappa’s: 0.49, 95% CI 0.28–0.71 vs 0.50, 95% CI 0.26–0.73, respectively). False positivity rates of the isolated posttreatment PET interpretations prevailed in 4/14 (29%, 95% CI 14–51%) and 6/20 (30%, 95% CI 16–48%) for conservative and sensitive readings, respectively (Table 2b).

Table 2 Agreement of isolated posttreatment and reference PET classification (a), as a function of PET positivity criteria (b), and observer agreement (c)
Fig. 1
figure 1figure 1

Posttreatment and pretreatment PET scan of a 51-year-old male with follicular non-Hodgkin lymphoma. Both interpreters scored complete metabolic response on the isolated posttreatment PET scan. a Posttreatment PET scan: coronal images with an b axial and c sagittal image at the level of the described lesion. d Pretreatment PET scan: coronal images with an e axial and f sagittal image at the level of the described lesion. The pretreatment PET scan displays multiple regions with pathologic FDG uptake: multiple regions above (neck and mediastinum) and below (intra abdominal and inguinal) diaphragm and diffuse bone marrow involvement. With this knowledge, the initial interpretation of the posttreatment PET scan was changed to a positive posttreatment PET scan: In the right inguinal region, there is increased FDG uptake. Without the pretreatment PET scan, this was considered physiologic.

The interobserver agreement of paired baseline and posttreatment evaluations was similar compared to the isolated PET evaluations (linear weighted kappa’s: 0.64, 95% CI 0.46–0.82 vs 0.67, 95% CI 0.51–0.83, respectively).

Per Lesion Analysis

As expected, the large majority of regions were FDG negative: in the consensus setting, 95% of both isolated posttreatment scans and paired readings (Table 3a); of the nonnegative readings, 22% and 17% were classified as “unclear,” respectively (NS). Twenty-four of the isolated posttherapy PET classifications altered after addition of the baseline scan (similar to the patient-based analysis), and the change was in either direction of suspicion (12 towards lower level). Dichotomization yielded kappa’s of 0.75 (95% CI 0.66–0.85) and 0.71 (95% CI 0.63–0.81) for conservative and sensitive reading strategies, respectively. The false positivity rates of isolated PET readings were 17% (95 %CI 9–30%) and 24% (95% CI 16–36%) with conservative and sensitive readings, respectively. The interobserver agreement for the per lesion evaluation of the isolated posttreatment PET scans was good (κ w = 0.71, 95% CI 0.61–0.81, p < 0.0001). Adding the baseline PET information did not further improve the observer agreement (κ w = 0.66, 95% CI 0.52–0.75, p < 0.0001). Discrepant lesional scores were randomly divided over the 22 different regions, the number of different scores ranging from 1 to 7 for each region in 20 of 22 regions.

Table 3 Per region analysis of the agreement of isolated posttreatment and reference PET classifications (a), as a function of PET positivity criteria (b), and observer agreement (c)

Discussion

The addition of baseline to posttreatment PET evaluation affected the classification of metabolic response in 34% of malignant lymphoma patients treated with first-line chemotherapy. In one out of seven patients, addition of the baseline PET lead to opposite conclusions (95% CI 4–14). False positivity was reduced by adding the baseline scan information, but the effect on false negativity was similar. In addition, the amount of unclear classifications reduced (50%) after paired reading. Observer agreement did not improve upon adding the baseline PET data.

In malignant lymphoma, baseline PET information is essential to be able to assess response in lymphoma types with variable FDG avidity. In routinely FDG-avid lymphoma, baseline scans are recommended but not mandatory [8]. The intrinsic paradox flows from a lack of evidence, primarily at the level of effectiveness. Obvious disadvantages of routine baseline scanning are costs and unnecessary radiation (typically about 3.5 mSv) added by FDG to the standard CT work-up [12, 13]. In case of adding low-dose CT to PET, augmented with another 2–3 mSv [14]. Even though diagnostic CT (typically extending from neck to perineum in these patients) yields most of the radiation dose, adding PET should be justified by an impact on disease classification.

Our study has some limitations. First, PET scans in this study were performed with a stand-alone PET scanner rather than with PET-CT. Second, the observers were blinded for type and grade of lymphoma, pretreatment, or posttreatment situation and CT findings. An argument favoring interpreting diagnostic tests with clinical information is that the accuracy of the read may be improved by the additional information. It could be argued that the added value of the pretreatment PET scan may have been less, when this information had been provided. An argument favoring interpreting diagnostic tests without clinical information is that it may bias the reading and that clinical information should be incorporated into decision-making only after an unbiased read [15]. In three patients, one region which was initially not affected according to the clinical digital database was classified as equivocal in the first session. After paired-reading, these scores were changed into negative. In this situation the pretreatment scan provided the same information as the knowledge of the initial clinical information and instead of 34% (15/44), 27% (12/44) of the overall results differed.

In conclusion, without any other clinical information, a pretreatment PET facilitates and changes the interpretation of a posttreatment PET in a third of the patients, resulting in both upgrading and downgrading of the posttreatment situation of a malignant lymphoma patient. Adding a baseline PET scan did not improve observer agreement. If these results are confirmed for PET/CT systems, they favor the addition of baseline PET to the current work-up of patients with malignant lymphoma.