Introduction

[18F] Fluorodeoxyglucose (FDG) positron emission tomography (PET) for tumor metabolic activity has been widely used in patients with non-small cell lung cancer (NSCLC) in diagnosis, staging, restaging, treatment response assessment, and radiation therapy planning. FDG-PET plays an important role in target delineation in radiation treatment planning for NSCLC [16]. Use of FDG-PET improves the accuracy of target definition [3, 7]. For primary tumors, FDG-PET helps differentiating tumor from collapsed lung, adjacent normal tissue such as large vessels, and defining disease extent in chest wall. PET scans reduce inter-observer variability compared with computed tomography (CT) alone. Integrated PET-CT scans further improve delineation consistency [8, 9]. While most studies have focused on pre-RT imaging for RT planning, or post-treatment PET for treatment response assessment, it is largely unknown whether changes occur during radiotherapy, which, if they are possible to assess, may provide an opportunity to redirect the remaining treatment.

We have previously demonstrated in a small pilot study that tumors reduce in activity during-RT [10], and during-RT, metabolic tumor volume (PET-MTV) can be used to adapt radiation treatment to provide radiation dose escalation (30–102 Gy; mean, 58 Gy) to more active malignancies or to reduce normal tissue complication probability by 0.4–3 % (mean, 2 %) on dry run dosimetry studies [11]. However, it is challenging to define MTV consistently because tumor margins are indistinct, due to heterogeneous [18] FDG uptake distribution and limited spatial resolution. The best target delineation criteria have not yet been established [12]. Currently, PET scanning is often used only to define the location of tumor, and if MTV is defined, methods used for definition vary among investigators in the literatures. In general, there are two basic strategies: (1) manual delineation based on visual inspection, depending on human skill and judgment, (2) using automated or semi-automated computer algorithms to identify the tumor boundary, which may be based on a fixed standard uptake value (SUV), a threshold of tumor maximum, or a fixed tumor to background ratio [13]. For tumor volume-based adaptive RT, one should also note that gross tumor volume on CT (CT-GTV) also reduces during the course of RT [14, 15]. It is unknown whether there is any difference between changes of PET metabolic tumor volumes (PET-MTV) and CT-GTV.

We hypothesized that PET-MTV can be delineated relatively objectively by a method combining strengths of above two strategies and that there is greater reduction of MTV than GTV during-RT. We tested hypotheses through the following ways: (1) study the reproducibility of the proposed method, (2) define PET-MTV and CT-GTV pre- and during-RT, and (3) study the changes and correlations between MTV and GTV during-RT. Additionally, with wide availability of stereotactic body radiation (SBRT), we investigated the changes of tumor volume changes after a few fractions of hypofractionated SBRT on PET and CT and compared differences in volumetric changes between 3DCRT and SBRT.

Methods

Study population

Eligible subjects included those with stage I to III NSCLC enrolled in IRB-approved prospective lung treatment and imaging protocols. All patients received a definitive course of conformal RT with or without chemotherapy and had a PET-CT before and during the course of treatment. Patients with stage I or II disease underwent daily fractionated (2.0 to 3.4 Gy fraction size) radiotherapy or hypofractionated SBRT (10–20 Gy fraction size); patients with stage III disease were treated with concurrent and adjuvant carboplatin/paclitaxel under a prospective clinical trial, in which patients may receive higher doses than in common practice. The dose of RT for the treatment protocol patients was based on an estimated normal lung complication probability of 15–17 %. Patients with prior thoracic RT were excluded from the study.

Study design

The FDG-PET/CT scans were acquired within 2 weeks before RT (pre-RT) and during the course of radiation therapy (during-RT) after the delivery of approximately 45 in 2 Gy equivalent of 3DCRT, as described previously [10], or 2/3 prescription SBRT. The reason for selecting this time point for during-RT PET scan is to make future adaptive therapy possible if these volumes are meaningful. The FDG-PET/CT scan was performed in a standard fashion on a flat table top. The PET images were obtained beginning approximately 60 min after administration of 8 to 10 mCi of 18FDG. The CT images (5-mm slices) for the PET/CT study were acquired during quiet breathing. Contrast-enhanced CT scans were also acquired in standard treatment position, at the end of inhale, exhale, and free-breathing states.

Tumor volume delineation: general principles

PET-CT images from the diagnostic radiology department were transferred to the functional image analysis tool (FIAT) and the UM-Plan system (in-house planning systems). Imaging data sets were co-registered according to anatomic match. Lymph nodes were contoured separately if they were not contiguous with the primary tumor. All volumes were delineated by one physician (PM), and 20 % of them were randomly checked by a senior physician (FMK). The reproducibility of the system was completed by comparing volumes by the same physician using the same methodology within two systems (FIAT and UM PLAN) in the first ten consecutive cases, while the reproducibility of methodology was assessed by comparing volumes of these same patients between two physicians (PM and SY).

PET metabolic tumor volume delineation

There are multiple ways to define PET-MTVs. There are strengths and weaknesses of each methodology; tumor background method is considered to be one of the more reproducible methods. We elected to use an auto-segmentation method based on a fixed source/background ratio, combined with CT anatomy-based manual editing to delineate PET-MTV as illustrated in Fig. 1. As the background blood pool is most commonly used as the reference for lung cancer diagnosis [16], we elected to use FDG uptake in the aorta to represent the normal activity of mediastinum background. To determine an optimal tumor/aorta ratio (TAR) value, we first completed a pilot study to measure the mean activity of 1 cm3 within the aortic arches on a pilot of 10 patients. Normalizing the mean aortic arch value to 1.0, the upper limits of 95 % and 99 % confidence intervals for the aortic arch were 1.2 and 1.5, respectively. We then compared the PET-MTVs of various TARs ranging from 1.2 to 2.0 and found that PET-MTVs from TARs of 1.5 were most effective and reproducible as it associated with the least amount of manual editing for tumors adjacent to mediastinum or chest wall. The auto-segmented MTVs were inspected visually at every slice through the co-registered CT-PET images; areas of FDG uptake from normal structures such as artery, bone marrow, heart, and esophagus were manually edited out from the PET-MTVs (Fig. 2). The central necrosis holes of tumors were not included in the PET-MTVs because we were most interested in metabolic uptake volume which needs to be metabolically. The schema of MTV definition is shown in Fig. 1.

Fig. 1
figure 1

Tumor delineation on PET/CT. TV = tumor volume, GTV = gross tumor volume, MTV = metabolic tumor volume

CT gross tumor volume delineation

The CT images without contrast from same PET-CT data sets were used for delineation. To improve the objectivity of the CT volume delineation, CT gross tumor volumes (CT-GTVs) were delineated using auto-segmentation (arbitrarily CT number = 500). We then edited the CT-GTVs by using anatomic guidance using mediastinum and lung windows, as appropriate. The spiculated branches of tumors were included, and central necrosis regions were filled (Fig. 3). Regions of suspected disease, such as hazy areas or controversial atelectasis areas, were also included in CT-GTVs.

Fig. 2
figure 2

TAR selection for PET-MTV delineation. TAR = tumor aorta ratio, MTV = metabolic tumor volume. Example image shows PET-MTVs autosegmented by TAR1.2 (light blue), 1.5 (dark blue), 1.7 (green), and 2.0 (red) on PET (a) and CT (b). TAR 1.5 was chosen as it appeared most appropriately and associated with the least amount of edit. PET-MTV auto-delineation by TAR 1.5 before edit (dark blue) and after edit (pink) by exclusion pulmonary aorta, esophagus, and bone marrow on PET image (c) and CT (d). GTV = gross tumor volume, MTV = metabolic tumor volume

Fig. 3
figure 3

Example tumor volumes. This figure shows a primary tumor with a central necrosis for MTV-PET (a) and CT-GTV (b). The MTV excludes the central necrosis (a) while the GTV includes the necrosis (b). GTV were autotracked on an axial CT image under a soft tissue window (c) and lung window (d). GTV = gross tumor volume, MTV = metabolic tumor volume

Study objectives and data analysis

The primary endpoint of this study was MTVs on PET to GTVs on CT. Reproducibility of the methodology on the primary endpoint assessment, i.e., target delineation, is essential and was tested by linear correlation. The primary objective of this study was to compare the changes in PET-MTVs and CT-GTVs between pre- and during-treatment, 3DCRT and SBRT. SPSS 13.0 software was used to test statistical significance. Intraclass coefficiency [17] was used to test the correlation between tumor volumes of two physicians from the same system and of one physician from two systems. The correlation between PET-MTVs and CT-GTVs was tested using linear regression analysis; the change of each individual tumor during-RT was compared with that of pre-RT by two-tailed paired t test. P values equal to or less than 0.05 were considered to be statistically significant. Unless otherwise specified, the data are presented as mean (95 % CI).

Results

Fifty patients were enrolled in this study. There were a total of 88 lesions on CT and 86 on PET. The patient characteristics are shown in Table 1. Thirty-three patients (66 %) received concurrent chemo-radiotherapy, while the remaining patients received definitive radiation alone. Five patients (five lesions) with stage I disease were treated with SBRT. The median interval time between pre-RT and during-RT scan was 38 (range, 10–60) days, 12 (range, 10–31) days, and 41 (range, 26–60) days in all patients, SBRT patients, and 3DCRT patients, respectively.

Table 1 Patient characteristics

Reproducibility of tumor definition methodology

Reproducibility was accomplished in the first ten consecutive patients in the system. PET-MTVs were delineated by one physician by using two systems (the FIAT and the UM systems) and by two physicians within the same system (FIAT) in the same ten patients for both pre- and during-RT images. Five patients had central lesions, and the remaining five had peripheral lesions. The intraclass correlation coefficiency (ICC) of PET-MTVs and CT-GTVs between two systems (one physician) was 0.98 (95%CI, 0.96–0.99) and 0.98 (95%CI, 0.96–0.99), respectively. The ICC was 0.99 (95%CI, 0.99–0.99) and 0.98 (95%CI, 0.97–0.99) between two physicians of the same system, for PET-MTVs and CT-GTVs, respectively (Fig. 4). PET-MTVs varied slightly more between the two systems than they did between two physicians (Fig. 4).

Fig. 4
figure 4

Reproducibility of tumor delineation methodology a MTV in the UM planning system (red line); b MTV in the FIAT image analysis system (red body) for the same patient; c MTVs drawn by two physicians in the same system (red body and black line) in a different patient with low central tumor activity; d GTVs drawn by two physicians (red and blue lines); e correlation of MTV and GTV between the two systems by a same physician; f correlation of MTV and GTV between the two physicians. GTV = gross tumor volume, MTV = metabolic tumor volume. GTV = gross tumor volume, MTV = metabolic tumor volume, ICC = correlation coefficiency

The change of tumor volumes on PET and CT during-RT

The mean ratios of MTV/GTV were 0.70 (−0.07 ∼ 1.47) and 0.33(−0.30 ∼ 0.95) for pre-RT and during-RT, respectively. Table 2 shows tumor volumes (cubic centimeters and percentage) on PET/CT images obtained pre-RT and during-RT as well as differences CT-PET volumes between the studies. The mean CT-GTVs were 84.1 cc (54.2–114.0 cc) and 50.1 cc (34.2–66.0 cc), while the mean PET-MTVs were 43.4 cc (28.2–58.5 cc) and 17.9 cc (10.0–25.7 cc) on pre-RT and during-RT scan, respectively. The tumor volume reduced significantly during-RT on both PET and CT images. The mean reductions of PET-MTVs and CT-GTVs were 32.2 cc (20.8–43.7 cc) and 40.7 cc (18.8–62.7 cc) (paired t test, p < 0.001), respectively. PET-MTVs had a significantly greater proportional reduction (mean, 70 %; 95 % CI 62–77 %) than CT-GTVs (mean, 41 %; 95% CI 33–49 %, p < 0.001) (Fig. 5). Interestingly, 3/85 lesions had their PET-MTV enlarged during RT. Two of them were lower lung lesions treated with SBRT (from 28.4 cc to 35.7 cc in one, 3.3 cc to 5.3 cc in another). The other lesion was a subcarinal node after 3DCRT, from 2.9 cc from pre-RT to 4.10 cc during-RT.

Table 2 MTV and GTV pre-RT and during-RT in all patients
Fig. 5
figure 5

Changes of PET-MTV and CT-GTV on PET/CT imaging during-RT. GTV = gross tumor volume, MTV = metabolic tumor volume. a Shows the mean MTV and GTV pre- and during-RT; b shows the absolute difference of MTV and GTV between pre- and during-RT; c shows percentage changes of each individual PET-MTV and CT-GTV. GTV = gross tumor volume, MTV = metabolic tumor volume

Factors associated with tumor volume reduction during-RT

There were remarkable individual heterogeneities in the magnitude of changes in tumor volumes on both CT and PET (Fig. 5). Compared with that of primary tumors (Fig. 6), lymph nodes appeared to have a significantly greater percentage reduction in both PET-MTVs and CT-GTVs though, overall, there was a significant correlation between changes in PET-MTVs and changes in CT-GTVs with a Pearson’s correlation coefficient of 0.55 (p < 0.001 Fig. 6). The mean PET-MTV reduction was 61.4 % (52.2–70.5 %) and 81.4 (70.9–91.9 %) for primary tumors and lymph nodes (p = 0.007), respectively. The mean CT-GTV reduction was 31.3 % (20.1–42.6 %) and 54.0 % (43.3–64.6 %) for primary tumors and lymph nodes (p = 0.007), respectively. Other factors were also evaluated for their association with the changes of PET-MTVs during-RT (Table 3). There was no significant correlation between changes (percent) of PET-MTVs and estimated diameter of pre-RT PET-MTVs and CT-GTVs. Estimated diameter of volume was calculated by 4/3¶R 3 equation (R = diameter, ¶ = 22/7). The percentage change in PET-MTVs was also significantly correlated with type of RT (conventional fractionation versus SBRT, r 2 = 0.40, p < 0.001), concurrent chemotherapy (r 2 = 0.24, p = 0.029), maximum FDG activity of tumor at baseline (r 2 = 0.24, p = 0.002), maximum normalized tumor activity (NTA = tumor activity divided by the mean aorta activity) (r 2 = 0.28, p = 0.009) and mean NTA (r 2 = 0.25, p = 0.02) (Table 3). Patients who received conventional treatment had a significantly greater reduction (mean 72.9 %, 95 % CI of mean 66.4–79.4 %) in PET-MTVs than patients who were treated with SBRT (mean, 15.4 %, 95 % CI of mean −31.6–62.5 %) (p < 0.001) (Table 4). There was a significantly greater reduction in PET-MTVs in patients who had mean NTA ≤ 2.5 (mean 79.2 %, 95 % CI of mean 69.2–89.2 %) than patients who had mean NTA > 2.5 (mean 61.5 %, 95 % CI of mean 51.8–71.2 %) (p = 0.015). Multivariate analysis showed that low maximum NTA (p = 0.026) and type of treatment (conventional RT versus SBRT) (p < 0.001) were significantly correlated with greater changes (percent) in PET-MTVs.

Fig. 6
figure 6

Factors associated with PET-MTV changes during-RT. PET-MTV reduction is significantly correlated with the reduction in CT-GTV (a). Lymph nodes have a significantly greater reduction in both PET-MTVs and CT-GTVs during-RT (b). Examples scans include: Metabolic complete response (CR) in lymph node MTV (c and d). Metabolic complete response in primary tumor after concurrent chemo-RT (e and f). Metabolic stable response in SBRT case (g and h). GTV = gross tumor volume, MTV = metabolic tumor volume, LN = lymph node

Table 3 Factors associated with PET-MTVs reduction during-RT
Table 4 Changes of MTV and GTV pre-RT and during-RT after SBRT and 3DCRT

Discussion

In this prospective study, we demonstrated a reproducible method of tumor target delineation by combining auto-threshold and manual editing on PET. Using this methodology, we have further demonstrated that MTVs on PET and GTVs on CT reduce significantly during the course of RT. While there was a remarkable heterogeneity in magnitude of volume reduction, there was a significant correlation between reductions of PET-MTV and CT-GTV. Patients with less active tumors and treated with chemotherapy were associated with a greater volume reduction during-RT, and 3DCRT had a greater reduction during-RT than SBRT.

PET-MTV delineation is challenging, and there is no universally acceptable method. Some authors use a percentage of the maximum or peak activity, whereas others recommend an absolute SUV value (e.g., an SUV threshold of 2.5 [18] to represent the edge of the lesion). It is now known that a fixed threshold method of using 40–50 % of maximum activity may lead to significant errors in the volume estimation [19]. A volumetric comparison of four methods (visual, 40 % maximum activity, SUV2.5, and source/background ratio (S/B)) in primary NSCLC showed substantially different volumes from different techniques and application of S/B ratios generated the most reasonable volumes, comparable to breath-expanded CT volumes [1]. Van Baardwijk [20] and colleagues attempted to compare S/B-based PET-CT auto-delineation. They reported a good correlation with pathology (correlation coefficient = 0.90), decreased the delineated volumes of the GTVs, and reduced the interobserver variability. Auto-contoured GTVs were smaller than manually contoured ones. In another study, the same group found that tumors auto-contoured at 42 % of maximum level overestimated the PET tumor volume in two of five cases, while CT-GTVs were larger than pathologic volume in four of five cases [21]. A pilot study comparing tumor volumes as determined by pathologic examination and FDG-PET/CT images of NSCLC showed that the optimal threshold and absolute SUV were 31 % ± 11 % and 3.0 ± 1.6, respectively [22]. Furthermore, several other studies showed that with PET-defined tumor volumes varied significantly with the methodology, resulting in considerable inter-observer and intra-observer variations [2325]. Fused PET and CT altered volume in about 50 % patients compared with CT volume alone, either by visual evaluation or using some mathematical algorithm, such as a fixed standard uptake value or threshold [24, 26, 27]. The relationship between PET-based (15 % or 40 % of the maximal iso-uptake value threshold methods) and CT-based volumes (visual method) generally suffers from poor correlation between the two image data sets, expressed in terms of a large statistical variation in gross tumor volume ratios, irrespective of the threshold method used [28]. With pathologic examination, the contour of the tumor volume of NSCLC patients with co-registered FDG-PET/CT resulted in >50 % alterations compared with CT targeting [29]. Recently, Bayne et al. demonstrated that PET-MTV autocontours generated using SUV 2.5, 3.5, and 40 % SUVmax differed widely in each of six cases and recommended a visual contouring protocol for contouring MTV in NSCLC [30] whereas comparative assessment in an anthropomorphic phantom demonstrated that method by background activity and a model-based method were more accurate and reproducible than SUVmax [31]. There has not been a study using the same method to test the reproducibility of different software systems or between different physicians. We elected the tumor/background ratio methodology, since it could be the most reasonable automated method [1, 20]. Although some physicians believe that normal liver standardized uptake value normalized for lean body mass (SUL) is slightly more stable than determinations of blood-pool SUL [32], FDG uptake at aorta was selected to be representative of background in this study because increased uptake area greater than mediastinal blood pool was often defined as abnormal findings [16] or residual tumor [33]. We found that the mean uptake at the aorta was reproducible even if the center of ROI in each slide was slightly shifted to within the wall of an aorta (in the pilot study). Both central (50 %) and peripheral (50 %) lesions were studied. There has not been a study using the same method to test the reproducibility of different software systems and between different physicians. Our methodology of delineating tumor volumes on PET and CT generated a very high ICC value, which suggests that this method of combining complex TAR auto-contouring method and manual editing may be superior to methods using a simple cutoff (SUV or percent threshold).

The current study is among the first to extensively examine PET-MTVs in comparison to CT-GTVs during-RT. This study demonstrates a significant tumor volume reduction during-RT on both PET and CT. On average, PET-MTVs changed significantly more than CT-GTVs (p < 0.001). The mechanism behind such differences is unclear, while underlying biology of each tumor could be part of the etiology. That MTVs from functional imaging (PET scans) changed more than GTVs on CT scan may suggest that tumor functional activity changes earlier or faster than morphologic appearance on CT. Indeed, the vast majority of tumors had greater reduction on PET-MTVs than CT-GTVduring-RT, despite the fact that PET-MTVs may have also included motion. This is important as it further suggests the value of using PET-MTV during-RT for dose escalation in supplement to CT-GTV-based adaptive RT or using dose painting on biologic planning target volume [34, 35]. RTOG1106 has been activated to adapt radiation therapy based on during-RT PET-MTV.

Patients treated with SBRT also had reduced PET-MTV and CT-GTV during-RT. This is remarkable as the PET scan during-RT for this group of patients could be performed as early as 3–5 days from SBRT start. It is also interesting to note that SBRT had significantly less reduction in PET-MTV than those received conventional fractionated RT. The mechanism of this is unclear. It could be a result of not enough time to allow tumor response as the during-SBRT PET was normally performed at 1–2 weeks (median 12 days) from SBRT start. Or it could be due to a slower effect of SBRT on PET-CT in NSCLC. Respiratory motion and size may have also impacted tumor quantification and delineation in PET/CT imaging [36], which may partially explain that two small lesions of SBRT cases in lower lung lobe increased in tumor volume. Vahdat et al. [37] studied FDG-PET/CT serial tumor response in 20 stage IA NSCLC patients and demonstrated that tumor SUVmax values return to background levels at 18–24 months following treatment. CT tumor shrinkage also continued for 2–15 months after SBRT [38]. No SBRT study has previously reported on MTV reduction during-RT. On the other hand, modest reduction of MTV during-SBRT versus excellent tumor control after SBRT may suggest that PET scan during-SBRT may not be a good predictor for long-term outcome. On other side, it may deserve further study whether such a volume difference can convert an otherwise unsafe plan to a safe one for normal tissue tolerance. A study with a larger number of SBRT cases is needed.

It is worth mentioning that lymph nodes had greater volumetric changes on both PET and CT as compared with primary tumors, after the same dose of radiation. Initially, we thought this was a result of volume effect, as the primary tumors are larger; thus the same amount of absolute volume reduction would cause less change in percentage. However, we failed to detect a correlation between tumor volume reduction during-RT and tumor volume at baseline. Further study is needed to validate this finding and investigate the underlying mechanism of this phenomenon.

There are also remarkable individual differences in changes of tumor volume during-RT. Those receiving concurrent chemotherapy, lower maximum tumor FDG activity, lower maximum NTA, and lower mean NTA were significantly correlated with higher percentage of PET tumor volume changing. It is possible that heterogeneity in the nature of tumors responding to treatment is due to biology or genetic heterogeneity and may be further associated with the prognosis. A Japanese study demonstrated that SUVs on both early and delayed scans (early scanning at 1 h and delayed scanning at 2 h) after treatment were significantly lower in pathologic responders than in non-responders (p = 0.0005 and p = 0.0015, respectively) [39]. Pottgen et al. also found a significantly greater percentage reduction in the SUVmax in patients showing an excellent pathologic response in the primary tumor than in those with greater than 10 % residual viable cells (p < 0.005) after receiving neoadjuvant chemotherapy or chemo-radiotherapy [40].

There were limitations to this study. The CT images from PET/CT fusion images were performed without intravenous (IV) contrast media, which may decrease the accuracy of the CT-GTV delineations. In practice, CT-simulation with IV contrast and using RT treatment position co-registered with planning CTs [41] could improve contouring.

Conclusion

In summary, we have demonstrated a reproducible method to delineate tumor on PET/CT images. From a study of 50 patients, we demonstrated that metabolic tumor volumes on PET reduced more than GTVs on CT during-RT, suggesting that functional volumes reduce more rapidly than physiological volume. Using PET-volumes during-RT to escalate the radiation dose or calculate for dose painting radiotherapy in patients with non-small lung cancer could be of value in the future. Prospective clinical trials such as RTOG1106 and UMCC 2007123 are ongoing to individualize adaptive RT dose escalation in each patient based on these methods and results.