Introduction

In non-small cell lung cancer (NSCLC), positron emission tomography (PET) has been shown to be a valuable tool not only for detecting and staging the disease, but also for response monitoring, prediction of prognosis and estimation of target volume for radiotherapy purposes [1, 2]. To date, [18F]fluoro-2-deoxy-D-glucose (FDG) is the most widely used tracer for oncological applications. Especially for response monitoring purposes, it is likely that quantitative assessment of FDG uptake will become the standard. In general, however, uptake of this tracer is not homogeneously distributed across the tumour. Factors that may contribute to intratumoural heterogeneity in FDG uptake are necrosis [3], cellular proliferation [4], blood flow [5], microvessel density [6] and hypoxia [79]. The standardized uptake value (SUV) is most commonly used for (semi-)quantification of whole-body FDG PET studies [10]. A high maximum SUV (SUVmax) has been shown to relate to hypoxia and poor overall survival [11]. However, changes in SUV to assess response to therapy only provide a proper measure of response if there is a global change in tracer uptake, i.e. in the absence of a spatially heterogeneous response. Therefore, intratumoural FDG heterogeneity may complicate accurate response assessment with PET. As uptake of a tracer usually is not homogeneously distributed across the tumour, it is of interest to quantify heterogeneity in tumour FDG uptake before, during and after treatment.

Firstly, the distribution of FDG uptake within a tumour could provide useful information for radiation therapy treatment planning, as it would allow for specific targeting of certain areas within the tumour. Secondly, it may provide additional information when monitoring response, as it would enable identification of a mixed response within a single tumour. Finally, differences between CT anatomical volumes and PET metabolic volumes could be characterized. In current clinical practice, however, there is no simple method for quantification of intratumoural heterogeneity in FDG uptake [8].

Only a few methods have been proposed to quantify intratumoural heterogeneity in FDG uptake. O’Sullivan et al. [12, 13] proposed a method for patients with sarcoma that compares the intratumoural FDG uptake distribution with an elliptic solid object with homogeneous density. Recently, intensity-volume histograms (IVH) or cumulative SUV-volume histograms (CSH) have been proposed by El Naqa et al. [14] as a novel way to characterize heterogeneity in intratumoural tracer uptake. These histograms are similar to dose-volume histograms frequently used in radiotherapy [15]. In CSH the per cent volume of a tumour (derived from CT or from PET-based (semi-)automatic tumour delineation methods [16]) with an SUV above a certain threshold is plotted against that threshold value, which is varied from 0 to 100% of SUVmax. The area under the CSH (AUC-CSH) may be a quantitative index of tracer uptake heterogeneity and/or heterogeneous tumour response [17]. Any method to characterize heterogeneity, however, will treat both partial volume effects and noise as heterogeneity and therefore partial volume correction and image denoising must be applied prior to calculating AUC-CSH [18].

The purpose of this study was to evaluate to which extent CSH provides additional information about tumour response (and its heterogeneity) over SUVmax alone. In addition, the impact of partial volume effects and noise were evaluated by applying partial volume correction and image denoising prior to calculating AUC-CSH. To this end, lung cancer simulations were performed and the method was applied to several FDG PET studies of tumours with variable heterogeneity.

Materials and methods

Cumulative SUV-volume histograms

Four strategies for calculating CSH were investigated. As mentioned above, CSH is normally obtained by plotting the per cent volume of a tumour with an SUV above a certain threshold against that threshold, which is varied from 0 to 100% of SUVmax. The AUC of this plot (AUC-CSH) is a quantitative index of uptake heterogeneity, where lower values correspond with increased heterogeneity. However, in this case AUC-CSH is independent of SUVmax and volume, and therefore it may not be useful for response monitoring, where changes in SUVmax or metabolic volume should also be taken into account. To do so, three modifications of CSH were defined, taking into account changes in metabolic volume (CSHV), SUVmax (CSHS) or both (CSHSV). This was achieved by not plotting relative (percentage) SUVmax or metabolic volume data, but rather absolute values or values relative to the baseline SUVmax or volume. Normalized AUC-CSH values (relative to the baseline) are calculated by:

$$ normalized\;AUC - CSH = \frac{{AUC - CS{H^{{response}}}}}{{AUC - CS{H^{{baseline}}}}} $$
$$ normalized\;AUC - CS{H^V} = \frac{{AUC - CS{H^{{response}}}}}{{AUC - CS{H^{{baseline}}}}} \times \frac{{volum{e^{{response}}}}}{{volum{e^{{baseline}}}}} $$
$$ normalized\;AUC - CS{H^S} = \frac{{AUC - CS{H^{{response}}}}}{{AUC - CS{H^{{baseline}}}}} \times \frac{{SUV_{{\max }}^{{response}}}}{{SUV_{{\max }}^{{baseline}}}} $$
$$ normalized\;AUC - CS{H^{{SV}}} = \frac{{AUC - CS{H^{{response}}}}}{{AUC - CS{H^{{baseline}}}}} \times \frac{{SUV_{{\max }}^{{response}}}}{{SUV_{{\max }}^{{baseline}}}} \times \frac{{volum{e^{{response}}}}}{{volum{e^{{baseline}}}}} $$

All four strategies were evaluated with and without partial volume correction and with and without image denoising. Partial volume correction was performed using image-based Van Cittert deconvolution (CIT) [19, 20]. During each iterative deconvolution step (i.e. iteration, five in total) the update of the image was penalized using a Gibbs prior (weight of 0.25) applying a neighbourhood of one voxel in all directions [21]. The Gibbs prior calculates the summed difference between the central voxel and its direct neighbours. Inclusion of the prior reduces voxel variability in uniform areas of the image. Masks that were used to determine metabolic volumes were eroded with a single voxel in all dimensions to compensate for remaining sampling error (due to the use of voxels) at the borders of the masks. In addition, PET images were denoised using an edge-preserving bilateral filter (BF) [22].

Other parameters to characterize tracer uptake

In addition to AUC-CSH, the following parameters relating to tumour FDG uptake were also investigated: SUVmax, average SUV (SUVmean) within a volume of interest (VOI), inverse standard deviation of SUV within a volume of interest (1/SD) and inverse coefficient of variation [1/COV, calculated as (SUVmean/SD)×100%] within the metabolic (i.e. PET-based) or anatomical (i.e. CT-based) volume.

Simulations

The purpose of the simulations was to evaluate to which extent CSH provides additional information about tumour response (and its heterogeneity) over SUVmax alone. In addition, the impact of partial volume effects and noise were evaluated by applying partial volume correction and image denoising prior to calculating AUC-CSH. To this end, two simulation studies were performed. As a starting point simulation 1 used a tumour with homogeneous FDG uptake, while for simulation 2 a tumour with heterogeneous FDG uptake was used. For both studies, various tumour responses showing different homogeneous and heterogeneous FDG uptake were simulated, as shown in Tables 1 and 2 and Fig. 1. All simulation software was implemented in-house using IDL (ITT Visual Information Solutions, Boulder, CO, USA). The effects of randoms and scatter events were not simulated in this study as this would require more complex (Monte Carlo-based) simulations. However, effects of attenuation and resolution were taken into account.

Table 1 Description of the various simulated tumour responses for simulation 1
Table 2 Description of the various simulated tumour responses for simulation 2
Fig. 1
figure 1

Axial images of simulated mathematical baseline scans and various response scans, showing a tumour placed in the left lung. BL baseline, NR no response, HO homogeneous tracer uptake, HE heterogeneous tracer uptake. Note that these images represent ideal (i.e. free of noise and partial volume effect) scans. During simulations noise and partial volume effects were added. For simulation 1, HO and HE represent homogeneous and heterogeneous responses, respectively. However, for simulation 2, all HO and HE show heterogeneous responses, except for HEB that shows a homogeneous response

A 3-D mathematical thorax image and its corresponding μ-image, used for attenuation correction purposes, were derived from a dynamic FDG scan of a typical patient, as described in detail in Boellaard et al. [10]. In short, the procedure was as follows. For the simulated baseline (BL) scans of simulation 1 (BL1), a tumour with homogeneous FDG uptake was placed within the lungs (SUVmean ~0.6) of the mathematical thorax image, using a tumour to background ratio of 10. In addition, various tumours having homogeneous and heterogeneous FDG uptake were placed within the lungs of the mathematical thorax image, as shown in Table 1 and Fig. 1. For the simulated baseline scans of simulation 2 (BL2), a tumour with heterogeneous FDG uptake was placed within the lungs of the mathematical thorax image, using a tumour (outer rim) to background ratio of 10. In addition, various tumours with homogeneous and heterogeneous FDG uptake were placed within the lungs of the mathematical thorax image, as shown in Table 2 and Fig. 1. For both simulations, these tumours were also placed in the lungs of the μ-image, having a μ-value equal to that observed in soft tissue at 511 keV (0.095 cm−1). All of these images were then forward-projected. Partial volume effects were introduced by smoothing the obtained sinograms using a 5-mm full-width at half-maximum (FWHM) Gaussian kernel. Poisson noise was added to the emission sinograms obtained. These sinograms were then reconstructed using normalization and attenuation weighted ordered subsets expectation maximization (NAW-OSEM) with 4 iterations and 16 subsets, and post-smoothed using a 5-mm FWHM Gaussian kernel. The resulting noise level (COV of voxel values within a VOI ~15%) was similar to that observed in PET studies acquired on a PET/CT scanner (Gemini TF 64, Philips Healthcare, Cleveland, OH, USA) [23]. All reconstructed images consisted of 30 planes of 256×256 voxels with a voxel size of 2.56×2.56×2.56 mm3. For each tumour type, 100 noisy simulation images were generated to investigate accuracy and precision of the various parameters as described above.

Human studies

Four different human FDG scans were included (Fig. 2). Two subjects were acquired on a whole-body PET/CT scanner (Gemini TF 64, Philips Healthcare, Cleveland, OH, USA). One subject (male, 78 kg, 59 years, 188 MBq FDG injection) had a primary lung tumour with heterogeneous uptake, whilst the other (female, 70 kg, 75 years, 184 MBq FDG injection) had a primary lung tumour with relative homogeneous uptake. In addition, one subject with a primary lung tumour (female, 60 years; baseline scan: 57 kg, 163 MBq FDG injection; response scan: 54 kg, 163 MBq FDG injection) was acquired twice (one scan before and one scan after one course of chemotherapy) on a whole-body PET scanner (ECAT EXACT HR+, CTI/Siemens, Knoxville, TN, USA). Furthermore, one subject with advanced liver metastases of a gastrointestinal malignancy (male, 65 years; baseline scan: 71 kg, 566 MBq FDG injection; response scan: 73 kg, 551 MBq FDG injection) was acquired twice (one scan before and one scan after three courses of chemotherapy) on a whole-body PET/CT scanner (Biograph, CTI/Siemens, Knoxville, TN, USA). All PET and CT data were collected as part of ongoing clinical studies, which were approved by an authorized medical Ethics Review Committee, and informed consent was obtained from each patient prior to inclusion in the study.

Fig. 2
figure 2

Coronal images of three clinical examples. Top row: two PET/CT studies showing various degrees of uptake heterogeneity in large lung lesions. Middle row: PET response study of a subject with lung cancer. Bottom row: PET/CT images of a patient with metastatic liver lesions before and after treatment

Data analysis

For all simulations and human scans, mean ± SD were calculated for all parameters under investigation. In the case of the simulations, VOIs were taken from baseline (VOIBL) scans or redrawn on response (VOIR) scans. For the human scans, VOIs were drawn using a 50% threshold isocontour method on baseline PET(/CT) scans and these were then transformed manually to response scans to obtain a repositioned VOIBL. In addition, VOIs were drawn using a 50% threshold isocontour method on the response scans (VOIR). All uptake (heterogeneity) parameters were calculated for all VOIs.

Results

Simulations

Noise reduction and partial volume correction

AUC-CSH data obtained from original (i.e. noise and partial volume free), simulated PET (i.e. with noise and partial volume effects) and simulated PET with additional use of bilateral filter and Van Cittert deconvolution (PET + BF + CIT) are shown in Table 2. A selection of responses is shown in Fig. 3. Table 3 illustrates that PET + BF + CIT gave a significant improvement in AUC-CSH compared to PET, at least when AUC-CSH was normalized to the baseline AUC-CSH (Student’s t test, p < 0.05). Nevertheless, the actual improvement was rather modest and not significant for non-normalized data (p = 0.10). Figure 4 shows that CSH curves improve visually after partial volume correction and noise reduction. More importantly, after image denoising and partial volume correction, AUC-CSH varied with the degree of heterogeneity, i.e. lower AUC-CSH corresponds with a visually more heterogeneous tracer distribution as well as with the variability of voxel values within a VOI. Therefore, in the remainder of this article, only CSH data for PET + BF + CIT images will be provided.

Fig. 3
figure 3

AUC-CSH (a) and ratio (b) for various types of responses derived from simulated original (noise and partial volume free), PET and PET + BF + CIT scans for simulation 1. The ratio was obtained by dividing AUC-CSH of the response scan by that of the baseline scan

Table 3 AUC-CSH values of different tumour responses for simulated original (i.e. noise and partial volume free), PET and PET + BF + CITa images
Fig. 4
figure 4

CSHs for baseline and typical homogeneous and heterogeneous responses obtained derived from simulated original (a), PET (b) and PET + BF + CIT (c) scans for simulation 1. Note that HO1 overlaps with BL1 in (a)

Simulating various response types

Table 4 shows ratios of AUC-CSH (obtained from PET + BF + CIT images) and several other parameters (obtained from PET) for various response types. A selection of responses is shown in Figs. 5 and 6 for simulations 1 and 2, respectively. These figures also show the effects of various SUV and volume normalizations on AUC-CSH. Both Table 4 and Figs. 5 and 6 show that normalized values of AUC-CSH and 1/COV corresponded well with the level of response heterogeneity, which was not seen using SUVmax, SUVmean or 1/SD alone.

Table 4 Ratios of AUC-CSH (derived from PET + BF + CITa), SUVmax, SUVmean 1/SD and 1/COV (derived from PET) for different tumour responses. All values are normalized to the baseline scan
Fig. 5
figure 5

Ratio of various types of AUC-CSH (a) and SUVmax, SUVmean, 1/SD and 1/COV (b) for various types of responses for simulation 1. AUC-CSH and other uptake parameters were derived from simulated PET + BF + CIT and PET response images, respectively. The ratio for each parameter was obtained by dividing its value from the response scan by that from the baseline scan

Fig. 6
figure 6

Ratio of various types of AUC-CSH (a) and SUVmax, SUVmean, 1/SD and 1/COV (b) for various types of responses for simulation 2. AUC-CSH and other uptake parameters were derived from simulated PET + BF + CIT and PET response images, respectively. The ratio for each parameter was obtained by dividing its value from the response scan by that from the baseline scan

Simulation 1 (Fig. 5) shows that for homogeneous responses, normalized (ratio to the baseline) values of AUC-CSH and 1/COV ranged from 0.93 to 1.02 and from 0.72 to 1.12, respectively, while non-normalized values ranged from 0.82 to 0.96 and from 0.08 to 0.13, respectively. For heterogeneous responses, normalized values of AUC-CSH and 1/COV ranged from 0.64 to 0.94 and from 0.34 to 0.63, respectively, whilst non-normalized values ranged from 0.60 to 0.89 and from 0.04 to 0.07, respectively. AUC-CSHS showed an increase or decrease when SUVmax increased or decreased, respectively. Similarly, AUC-CSHV showed an increase or decrease when volume increased or decreased, respectively. AUC-CSHSV showed the combined effect of a change in volume and a change in SUVmax. 1/COV did not correspond with tumour heterogeneity when data were obtained from PET + BF + CIT images, where ranges were between 0.45 and 1.11 and between 0.26 and 0.62 for homogeneous and heterogeneous responses, respectively.

Simulation 2 (Fig. 6) also shows that AUC-CSH and 1/COV of a homogeneous response (HEB, 1.01 and 1.00, respectively), normalized to the baseline, falls within the same range as for simulation 1. Responses that show an increase in homogeneity showed an increase in normalized AUC-CSH and 1/COV, ranging from 1.47 to 1.60 and from 1.86 to 3.32, respectively. For non-normalized AUC-CSH and 1/COV, this trend was not observed (increase in homogeneity: 0.88–0.96 and 0.07–0.13, respectively; homogeneous response: 0.60 and 0.04, respectively).

Human studies

Figure 7 shows CSHs for the four clinical scans shown in Fig. 2. Values of the various parameters are shown in Table 5. For the diagnostic lung studies, AUC-CSH correctly indicated heterogeneity for the tumour with more heterogeneous uptake (0.47) and a more homogeneous distribution for the homogeneous one (0.79). For the response studies, AUC-CSH indicated an increase in heterogeneity (AUC-CSH: 0.74 to 0.63) for the lung tumour (Fig. 2, second row), whilst a decrease in heterogeneity (AUC-CSH: 0.18 to 0.40) was seen for the liver metastases (Fig. 2, second row). All the increases and decreases of AUC-CSH were in agreement with visual interpretation. Similar trends were observed for 1/COV.

Fig. 7
figure 7

CSHs for diagnostic lung study (a), response lung study (b) and response liver study (c). VOIs of response scans were either defined on the baseline scan (VOIBL) or on the response scan (VOIR). The AUC-CSH and/or its change corresponded well with various degrees of tracer uptake heterogeneity and/or its change

Table 5 AUC-CSH, SUVmax, SUVmean 1/SD and 1/COV data for the four human scans shown in Fig. 2, as obtained from PET + BF + CITa images

Discussion

Both partial volume effects and image noise will lead to apparent tumour heterogeneity, and a correction for these confounding effects is needed for accurate calculation of measures that quantify tumour heterogeneity. Partial volume effects could prevent identification of the origin of counts in voxels, as these consist of average values of which the origin is not always known. For example, low uptake of FDG in a voxel could be the result of well oxygenated tumour tissue (with low and uniform FDG uptake) or averaging over a region with both hypoxic (high FDG uptake) and necrotic (low FDG uptake) tissue [8]. Although partial volume effects are unavoidable due to the limited resolution of PET, they need to be corrected for as much as possible. Recently, Hoetjes et al. [20] showed that image-based Van Cittert deconvolution is useful for partial volume correction in oncology. The results of the present study showed that a Van Cittert deconvolution with Gibbs prior, in combination with bilateral filtering to limit spatial noise, did improve CSH curves visually (Fig. 4) and improved accuracy of AUC-CSH (Fig. 3 and Table 3).

The simulations showed that both AUC-CSH and 1/COV corresponded well with level of tumour heterogeneity or heterogeneous response, in contrast to SUVmax, SUVmean or 1/SD (Fig. 5 and Table 4). This means that a change in AUC-CSH corresponded with a visual apparent change in tracer uptake distribution as well as with the variability of voxel values within a tumour. Although the latter can be expressed by COV (%) of voxel values, AUC-CSH might have the additional advantage of exploring changes in tracer uptake distribution relative to baseline distribution, i.e. to normalize the x- and y-axis to baseline volume and SUVmax, respectively. In practice, this means that 100% corresponds to the metabolic volume (y-axis) and SUVmax (x-axis) observed in the baseline study. When such normalization is applied, tumour growth would result in a CSH that goes beyond the 100% on the y-axis. Likewise, an increase of SUVmax means that the CSH would go beyond the 100% of the x-axis. Thus, visual representation of CSH curves for baseline and response studies in one plot can then assist in a quick interpretation of both metabolic volume and tracer uptake changes. Note that a change in AUC-CSH, when both axes are normalized to the baseline, equals that of the total lesion glycolysis (TLG, calculated as the product of SUVmean and volume). However, a constant TLG could still be seen when tumour size has increased with a corresponding decrease in SUV. Use of normalized CSH would also demonstrate a constant AUC-CSH (Fig. 8), but would provide CSH curves with different shapes. By not renormalizing CSH (i.e. CSH is generated using the tumour volume and SUVmax of the image being analysed) an index for tracer heterogeneity or tracer uptake variability over the tumour at that time is obtained. These different normalizations may be useful for response monitoring, as they provide additional information on the response. The type of normalization of AUC-CSH that would be clinically most relevant needs to be further assessed in future studies.

Fig. 8
figure 8

CSHs for theoretical baseline and response study. Both show the same AUC, but different curves

CSH could be used for various applications. The focus of the present study was measurement of heterogeneity in FDG uptake in an NSCLC tumour. Two clinical examples (Figs. 2 and 7 and Table 5) indicate that AUC-CSH can quantify the degree of heterogeneity in FDG uptake, both at baseline (diagnostic studies) and as a result of therapy (response studies). For response studies, VOIs could be defined on either baseline or response scans. Definition of VOI on the baseline and response scans might identify increase in necrotic tissue tumour after treatment. Definition of VOI on the response scan would indicate whether overall heterogeneity in FDG uptake of the tumour had changed in the presence of a volume change. The definition that would be clinically most relevant needs to be further assessed in future studies. Another application might be assessment of response across multiple metastatic lesions, such as in the liver. The liver often contains many metastatic lesions, which are difficult to monitor individually. In these cases AUC-CSH could be used to characterize a global change in tumour load in combination with e.g. TLG [24]. Finally, CSH and AUC-CSH may be used to characterize differences between CT-based anatomical and PET-based metabolic tumour volumes and/or tracer distributions, i.e. a high AUC-CSH would indicate a good correspondence between CT tumour volume and tracer distribution, whilst a low AUC-CSH indicates a large discrepancy between CT tumour volume and PET tracer distribution.

To the best of our knowledge, this is the first study that attempts to characterize and quantify heterogeneity of tracer uptake using the AUC-CSH and different normalizations. The performance of the proposed method was assessed using simulations, and a few clinical cases were used to illustrate the performance of the method and different normalizations. Future studies need to address the potential clinical value of CSH and are aimed at test-retest studies as well as the application of the method on larger clinical data sets. Furthermore, it would be ideal to have a database or benchmark data set containing more realistically simulated anthropomorphic PET images based on e.g. Monte Carlo simulations [25, 26]. This would allow one to not only more extensively explore uptake heterogeneity measures but also to validate new tumour segmentation algorithms. The proposed method for parameterizing uptake heterogeneity is not spatially invariant, unlike the method proposed by O’Sullivan et al. [12]. For patients with sarcoma, this method that assesses spatial intratumoural heterogeneity in FDG uptake has been shown to be a predictor of patient outcome [27]. However, preliminary studies already indicated that other measures derived from the CSH, such as the % of volume derived at a fixed % of SUV or % of SUV derived at a fixed % of volume, could be used as a prognostic factor in NSCLC [28] and head and neck cancer [14]. In addition, further studies are needed to assess the value of the different methods to normalize these CSH to baseline values (SUV and/or volume). Nevertheless, the present preliminary results might indicate the potential value of CSH and AUC-CSH characterizing metabolic tumour heterogeneity.

Conclusion

These initial results indicate that AUC-CSH might be used as a quantitative index of heterogeneity in tracer uptake and it can be used as a means to address (changes in) heterogeneity in response assessment studies. In addition, the results show that a Van Cittert deconvolution with Gibbs prior, in combination with bilateral filtering to limit spatial noise, improves the CSH curves visually.