Introduction

Positron emission tomography (PET) using 18F-2-fluoro-2-deoxy-D-glucose (FDG) is important for detection, staging, prognosis and assessment of treatment response in oncology [110]. Response monitoring using FDG PET has been implemented in a variety of tumours [9, 11]. Although criteria have been defined for evaluating PET studies using visual inspection [12, 13], quantification of FDG uptake is helpful for objective assessment of response to therapy.

The most common method for quantifying tumour FDG uptake is the use of standardized uptake values (SUV). Limitations of SUV include its dependency on patient preparation, scanning procedure, image reconstruction and image analysis procedures [9, 14]. These limitations, however, can to a large extent be overcome by proper standardized procedures [9, 14, 15].

Apart from SUV, various other more quantitative measures can be used for analysis of FDG PET studies. Young et al. [16] first provided guidelines for the measurement of tumour response to therapy using FDG PET imaging. Quantitative approaches range from visual assessment, through semi-quantitative methods (e.g. SUV), to full kinetic analysis (Patlak). However, the European Organization for Research and Treatment of Cancer (EORTC) report did not address the accuracy or limitations of the various analytical methods for monitoring response. Hoekstra et al. [11, 17] compared various methods for quantifying FDG uptake in non-small cell lung cancer (NSCLC) such as nonlinear regression (NLR) of the data using a pharmacokinetic model, Patlak analysis and various simplified methods, e.g. the simplified kinetic method (SKM) [18], SUV normalized by body surface area and corrected for glucose (SUVBSAg) and SadatoBSA [19]. Most measures correlated well with NLR (R 2 > 0.95). Lammertsma et al. [20] discussed the relationship between these simplified methods and full kinetic analysis for monitoring response to classic cytotoxic drugs. Both groups of authors emphasized that care should be taken in using simplified methods (i.e. SUV-based methods) for evaluation of new drugs.

Potential discrepancies between SUV and full kinetic analysis results may be caused by changes in plasma glucose levels or differences in FDG plasma clearance among scans. Factors potentially affecting the relation between SUV and full kinetic analysis have been described by Freedman et al. [21, 22]. Moreover, lower SUV in patients with high plasma glucose levels has been reported [23].

Full kinetic approaches are considered most quantitative as they use the entire (measured) arterial input and tumour time-activity curves (TAC) in combination with a tracer kinetic model and plasma glucose levels to derive a more quantitative measure of the metabolic rate of glucose.

In order to correctly interpret SUV results, it has been recommended [15, 16] that a formal comparison with full kinetic approaches be included during early drug development trials. Although good correlation of the percentage change between SUV and Patlak over all tumour pairs has been reported [11, 21], there were differences when monitoring individual subject data [21, 24, 25]. These authors concluded that the two measures were not equivalent and that they may not equally be affected by changes occurring during treatment. Only after SUV responses have been validated against those seen using full quantitative analysis could SUV be used in a larger clinical setting [14, 20, 26]. The aim of the present study was to compare various semi-quantitative measures with full kinetic analysis for assessing response to treatment. Although similar comparisons have been performed previously, these mainly involved classic cytotoxic drugs. In contrast, in the present study, simplified methods were evaluated for use with novel drugs.

Materials and methods

Subjects and study design

Data from two separate response monitoring studies, involving different drugs, were used, which will be referred to as studies A and B. These studies were selected based on targeted treatment. Written informed consent was obtained for all subjects and studies were approved by the Medical Ethics Review Committee of the VU University Medical Center.

Study A

Seven subjects suffering from stage IIIB and IV NSCLC, enrolled in a response monitoring trial, were included in this study. Patients were treated with an orally active inhibitor of the epidermal growth factor receptor (EGFR) tyrosine kinase (erlotinib). Erlotinib (Tarceva, drug A) inhibits EGF-dependent proliferation of cells at submicromolar concentrations and blocks cell cycle progression in the G1 phase, and is taken orally on a daily basis. All patients (four women and three men; mean body weight at baseline 74 ± 16 kg, range 53–109 kg; mean body weight at response scan 79 ± 16 kg, range 60–108 kg) underwent a dynamic FDG PET scan at baseline and after 3 weeks of therapy.

Study B

Six patients (one woman and five men) with advanced gastrointestinal malignancies were treated by BMS-582664 (brivanib alaninate) in combination with full-dose cetuximab (Erbitux), a monoclonal antibody targeting EGFR. BMS-582664 (drug B) is a selective dual inhibitor of fibroblast growth factor and vascular endothelial growth factor signalling, and is taken orally on a daily schedule. All patients (mean body weight 84 ± 17 kg, range 60–110 kg) underwent a dynamic baseline FDG PET scan within 2 weeks before start of treatment. Two dynamic response FDG PET scans were performed. Five patients (mean body weight 84 ± 19 kg, range 63–110 kg) underwent an early response study during cycle 1 (day 15) of treatment and three patients (mean body weight 88 ± 13 kg, range 73–108 kg) underwent a late response study after cycle 2 (day 56) of treatment. Thus, there were two subjects that underwent both an early and a late response study.

PET scanning protocol

All patients fasted for at least 6 h before scanning. Patients were prepared in accordance with recently published guidelines for quantitative FDG PET studies [14, 27]. Patients were scanned in the supine position. Patients received an intravenous catheter for administration of FDG and for collection of additional blood samples to measure blood glucose (i.e. at 35, 45, 55 min post-injection during dynamic scanning) using validated and calibrated methods. Blood glucose levels were used for quantitative analysis, but because of the strict study design, patients could not be rescheduled in cases of high glucose levels. In addition, whole blood and plasma FDG concentrations were measured using a cross-calibrated well counter.

All studies were performed using an ECAT EXACT HR+ scanner (Siemens/CTI, Knoxville, TN, USA). The characteristics of this scanner have been described previously [28]. Each PET study started with a 10-min transmission scan. After completion of the transmission scan, a bolus of about 370 MBq FDG was injected intravenously and a dynamic emission scan was started simultaneously in a two-dimensional acquisition mode. This dynamic emission scan consisted of 40 frames with the following lengths: 1 × 30, 6 × 5, 6 × 10, 3 × 20, 5 × 30, 5 × 60, 8 × 150, 6 × 300 s.

All emission scans were reconstructed using filtered backprojection (FBP) with a 0.5 Hanning filter. During reconstruction all corrections needed for quantification, such as dead time, decay, scatter, random, attenuation and normalization corrections, were applied. In addition, the last three frames of the sinogram were summed and reconstructed using normalization and attenuation-weighted ordered subsets expectation maximization (OSEM) with 2 iterations and 16 subsets, followed by post-smoothing using a 0.5 Hanning filter [29]. Consequently, OSEM reconstructed images had the same dimensions and spatial resolution as FBP reconstructed images. OSEM reconstructions were used for volumes of interest (VOI) definition, because of its superior image quality compared with FBP reconstructions.

VOI definition

To obtain image-derived input functions (IDIF), 3-D VOI were drawn manually in three vascular structures (i.e. left ventricle, aortic arch and ascending aorta) using an early frame highlighting the blood pool [30]. These VOI were then projected onto all frames yielding arterial whole blood TAC, i.e. image-derived input functions. A quality check of each IDIF was performed, as described previously [11, 17]. This involved comparison of the three IDIFs with the activity concentration measured in the three venous blood samples. The maximum discrepancy allowed was 7%, which is two times the repeatability of the blood sample well counter measurements. Average input curves from the three VOI defined in the three vascular structures were used as input function during kinetic analysis. Tumour VOI were defined on the OSEM reconstructed images using a semi-automatic procedure [14, 27] based on a source to background-adapted 50% isocontour. These VOI were then projected onto all frames of the FBP reconstructed dynamic study to provide tumour TAC. Whenever possible, TAC for up to five lesions per patient was defined. The same lesions were selected for all longitudinal scans of a patient.

Quantitative analysis and kinetic modelling

Four quantitative methods were investigated. These methods have been described in detail before [11] and a brief summary is described below:

  1. 1.

    NLR is the most quantitative method and involves fitting tumour TAC to the standard two-tissue irreversible compartment model (three rate constants plus blood volume fraction parameter) using an IDIF as arterial input function. The primary outcome parameters of this method are the overall influx rate constant (Ki) in min−1 and the metabolic rate of glucose (MRGlu) in mmol⋅l−1⋅min−1, which equals Ki times blood glucose. The lumped constant was assumed to be equal to 1 in all studies.

  2. 2.

    Patlak graphical analysis is based on a linearization of the standard FDG model mentioned above, assuming that the free concentration of FDG in tissue reaches a steady state and that binding/trapping is irreversible. As the method is rapid and robust in the presence of noise, it has been recommended by the EORTC [16]. The primary outcome parameters of this method again are Ki and MRGlu. In the Patlak method, linear regression is applied to a certain (user-defined) time interval of the collected data. Although, in the present study, various time intervals were investigated, only results for the interval 10–60 min post-injection will be presented, as these showed the best correlation (R 2 = 0.97) with results obtained using NLR.

  3. 3.

    SKM: in contrast to NLR and Patlak, this method [18] only requires a static FDG scan and, in principle, a single venous blood sample obtained at 55 min post-injection. SKM can (partly) compensate for e.g. differences in FDG clearance from blood between various subjects or scans. SKM uses a tri-exponential function to describe the input function. The parameters of this function are partly based on population average value, but the third exponential function, describing the tail, is called using the blood concentration measured in a few blood samples. The primary outcome parameter of SKM is MRGlu.

  4. 4.

    SUV: various SUV alternatives were investigated, which were based on different normalization procedures, e.g. body weight (SUVBW), body surface area (SUVBSA) and lean body mass (SUVLBM). In addition, all SUV parameters were calculated with and without a correction for blood glucose.

SKM and SUV are based on static PET images, i.e. they use the FDG concentration at a fixed time after FDG administration. In the present study dynamic scans were performed and thus the average tumour activity concentration observed at 50–60 min post-injection was used for calculations.

Comparison of methods

Patlak analysis has been recommended by the EORTC [16] as the method of choice for a full quantitative analysis of dynamic FDG PET studies. To assess whether this guideline is still valid for targeted therapies, Patlak-derived Ki and MRGlu values were compared with those obtained using NLR fitting of TAC using a two-tissue compartment model with blood volume fraction correction.

Quantitative measures of FDG uptake in response scans were related to those obtained during baseline scans and fractional or percentage changes calculated. Percentage changes in tracer uptake due to therapy seen with various simplified methods and derived on a lesion per lesion basis were correlated against those obtained using MRGlu-Patlak.

Comparison of input functions

As SUV assumes a constant relationship between injected radioactivity and body distribution, possible discrepancies in observed fractional changes (relative to full kinetic analysis) may be due to therapy-induced changes in shape and/or amplitude of the input function. Therefore, normalized areas under the curve (AUC) of observed input functions were calculated. Normalization was performed by dividing the observed AUC of the input function by the net amount of administered dose. Normalized AUC (nAUC) obtained during response scans were compared with those at baseline.

Results

For study A, 18 lesions were defined over the 7 subjects. The mean (±SD) plasma glucose level was 6.5 ± 3.3 (range 4.0–14.1) mmol⋅l−1 for baseline studies and 7.1 ± 4.1 (range 4.5–16.0) mmol⋅l−1 for response studies (p = NS, Student’s t test). Although the subjects were well prepared, high plasma glucose levels (>8 mmol⋅l−1) were seen for two subjects at baseline and/or after treatment. All other subjects showed plasma glucose levels lower than 6 mmol⋅l−1.

Similarly, for study B, nine lesions from six subjects were analysed. However, for five subjects early response studies were available; of these five only two subjects successfully concluded the late response studies. For one subject only a late response study was successfully collected. The mean plasma glucose level was 5.6 ± 0.9 (range 3.8–7.1) mmol⋅l−1 for baseline and 5.5 ± 1.36 (range 4.2–8.4) mmol⋅l−1 for response (p = NS, Student’s t test). All plasma glucose levels were within normal range [6, 11].

Validation of Patlak against NLR

A comparison of Patlak- versus NLR-derived MRGlu is given in Fig. 1 for both studies. Correlations were excellent (all R 2 > 0.96) for both studies, both at baseline and following therapy (Table 1). Moreover, slopes were close to 1 and did not significantly change between baseline and response studies. Although good correlations were found, there were small differences in MRGlu values between those derived using NLR versus Patlak. These differences, however, correlated with fitted blood volume fractions, as shown in Fig. 2.

Fig. 1
figure 1

Correlation of Patlak-derived metabolic rate of glucose (MR Glu ) versus nonlinear regression (NLR) for study A (a) and study B (b). Baseline data are indicated using circles and response studies using squares. Dashed line represents line of identity

Table 1 Slope, standard error and correlation coefficient (R 2) of MRGlu-Patlak versus MRGlu-NLR
Fig. 2
figure 2

Relative difference between MRGlu-Patlak and MRGlu-NLR for both baseline and response scans against vascular volume fractions for study A (a) and study B (b). Dashed line represents line of identity

Correlation of methods

Results of the comparison of simplified methods against MRGlu-Patlak are summarized in Tables 2 (study A) and 3 (study B).

Table 2 Slope, standard error and correlation coefficient (R 2) of various semi-quantitative methods versus MRGlu-Patlak for study A
Table 3 Slope, standard error and correlation coefficient (R 2) of various semi-quantitative methods versus MRGlu-Patlak for study B

In the case of study A, fair correlation (approximately R 2 > 0.6) was seen at baseline for all simplified methods. In response studies, however, correlations worsened for those methods that do not correct for plasma glucose levels, i.e. SUVBW, SUVBSA and SUVLBM, and were significantly better for methods that correct for plasma glucose levels. When removing subjects with high blood glucose values (>11 mmol⋅l−1), correlation between SUV and MRGlu-Patlak improved considerably for most SUV measures. In the case of study B, effects of plasma glucose corrections on correlations were not seen. In this particular study, however, (all) simplified methods (independent of plasma glucose correction) provided poorer correlations for baseline studies than for response studies.

Response assessment: comparison of methods

Study A

The mean fractional change in MRGlu-Patlak following therapy was 26 ± 35% (range −24 to 87%), i.e. for most subjects and lesions an increase in MRGlu was observed, possibly indicating progression of disease. Clinical interpretation of these results is beyond the scope of the present study. In Fig. 3 fractional changes obtained using simplified methods are plotted against those seen using MRGlu-Patlak. Percentage changes obtained using various SUV measures without glucose correction may, at times, differ substantially from those seen using MRGlu-Patlak. It should be noted, however, that the largest deviations were observed for two subjects with the highest plasma glucose values (>8 mmol⋅l−1). Figure 3a also illustrates that results for SUVBW, SUVBSA and SUVLBM were very similar. Fractional changes obtained using SUV with plasma glucose correction (Fig. 3b) and SKM (Fig. 3c) generally showed good agreement with those obtained using MRGlu-Patlak.

Fig. 3
figure 3

Relative percentage changes in SUV and SKM due to therapy compared with corresponding changes in MRGlu-Patlak on a lesion per lesion basis for SUVBW (triangles), SUVLBM (circles) and SUVBSA (squares) (a) uncorrected for blood glucose and (b) corrected for blood glucose, and (c) for SKM in study A. Grey symbols represent data of subjects with high blood glucose in the response scan only (<11 mmol⋅l−1). Black symbols represent subject data with high blood glucose values in both scans (>11 mmol⋅l−1). The open symbols represent data from subjects having a normal blood glucose level. Dashed line represents line of identity

Study B

The mean fractional change in MRGlu-Patlak following therapy was −34 ± 25% (range −64 to 34%). In Fig. 4 fractional changes obtained using simplified methods are plotted against those obtained using MRGlu-Patlak. Figure 4a illustrates results obtained with various SUV measures without glucose correction, whilst results after glucose correction are presented in Fig. 4b. Figure 4c shows data obtained using SKM. Note that in these figures we illustrated results for the primary lesion only to avoid unbalanced representation of subject data. From these figures it can be deduced that all simplified methods provide smaller fractional changes than MRGlu-Patlak. Interestingly, in this case, plasma glucose correction did not reduce the discrepancy between SUV and MRGlu-Patlak. Similar results were obtained for SUV measures with different normalization factors (BW, BSA, LBM). Fractional changes found with SKM, however, seemed to agree slightly better with those found with MRGlu-Patlak (Fig. 4c). Tables 4 and 5 summarize all correlation coefficients and slopes for studies A and B, respectively.

Fig. 4
figure 4

Relative percentage changes in SUV and SKM due to therapy compared with corresponding changes in MRGlu-Patlak on a lesion per lesion basis for SUVBW (triangles), SUVLBM (circles) and SUVBSA (squares) (a) uncorrected for blood glucose and (b) corrected for blood glucose, and (c) for SKM in study B. Dashed line represents line of identity

Table 4 Slope, standard error and correlation coefficient (R 2) of percentage change in various semi-quantitative methods versus those obtained with MRGlu-Patlak for study A
Table 5 Slope, standard error and correlation coefficient (R 2) of percentage change in various semi-quantitative methods versus those obtained with MRGlu-Patlak for study B

nAUC of input functions

Figure 5 illustrates the correlation of nAUC of input functions between response and baseline studies. For study A (Fig. 5a), nAUC of the input function were similar for baseline and response studies. In contrast, for study B (Fig. 5b) differences were seen for some cases.

Fig. 5
figure 5

a Area under the curve (AUC) of image-derived input functions (IDIF), normalized for injected dose, of response versus baseline studies for study A. b Same for study B; early response data are indicated using squares and late response studies using triangles. Dashed line represents line of identity

Discussion

Validation of Patlak analysis for response assessment

For both studies very good correlations (R 2 > 0.96) between MRGlu-NLR and MRGlu-Patlak were found. This finding corresponds with previously published data [11] and supports the EORTC recommendation of using Patlak analysis for quantification of MRGlu [16]. A possible limitation of Patlak analysis, however, is that a correction for blood volume is not included. This may explain the small differences in MRGlu between both methods, especially for the lung cancer study (study A). The measured fractional difference in MRGlu between both methods correlates with the fractional blood volume fraction or spillover fraction in those tumours located near large blood vessels (Fig. 2). In theory, Patlak could provide a slightly incorrect tumour response when there are large changes in blood volume fraction following therapy. For study A regression slopes between MRGlu-NLR and MRGlu-Patlak were 0.87 ± 0.02 for both baseline and response studies. For study B a small difference in these regression slopes was seen, changing from 1.01 ± 0.02 at baseline to 0.93 ± 0.02 following therapy. Consequently, in this case responses measured using MRGlu-Patlak would be biased by approximately 8%. This, however, should be balanced against the main advantages of Patlak analysis, its speed and its insensitivity to noise. The latter is relevant not only for high accuracy, but also a good test-retest variability [31] is needed to detect metabolic responses [11, 17]. Therefore, despite the small bias in measured response, Patlak analysis was used for assessing the simplified methods also for study B.

Study A: correlation and validation of methods

For study A good correlations were found between several semi-quantitative (simplified) measures and MRGlu-Patlak. These results are in agreement with previous reports [11, 21, 24]. However, post-treatment studies showed decreased correlation compared with baseline scans. A similar observation was made using SUV for response monitoring. A few subjects showed an SUV response that was substantially different from the MRGlu response. This was primarily due to high values of and/or large changes in blood glucose levels, as correlations were higher for SKM and SUV measures with blood glucose correction. It should be noted, however, that these effects were caused by two of the seven subjects. Nevertheless, these findings demonstrate the need to correct for blood glucose levels if it cannot be guaranteed that they will be within the normal range.

One strategy to avoid inaccurate quantification using SUV would be to reschedule the study when high blood glucose levels are observed. Current guidelines [9, 14, 15] recommend postponement of a FDG PET study if the blood glucose level is elevated (>11 mmol⋅l−1). Moreover, this exclusion criterion still implies that subjects with a blood glucose level of 10 mmol⋅l−1, i.e. a value that is almost twofold higher than the average, would still be included in a trial. Such a high value would likely result in lower SUV values [32].

Another option, supported by the present results, is to include a glucose correction within the SUV calculation. However, use of blood glucose corrections is a matter of debate. Data have been published that either favour or discourage use of blood glucose-corrected SUV. Recently, Boellaard et al. [14, 27] and Velasquez et al. [31] reported results from a multi-centre trial, showing that test-retest variability of SUV worsened due to blood glucose correction. Interestingly, a subset of the same data, collected at a single centre, showed improved test-retest variability following blood glucose correction. The possible explanation for this discrepancy is that some of the centres used a bedside method for measuring blood glucose, i.e. a method that provided accurate results within 5% in 46.9% and within 10% in 78.3% of the cases, respectively [33]. Clearly, a more accurate (laboratory) method should be considered for response monitoring studies.

Study B: correlation and validation of methods

In contrast to study A, correlations between Patlak and all SUV methods, including those that incorporate a blood glucose correction, were poor for study B. During therapy, correlations between simplified methods (SUV or SKM) and Patlak improved to values previously found in other studies. A relatively consistent difference between fractional SUV and MRGlu-Patlak responses was seen. All simplified methods showed a substantially smaller fractional change than Patlak analysis. This was observed for both early and late response studies and for all lesions (data not shown). SKM performed better than SUV, which was likely due to therapy-induced changes in FDG plasma clearance. In theory, there are two factors that affect SUV calculations: (1) the FDG from plasma that is available for the tumour over time (input function) and (2) the concentration of free FDG in intravascular and tissue spaces. For example, tumours with low FDG uptake, having a relatively high fraction of free FDG, may show larger discrepancies between SUV and Patlak analysis [21, 24, 34]. Results shown in Fig. 5b indicate that for some patient studies there were substantial differences between nAUC of the input functions during baseline and response scans. However, the number of patients included is small and, therefore, any explanation for the observed differences between SUV and MRGlu responses requires further validation. Nevertheless, nearly all subjects showed differences between SUV and MRGlu responses.

Limitations

Full quantitative dynamic PET studies for response monitoring studies are complex resulting in discomfort for the subjects. Consequently, only a small number of patients are usually recruited (and/or approval is given for a small number of subjects). Additionally, in the present study a limited number of successfully completed studies were available. Despite the small number of subjects, however, both studies illustrate potential difficulties associated using SUV for response monitoring purposes. In the case of study B all observed fractional changes in SUV were different (smaller) from those seen with Patlak analysis. Therefore, the results of this study confirm that SUV responses may differ substantially from those obtained using full kinetic analysis [21, 24, 25].

It should be noted that most simplified methods can be applied in a routine clinical setting (i.e. using whole-body scanning), while full kinetic analysis requires a dynamic scan. An alternative could be to first perform a 60-min dynamic scan covering the primary tumour only followed with a whole-body acquisition to study presence and tracer uptake in distant metastases. However, in practice this strategy would seriously affect patient comfort and throughput. Moreover, apart from the limited coverage of the patient, a dynamic scan is also more sensitive to patient motion which could affect the accuracy and precision of full kinetic analysis.

Finally, in our paper image-derived input functions were used. The quality of these input functions might be affected by partial volume effects, i.e. spillover of signal to and from surrounding tissues. Therefore, in the present studies three blood samples were drawn to perform a quality check of the image-derived input function to avoid that these are affected by partial volume effects and/or patient motion. Therefore, it is expected that use of image-derived input functions in our studies did not affect the main conclusions of this paper. Yet, it is acknowledged that image-derived input functions should be used with care and that at least a quality check on the appropriateness of the input function should be performed, e.g. as described in [11, 17].

Conclusion

In this study use of simplified methods for quantification of treatment response with FDG PET was compared with full kinetic analysis. Fractional changes seen with SUV differ from those seen using full quantification based on dynamic scans and Patlak analysis due to e.g. elevated plasma glucose values or changes in FDG plasma clearance. SUV may thus provide different response measures than those observed with metabolic rate of glucose. For clinical trials this could imply that use of SUV for assessment of drug efficacy may under- or overestimate treatment effects compared with full kinetic analysis and thus it would be recommended that SUV responses are initially characterized prior to implementation in larger clinical trials.