Abstract
We conducted a comprehensive systematic review of the literature on volumetric parameters and a meta-analysis of the prognostic value of metabolic tumor volume (MTV) and total lesion glycolysis (TLG) in patients with head and neck cancer (HNC). Methods: A systematic search of MEDLINE and EMBASE was performed using the key words PET, head and neck, and volume. Inclusion criteria were 18F-FDG PET used as an initial imaging tool; studies limited to HNC; patients who had not undergone surgery, chemotherapy, or radiotherapy before PET scans; and studies reporting survival data. Event-free survival and overall survival were considered markers of outcome. The impact of MTV or TLG on survival was measured by the effect size hazard ratio (HR). Data from each study were analyzed using Review Manager. Results: Thirteen studies comprising 1,180 patients were included in this study. The combined HR for adverse events was 3.06 (2.33–4.01, P < 0.00001) with MTV and 3.10 (2.27–4.24, P < 0.00001) with TLG, meaning that tumors with high volumetric parameters were associated with progression or recurrence. Regarding overall survival, the pooled HR was 3.51 (2.62–4.72, P < 0.00001) with MTV and 3.14 (2.24–4.40, P < 0.00001) with TLG. There was no evidence of significant statistical heterogeneity at an I2 of 0%. Conclusion: MTV and TLG are prognostic predictors of outcome in patients with HNC. Despite clinically heterogeneous HNC and the various methods adopted between studies, we can confirm that patients with a high MTV or TLG have a higher risk of adverse events or death.
Head and neck cancer (HNC) includes malignancies of the oral cavity, oropharynx, hypopharynx, larynx, sinonasal tract, and nasopharynx (1). HNCs are histologically identical but clinically heterogeneous entities that show disparities in natural course or clinical behavior based on primary location (2). The American Joint Committee on Cancer staging is generally used to estimate the prognosis and guide therapy. However, the prognostic value of American Joint Committee on Cancer staging is limited in individual patients in the pretreatment stage, because staging is based on tumor morphology and does not reflect individual biologic and molecular markers (1).
PET using 18F-FDG has become a standard modality for staging, restaging, and monitoring the treatment response in a variety of tumors (3). In addition, it is more accurate than conventional staging in HNC, overcoming the limitations of morphologic imaging modalities (1). Standardized uptake value (SUV) is a semiquantitative measure of the normalized concentration of radioactivity in a lesion, and maximum SUV (SUVmax) is one of the most widely used parameters in clinical practice (1). However, SUVmax shows the highest intensity of 18F-FDG uptake within the region of interest or volume of interest (VOI) and cannot represent total tumor uptake for the entire tumor mass (3).
Recently, there has been an increasing interest in the use of volumetric parameters of metabolism such as metabolic tumor volume (MTV) and total lesion glycolysis (TLG). MTV and mean SUV can be measured by contouring margins defined by thresholds. Then, TLG can be calculated by multiplying MTV by mean SUV, which weights the volumetric burden and metabolic activity of tumors (3–5). Commercially available tools for tumor analysis enable rapid and easier measurement of MTV or TLG (3). These parameters could be used to reflect disease burden and tumor aggressiveness in some kinds of malignant tumors (6). However, there have been conflicting results regarding the prognostic value of volumetric parameters in HNC (7,8). Thus, we conducted a comprehensive systematic review of the literature on volumetric parameters and designed a meta-analysis to assess the prognostic value of MTV and TLG in patients with HNC.
MATERIALS AND METHODS
Data Search and Study Selection
We performed a systematic search of MEDLINE (inception to July 2013) and EMBASE (inception to July 2013) for English publications using the key words PET, head and neck, and volume. All searches were limited to human studies. Inclusion criteria were 18F-FDG PET used as an initial imaging tool; studies limited to HNC; patients who had not undergone surgery, chemotherapy, or radiotherapy before PET scans; and studies that reported survival data. Reviews, abstracts, and editorial materials were excluded. Two authors conducted the searches and screening independently. Any discrepancies were resolved by a consensus.
Data Extraction and Quality Assessment
Data were extracted from the publications independently by 2 reviewers, and the following information was recorded: first author, year of publication, country, PET machine, study design, number of patients, types of diseases, staging, treatment, and endpoints. Three reviewers scored each publication according to a quality scale, which was based on that used in previous studies (9,10). This quality scale was grouped into 4 categories: scientific design, generalizability, analysis of results, and PET reports. A value between 0 and 2 was attributed to each item. Each category had a maximum score of 10 points. The scores were expressed as a percentage of the maximum 40 points.
Statistical Analysis
The primary outcome was event-free survival (EFS). Disease-free survival, locoregional control, and progression-free survival were obtained as primary outcomes and newly defined as EFS, which was measured from the date of initiation of therapy to the date of recurrence or metastasis (11). The secondary endpoint was overall survival (OS), defined as the time from initiation of therapy until death by any cause. The impact of MTV or TLG on survival was measured by the effect size of hazard ratio (HR). Survival data were extracted using the following methodology suggested by Parmar et al. (12). We extracted a univariate HR estimate and 95% confidence intervals (CIs) directly from each study if provided by the authors. Otherwise, P values of the log-rank test, 95% CI, number of events, and number at risk were extracted to estimate the HR indirectly. Survival rates on the graphical representation of the Kaplan–Meier curves were read by Engauge Digitizer (version 3.0; http://digitizer.sourceforge.net) to reconstruct the HR estimate and its variance, assuming that patients were censored at a constant rate during the follow-up. An HR greater than 1 implied worse survival for patients with a high MTV or TLG, whereas an HR less than 1 implied a survival benefit for patients with a high MTV or TLG. Heterogeneity between studies was assessed by χ2 test and I2 statistics, as described by Higgins et al. (13). Funnel plots were used to assess publication bias graphically (14). We also extracted survival data of SUVmax from the same studies included in this meta-analysis as mentioned above. P values of less than 0.05 were considered statistically significant. Data from each study were analyzed using Review Manager (RevMan, version 5.2; The Nordic Cochrane Centre, The Cochrane Collaboration).
RESULTS
Study Characteristics
The electronic search identified 365 articles. After the exclusion of non-English articles (n = 24), conference abstracts (n = 131), and 180 studies that did not meet the inclusion criteria based on title and abstract, and reviewing the full text of 30 articles, 13 studies including 1,180 patients were eligible for this study. The detailed procedure is presented in Figure 1. Three of 13 studies were of a prospective design. The studies included malignancies of the oral cavity, nasopharynx, oropharynx, hypopharynx, larynx, or salivary gland. Either MTV (2,15–17) or TLG (18) was measured in 5 studies, and both were measured in 8 studies (8,19–25). The VOI was defined as the tumor (2,8,17–23) or tumor plus metastatic lymph nodes (LNs) (15,16,24,25). Three threshold methods were adapted to segment VOIs. A fixed SUV of 2.5 (2,8,15–19,22) or 3.0 (23) was used in 9 studies. The gradient segmentation method was applied in 1 study (20), and a percentage of SUVmax (30%, 42%, or 50%) was used in 3 studies (21,24,25). In each study, patients were divided into 2 groups (high and low volume) based on cutoff values. A minimum P value was used in 4 studies (15,16,19,22), receiver-operating characteristics (ROCs) in 4 studies (2,7,23,24), and median value in 5 studies (16,18,20,21,23). High volumetric parameters were significant variables in predicting a worse prognosis except in 1 study (20). The cutoff values of MTV ranged between 7.7 and 45 cm3 and those of TLG ranged from 55 to 330. The mean quality score was 79.4%, ranging from 70% to 85%. Visual inspection of the funnel plot suggested no evidence of publication bias. Study characteristics are summarized in Table 1.
Primary Outcome: EFS
The EFS was analyzed using 8 studies with MTV. We performed subgroup analyses according to the definition of VOI. The HR for adverse events was 3.03 (95% CI, 2.22–4.13; P < 0.00001) for an MTV defined by the tumor and 3.15 (95% CI, 1.80–5.51, P < 0.0001) for an MTV defined by the tumor and LN. The combined HR was 3.06 (95% CI, 2.33–4.01, P < 0.00001). The test for heterogeneity gave no significant results (χ2 = 3.40, P = 0.85; I2 = 0%). Five studies with TLG were included in the second analysis of EFS. When a fixed-effect model was used, the pooled HR was 3.10 (95% CI, 2.27–4.24, P < 0.00001; I2 = 0%), meaning that tumors with a high TLG are associated with progression and recurrence. Forest plots of MTV and TLG are shown in Figures 2 and 3, respectively.
Additional subgroup analyses were performed according to tumor delineation, cutoff values, and study design (Table 2). Among studies including MTV, those with a fixed SUV of 2.5 had an HR of 3.17 (95% CI, 2.30–4.36, P < 0.00001), and those with other thresholds had an HR of 2.78 (95% CI, 1.66–4.66, P = 0.0001). Studies with cutoff values using ROC had an HR of 4.30 (95% CI, 2.46–7.54, P < 0.00001), and those adopted cutoff values using other methods had an HR of 2.75 (95% CI, 2.02–3.75, P < 0.00001). Among studies including TLG, those with a fixed SUV of 2.5 had an HR of 3.45 (95% CI, 2.33–5.12, P < 0.00001), and those with other thresholds had an HR of 2.59 (95% CI, 1.55–4.31, P = 0.0003).
Secondary Outcome: OS
The survival analysis was based on 8 studies including MTV. Subgroup analysis was assessed according to the VOI of MTV. The HR for an MTV defined by the tumor was 3.19 (95% CI, 2.28–4.48; P < 0.00001) and that defined by the tumor and LN was 4.71 (95% CI, 2.60–8.54, P < 0.00001). The combined HR was 3.51 (95% CI, 2.62–4.72, P < 0.00001) (Fig. 4). The test for heterogeneity gave no significant results (χ2 = 5.71, P = 0.57; I2 = 0%). Six studies with TLG were included in the analysis of OS. The pooled HR of death was 3.14 (95% CI, 2.24–4.40, P < 0.00001) (Fig. 5). There was no evidence of significant statistical heterogeneity, with an I2 of 0% (χ2 = 3.65, P = 0.60).
Additional subgroup analyses were performed according to tumor delineation and cutoff values (Table 2). Among studies of MTV, those with a fixed SUV of 2.5 had an HR of 4.09 (95% CI, 2.63–6.36, P < 0.00001), and those with other thresholds had an HR of 3.23 (95% CI, 1.95–5.34, P < 0.00001). Studies with cutoff values using ROC had an HR of 4.57 (95% CI, 2.89–7.25, P < 0.00001), and those adopting cutoff values using other methods had (95% CI, an HR of 2.93 (95% CI, 2.0–4.29, P < 0.00001). Among the studies including TLG, those with a fixed SUV of 2.5 had an HR of 3.90 (95% CI, 2.45–6.21, P < 0.00001), and those with other thresholds had an HR of 2.46 (95% CI, 1.51–4.02, P = 0.0003).
Combined Data of SUVmax
Survival data of SUVmax were extracted from 7 studies (2,14–16,18,22,23) for EFS and from 3 studies (2,18,23) for OS. The HR for adverse events was 1.83 (95% CI, 1.39–2.42, P < 0.0001), and the test for heterogeneity gave no significant results (χ2 = 3.59, P = 0.73; I2 = 0%). The pooled HR of death was 2.36 (95% CI, 1.48–3.77, P = 0.0003). There was no evidence of significant statistical heterogeneity, with an I2 of 0% (χ2 = 0.09, P = 0.96) (Table 3).
DISCUSSION
This meta-analysis evaluated the prognostic value of MTV or TLG for 18F-FDG PET in patients with HNC by determining the HR of EFS and OS of high values for MTV or TLG, compared with those of low values for MTV or TLG. In combined results, patients with a high MTV showed a 3.06-fold-higher risk of adverse events or 3.51-fold-higher risk of death than patients with a low MTV. Patients with a high TLG had a 3.10-fold-higher risk of events or a 3.14-fold-higher risk of death than patients with a low TLG. Although large variability may affect MTV or TLG, our findings suggest that volumetric parameters of PET have prognostic value in EFS or OS. To evaluate the effects of methods selected in each study, we performed subgroup analyses, which showed small variations of the HRs of EFS for MTV (2.75–3.68) despite the wide range of MTV (11.2–45 cm3).
Most previous studies that evaluated the prognostic value of volumetric parameters followed the protocol shown in Figure 6. First, the VOI is determined whether for tumors alone or tumors plus LN. Next, VOI is delineated with variable methods. The choice of the threshold may affect the absolute value of MTV or TLG (26). A certain SUV such as 2.5, 3.0, or percentages of SUVmax are widely used to properly differentiate between benign and malignant lesions (3). All voxels containing SUVs above these thresholds are measured as VOIs. The ranges of fixed SUV and percentage of SUVmax for VOI determination included in this study were limited to an SUV of 2.5–3.0 and 30%–50% of SUVmax. Also, a fixed SUV of 2.5 was adopted in 9 of 15 studies in this meta-analysis, which may be a good standard of thresholds of VOI delineation. The gradient segmentation method can also be used to delineate tumors. This method calculates spatial derivatives along the tumor radii and defines the tumor edge on the basis of derivative levels and continuity of the tumor edge (27). Manual drawing methods can be used to delineate VOIs; however, interobserver variability is possible. As a consensus has yet to be reached, MTV and TLG may range widely even in the same tumor, according to the method used. After the VOI is delineated, MTV or TLG or both are measured. Currently, commercially available tools for tumor analysis can enable more rapid and easier measurement of volumetric parameters (3). MTV or TLG are incorporated into categoric data using specific cutoff values. Patients are divided into 2 groups of high or low volumetric parameters (MTV or TLG). Cutoff values are determined mostly by the minimum P value, ROC, or a median value. Although the minimum P value method has widely been used in previous studies, it is associated with high false-positives and may yield a biased, unreliable, and nonreproducible estimate of the prognostic impact of the tested covariate (28). The cutoff values of studies included in this meta-analysis ranged widely according to the methods selected in each study, from 7.7 to 45 cm3 for MTV and from 55 to 330 for TLG. A few studies evaluated prognostic values of MTV or TLG with continuous variables without dividing patients into 2 groups (7). After patients were divided into 2 groups, the prognostic values of MTV or TLG were analyzed using the log-rank test or Cox proportional hazards regression method.
Ten previous meta-analyses of HNC with PET were identified by electronic searches of MEDLINE and EMBASE (Table 4). Eight studies analyzed the diagnostic performance of PET regarding LN metastasis (29,30), distant metastasis (31–34), and residual disease or recurrence (35,36). Prognostic values of SUVmax in terms of disease-free survival, OS, or locoregional control with the effect size of risk ratio or odds ratio were evaluated in studies by Zhang et al. (37) and Xie et al. (38). As the odds ratio is measured at a single point in time, it is not recommended as a surrogate method for analyzing time-to-event outcomes (39); HR is the most appropriate measure. Therefore, we calculated the HR as the effect size of the current study. To the best of our knowledge, this is the first meta-analysis to evaluate the prognostic value of MTV or TLG in any kind of tumors. Although we analyzed HRs of SUVmax for events and deaths, comparison of HRs between SUVmax and volumetric parameters could not be done directly. However, pooled HRs of MTV and TLG seem to be higher than SUVmax for both EFS and OS, which might lead to the assumption that MTV and TLG are stronger predictors. In addition, SUVmax was not a significant prognostic factor either for EFS (6/7 studies) or for OS (2/3 studies) in most studies.
This study has several limitations. Regardless of the methods selected in each study, high values for MTV or TLG are shown to be associated with a higher risk of adverse events or death. However, as there is still debate over the best approach for VOI and threshold methods, we were unable to propose an optimal cutoff value to categorize volumetric parameters as high or low. Because we could not access individual patient data, there is a risk of bias in this study. Although we have found that patients with a high MTV or TLG had higher risk of adverse events or death than patients with a low MTV or TLG, there is the difficulty in interpreting the HRs for MTV and TLG, which stems from the fact that we do not know the exact incidence rate for the events of interest over a given period of time. Further prospective studies combining incidence rate of diseases are needed. We searched databases that include only studies that have been published. A publication bias cannot be excluded, even if the funnel plot does not suggest clear evidence of it. In addition, HNC is a heterogeneous disease, and patients with different histologic grade, stages, and treatments were included in this meta-analysis, which can affect events occurring over the time and survival. To recommend PET as a routine test in HNC, further studies regarding cost-effectiveness and those comparing clinical benefits of PET with those of other modalities are required. Second, even though 2 reviewers independently read survival curves, the strategy could not ensure complete accuracy of the extracted data. In addition, as non-English articles were excluded in this study, the potential impact of language bias should be considered.
CONCLUSION
MTV and TLG are accurate prognostic indicators of outcome in patients with HNC. Despite clinically heterogeneous HNC and the various methods adopted between studies, we can confirm that patients with a high MTV or TLG are at higher risk for adverse events or death.
DISCLOSURE
The costs of publication of this article were defrayed in part by the payment of page charges. Therefore, and solely to indicate this fact, this article is hereby marked “advertisement” in accordance with 18 USC section 1734. No potential conflict of interest relevant to this article was reported.
Footnotes
Published online Apr. 21, 2014.
- © 2014 by the Society of Nuclear Medicine and Molecular Imaging, Inc.
REFERENCES
- Received for publication October 11, 2013.
- Accepted for publication January 29, 2013.