Introduction

Over 90 % of malignant head and neck tumours in adults are squamous cell carcinomas (HNSCC) [1]. Appropriate assessment of superficial and deep tumour spread, regional lymphadenopathy and distant metastases is essential for staging and therapeutic planning. According to the guidelines of the International Union Against Cancer (UICC) and American Joint Committee on Cancer (AJCC), tumour staging requires endoscopy with biopsy and additional imaging [1].

Morphologic evaluation is preferably done with contrast-enhanced CT or MRI. However, fluorodeoxyglucose (FDG) positron emission tomography (PET/CT) and diffusion-weighted MR imaging (DW MRI) are increasingly used in oncologic head and neck imaging in order to add diagnostic information beyond morphology [2, 3]. Although each modality is based on different physical principles, both modalities may be seen as functional imaging tools as they allow interrogation of tissue with regard to certain biologic properties. FDG PET/CT measures increased cellular glucose metabolism as expressed by the standardized uptake value (SUV). In oncologic imaging, increased SUV may be seen as a sign of increased cell proliferation [14] and may also correlate with the degree of tumour necrosis [5]. The principle of DW MRI as a functional biomarker is based on the assessment of random (Brownian) motion of extracellular water molecules, which is restricted in hypercellular tumour tissue, as expressed by the decreased apparent diffusion coefficient (ADC) value. In addition, ADC values may also reflect cell proliferation [6, 7] and may be affected by the presence of tumour necrosis [8]. Based on previous studies it appears that most HNSCC have lower ADC values than normal tissue because of their higher cellular density [8].

Since both increased SUV and decreased ADC values are seen in the context of neoplasia although they refer to different biologic phenomena, it would be of interest to know whether these parameters are statistically correlated or independent in order to facilitate their use in diagnostic interpretation. Previous studies in different tumour types have found diverging results. An inverse correlation was recently demonstrated between SUV and ADC values in gastrointestinal stromal tumours [9], lung cancer [10] and in cervical cancer [11], whereas no correlation was found in lymphoma [12]. To the best of our knowledge, only three reports have so far compared SUV and ADC values in HNSCC. The results were diverging with either significant correlation or no correlation [1315]. It also remains open whether SUV and ADC measurements are reproducible in HNSCC due to the paucity of reported data [16, 17].

The purpose of the current study was to assess the reproducibility of ADC and SUV measurements in patients with biopsy-proven HNSCC and to evaluate whether ADC and SUV values are statistically correlated or independent functional parameters.

Materials and methods

Study design and patient selection

This retrospective study was approved by the Institutional Ethics Committee and was performed in accordance with the guidelines of the Helsinki II Declaration. Informed consent was waived. Inclusion criteria were: adult patients with histologically proven HNSCC, who had undergone whole-body FDG PET/CT and high-resolution head and neck DW MRI prior to treatment (mean delay between the two exams = 3.5 days) and who were subsequently treated with surgery, radio(chemo)therapy or a combination of the two. A computerized search of the PACS archives and medical records of our institution retrospectively identified 34 consecutive patients who fulfilled the above-mentioned inclusion criteria. One patient had to be excluded from the study because of poor DW MRI image quality due to metal implants causing major image distortion. Therefore, 33 patients formed the basis of the current study (22 men and 11 women with a mean age of 54.7 years; range 16–77 years). One patient had two synchronous HNSCC resulting in a total of 34 evaluated tumours. In 24 patients, the indication for imaging was primary staging of HNSCC; in 10 patients, the indication was follow-up or suspected recurrence 45 months (range 10–95 months) after surgery ± radiotherapy. Primary tumour sites were as follows: oropharynx (n = 13), hypopharynx (n = 6), oral cavity (n = 6), larynx (n = 3), nasopharynx (n = 3), parotid gland (n = 2) and paranasal sinuses (n = 1). Tumour size assessed by the longest diameter measured in the axial plane according to RECIST criteria was 32 ± 14 mm (range 15–64 mm). Of the tumours, 8 were poorly differentiated or undifferentiated, 21 were moderately differentiated and 5 were well differentiated. There were one T1, four T2, five T3 and fourteen T4 primary tumours and four T1, two T2, two T3 and two T4 recurrent tumours.

PET/CT acquisition

All PET/CT examinations were performed on an integrated PET/CT scanner (Biograph 16-slice PET/CT scanner, Siemens Healthcare, Erlangen, Germany). Prior to the exam, all patients had fasted for at least 6 h and the measured intravenous serum glucose concentration prior to study initiation was less than 8.5 mmol/l. PET data acquisition was started 60 min after injection of 370 MBq of 18F-FDG with 3 min per bed position for a total of 7 to 9 beds, from the vertex to the proximal thigh. CT data acquisition for attenuation correction was performed using the following parameters: 120 kV, 180 mAs, 16 × 1.5 collimation, pitch = 1.2, 1 s per rotation. PET image reconstruction was done using an attenuation-weighted ordered subset expectation maximization (AWOSEM) iterative reconstruction algorithm with a matrix of 168 × 168 and a slice thickness of 5 mm. The reconstruction parameters were set to the default values (four iterations, eight subsets, and a post-processing Gaussian kernel with a full-width at half-maximum of 5 mm). The SUV was calculated by using the standard formula normalized by body weight: SUV = cdc/(di/w), where cdc is the decay-corrected tracer tissue concentration (Bq/g), di is the injected dose (Bq), and w is the patient’s body weight (g) [18]. In addition, high-resolution, contrast-enhanced CT, which is part of the routine PET/CT protocol for HNSCC in our institution, was obtained after intravenous administration of 2 ml/kg of iohexol (Accupaque 350®, GE Healthcare SA, Opfikon, Switzerland) in all patients.

MRI acquisition

The MRI examinations were performed with dedicated head and neck coils on a 1.5 T Espree machine (Siemens Healthcare, Erlangen, Germany) in 23 patients and on a 3 T Trio (Siemens Healthcare, Erlangen, Germany) in 11 patients. The MRI protocol on both machines included axial T2-weighted, axial T1-weighted and axial diffusion-weighted imaging (DWI) sequences obtained before intravenous injection of contrast material followed by an axial T1-weighted sequence after contrast material application (gadobenate dimeglumine, Multihance®, Bracco Diagnostics, Milan, Italy). Depending on tumour location, additional T1-weighted sequences with fat saturation were then obtained in the coronal and/or sagittal plane. DWI was performed using a single-shot echo planar technique with fat suppression on both machines (repetition time ms/echo time ms, TR/TE = 3,200/86 and TR/TE = 2,200/63, respectively, 40 sections, slice thickness = 5 mm, intersection gap = 1.5 mm, field of view = 230 × 230 mm, matrix = 128 × 128, 3 acquired signals, pixel resolution = 1.8 × 1.8 × 5.0 mm, acquisition time = 3 min 02 s and 3 min 17 s, respectively). Two b values were applied: 0 and 1,000 s/mm2. DWI images were acquired in three orthogonal directions and then combined into a single trace image. ADC maps were calculated automatically by the MRI software using the following formula: ADC (mm2/s) = -ln (S2/S1)/(b2-b1), with b1 = 0 and b2 = 1,000. The parameters for the fast spin echo T2- weighted sequence at 1.5 and 3 T were as follows: TR/TE = 3,300/106 and TR/TE = 3,010/82, respectively, 24 sections, slice thickness = 4 mm, intersection gap = 0.8 mm, field of view = 230 × 180 mm, matrix = 512 × 416, three acquired signals, pixel resolution = 0.4 × 0.4 × 4.0 mm, acquisition time = 3 min 30 s and 4 min 20 s, respectively). The fast spin echo T1-weighted sequences performed before and after intravenous injection of contrast material at 1.5 and 3 T had the following parameters: TR/TE = 771/11 and TR/TE = 687/9, respectively, 30 slices, slice thickness = 3 mm, intersection gap = 0.6 mm, field of view = 230 × 230 mm, matrix = 512 × 512, two acquired signals, pixel resolution = 0.4 × 0.4 × 4.0 mm, acquisition time = 3 min 56 s and 4 min 36 s, respectively). In order to obtain good MRI image quality, all patients were carefully instructed to refrain from vigorous swallowing or breathing during image acquisition, in particular during the acquisition of DWI sequences. Small breaks were made between individual sequences, so as to allow patients to clear the throat of mucous secretions, to cough and to swallow between the sequences.

Image evaluation, ADC and SUV measurements

All images were analysed by two independent readers who evaluated both data sets (MRI including DW MRI and PET/CT). One reader was a board-certified nuclear medicine specialist with additional expertise in head and neck MRI, and the second reader was a board-certified head and neck radiologist with additional expertise in PET/CT. Defined criteria were used for each modality and both readers were blinded to clinical data and histopathologic results. Image quality and lesion conspicuity on each modality were evaluated using a three-point scale (good-moderate-poor). Prior to performing measurements, each reader also assessed the presence or absence of geometric distortion, breathing and swallowing artefacts. All images were evaluated in the transverse plane on a PACS workstation to allow comparison between DW MRI acquired in the axial plane and PET/CT images.

First, the fused T2-weighted, b1000 images and the corresponding PET/CT fused images were identified and matched, so as to enable measurements at the same anatomic levels (Figs. 1 and 2). Then one reader drew a region of interest (ROI) manually avoiding negative pixels by contouring the tumour area identified on the fused T2-weighted, b1000 image and the respective ROI size was saved in a DICOM format. This ROI with a predefined size was then used for subsequent measurements on the anatomically corresponding ADC maps and PET images of the same patient. The size of the predefined ROI varied from 5 to 825 mm2 (mean = 199 mm2) depending on the size of the individual tumours on axial slices. In each head and neck tumour, the measurements were performed at the largest tumour size levels. The precise anatomic levels where the measurements were performed were recorded. In a first session, the first reader measured ADC and SUV values for each lesion. Then, the second reader measured the same parameters using the predefined ROI size and performing the measurements at the predefined anatomic levels. Two weeks later, measurements were repeated by the two readers, independently, using the same methodology, so as to obtain a second data set for each observer enabling assessment of intra-observer agreement.

Fig. 1
figure 1

Anatomically matched corresponding transverse slices of inverted b1000 (a), PET (b), fused T2-weighted, b1000 (c) and fused PET/CT (d) images of a 63-year-old male with primary squamous cell carcinoma of the hypopharynx staged T4a N2c M0. Both PET/CT and MRI fused images allow correct anatomic assessment of the hypopharyngeal lesion that invades the retrocricoarytenoid region and both piriform sinuses

Fig. 2
figure 2

Anatomically matched corresponding transverse slices of inverted b1000 (a), PET (b), fused T2-weighted, b1000 (c) and fused PET/CT (d) images of a 59-year-old patient with primary undifferentiated carcinoma of the nasopharynx staged T4 N3 M0. Both PET/CT and MRI fused images allow correct anatomic assessment of the right nasopharyngeal carcinoma that extends to the skull base, parapharyngeal and retropharyngeal space

Statistical analysis

Statistical analysis was carried out using a commercially available software program (SPSS software for Windows, version 11.0, SPSS Inc., Chicago, IL, USA). Two methods were used to assess intra- and inter-observer agreement for SUV and ADC measurements [1921]. Inter- and intra-observer reproducibility of ADC and SUV of tumours were determined as mean absolute difference (bias) and as 95 % confidence interval (CI) of the mean difference (limits of agreement), according to the methods of Bland and Altman [19]. Relative bias (bias/mean value) and relative limits of agreement (limits of agreement/mean value) were calculated. Single measure intraclass correlation coefficients (ICC) were calculated using the two-way random analysis of variance (ANOVA) on average measures (ICC ranges 0.00–1.00 with values closer to 1.00 representing better reproducibility) [19]. Interpretation of ICC was categorized according to Landis and Koch [20] as follows: <0, no reproducibility; 0.0–0.20, slight reproducibility; 0.21–0.40, fair reproducibility; 0.41–0.60, moderate reproducibility; 0.61–0.80, substantial reproducibility; and 0.81–1.00, almost perfect reproducibility. Pearson’s correlation coefficient analysis was used to evaluate the correlations between ADC (ADCmean, ADCmax, ADCmin) and SUV (SUVmean, SUVmax, SUVmin) values. A p value of less than 0.05 was considered as statistically significant. Kruskal-Wallis one-way ANOVA [21] was used to detect any statistically significant difference between ADC and SUV values and ratios in primary versus recurrent tumours and in different histologic tumour grades. Student’s t test was used to compare significant difference between ADC at 1.5 and 3 T.

Results

Image quality and tumour detection

MRI image quality was considered as good in 20 patients (61 %) and moderate in 13 patients (39 %). DW MRI image quality did not depend on tumour location but on patient cooperation during the MRI examination. Susceptibility artefacts caused minor signal loss and minor to moderate geometric distortion on DWI images in 23 of the 33 patients (70 %); slight motion artefacts due to swallowing or breathing were present in 13 patients (39 %). Regarding the PET/CT, image quality was good in 27 cases (82 %) and moderate in 6 cases (18 %). Artefacts due to dental amalgam were seen in 14 patients (42 %) and swallowing/breathing artefacts were present in 2 cases (6 %).

Conspicuity of all 34 HNSCC was rated as good on all fused T2-weighted, b1000 images, ADC maps and PET/CT images. All tumours showed a restricted diffusion with high signal intensity on the b1000 images. Anatomic matching of the fused T2-weighted, b1000 images and PET/CT images was feasible in all tumours with comparable tumour conspicuity on both modalities (Figs. 1 and 2).

ADC and SUV values

Table 1 summarizes the results of ADC (ADCmean, ADCmin) and SUV (SUVmean, SUVmax) measurements of both observers performed during the two readings. The mean ADCmean and ADCmin values of all tumours together were 1.05 ± 0.34 × 10−3 mm2/s and 0.65 ± 0.29 × 10−3 mm2/s, respectively. The mean SUVmean and SUVmax values of all tumours were 9.86 ± 3.82 and 12.80 ± 5.00, respectively. SUVmax values were higher for primary than for recurrent tumours (p = 0.025), whereas no statistically significant difference was observed regarding SUVmean, ADCmean and ADCmin (p = 0.109, p = 0.897 and p = 0.257, respectively). There was no statistically significant difference for ADCmean values of HNSCC at 1.5 and 3 T (p = 0.201).

Table 1 Comparison of ADC and SUV values in primary and recurrent HNSCC and for all tumours together

Intra- and inter-observer agreement

The mean bias and the limits of agreement for intra- and inter-observer measurements of SUV and ADC are displayed in the Bland–Altman plots of Figs. 3 and 4 and summarized in Tables 2 and 3. ICC showed almost perfect reproducibility for ADC and SUV values (Table 4).

Fig. 3
figure 3

Intra-observer reproducibility measurements in tumours of ADCmean (a), SUVmean (b), ADCmin (c) and SUVmax (d). Bland–Altman plots of relative difference values (in %) of measurements (y-axis) against relative mean values (in %) of measurements (x-axis). The mean absolute difference (bias) and 95 % CI of the mean difference (1.96 SD) are equally indicated

Fig. 4
figure 4

Inter-observer reproducibility measurements in tumours of ADCmean (a), SUVmean (b), ADCmin (c) and SUVmax (d). Bland–Altman plots of relative difference values (in %) of measurements (y-axis) against relative mean values (in %) of measurements (x-axis). The mean absolute difference (bias) and 95 % CI of the mean difference (1.96 SD) are equally indicated

Table 2 Inter-observer agreements according to Bland and Altman
Table 3 Intra-observer agreements according to Bland and Altman
Table 4 Single measure ICC of overall measurements, indicating almost perfect reproducibility of measurements according to Landis and Koch [20]

Correlation of ADC and SUV values

There was no correlation between ADC values (ADCmean, ADCmin, ADCmax) and SUV values (SUVmax, SUVmean or SUVmin) (r −0.103, −0.051; p 0.552, 0.777) (Fig. 5).

Fig. 5
figure 5

Scatter plots represent results of linear regression between ADCmean and SUVmean (a), ADCmean and SUVmax (b), ADCmin and SUVmean (c) and ADCmin and SUVmax (d). ADCs are expressed in 10−3 mm2/s. For each scatter plot, the best-fit line is shown. No statistically significant correlation was found between ADC and SUV values

ADC and SUV in relationship to the histologic grade

Table 5 summarizes the results of ADCmean and SUVmean of well, moderately and poorly differentiated HNSCC. No significant correlation was seen between the ADCmean and histologic grade (p = 0.216) or between SUVmean and histologic grade (p = 0.425). There was, however, a tendency of SUV to increase with dedifferentiation and a tendency of ADC to decrease with dedifferentiation (Table 5).

Table 5 ADC and SUV values of well, moderately and poorly differentiated HNSCC (n = 34)

Discussion

Most DWI sequences routinely used in head and neck oncology are echo planar imaging (EPI)-based sequences [8, 13, 17, 22, 23]. Although the quality of EPI DWI images may be impaired by susceptibility artefacts, breathing or swallowing, we were able to correctly localize the tumours and to precisely position ROIs by fusing the b1000 and T2-weighted sequences (Figs. 1 and 2). As recently shown, the use of newer improved EPI technology, dedicated surface coils and optimized sequences enables a maximal reduction of EPI-related artefacts at a relatively high spatial resolution [8, 2327]. In addition, by carefully instructing the patient not to move or swallow during image acquisition, good quality DWI images may be obtained in the vast majority of patients. In the current series, DWI images were of poor quality only in 1 of 34 patients (3 % non-diagnostic DWI).

As pointed out recently, in order to eliminate the effect of possible distortion due to susceptibility artefacts, ADC measurements should not be performed on ADC maps alone but the anatomic information from T1- or T2-weighted sequences should be taken into consideration when performing these measurements [8, 27, 28]. In the current study ADC measurements were performed using predefined ROIs saved in DICOM format and fusing the b1000 and T2-weighted images, thereby allowing optimal anatomic matching.

In our study we calculated the ADC of tumours from b values of 0 and 1,000 s/mm2. Such high b values eliminate the perfusion effect and have been used by most investigators for the evaluation of HNSCC [8, 17, 23, 24, 2731].

The ADC values in the current series were obtained at 1.5 and 3 T. In theory, ADC values are independent of the magnetic field strength [32]. Having measured at least two different b values (such as b = 0 and b = 1,000 s/mm2), the logarithm of the relative signal intensity of a tissue is plotted on the y-axis against the b values on the x-axis. The slope of the line fitted through the plots describes the ADC [32]. This mono-exponential fitting is most often used in the literature [8, 17, 23, 27, 32]. Several investigators have compared ADC values of different tissues at different field strengths. With one exception [33], the vast majority of studies evaluating the influence of the magnetic field strength on ADC measurements at 1.5, 3 and 7 T found no statistically significant difference for ADC values either in the abdomen [34, 35], or in the head and neck [36, 37], brain [25, 38] or breast [38] provided that the parameters of the DWI sequence used were identical. Data of the current series showing no statistically significant difference between ADC values of HNSCC at 1.5 and 3 T are in coherence with the literature; in this retrospective series identical b values and parameters were used so that any possible variation is negligible at most.

Although ADC measurements are increasingly used in head and neck oncology, data regarding measurement reproducibility are very scarce both for HNSCC [17] and for nodal metastases from HNSCC [17, 22]. Our current data suggest that intra- and inter-observer agreement for ADC measurements are excellent (Figs. 3 and 4). Compared to the only series to date evaluating observer variability in HNSCC [17], we found a slightly superior inter-observer agreement (ICC = 0.95–0.96 versus reported 0.79) and a higher intra-observer agreement for ADC values (ICC = 0.98 versus reported 0.33) [17]. These results may be explained by the use of predefined ROIs matched for size and anatomic levels resulting in minimal inter- and intra-observer variability.

Currently, PET/CT is widely used for the pre-therapeutic evaluation of HNSCC and for the assessment of treatment response. In most studies, the most common quantitative approach used is to measure SUVmax because this value is not dependent on the size of the ROI used [13, 14, 39]. In addition, in routine clinical practice, SUVmean is often equally reported. While SUVmax does not depend on the observer, SUVmean may theoretically vary with the person who draws the ROI [39]. Several investigators have shown an excellent inter-observer reproducibility for SUVmax values for lung cancer [40], sarcomas [41], breast cancer and, more recently, for HNSCC [16]. In the current series, we found almost perfect SUVmean and SUVmax reproducibility with ICC values ranging from 0.97 to 0.99.

Both the ADC values and the SUV values of HNSCC in the current study are similar to those reported by previous investigators [1315, 24, 29, 30, 4246]. In our study, there was no statistically significant association between the histologic tumour grade and SUV or ADC values (Table 5), although we observed a trend towards higher SUV and lower ADC values in poorly differentiated as compared to well differentiated HNSCC (Table 5). A similar trend has been recently reported by another group [43].

Although SUV values mainly reflect cell proliferation, whereas ADC values mainly reflect tissue cellularity, it is still unclear whether both parameters may provide similar information with respect to viable tumour cells or degree of dedifferentiation. The question of whether these parameters are statistically correlated or independent is gaining increasing attention, as recent data suggest that both ADC and SUV values may be correlated with cell proliferation and tumour necrosis [14]. Our current study did not show a correlation between SUV and ADC(1000) values in HNSCC, indicating that these two biomarkers are independent biomarkers in HNSCC. Similar observations were made in a previous study [13, 14]; however, they were contradicted by another group who found a significant negative correlation between SUVmax and ADC(800) in 26 patients [15]. Correlating SUV and ADC values of tumours requires performing measurements on different imaging modalities at the same location within the tumour. Ideally, this should be performed by fusing MRI and PET/CT data sets. Although fusion of MRI and PET/CT data sets may be feasible using currently available anatomy-based fusion software, the quality of such multimodality fusion in the head and neck strongly depends on the anatomic tumour location. As recently suggested, the quality of data fusion above the hyoid bone may be good, whereas fusion quality below the hyoid bone may be fair to poor [47]. Until newly developed elastic fusion software may become more widely available, visual correlation using anatomic landmarks remains the most reliable way to precisely compare tumour levels on different imaging modalities. Although the possible correlation between SUV and ADC values is still a controversial issue, both parameters may accurately predict response to chemotherapy [15, 48]. Additional work thus appears necessary in order to determine how to interpret these biomarkers in a complementary fashion.

In conclusion, our study indicates that SUV and ADC values are independent parameters in HNSCC. Measurements of these two biomarkers were reproducible with almost perfect inter-observer and intra-observer agreements for both methods. Neither SUV nor ADC values were able to predict the histologic grade, although a trend towards higher SUV and lower ADC values was observed in poorly differentiated tumours. Further studies in larger patient populations may address the question of whether the complementary use of SUV and ADC values could be useful in the diagnosis of primary and recurrent HNSCC.