Visual Abstract
Abstract
The study rationale was to assess the performance of qualitative and semiquantitative scoring methods for 18F-FDG PET assessment in large-vessel vasculitis. Methods: Patients with giant cell arteritis or Takayasu arteritis underwent independent clinical and imaging assessments within a prospective observational cohort. 18F-FDG PET/CT scans were interpreted for active vasculitis by central reader assessment. Arterial 18F-FDG uptake was scored by qualitative visual assessment using the PET vascular activity score (PETVAS) and by semiquantitative assessment using SUVs and target-to-background ratios (TBRs) relative to liver or blood activity. The performance of each scoring method was assessed by intrarater reliability using the intraclass correlation coefficient (ICC) and areas under the receiver-operating-characteristic curve, applying physician assessment of clinical disease activity and reader interpretation of vascular PET activity as independent reference standards. The Wilcoxon signed-rank test was used to analyze change in arterial 18F-FDG uptake over time. Results: Ninety-five patients (giant cell arteritis, 52; Takayasu arteritis, 43) contributed 212 18F-FDG PET studies. The ICC for semiquantitative evaluation (0.99 [range, 0.98–1.00]) was greater than the ICC for qualitative evaluation (0.82 [range, 0.56–0.93]). PETVAS and target-to-background ratio metrics were more strongly associated with reader interpretation of PET activity than SUV metrics. All assessment methods were significantly associated with physician assessment of clinical disease activity, but the semiquantitative metric liver tissue-to-background ratio (TBRLiver) achieved the highest area under the receiver-operating-characteristic curve (0.66). Significant but weak correlations with C-reactive protein were observed for SUV metrics (r = 0.19, P < 0.01) and TBRLiver (r = 0.20, P < 0.01) but not for PETVAS. In response to increased treatment in 56 patients, arterial 18F-FDG uptake was significantly reduced when measured by semiquantitative (TBRLiver, 1.31–1.23; 6.1% change; P < 0.0001) or qualitative (PETVAS, 22–18; P < 0.0001) methods. Semiquantitative metrics provided information complementary to qualitative evaluation in cases of severe vascular inflammation. Conclusion: Both qualitative and semiquantitative methods of measuring arterial 18F-FDG uptake are useful in assessing and monitoring vascular inflammation in large-vessel vasculitis. Compared with qualitative metrics, semiquantitative methods have superior reliability and better discriminate treatment response in cases of severe inflammation.
Large-vessel vasculitis (LVV) refers to a class of rare diseases characterized by inflammation of the aorta and its primary branch arteries. Giant cell arteritis and Takayasu arteritis comprise the 2 major subtypes of LVV (1). 18F-FDG PET can detect metabolic activity in the walls of large arteries as a biomarker of vascular inflammation (2). Ample evidence supports the use of 18F-FDG PET as a diagnostic surrogate to histologic confirmation of vasculitis, which is advantageous because arterial biopsies are invasive and can be difficult to obtain (3,4). In contrast to diagnostic assessment, use of arterial 18F-FDG uptake to guide treatment decisions and monitor disease activity is less well defined (5–9), in part because of lack of prospective, longitudinal imaging studies on LVV (10,11). Reliance on clinical assessment alone may lead to underdetection of vascular pathology (12). Vascular inflammation with angiographic progression of disease can occur in patients with LVV who are otherwise completely asymptomatic, highlighting a need for vascular imaging to complement clinical assessment in these patients (13).
Uncertainty about the optimal method to evaluate 18F-FDG uptake in the large arteries remains a major barrier to the use of 18F-FDG PET to monitor vascular inflammation (11). Both visual/qualitative and semiquantitative methods of 18F-FDG PET assessment have been reported in LVV. Qualitative methods typically visually compare the amount of 18F-FDG uptake in the arterial wall relative to a background tissue, such as the liver (11,14), similar to the Deauville score used in lymphoma (15). In contrast, semiquantitative methods use regions of interest (ROIs) constructed on the PET image to determine SUVmax (16). Target-to-background ratios (TBRs), comprised of SUVs from arterial tissue referenced to background tissue (e.g., liver, blood pool), are also used to quantify arterial 18F-FDG uptake in atherosclerosis and vasculitis (17). Recent recommendations highlight that several methods of quantifying arterial 18F-FDG uptake are available, but the relevance of each method in evaluating patients requires further clarification (11). SUV metrics often overlap between patients with LVV and controls, and many patients with LVV have residual, and sometimes profound, arterial 18F-FDG uptake during periods of apparent clinical remission (18).
There is an unmet need to better understand the strengths and weaknesses of qualitative versus semiquantitative methods of quantifying arterial 18F-FDG uptake in LVV. Semiquantitative assessment of arterial 18F-FDG uptake can be a time-consuming process, which may be difficult to apply in a contemporary clinical setting or be cost-prohibitive in research. In contrast, qualitative PET assessment may be easier to do with appropriate user training; however, qualitative assessment may be less reliable and accurate in quantifying arterial 18F-FDG uptake than are semiqualitative approaches (19,20).
This study aimed to compare the effectiveness of qualitative and semiquantitative scoring methods, with the goal of informing a standardized approach to 18F-FDG PET assessment in LVV for use in clinical care and research.
MATERIALS AND METHODS
Study Population
Patients with LVV who were at least 18 y old were recruited into a prospective, observational cohort at the National Institutes of Health. All patients provided written informed consent, and the study was approved by an institutional review board at the National Institutes of Health (NCT02257866; 14-AR-0200). All patients fulfilled the 1990 American College of Rheumatology Classification Criteria for Takayasu arteritis (21) or the modified 1990 American College of Rheumatology Criteria for giant cell arteritis (22,23). The patients were enrolled at various stages of the disease course. Treatment decisions were made at the discretion of each patient’s local health-care provider rather than by the investigative research team.
Clinical Assessment
Each patient’s imaging assessment took place within 24 h after that same patient’s clinical assessment at the National Institutes of Health Clinical Center. Repeat imaging studies and clinical assessments were performed at 6-mo intervals. A team of clinical rheumatologists with further specialist training and experience in LVV evaluated all cases. Physician assessment of clinical disease activity was recorded as active or remission on the basis of findings from the medical history, physical examination, and laboratory assessments. Active disease was defined as the presence of clinical disease features attributed to vasculitis (e.g., carotidynia) at the time of assessment. Remission was defined as the absence of clinical symptoms attributable to vasculitis at the time of assessment. Imaging study findings were not incorporated into the definition of clinical disease activity.
18F-FDG PET Imaging Protocol
All patients underwent 18F-FDG PET CT on a 128-detector-row Biograph mCT (Siemens Medical Solutions). The patients were given detailed instructions to avoid carbohydrate-laden meals 1 d before imaging and to fast on the day of imaging. The 18F-FDG dose was fixed to 370 MBq for all patients. Images of the torso were acquired 2 h after injection. Postacquisition image reconstruction used CT attenuation correction and iterative reconstruction (point-spread function correction with time-of-flight correction, 3 iterations, 21 subsets, a 256 matrix, a final isotropic voxel resolution of 3.2 mm3, and no postreconstruction filtering).
18F-FDG PET Imaging Assessment
Qualitative Analysis
One imaging specialist interpreted all PET studies without knowledge of the clinical data. A study was excluded if there were technical concerns about image quality per physician review. Each study was subjectively interpreted as PET-active or PET-inactive if, respectively, there was or was not at least 1 area of abnormal arterial 18F-FDG uptake felt to represent vascular inflammation. Intra- and interrater reproducibility of LVV PET image interpretation by our group has been previously reported to be excellent (18). Qualitative assessment of 18F-FDG uptake was also performed at the territory level, which included 4 segments of the aorta (ascending, arch, descending thoracic, and abdominal) and 5 branch arteries (brachiocephalic, right and left carotid, and right and left subclavian). Scores between 0 and 3 were assigned to each territory, representing the visual degree of arterial 18F-FDG uptake relative to liver 18F-FDG uptake (0, no uptake; 1, less than liver; 2, similar to liver; and 3, greater than liver). Adding the qualitative arterial territory scores yields a summary score (termed the PET vascular activity score, or PETVAS) ranging from 0 to 27, with higher scores indicating a greater global burden of vascular inflammation (18).
Semiquantitative Analysis
ROIs were manually contoured in OsiriX DICOM Viewer (version 9.5.2) with respect to both CT anatomic location and coregistered PET activity to determine arterial 18F-FDG SUVs. ROIs were drawn in the axial dimension, encompassing both arterial wall and lumen. Five segments of the aorta (ascending aorta, aortic arch, descending thoracic aorta, suprarenal abdominal aorta, and infrarenal abdominal aorta) and 4 branch arteries (right and left common carotid and subclavian arteries) were segmented in this process to create 9 territories. The 18F-FDG SUVmax per ROI of each territory was identified. A territory score was calculated by taking the average of the SUVmax across all ROIs in the territory (17). A global summary metric (SUVArtery) was calculated by averaging all territory scores.
The volumetric SUVmean in the liver was measured in the dome of the right lobe. The volumetric SUVmean in the venous blood pool was measured within the right jugular, superior vena cava, right atrium, and inferior vena cava. SUVArtery was divided by the background tissue to generate 2 TBR metrics: liver TBR (TBRLiver) and blood TBR (TBRBlood).
Statistical Analysis
Intrarater Reliability
Intrarater reliability, reflecting the variation in data measured by 1 rater over multiple trials, was quantified with a 2-way random effect (consistency) and a single-measurement intraclass correlation coefficient (ICC) (24). ICC estimates and their 95% CIs were calculated using R, package irr (version 0.84.1). ICCs lie between 0 and 1. Values less than 0.5, between 0.5 and 0.75, between 0.75 and 0.9, and greater than 0.9 are indicative of poor, moderate, good, and excellent reliability, respectively. The ICC for the qualitative approach was obtained by repeating PETVAS on a set of randomly selected patients. The ICC for the semiquantitative approach was obtained by recontouring ROIs to recalculate SUVArtery for a set of randomly selected patients representing 10% of the cohort.
Receiver-Operating-Characteristic Curve
Area under the receiver-operating-characteristic curve (AUC), along with the 95% CI, was used as a combined measure of sensitivity and specificity to evaluate the overall performance of the PET scoring metrics as classifiers of a binary outcome (25), either reader interpretation of vascular PET activity (PET-active vs. PET-inactive) or physician assessment of clinical disease activity (clinically active vs. clinical remission). AUCs lie between 0 and 1. Metrics with capability to distinguish between binary outcomes will result in an AUC above 0.5, with larger AUCs suggesting better diagnostic performance. The Youden J statistic was used to determine the optimal cutoff score that maximized the distance to the identity (diagonal) line.
Mixed-Effects Logistic Regression
To account for repeated imaging contributions from a single patient, generalized linear mixed models with logistic outcomes were constructed. The dependent variable was a binary classification of either reader interpretation of vascular PET activity (PET-active vs. PET-inactive) or physician assessment of clinical disease activity (clinically active vs. clinical remission). The PET scoring metric, either semiquantitative or qualitative, was used as the fixed effect, with patient identification used as a random effect. A “bound optimization by quadratic approximation” nonlinear optimizer and 10 points of integration for the adaptive gaussian Hermite approximation were used as model control parameters. Independent generalized linear mixed models were created for each scoring method. The Akaike information criterion (AIC) estimates the information loss for a given model and is a means for model selection. Relative to the other models, the candidate model with the lowest AIC minimizes estimated information loss. All generalized linear mixed-model analysis was performed using R, package lme4 (version 1.1-21).
Correlation Analysis
Spearman rank-order correlation was used to measure the association between the PET scoring metrics and the acute-phase reactants (C-reactive protein and erythrocyte sedimentation rate). The Spearman r, ranging from 0 to 1, and the P value of the correlation are presented.
Longitudinal PET Assessment in Response to Treatment
The Wilcoxon matched-pairs signed-rank test was used to compare changes in PET assessment metrics between 2 time points for the same patient. When stratifying by treatment status, we placed initial and follow-up scan pairings into increased-treatment and no-change groups. Increased treatment was defined as the introduction of a glucocorticoid-sparing medication or an increase in daily prednisone dose by more than 5 mg. No change was defined as maintenance of biologic agent administration or a stable glucocorticoid dosage.
Semiquantitative metrics of 18F-FDG PET activity exist on a continuous scale. In contrast, a qualitative metric such as PETVAS is ordinal, with a maximum score of 27 (18). In cases of severe inflammation in which PET activity may be reduced but remains in a range above the maximum PETVAS score, semiquantitative metrics may be better suited to demonstrate a change in PET activity. A subset of patients was selected who had, first, severe vascular inflammation defined by a baseline PETVAS score of 27 and, second, a reduction in 18F-FDG uptake on the follow-up scan by visual assessment, as agreed on by 2 independent readers. The longitudinal change in PET activity measured by PETVAS versus TBRLiver metrics was compared in this subset of patients.
RESULTS
Study Population
In total, 95 patients (giant cell arteritis, 52; Takayasu arteritis, 43) contributed 212 imaging studies. Three imaging studies were excluded because of concerns about image quality. Demographics were consistent with the expected age and sex distributions for giant cell arteritis and Takayasu arteritis (Table 1). The patients were seen, on average, 6.1 y into the disease course while taking, on average, 8.3 mg of daily prednisone.
Intrarater Reliability
Intrarater reliability for repeat scoring of 34 imaging studies using the semiquantitative scoring protocol was excellent (ICC, 0.99; range, 0.98–1.00). Intrarater reliability for the qualitative assessment by PETVAS was good (ICC, 0.82; range, 0.56–0.93).
Quantification of Arterial 18F-FDG Uptake in Association with Reader Interpretation of PET Scan Activity
Of 209 18F-FDG PET imaging studies, 147 scans were interpreted as PET-active and 62 scans as PET-inactive. Compared with the use of SUV alone, discriminatory power (AUC) was greater and model quality was better (lower AIC) when TBR was used to differentiate PET-active from PET-inactive scans (Table 2). PETVAS performed similarly to TBR, with better performance characteristics than SUV. PETVAS achieved the highest AUC and lowest AIC relative to the other models, with an optimal cutoff of 19.5 (Table 2).
Quantification of Arterial 18F-FDG Uptake in Association with Physician Assessment of Clinical Disease Activity and Laboratory Tests
Complete clinical and imaging assessments were available for 206 study visits. Clinical disease activity was assessed as clinically active for 95 study visits and clinical remission for 131 study visits. Corresponding arterial 18F-FDG uptake evaluated by any proposed method significantly discriminated active disease from clinical remission, but TBR metrics and PETVAS resulted in higher AUCs than did SUV metrics (Table 3). Within the proposed mixed models, the PETVAS-informed model had the lowest AIC when predicting the same clinical outcomes as the other models, suggesting the best model fit (26). Broadly, all AUCs were lower when 18F-FDG metrics were compared with clinical assessment than when reader interpretation of PET activity was used as the reference standard.
Significant but weak correlations with acute-phase reactants were observed for SUVArtery (C-reactive protein: r = 0.19, P < 0.01; erythrocyte sedimentation rate: r = 0.14, P = 0.04) and TBRLiver (C-reactive protein: r = 0.20, P < 0.01; erythrocyte sedimentation rate: r = 0.15, P = 0.03). Neither TBRBlood nor PETVAS correlated significantly with C-reactive protein or erythrocyte sedimentation rate (Table 4).
Longitudinal Treatment Response
Treatment was increased over 56 interval study visits. Correspondingly, there was a significant reduction in vascular inflammation by the semiquantitative approach (median TBRLiver, 1.31 [IQR, 1.19–1.59] to 1.23 [IQR, 1.13–1.39]; P < 0.001) or the qualitative approach (median PETVAS, 22 [IQR, 17–25] to 18 [IQR, 15–22]; P < 0.001). Over 25 interval visits for which there was no change in treatment status between successive imaging studies, the degree of vascular inflammation remained similarly unchanged as measured by either semiquantitative assessment (median TBRLiver, 1.39 [IQR, 1.24–1.54] to 1.35 [IQR, 1.78–1.49]; P = 0.22) or qualitative assessment (median PETVAS, 21 [IQR, 18–25] to 21 [IQR, 18.5–25]; P = 0.68) (Figs. 1 and 2).
A subset of 9 patients with severe inflammation (baseline PETVAS of 27) who had a visually apparent reduction in arterial 18F-FDG uptake on the follow-up imaging study were studied. PETVAS was significantly reduced from a score of 27 at baseline to a median score of 24 (IQR, 18.5–26; P < 0.01) at the follow-up visit (Fig. 3). TBRLiver scores in these same patients were a median of 1.86 (range, 1.55–2.63) at the baseline visit, with a significant reduction in scores at follow-up (median, 1.24 [range, 1.14–1.69]; P < 0.01). Although the baseline PETVAS scores were the same for all 9 patients, there was a corresponding dynamic range of baseline TBRLiver scores, reflecting variability among these patients. TBRLiver was reduced over time in every patient; however, in only 3 of 9 patients was there a reduction in PETVAS, and this reduction was minimal (i.e., change ≤ 1 point). Representative images from a patient with a visually apparent reduction in vascular PET activity are shown in Figure 4.
DISCUSSION
Use of 18F-FDG PET to monitor vascular inflammation in LVV holds promise as a complement to clinical and laboratory-based assessment (10,18,27,28). Visualizing glucose metabolism within the arterial wall as a biomarker of vascular inflammation enables clinicians to noninvasively diagnose and track disease activity in LVV directly in the target tissue, in parallel with clinical and laboratory assessments (29). This ability is particularly important in LVV because patients can develop subclinical vascular inflammation that has no accompanying clinical symptoms or abnormal laboratory findings and can be detected and monitored only by vascular imaging (18,28,30). The present study advances our understanding of the strengths and weaknesses of different methodologic approaches to quantifying vascular inflammation.
Reassuringly, both qualitative and semiquantitative approaches performed well in detecting and monitoring arterial 18F-FDG PET uptake in patients with LVV. PETVAS, a qualitative scoring approach developed by our group, and semiquantitative methods had good-to-excellent intrarater reliability. Because some patients can show vascular inflammation on PET in the absence of clinical activity, we studied the performance characteristics of qualitative and semiquantitative metrics against 2 independent reference standards (31). As expected, SUV metrics, TBRs, and PETVAS were significantly associated with reader interpretation of vascular PET activity; however, TBRs and PETVAS outperformed SUV metrics as evidenced by a higher AUC in the models. When compared against physician assessment of clinical disease activity as the reference standard, all the metrics distinguished between active clinical disease and remission, with lower AUCs than when using reader interpretation of PET activity as the reference standard, showing that clinical assessment is not always linked to vascular inflammation. Both qualitative and semiquantitative approaches were useful in demonstrating a reduction in the burden of vascular inflammation in response to treatment, suggesting they have utility as outcome measures in future treatment trials on LVV.
The ease of implementation makes a qualitative strategy such as PETVAS an attractive option for clinical assessment; however, there are some limitations in comparison to semiquantitative approaches. Qualitative visual assessment requires reader experience and is subjective. Semiquantitative approaches, although more time-consuming and labor-intensive, are more reliable than PETVAS. The granularity and continuous scale of semiquantitative scoring systems leads to a better ability to discriminate change in PET activity across a wider range of values. Use of an ordinal scale such as PETVAS, with a ceiling limit of 27, may not capture important variability in patients with severe vascular inflammation, a situation in which semiquantitative metrics may be preferable or may provide an opportunity to investigate improvements in qualitative scoring.
Semiquantitative approaches correlated better than qualitative assessments with circulating markers of systemic inflammation; however, the correlation was weak. Future biomarker discovery studies on LVV that use 18F-FDG PET findings as a reference standard for disease activity should consider semiquantitative metrics rather than quantitative metrics of vascular inflammation, for greater precision in detecting candidate circulating biomarkers. In keeping with prior studies, the overall correlation of vascular inflammation with concentrations of acute-phase reactants was poor (28).
TBRs and PETVAS achieved better performance characteristics than SUV when compared with reader interpretation of vascular PET activity, as is in line with a recent study by an independent group (30). TBRLiver and TBRBlood displayed near-identical performance characteristics in association with clinical assessment of disease activity. However, TBRLiver was more strongly associated with reader interpretation of vascular PET activity and with circulating acute-phase reactants.
There are several study strengths to highlight. 18F-FDG PET image acquisition and subsequent imaging interpretation were performed according to standardized protocols. Clinical and imaging assessments were performed independent of each other to enable unbiased comparisons. A prospective, longitudinal study design was used, which is uncommon in vascular imaging studies on LVV but are important in understanding the utility of 18F-FDG PET to detect changes in vascular inflammation and in avoiding bias inherent in retrospective study designs. The performance characteristics of PET assessment were tested against both reader interpretation of PET activity and physician assessment of clinical disease activity and performed well against both of these independent reference standards.
There are a few limitations to consider. This study was conducted at a single center using a specific imaging protocol, and these findings should be replicated in other cohorts. Specifically, the qualitative and quantitative imaging metrics reported here are a product of the methodology used for patient preparation, image acquisition, and image reconstruction at a single institution. Thus, the performance characteristics of discrete cutoffs for metrics, as applied in this study, will vary if the same cutoffs are applied broadly. This study compared the performance of different methods of measuring arterial 18F-FDG uptake, as might be used in the clinical management of patients or in clinical trials of LVV. However, issues of feasibility and cost must be balanced against potential test utility.
CONCLUSION
Qualitative and semiquantitative approaches to measuring arterial 18F-FDG uptake are useful in detecting and monitoring vascular inflammation in LVV. Qualitative metrics, such as PETVAS, can be used for 18F-FDG PET assessment when simplicity and ease of interpretation are a priority, as is often the case in clinical practice or observational studies. Semiquantitative metrics can be used for 18F-FDG PET assessment when there is a need for greater precision, such as in randomized clinical trials or translational research focused on biomarker discovery.
DISCLOSURE
This work was supported through the intramural research program at the National Institute of Arthritis Musculoskeletal and Skin Disease (ZIA-AR-041199). No other potential conflict of interest relevant to this article was reported.
KEY POINTS
QUESTION: How do the performance characteristics of qualitative versus semiquantitative metrics of arterial 18F-FDG uptake compare, in detecting and monitoring vascular inflammation by PET?
PERTINENT FINDINGS: In this prospective, observational cohort study of 95 patients with LVV, qualitative and semiquantitative measurements of arterial 18F-FDG uptake were useful in monitoring vascular inflammation.
IMPLICATIONS FOR PATIENT CARE: Assessment of vascular inflammation by 18F-FDG-PET should be studied as an outcome measure in clinical trials of LVV.
Footnotes
Published online June 04, 2021.
- © 2022 by the Society of Nuclear Medicine and Molecular Imaging.
REFERENCES
- Received for publication March 19, 2021.
- Revision received May 5, 2021.