Abstract
Captopril-stimulated renography is widely used to screen selected groups of hypertensive patients for renal vascular disease. Evaluation of the test is a complex task. Lack of interobserver agreement on the assessment and interpretation of renographic parameters may contribute to differences in sensitivity and specificity between studies. Methods: Three experienced nuclear medicine physicians evaluated 658 renograms of 503 hypertensive patients suspected of having renal vascular disease from a large Dutch multicenter study (the Dutch Renal Artery Stenosis Intervention Cooperative [DRASTIC] study). Interobserver agreement on several renographic parameters was assessed by the κ statistic and the intraclass correlation coefficient (ICC). Results: The interobserver agreement on the time to excretion was high: The pooled ICC was 0.90. The pooled κ was ≥0.65 for the pattern of the time–activity curves, the visual aspect of the scintigraphic images (visible uptake and kidney size), and the judgment on the presence of renal artery stenosis. However, the interobserver agreement on cortical retention and pelvic retention by visual inspection of the images was rather low (pooled κ = 0.46 and 0.52, respectively). Pelvic retention was found to complicate the interpretation of renography. Conclusion: Interobserver agreement on most of the renographic parameters was satisfactory, but the assessment of cortical retention was more difficult, in particular, in the presence of pelvic retention. Captopril renography should be interpreted with caution if pelvic retention is suspected. Interobserver variability offers one of several explanations for the differences in diagnostic test performance that are found between studies.
Captopril-stimulated renography is a noninvasive test that is widely used to screen selected groups of hypertensive patients for the presence of renal vascular disease. In patients with renovascular hypertension, captopril induces changes in the scintigraphic images of the kidney distal to the stenosis by revealing decreased uptake or delayed excretion with cortical retention (or both). Accordingly, the time–activity curves may reveal these alterations. Evaluation of scintigraphic images and time–activity curves is encouraged in the investigation of renal vascular disease (1–3). Patients with such captopril-induced changes on the renogram are generally expected to benefit from intervention with balloon angioplasty or with stent insertion (4–6).
Interpretation of captopril renography is not a straightforward task. The nuclear medicine physician must assess several renographic parameters and subsequently integrate this information to form a judgment on the presence of renal vascular disease. Efforts have been made to standardize the test (1,7–9). These guidelines focus mainly on the procedure and not on interpretation of the results. Moreover, diagnostic criteria are not uniform, and different renographic parameters are considered. The diagnostic performance of captopril renography has been variously described with sensitivity ranging between 70% and 100% and specificity ranging between 60% and 100% (6,10,11). A lack of interobserver agreement on interpretation of the test results may have contributed to these differences. Despite the vast literature on captopril renography for diagnosing renal vascular disease, the interobserver variability has not yet been described.
In this study, 3 experienced nuclear medicine physicians, working in different university hospitals, evaluated 658 renograms of 503 patients suspected of having renal vascular disease. We analyzed the interobserver agreement on the assessment of renographic parameters and the agreement on the judgment on the presence of hemodynamically significant renal artery stenosis.
MATERIALS AND METHODS
Study Design
The study was part of the Dutch Renal Artery Stenosis Intervention Cooperative (DRASTIC) study. The aim of this multicenter study was to optimize the diagnosis and treatment of renal artery stenosis (12,13). The DRASTIC study included 1,205 hypertensive patients, 18–75 y old, who had been referred for unsatisfactory control of blood pressure or an adverse drug effect during the course of antihypertensive treatment or for analysis of possible secondary hypertension. Exclusion criteria were suspected secondary hypertension other than renal vascular disease, unstable coronary artery disease, heart failure, renal failure (serum creatinine, ≥200 μmol/L [2.26 mg/dL]), and inadequate contraception. Patients with drug-resistant hypertension (diastolic blood pressure, ≥95 mm Hg on 2 drugs during 3 visits) (n = 455) or with a rise in serum creatinine concentration after angiotensin-converting enzyme (ACE) inhibitor therapy (n = 43) as well as patients in whom renal artery stenosis had been diagnosed before their referral to the participating center (n = 72) underwent diagnostic workup for renal artery stenosis. Patients with atherosclerotic renal artery stenosis, defined as ≥50% reduction of lumen diameter according to renal digital subtraction angiography (gold standard test), were randomly assigned to either the balloon angioplasty (n = 56) or the medical treatment (n = 50) group. Captopril renography was performed and evaluated by the local nuclear medicine physicians in 22 participating hospitals. In the diagnostic workup, the sensitivity and specificity for finding stenosis according to the local nuclear medicine physician were 72% and 90%, respectively (11). Furthermore, renography was performed to evaluate treatment after 3 and 12 mo of follow-up.
Renographic Protocol
The protocol for conducting the renographic procedures reflected the guidelines of the consensus report on ACE inhibitor renography (7). In patients who were receiving long-term ACE inhibitor treatment, the ACE inhibitor was withheld for at least 24 h before renography was performed. According to the protocol, an oral dose of 50 mg captopril was given 1 h before the examination in 95% of the procedures to induce asymmetry in uptake and intrarenal transit between the kidneys in the case of renal vascular disease. In the remaining 5% of the procedures, the physician reduced the dose of captopril to 25 mg to prevent hypotension. To ensure adequate absorption of captopril, patients were required to fast during the 4 h preceding renography. Sufficient hydration was guaranteed by oral administration of 0.5 L of tap water. Blood pressure was measured with an automatic device before administration of captopril and every 5–10 min for 2 h after administration of captopril. Renography was performed with the patient in supine position and the detector placed posteriorly. After intravenous administration of 75–100 MBq 99mTc-mercaptoacetyltriglycine, data were collected in 10-s frames during a 20-min period, and sequential analog images were obtained every minute. The time–activity curves were generated using regions of interest over the whole kidney (1).
Study
In this study on interobserver agreement, 658 renograms of 503 patients with 2 native kidneys were reevaluated by 3 experienced nuclear medicine physicians (referred to as physicians A, B, and C) who were working in different university hospitals at the time. Of these renograms, 487 were obtained during the diagnostic workup of patients with and without renal artery stenosis. The remaining renograms were obtained during follow-up of patients with stenosis: 82 renograms after 3 mo of follow-up and 89 renograms after 12 mo of follow-up.
Renographic Evaluation
The renograms were evaluated independently, and the physicians were unaware of patient characteristics and hospital source. The 3 physicians had no additional clinical information, such as the blood pressure response to captopril and the diuresis during the procedure. Before evaluation, the physicians discussed which renographic parameters of the scintigraphic images and time–activity curves would be assessed and how these features would be scored.
The following parameters were scored from the scintigraphic images by each individual observer, separately for the left and right kidneys: visible uptake (scored as present or absent); time to excretion (scored as number of minutes until radioactivity appeared in the renal pelvis, determined by visual evaluation of the 1-min sequential images, if available; if the excretory phase started only after 20 min, no excretory phase was registered); and kidney size (scored as normal or small). Cortical retention and pelvic retention (scored as present or absent) were determined by visual inspection. The presence of pelvic retention was assessed because this was considered to complicate the renographic evaluation of the images and the time–activity curves of the whole kidneys (1,14). The pattern of the time–activity curves was scored in 6 ordered categories as proposed by Fommei et al. (10) (0 = normal, 1 = minor abnormalities, 2 = marked delayed excretion rate with preserved washout phase, 3 = delayed excretion rate without washout phase [accumulation curve], 4 = renal failure pattern with measurable kidney uptake, and 5 = renal failure pattern without measurable kidney uptake [blood background-type curve]). Interobserver agreement was not applicable for the time to peak activity (Tmax) and the relative (individual kidney) uptake because these diagnostic criteria were calculated by the computer.
Finally, the judgment on the presence or absence of renal artery stenosis was assessed for each kidney. No specific diagnostic criteria were defined to reflect the clinical practice. The judgment on the presence of stenosis was scored as 1 of 5 ordered categories (1 = certainly stenosis, 2 = probably stenosis, 3 = indeterminate, 4 = probably no stenosis, and 5 = certainly no stenosis; in the case of a blood background-type curve, the diagnosis was scored as indeterminate).
Interobserver Agreement
We used the κ statistic to assess interobserver agreement on the renographic parameters that were measured on a nominal scale. κ reflects the proportion of the maximally achievable agreement that is realized on top of the agreement that is expected by chance (15–17). κ values usually range from 0 (indicating chance agreement only) to 1 (indicating perfect agreement). The only meaningful interpretation of negative values of κ is that the level of agreement is what would be expected by chance alone (18). In general, κ values of <0.40 are considered as low and values of >0.80 are considered as high (16,17,19). Because the value of κ decreases if the number of ordinal categories is increased, we calculated weighted κ values for the pattern of the time–activity curves and the judgment on the presence of stenosis to adjust for the seriousness of different levels of disagreement (20–22). Linear weights were used: w(ij) = 1 − |i − j|/(c − 1), where i and j are the sequence numbers of the categories, and c is the number of categories. Interpretation of weighted κ is like that of unweighted κ (16).
The interobserver agreement on the time to excretion, which was measured on an interval scale, was expressed as the intraclass correlation coefficient (ICC). The ICC takes into account systematic differences between observers and ranges from −1 (perfect disagreement) to 1 (perfect agreement), with 0 indicating only random concordance (23,24). Although there are no universal standards, values of ICC of <0.40 are considered as low and values of >0.75 are considered as high (25).
Interobserver agreement on renographic parameters was calculated by kidney and on the judgment on the presence of stenosis by kidney as well as by patient. Interobserver agreement was assessed for each pair of observers. A pooled estimate was also calculated on the basis of the mean observed agreement and the mean amount of agreement expected under the null model of independence. A 95% confidence interval (CI) was calculated for each estimate. Estimates of the ICC were calculated with SPSS software (release 9.0.0; SPSS, Chicago, IL) and estimates of κ were calculated with Agree Statistical Software (version 7.001; ProGAMMA, Groningen, The Netherlands).
Finally, the probability that a physician judged stenosis to be absent given the fact that another did so, corrected for chance agreement, was calculated using the average conditional probability of the absence of stenosis and the average expected probability of stenosis. Similar probabilities were calculated for the judgment on the presence of unilateral stenosis and for the judgment on the presence of bilateral stenosis. These probabilities can be interpreted as a κ-per-outcome category.
RESULTS
Patients
All patients whose renograms were evaluated had drug-resistant hypertension. Their diastolic blood pressure was 105 ± 9 mm Hg (mean ± SD) despite the use of 2 ± 1 antihypertensive drugs. At study entry, the renal function was normal or mildly impaired: The patients had a serum creatinine concentration of 95 ± 27 μmol/L and their creatinine clearance was 85 ± 33 mL/min (Table 1). In 5 patients the serum creatinine concentration had increased to >150 μmol/L during follow-up.
Scintigraphic Images
The 3 nuclear medicine physicians did not note any uptake on the scintigraphic images of 1%–3% of the kidneys (Table 2). Physician C reported the absence of uptake twice as often as physicians A and B. The pooled κ-value for visual uptake was 0.65 (95% CI, 0.51–0.80). A small kidney size was scored more frequently by physician A than by the other physicians (25% vs. 18% and 17%). The pooled κ was 0.70 (95% CI, 0.66–0.74). Because 1-min images were not obtained routinely in every hospital, the beginning of the excretory phase was assessed for approximately half of the renograms. The beginning of the excretory phase was estimated to start, on average, after 4.29–4.43 min. The pooled ICC was 0.90 (95% CI, 0.89–0.91). Cortical retention was reported in 2–3 times as many kidneys by physician A as by the other physicians. The pooled κ was 0.46 (95% CI, 0.42–0.51). Pelvic retention was reported least by physician B (12% vs. 21% and 18%). The pooled κ for pelvic retention was 0.52 (95% CI, 0.47–0.56).
Time–Activity Curves
Systematic differences occurred between observers in assigning a pattern to the time–activity curves (Table 2; Fig. 1). Physician C reported more abnormal time–activity curves than physicians A and B. Furthermore, physician A reported more abnormal curves than physician B. The pooled value for the weighted κ was 0.65 (95% CI, 0.62–0.68).
Judgment on Presence of Stenosis by Kidney
The pooled value of the weighted κ for the judgment on the presence of stenosis for separate kidneys, as measured on a 5-point scale, was 0.16 (95% CI, 0.13–0.18) (Table 2). Physician B was more outspoken in assigning scores than physicians A and C: Physician B was certain of the presence of stenosis in 4% of the kidneys compared with 2% and <1% (physicians A and C, respectively) and was certain of the absence of stenosis in 59% of the kidneys compared with 6% and 18% (physicians A and C, respectively) (Fig. 2). When the judgment on the presence of stenosis was dichotomized into certainly or probably stenosis or indeterminate versus certainly or probably no stenosis, an indication for stenosis was found in 14%–22% of the kidneys. The pooled κ for the dichotomized judgment was better than that on the 5-point scale: 0.66 (95% CI, 0.62–0.70) versus 0.16 (95% CI, 0.13–0.18).
κ was calculated separately for those kidneys on which all 3 physicians agreed that pelvic retention had or had not occurred. For kidneys showing pelvic retention (n = 90), κ for the dichotomized judgment on the presence of stenosis was significantly lower than that for kidneys without pelvic retention (n = 909): κ ranged between −0.07 and 0.12 for kidneys with pelvic retention (pooled estimate, 0.06; 95% CI, −0.04 to 0.15) and between 0.69 and 0.77 for kidneys without pelvic retention (pooled estimate, 0.73; 95% CI, 0.68–0.78).
Judgment on Presence of Stenosis by Patient
The 3 physicians found an indication for stenosis (certainly or probably stenosis or indeterminate) on 20%–28% of the renograms. The pooled κ was 0.70 (95% CI, 0.64–0.76). Furthermore, we studied the agreement on whether there was no indication for stenosis or was an indication for unilateral stenosis or an indication for bilateral stenosis (Fig. 3). An indication of bilateral stenosis was judged variously: Physician B suspected bilateral stenosis in 4% of the patients, whereas physicians A and C suspected bilateral stenosis to be present more frequently (in 12% and 11%, respectively). When 1 of the 3 physicians judged that stenosis was absent, the probability that a second physician concluded the same was, on average, 0.70 (95% CI, 0.61–0.79). When 1 of the 3 physicians judged that unilateral stenosis was present, the probability that a second physician reached the same conclusion was, on average, 0.65 (95% CI, 0.61–0.70). For the presence of bilateral stenosis, this probability was, on average, 0.48 (95% CI, 0.43–0.52).
DISCUSSION
In this study, the interobserver agreement on captopril renography was studied on 658 renograms of patients with drug-resistant hypertension and a normal or mildly impaired renal function. Three experienced nuclear medicine physicians assessed renographic parameters that have been recommended for evaluation (1,2,7) and judged whether hemodynamically significant renal artery stenosis was present or absent. For most of these parameters and for the judgment on presence of stenosis, the interobserver agreement was satisfactory. The agreement on cortical retention was relatively low, however.
Except for the time to excretion, the interobserver agreement was assessed by the κ statistic. Although κ is most commonly used to measure interobserver agreement in categoric data, one has to bear in mind, however, that the interpretation of κ is complicated by some of its properties (17,18,26). First, the value of κ strongly depends on the underlying prevalence of the parameter under study. For instance, a high value of κ for agreement on the absence of visible uptake is harder to achieve than for agreement on small kidney size because the latter is much more common. Second, although κ does not identify systematic differences between observers (bias), κ will be lower if such bias is present. This is also the case for the ICC, which was used to assess interobserver agreement in continuous data. Therefore, it should be noted that systematic differences between the observers in the assessment of several parameters were found—for instance, for the judgment on the presence of stenosis (Fig. 2). Third, the way one values discrepancies between categories and consequently chooses the weights for the calculation of the weighted κ is arbitrary. For instance, by choosing linear weights in the calculation of κ for the time–activity curves, we assumed that disagreement between normal curves and curves with minor abnormalities (curve types 0 and 1) is as serious as disagreement between renal failure patterns with and without measurable kidney uptake (curve types 4 and 5).
The pattern of the time–activity curves, which is considered to be an important diagnostic parameter (2,14), was scored in 6 ordered categories (10). The weighted κ value for the pattern of the time–activity curves was moderately high, especially when one considers that the distinction between some of these types of curves is difficult to make. The interobserver agreement on visible uptake and on kidney size was also satisfactory but could have been affected negatively by the low prevalence of these features (18,26). The interobserver agreement on time to excretion as assessed from the scintigraphic images was high. Yet, the relative (individual kidney) uptake and the Tmax are the most reliable parameters in terms of interobserver agreement because the computer calculates them.
With 99mTc-mercaptoacetyltriglycine, which is almost completely cleared by tubular secretion, renovascular hypertension can usually be detected by cortical retention after ACE inhibition (7). Delayed excretion can also be caused by pelvic stasis, however. In kidneys without a dilated renal pelvis, pelvic retention will be observed because of low diuresis. The patients in this study drank 0.5 L of tap water 30–60 min before the renography. Perhaps a more abundant diuresis could be achieved by giving 10 mL/kg of body weight. Another cause of low diuresis is the fact that some of the patients were on diuretics. These patients may produce less urine during the renography (9). The identification of cortical retention is difficult in the presence of pelvic retention (1,14). The complicating role of pelvic retention in the evaluation of captopril renography was evident in our study. For cortical and pelvic retention, the interobserver agreement on the assessment of the presence or absence of these phenomena was not satisfactory. Probably, this can be improved by the assessment of the time–activity curves of the renal cortex.
Which renographic parameters should be used then as diagnostic criteria in the evaluation of renal vascular disease? The diagnostic performance and the interobserver variability should be included in this consideration. When ranked according to the sum of sensitivity and specificity in a by-patient analysis, the order of the renographic parameters was virtually the same for the 3 nuclear medicine physicians (data not shown). However, one must bear in mind that by this way of ranking the sensitivity and the specificity are valued equally. The parameter with the best diagnostic performance was asymmetry in renal uptake. The fact that the individual kidney uptake is measured objectively adds to its usefulness as a diagnostic criterion. Time to excretion as assessed from the scintigraphic images, an abnormal pattern of the time–activity curves, and cortical retention ranked somewhat lower in terms of diagnostic performance. On the basis of the interobserver variability of these parameters, the first 2 are also important diagnostic criteria but the last should be given less weight. The lowest diagnostic performance was found for the visual assessment of the kidneys on the scintigraphic images (i.e., no visible uptake or asymmetry in kidney size) and Tmax. Diagnostic information is lost if one focuses on just 1 or 2 parameters when evaluating the test results. To maximize the diagnostic value of the test, all parameters might be brought together in multivariate models, one predicting the outcome of angiography and one predicting the response to treatment as primary outcome measures for the value of renography. These models may be used then to support decision making by nuclear medicine physicians. We intend to report on the usefulness of such models in the future.
The 3 evaluating physicians judged the presence of stenosis on a 5-point scale, which was collapsed into suspect or indeterminate versus not suspect to reflect which patients would normally be referred for further diagnostic workup. The interobserver agreement on the presence of stenosis was moderate. When 1 physician judged stenosis to be absent, the probability that a second physician concluded the opposite was 30%. It would seem that the interobserver agreement found in this study represents the maximum achievable because the evaluating physicians in this study were well trained and experienced and had deliberated their way of scoring beforehand. On the other hand, the renograms were not always obtained according to the protocol (1-min images were not always acquired) and were not self-managed by the evaluating physicians. Also, to reflect the common clinical practice, diagnostic criteria for identifying stenosis were not specified before evaluation. Thus, the interobserver agreement found in this study could possibly be improved by performing the procedure and evaluation in a uniform manner.
CONCLUSION
The interobserver agreement on most renographic parameters was satisfactory. Important parameters for establishing the diagnosis of stenosis with high interobserver agreement were the relative (individual kidney) uptake, the pattern of the time–activity curves, and the time to excretion. The assessment of cortical retention by visual inspection of the images was more difficult—in particular, in the presence of pelvic retention—and should be given less weight in the evaluation. Captopril renography should be interpreted with caution if pelvic retention is present. Besides differences in patient selection, study design, and diagnostic criteria, interobserver variability offers an explanation for differences in diagnostic performance of captopril renography between studies.
Acknowledgments
This study was supported by grant OG92–031 from the Dutch Health Insurance Executive Board (Ziekenfondsraad).
Footnotes
Received Aug. 7, 2001; revision accepted Dec. 3, 2001.
For correspondence contact: Pieta Krijnen, MSc, Center for Clinical Decision Sciences, Department of Public Health, Erasmus University Rotterdam, P.O. Box 1738, Rotterdam, 3000 DR, The Netherlands.
E-mail: krijnen{at}mgz.fgg.eur.nl