|
|
||||||||
CLINICAL INVESTIGATIONS |
Emory University School of Medicine, Atlanta; and Georgia Institute of Technology, Atlanta, Georgia
| ABSTRACT |
|---|
|
|
|---|
Key Words: expert systems artificial intelligence myocardial perfusion SPECT quantitative analysis
| INTRODUCTION |
|---|
|
|
|---|
Over the past several years, artificial intelligence methods have been investigated as a way to develop such a tool. Examples include neural networks (24) and case-based reasoning techniques (5) to provide computer-assisted diagnosis of myocardial perfusion planar and SPECT studies. Most of these approaches have used the polar map output of well-established database myocardial perfusion quantification programs (68) as input to the decision-making process. In the artificial neural network approach, the concept is to try to emulate how human neurons perform pattern recognition tasks. Repeated recognition trials are run using sample perfusion data as input and using corresponding coronary angiography results as output to modify the strength between the input and output nodes. In this manner, the network is trained and the input data eventually predict the output. In the case-based reasoning approach, the algorithm searches a library of patient cases to find the ones that best match those of the patient study being analyzed. The common findings from these cases, such as coronary angiography results, are then used to assist the diagnosticians interpretation.
Another artificial intelligence approach that has been investigated for this purpose is the knowledge-based expert system. In expert systems, a knowledge base of heuristic rules is obtained from human experts, capturing how they make their interpretations. These rules are usually expressed in the form of "if/then" expressions. Expert systems have been investigated in nuclear medicine to assist in the interpretation of perfusionventilation lung studies (9) and hexamethylpropyleneamine oxime brain SPECT studies (10). Expert systems have also been used in cardiology for assessment of acute myocardial infarction from electrocardiography (ECG) analysis (11), for echocardiography analysis (12), and for the management of ventricular tachycardia (13).
Since 1985, we have been developing an expert system called PERFEX (an abbreviation of "perfusion expert"; Syntermed, Atlanta, GA) as a tool for the computer-assisted diagnosis of stress/rest myocardial perfusion SPECT studies (14,15). The purpose of this study was to investigate how computer-assisted interpretations suggested by PERFEX compare with those of human experts. These investigations were designed to show the overall performance of PERFEX before its dissemination and distribution. We have chosen 2 sets of gold standards: the interpretation by human experts and the results of coronary angiography studies. Although the goal of expert systems is to match the interpretation of human experts, we used the results of coronary angiography to resolve differences between the expert system and the human experts.
| Materials and Methods |
|---|
|
|
|---|
Diagnosis of CAD was based on the routine clinical interpretation of myocardial perfusion SPECT imaging. An experienced nuclear medicine physician, using both the visual assessment of the tomograms and the results of database programs for quantifying myocardial perfusion defects, assessed hypoperfusion. Disease was assigned to 1 or more vascular territory combinations: left anterior descending artery (LAD), left circumflex artery (LCX), right coronary artery (RCA), LAD or LCX, LAD or RCA, or LCX or RCA. Once a region was determined to be hypoperfused, it was assigned to the territory in which the majority of the region fell. If a defect or reversibility region fell between 2 territories, it was assigned to the "or" of the 2 territories, as was done in previous studies (17).
Independent diagnosis of CAD was based on coronary angiography during cardiac catheterization. Diagnosis of CAD was based on 1 or more of the major coronary vessels having at least 1 stenosis with
50% luminal narrowing or diffuse disease. Luminal narrowing and diffuse disease were qualitatively assessed by an experienced attending cardiologist. The application of these criteria to the 655-patient population resulted in 175 patients without CAD and 480 patients with CAD. The breakdown of disease by vascular territory in the 480 CAD patients was as follows: 346 LAD, 256 LCX, and 281 RCA. These included 194 patients with single-vessel disease, 169 with double-vessel disease, and 117 with triple-vessel disease.
Data Analysis and Expert System Interpretation
All SPECT patient studies were reconstructed and reoriented into oblique-axis tomograms using conventional techniques (16). The studies were then submitted to a well-established method of database quantification (18). This method identified hypoperfused regions as those with normalized count distributions falling below a predetermined number of SDs below the mean pattern of sex-matched normal response for the specific myocardial perfusion SPECT protocol used. The program also identified ischemic regions as stress perfusion defects that improve at rest. This was determined quantitatively as a predetermined number of SDs above the mean normalized difference between the stress and rest distributions.
An automatic feature-extraction program then described the location and severity of each defect and corresponding reversibility (19). The location was expressed in the form of 32 possible descriptors (20). These descriptors were defined as coordinates of both depth (basal, medial, distal apical, and proximal apical) and angular location (8 subsets of the septal, inferior, lateral, and anterior myocardial walls). The severity was expressed in terms of certainty factors ranging from -1 to +1 for the pixel in that descriptor with the most severe finding (-1 means there is definitely no disease, +1 means there is definitely disease, and the range from -0.2 to +0.2 means the presence of disease is equivocal or indeterminate). Certainty factors are heuristically defined numeric estimates of evidence for or against a particular hypothesis. The certainty factor model is a well-known approach to uncertainty reasoning used in artificial intelligence (21). Initially, the certainty factor of each abnormal descriptor was allowed to vary between 0.2 and 0.99 in linear proportion to the number of SDs below the mean normal response. In this representation, a certainty factor of +0.2 corresponds to the threshold (in number of SDs below the mean) for just detecting disease and +0.99 corresponds to
8 SDs below the mean (very sure that findings are abnormal). Descriptors with all pixels above the normal response were set to a certainty factor of -1.
The description of how PERFEX works has been provided in detail elsewhere (14,15,20). The architecture of PERFEX was inspired by that of MYCIN (21), a pioneering rule-based expert system developed in the 1970s to assist physicians with the decisions involved in the selection of appropriate therapy for patients with infections. To create the PERFEX knowledge base, a study was conducted of 461 myocardial perfusion SPECT studies from patients (different from the patients in this study) with angiographically documented CAD. This effort resulted in 253 heuristic (if/then) rules created by experts. These heuristic rules best correlated the presence and location of perfusion defects on 201Tl SPECT studies with coronary lesions. These rules were then inserted as the knowledge base using a commercial expert system shell (Expert Elements; Blaze Software, San Jose, CA). These rules operate on the descriptor files that are the output of the feature extraction program as described above. Using these features, the expert system automatically determines the location, size, and shape of each defect and corresponding reversibility. This information is used to activate the heuristic rules to produce new findings or draw inferences regarding CAD. For each input parameter and for each rule, a certainty factor is assigned and is used to determine the certainty of the identification and location of a coronary lesion. A specific vascular territory with an output certainty factor for disease of 0.2 or greater was deemed to be abnormal. A separate variable for the assessment of overall CAD was also deemed abnormal if its output certainty factor for disease was 0.2 or greater.
Statistical Analysis
Separate databases were generated containing the following results: the interpretation of the SPECT study by the nuclear medicine physicians; the interpretation of the coronary angiography study; and the output of the PERFEX program. A program was then written to automatically compare the various results and to calculate the sensitivity, specificity, and accuracy of PERFEX for the detection and localization of CAD. This calculation was based on using either the reading of the SPECT study by the human experts or the results of the coronary angiography study as the gold standard.
Two sets of receiver operating characteristic (ROC) curves were generated. The first compared the results of PERFEX with interpretation by the human nuclear medicine experts as the gold standard. The second compared the results of PERFEX with coronary angiography as the gold standard. The 2 sets of ROCs were generated for 4 categories: detection of CAD and localization to the LAD, LCX, and RCA vascular territories. To generate the ROC, the initially set certainty factor (CF) for each input descriptor was allowed to vary by subtracting values ranging from 0.0 to 0.30 in intervals of 0.05. These are called CF shift levels.
The first step in the analysis was to search for CF shift levels that would provide optimal accuracy for localizing disease to each of the 3 vascular territories using the interpretations by the human experts as the gold standard. This was done by visual inspection of a plot of accuracy versus CF shift level for each of the 3 vascular territories.
The input CF shift level (or levels, if they were different for different vascular territories) that generated the best agreements between the interpretation of the SPECT study by the human experts and PERFEX was then used to determine the sensitivity and specificity of PERFEX using coronary angiography as the gold standard. These sensitivity and specificity results were compared with the sensitivity and specificity results obtained from the interpretation by human experts, also using coronary angiography as the gold standard. The
2 test was used to evaluate statistical differences in sensitivity and specificity between diagnostic approaches. The level P < 0.05 was used to determine significance.
| Results |
|---|
|
|
|---|
|
|
|
Using coronary angiography as the gold standard, analysis of the PERFEX results at the 0.15 CF shift level generated the following statistics (Fig. 3). The sensitivity ranged from a high of 80% for detection of CAD to a low of 65% for the RCA region. The sensitivities for localization in the LAD vascular territory and the LCX region were 69% and 68%, respectively. The specificity ranged from a high of 65% for the RCA vascular territory to a low of 42% for detecting the absence of CAD. The specificities for the LAD and LCX regions were 54% and 56%, respectively.
Analysis of the nuclear medicine physicians interpretation of the SPECT perfusion studies using coronary angiography as the gold standard generated the following results (Figs. 3 and 4). The sensitivities for detection of CAD and for localization in the LAD, LCX, and RCA vascular territories were 87%, 69%, 61%, and 73%, respectively. The specificities for detecting the absence of CAD overall and in the LAD, LCX, and RCA vascular territories were 21%, 59%, 88%, and 71%, respectively.
Comparison of the sensitivity and specificity of PERFEX (at the 0.15 CF shift level) versus those obtained from the expert reading by nuclear medicine physicians, both using coronary angiography as the gold standard, yielded the following results (Fig. 4). Statistically significant differences were obtained in the detection of the presence and absence of CAD and in the specificity of the LCX region. Of these 3 categories, PERFEX obtained better results in 1 category (specificity for detecting the absence of CAD) and the human expert obtained better results in the other 2. There were no statistically significant differences in the RCA region or the LAD region or in the sensitivity of localizing LCX disease. These comparisons were also performed at the 0.10 and 0.20 CF shift levels (Fig. 4).
|
| Discussion |
|---|
|
|
|---|
The results of this study show that PERFEX is almost as accurate as nuclear medicine expert readers in detecting and localizing CAD when coronary angiography is used as the gold standard. These results are remarkable considering that PERFEX reaches its conclusions in <3 s per patient. Furthermore, this version of the expert system does not use most of the relevant clinical and quality control information available to the diagnosticians. This lack of information may partly account for the differences between the expert system recommendations and those of the human experts when using the interpretation of the latter as the gold standard. This knowledge includes information on body habitus and level of tissue attenuation by breast muscles or diaphragm.
Another reason for the apparent superiority of experts over PERFEX for localizing disease to the LCX vascular territory is that the experts tended to assign disease to the posterolateral territory (LCX or RCA) much more frequently than did PERFEX. Experts used this category on 197 patients and PERFEX on only 3 (at the 0.15 CF shift level). PERFEX almost always assigned disease to either the LCX territory or the RCA territory. In our analysis, assignment to the posterolateral territory (LCX or RCA), rather than to the specific LCX territory or RCA territory, generated a better result because correct agreements with coronary angiography were counted whether the patient had LCX or RCA disease. When this analysis was repeated, giving PERFEX the same advantage as the experts, the results yielded no statistical difference between experts and PERFEX for the sensitivity (71% vs. 72%), specificity (57% vs. 55%), or accuracy (65% vs. 65%) in detecting posterolateral disease.
It is difficult to compare the results of PERFEX with those obtained from other artificial intelligence approaches that use artificial neural networks (24) or case-based reasoning (5). That is because the criteria used to determine diagnostic accuracy (sensitivity and specificity) are a function of the prevalence of disease and the referral bias of the population. These vary widely between study populations. Thus, the reported sensitivity and specificity of the expert system should be used for comparison with those of the human experts rather than as a measure of the accuracy of the diagnostic performance of the program for all patient groups. Our philosophy is that there is no optimal threshold to interpret all patient populations. In fact, in the present implementation of this program, physicians are allowed to use the expert system to interpret at different points along the ROC curve (Fig. 3), resulting in different sensitivity and specificity results depending on how aggressive or conservative an interpretation is desired.
We have preferred the expert system approach to the neural network approach because the heuristic rules that are used in expert systems to reach a conclusion may be traced and linked to each other, thereby providing a mechanism to justify or explain any conclusion reached. By contrast, neural networks do not provide justifications, although they are excellent for pattern recognition tasks. Moreover, neural network systems require a much larger training dataset than expert systems to converge on reasonable results for the same task.
There are equally compelling reasons for preferring knowledge-based approaches to case-based reasoning approaches. The main challenge is that a very large library of image cases would be required to create a sufficiently robust system, similar to the large data demands posed by neural network training. Also, accurate measures of image "similarity" would have to be developed to adapt the indexed cases to the patient case under consideration. Knowledge captured as rules, coupled with the uncertainty reasoning model used by PERFEX, seems to overcome these challenges while featuring a quick processing time.
There are 4 limitations to this study. First, all the data used for this evaluation were obtained retrospectively as part of the routine clinical evaluation of the patient and the routine coding of report forms that went into the cardiac database rather than as a result of a research protocol. One would expect the integrity of the database to be <100% and the interpretations and assessments both from the myocardial perfusion SPECT study and the coronary angiography study to be less detailed and accurate than if they had been performed under a strict prospective research protocol. Moreover, because these retrospective data were interpreted so long ago, the processing was performed manually and not with the advantage of todays automatic processing programs. Nevertheless, because the data used in this study reflect the true clinical information used to manage the patients, the results reported herein reflect the true effectiveness of the tests for this population. Second, there is also a referral bias in the routine work-up of the patient that accounts for, among other things, an apparent low specificity for detecting the absence of CAD (22). Third, although the interpretation by the nuclear medicine experts appears to be the ideal gold standard for assessing an expert system, we used coronary angiography to resolve the differences between the experts and PERFEX. The anatomic information extracted from coronary angiography studies is not expected to always coincide with the physiologic information obtained from myocardial perfusion studies. Fourth, although a main advantage of PERFEX over artificial neural networks is that it provides justifications for its conclusions, this function was not validated. We have yet to design an objective approach to validate this function.
We are continuing to investigate how to further improve the diagnostic performance of PERFEX. One approach has been to add heuristic rules that use the patients clinical information routinely available to physicians. This includes such information as whether the patient has left bundle branch block or had previous myocardial infarction, the results of stress ECG, the technical quality of the study, and information on body habitus. Another approach, as suggested by our results, is to use different CF shift levels for the different vascular territories. We are also investigating the use of data mining techniques to automatically find associations between the myocardial perfusion quantitative results, clinical variables, and angiographic results. These associations may be used as heuristic rules to enhance the expert system (23).
| Conclusion |
|---|
|
|
|---|
| ACKNOWLEDGMENTS |
|---|
| FOOTNOTES |
|---|
For correspondence or reprints contact: Ernest V. Garcia, PhD, Emory University Hospital, Room E163, 1364 Clifton Rd. NE, Atlanta, GA 30322.
| References |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
A. Taylor, E. V. Garcia, J. N. G. Binongo, A. Manatunga, R. Halkar, R. D. Folks, and E. Dubovsky Diagnostic Performance of an Expert System for Interpretation of 99mTc MAG3 Scans in Suspected Renal Obstruction J. Nucl. Med., February 1, 2008; 49(2): 216 - 224. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Taylor, A. N. Hill, J. N. E. Binongo, A. K. Manatunga, R. Halkar, E. V. Dubovsky, and E. V. Garcia Evaluation of Two Diuresis Renography Decision Support Systems to Determine the Need for Furosemide in Patients with Suspected Obstruction Am. J. Roentgenol., May 1, 2007; 188(5): 1395 - 1402. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. V. Garcia, A. Taylor, D. Manatunga, and R. Folks A Software Engine to Justify the Conclusions of an Expert System for Detecting Renal Obstruction on 99mTc-MAG3 Scans J. Nucl. Med., March 1, 2007; 48(3): 463 - 470. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. V. Garcia, A. Taylor, R. Halkar, R. Folks, M. Krishnan, C. D. Cooke, and E. Dubovsky RENEX: An Expert System for the Interpretation of 99mTc-MAG3 Scans to Detect Renal Obstruction J. Nucl. Med., February 1, 2006; 47(2): 320 - 329. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. D. Folks Interpretation and Reporting of Myocardial Perfusion SPECT: A Summary for Technologists J. Nucl. Med. Technol., December 1, 2002; 30(4): 153 - 163. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. W. Wallis Use of Artificial Intelligence in Cardiac Imaging J. Nucl. Med., August 1, 2001; 42(8): 1192 - 1194. [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| JOURNAL OF NUCLEAR MEDICINE TECHNOLOGY | THE JOURNAL OF NUCLEAR MEDICINE |