Abstract
The objectives of this study were to investigate 18F-FDG imaging, using a coincidence detection system, for diagnosing prosthetic joint infection and to compare it with combined 111In-labeled leukocyte/99mTc-sulfur colloid marrow imaging in patients with failed lower extremity joint replacements. Methods: Fifty-nine patients—with painful, failed, lower extremity joint prostheses, 40 hip and 19 knee—who underwent 18F-FDG, labeled leukocyte, and bone marrow imaging, and had histopathologic and microbiologic confirmation of the final diagnosis, formed the basis of this investigation. 18F-FDG images were interpreted as positive for infection using 4 different criteria: criterion 1: any periprosthetic activity, regardless of location or intensity; criterion 2: periprosthetic activity on the 18F-FDG image, without corresponding activity on the marrow image; criterion 3: only bone–prosthesis interface activity, regardless of intensity; criterion 4: semiquantitative analysis—a lesion-to-background ratio was generated, and the cutoff value yielding the highest accuracy for determining the presence of infection was determined. Labeled leukocyte/marrow images were interpreted as positive for infection when periprosthetic activity was present on the labeled leukocyte image without corresponding activity on the marrow image. Results: Twenty-five (42%) prostheses, 14 hip and 11 knee, were infected. The sensitivity, specificity, and accuracy of 18F-FDG, by criterion, were as follows: criterion 1: 100%, 9%, 47%; criterion 2: 96%, 35%, 61%; criterion 3: 52%, 44%, 47%; criterion 4: 36%, 97%, 71%. The sensitivity, specificity, and accuracy of labeled leukocyte/marrow imaging were 100%, 91%, and 95%, respectively. WBC/marrow imaging, which was more accurate than any of the 18F-FDG criteria for all prostheses, as well as for hips and knees separately, was significantly more sensitive than criterion 3 (P < 0.001) and criterion 4 (P < 0.001) and was significantly more specific than criterion 1 (P < 0.001), criterion 2 (P < 0.001), and criterion 3 (P < 0.001). Conclusion: Regardless of how the images are interpreted, coincidence detection–based 18F-FDG imaging is less accurate than, and cannot replace, labeled leukocyte/marrow imaging for diagnosing infection of the failed prosthetic joint.
Nuclear medicine has long played an important role in the evaluation of the patient with a painful, failed joint prosthesis. The current radionuclide gold standard for diagnosing the infected joint replacement is combined 111In-labeled leukocyte/99mTc-sulfur colloid marrow imaging (WBC/marrow), with an accuracy of >95% (1–3). Although it is extremely accurate, there are definite limitations to this technique. The in vitro labeling process is labor intensive, is not always available, and involves direct handling of blood products. The need to perform marrow imaging, which adds complexity and expense to the procedure, is an inconvenience to patients, many of whom are debilitated (4). Considerable effort has been devoted to developing acceptable alternatives to in vitro labeled leukocyte imaging. Several in vivo leukocyte-labeling methods have been investigated, including peptides and antigranulocyte antibodies/antibody fragments (5–7). One of these agents, 99mTc-fanolesomab, recently has been approved for use in the United States, and only limited data on its role in the evaluation of the failed joint prosthesis are available (7). 111In-labeled immunoglobulin has been used in the evaluation of the painful joint replacement. This agent, unavailable in the United States, is sensitive but is not specific for prosthetic joint infection (8).
Recently, 18F-FDG PET has been used to evaluate the painful joint replacement, although its precise role is, as of yet, uncertain (9–15). The objectives of this study were to investigate, using a variety of different interpretive criteria, coincidence detection system–based 18F-FDG imaging for diagnosing the infected joint prosthesis and to compare it with WBC/marrow imaging for this purpose.
MATERIALS AND METHODS
Patient Population
Fifty-nine patients with failed lower extremity joint arthroplasties, who underwent 18F-FDG and WBC/marrow imaging, and for whom surgical and microbiologic or histopathologic confirmation of the final diagnosis was available, formed the basis of this investigation. There were 22 men and 37 women between 35 and 89 y old, with 59 painful, failed, lower extremity joint prostheses. There were 40 hip prostheses, including 37 total arthroplasties and 3 hemiarthroplasties. Thirty were primary implants, and 10 were revision implants. There were 16 hybrid (15 cementless acetabular and cemented femoral components, 1 cemented acetabular and cementless femoral component), 14 cemented, and 10 cementless prostheses. Hip prostheses ranged in age from 1 mo to 20 y. There were 19 knee prostheses, all total arthroplasties. Seventeen were primary implants and 2 were revision arthroplasties. Eighteen of the prostheses were cemented, and 1 was cementless. Knee prostheses ranged in age from 1 wk to 19 y.
Imaging Studies
18F-FDG.
Patients were injected with 150–220 MBq (4–6 mCi) of 18F-FDG after a minimum 6-h fast. Patients had at least 1 h of bed rest both before and after tracer injection to reduce lower extremity muscle or soft-tissue accumulation. Approximately 1 h after injection, emission/transmission scanning of the region of interest was performed using a hybrid PET system (Solus MCD/AC; ADAC Laboratories) with measured attenuation correction (137Cs). Data were acquired as 128 × 128 matrices for 64 projections, at 80 s per projection. A 20% window centered on the 511-keV photopeak of 18F and a 30% window centered on the Compton centerline (approximately 310 keV) were used. Data were reconstructed using an iterative method (ordered-subset expectation maximization). Only nonattenuation–corrected transaxial, coronal, sagittal, and 3-dimensional volume images were reviewed because of concern about the possibility of artifacts induced by attenuation correction (16,17).
WBC.
Patients were imaged about 24 h after injection of approximately 18.5 MBq (500 μCi) of mixed, autologous leukocytes that had been labeled with 111In-oxine according to the method of Thakur et al. (18). Imaging was performed on a large-field-of-view γ-camera (Argus, Genesys, or Solus; ADAC Laboratories) equipped with a medium-energy, parallel-hole collimator. Energy discrimination was accomplished by using 15% windows centered on the 174- and 247-keV photopeaks of 111In. Anterior, posterior, and lateral images of the region of interest were acquired for 10 min per view using 128 × 128 matrices.
Dual Isotope.
Patients were injected with approximately 370 MBq (10 mCi) freshly prepared 99mTc-sulfur colloid immediately after completion of WBC imaging. Forty-five to 60 min later, simultaneous dual-isotope acquisition was performed on the same γ-camera on which WBC imaging had been performed. A medium-energy, parallel-hole collimator was used. Energy discrimination was accomplished by using a 10% window centered on 140 keV, a 5% window centered on 174 keV, and a 15% window centered on 247 keV. Anterior, posterior, and lateral images of the region of interest were acquired for 10 min per view using 128 × 128 matrices.
In all 59 patients, 18F-FDG and WBC/marrow imaging was performed within 8 d of each other; in most (55/59) of the patients, both studies were completed within 24 h.
Image Interpretation
18F-FDG.
18F-FDG images were interpreted using the following criteria:
Criterion 1. Any periprosthetic activity, regardless of location or intensity, was interpreted as positive for infection (9).
Criterion 2. 18F-FDG images were interpreted together with the marrow images. Studies in which there was periprosthetic activity on the 18F-FDG image, regardless of location or intensity, without corresponding activity on the marrow image were interpreted as positive for infection. Studies in which the distribution of the 2 tracers was spatially congruent were classified as negative for infection.
Criterion 3. For hip prostheses, only activity at the bone–prosthesis interface (BPI) of the femoral component, regardless of intensity, was interpreted as positive for infection. For knee prostheses, only activity at the BPI of either the femoral or tibial component, regardless of intensity, was interpreted as positive for infection (10,11).
Criterion 4. Semiquantitative analysis of BPI activity was also performed. For hip prostheses, a region of interest was drawn around the area of most intense activity at the BPI of the femoral component. For knee prostheses, a region of interest was drawn around the area of most intense activity at the BPI of either the femoral or the tibial component. In the absence of increased BPI activity, the target region was drawn along the lateral margin of the appropriate prosthetic component. The total counts in the target region were recorded. The identical region of interest was placed over the soft tissues, away from bone or prosthesis of the corresponding contralateral lower extremity, and total counts were recorded as the background. Target-to-background ratios were computed, and the cutoff value yielding the highest accuracy for diagnosing infection was determined as a threshold value, separately, for hip and knee prostheses. Mean target-to-background ratios for infected and uninfected prostheses were also determined.
WBC/Marrow.
Studies in which periprosthetic activity was present on the WBC image, regardless of location or intensity, without corresponding activity on the marrow image were interpreted as positive for infection (2,3).
Two individuals, who had no knowledge of the type or age of the prosthesis, the results of other tests, or the final diagnosis, randomly interpreted images in a consensus fashion.
Final Diagnoses
In all 59 patients, the presence or absence of infection was based on the results of histopathologic or microbiologic specimens obtained at surgery. The diagnosis of prosthetic infection was made if cultures grew organisms or if there were >5 neutrophils per high-power field (19). The diagnosis of aseptic loosening was based on negative cultures, <5 neutrophils per high-power field, and surgical confirmation of loosening of at least one of the prosthetic components.
Statistical Analysis
The sensitivity, specificity, accuracy, and positive and negative predictive values were calculated for each of the criteria by which the 18F-FDG images were interpreted as well as for WBC/marrow imaging. Each 18F-FDG criterion was compared with WBC/marrow imaging, and the significance of differences was determined with the McNemar test. A P value of <0.05 was considered significant. The significance of differences in the mean target-to-background ratio between infected and uninfected prostheses was determined with the t test, and a P value of <0.05 was considered significant.
RESULTS
Final Diagnoses
Twenty-five (42%) of the 59 prostheses, 14 hip and 11 knee, were infected. Organisms were identified in 21 of the infected devices: Staphylococcus species (n = 17), Streptococcus species (n = 1), Bacillus species (n = 1), Enterococcus faecalis (n = 1), and Pseudomonas aeroginosa (n = 1). Among the 14 infected hip replacements, 6 of 14 femoral components were loose, and 4 of 11 acetabular components were loose. Among the 11 infected knee replacements, 3 femoral components and 6 tibial components were loose.
Thirty prostheses (51%) were aseptically loosened, including 23 hip and 7 knee prostheses. Among the hip replacements, 15 of 23 femoral components were loose, and 10 of 23 acetabular components were loose. Among the knee replacements, 4 of 7 femoral components were loose, and 6 of 7 tibial components were loose.
Two hip replacements failed because of misalignment of the femoral and acetabular components. One hip replacement failed because of fracture of the ceramic acetabular liner. One knee replacement failed because of excessive liner wear.
Imaging Results
Imaging results are summarized in Tables 1–3.
18F-FDG.
Periprosthetic activity, in one location or another, was present around 56 (95%) of the 59 prostheses, including all 25 infected, 27 of 30 aseptically loosened, and all 4 devices that were neither infected nor loosened (Fig. 1). This criterion was sensitive but was significantly less specific than WBC/marrow imaging for all prostheses (P < 0.001) as well for hips (P < 0.001) and knees (P = 0.008) separately.
Interpreting 18F-FDG images together with marrow images improved the specificity of the study, from 9% to 35%, with only a slight decrease in sensitivity (100% vs. 96%) (Fig. 2). Nevertheless, the test was still significantly less specific than WBC/marrow imaging for all prostheses (P < 0.001) as well for hips (P = 0.003) and knees (P = 0.008) individually.
BPI activity was significantly less sensitive and specific than WBC/marrow imaging for all prostheses (P < 0.001 and P < 0.001, respectively), and for hips (P = 0.03 and P = 0.003, respectively), and was significantly less sensitive than WBC/marrow imaging for knee prostheses (P = 0.03). BPI activity was present in 19 of 21 hip replacements with a loose femoral component, including all 6 infected, and 13 of 15 uninfected, prostheses. In contrast, BPI activity was present in only 3 of the19 hip replacements with a fixed femoral component, including only 2 of 8 infected devices (Fig. 3). Thirty-two hip replacements were >1 y old and, in this group, the sensitivity and specificity of BPI were 50% (4/8) and 46% (11/24), respectively. All 4 false-negative results were associated with infected prostheses with a fixed femoral component, and all 13 false-positive results occurred in aseptically loosened femoral components.
BPI activity was present in 7 of 13 loosened knee prostheses, including 2 infected and 5 aseptically loosened ones. Of the 6 loosened devices without BPI, 4 were infected and 2 were not. BPI was present in 3 of 6 fixed prostheses; all 3 were infected (Fig. 4). Among the 13 knee replacements >1 y old, BPI was present in only 1 of 6 infected devices (17% sensitivity); this device was also loose. Five infected devices did not demonstrate BPI: 4 were loose and 1 was fixed. Four of 7 uninfected devices demonstrated BPI, and all 4 were loose (43% specificity). Three uninfected knee prostheses did not demonstrate BPI: 2 were loose and 1 was fixed.
Using semiquantitative analysis of BPI activity, the target-to-background ratio was 5.3 ± 8.8 versus 1.7 ± 1.5 for infected versus uninfected hip prostheses (P < 0.05, t test), with a maximum accuracy of 78% found using a target-to-background threshold of 3.6. The target-to-background ratio was 5.2 ± 3.0 versus 4.1 ± 1.5 for infected versus uninfected knee prostheses (P = not significant; t test), with a maximum accuracy of 58% using a threshold of 3.9. Although specificity was comparable to that of WBC/marrow imaging for all prostheses, as well for hips and knees separately, semiquantitative analysis was significantly less sensitive than WBC/marrow imaging for all prostheses (P < 0.001) as well as for hips (P < 0.02) and knees (P < 0.02), separately.
WBC/Marrow.
The sensitivities and specificities of WBC/marrow imaging, for all prostheses, and for hip and knee prostheses separately, were 100% and 91%, 100% and 88%, and 100% and 100%, respectively. There were no false-negative results and only 3 false-positive results, all of which involved aseptically loosened hip prostheses. Retrospective review of these cases suggested that in 1 case the WBC–marrow mismatch was probably related to adjacent bowel activity. In another case, it was due to overlying nodal accumulation of labeled leukocytes. In the third case—that of a fractured ceramic acetabular liner—even on review of the images, abnormal labeled leukocyte activity was clearly present in the hip joint, but microbiologic and histopathologic analyses were negative for infection.
DISCUSSION
More than 400,000 hip and knee arthroplasties are performed annually in the United States. The rates of infection after primary implantation are about 1% for hip and 2% for knee prostheses. The rates of infection after revision surgery are slightly higher: about 3% for hip and about 5% for knee replacements (20). Approximately one third of these infections develop within 3 mo, another third within 1 y, and the remainder >1 y, after surgery (21). Differentiating infection from aseptic loosening, the most common cause of joint arthroplasty failure, is extremely important because the management of these 2 conditions differs markedly. The diagnosis of infection has significant implications, both clinically and economically, in terms of prolonged antibiotic treatment, longer hospital stay, and a second operation. The failure to diagnose infection also has serious ramifications. Persistence of infection will almost assuredly lead to failure of a revision arthroplasty, continuing periprosthetic osteolysis, and the need for a second surgical procedure, which may be more difficult and extensive than what would have been necessary otherwise (22–25).
Clinical and laboratory features of acute infection may be present in some, but not all, cases of early infection. In most cases of late-onset infection, however, these features are usually absent (19). Nonspecific markers of inflammation such as the erythrocyte sedimentation rate and C-reactive protein level may be elevated in both aseptic loosening and infection. The results of joint aspiration are variable (26–28). Plain radiographs are neither sensitive nor specific, and the artifacts caused by the hardware itself limit cross-sectional imaging modalities, such as CT and MRI. Combined 111In-labeled leukocyte/99mTc-sulfur colloid marrow imaging, which reflects physiologic rather than anatomic changes and has an accuracy of >95%, is generally considered to be the imaging modality of choice for diagnosing prosthetic joint replacement infection (29). Although WBC/marrow imaging is extremely accurate, this test has significant limitations. The in vitro labeling process is labor intensive, is not always available, and requires direct contact with blood products. The need for marrow imaging adds to the complexity and cost of the study and is an additional inconvenience to patients, many of whom are elderly and debilitated. Thus, investigators continue to search for suitable alternatives to this procedure. One agent that has generated considerable interest is 18F-FDG. The high-resolution tomographic images, availability of the agent, and rapid completion of the procedure are all very desirable traits. While some investigators have reported that 18F-FDG accurately identifies the infected joint prosthesis, other investigators have found the agent less useful for this purpose (9–12,14,15).
18F-FDG
Our data indicate that the mere presence of periprosthetic activity cannot automatically be equated with infection. While periprosthetic uptake was present around all 25 infected prostheses, activity was also identified around 31 of 34 uninfected devices, including 27 of 30 aseptically loosened prostheses. These results are in agreement with those of other investigators who have reported that, in addition to infection, periprosthetic activity may also occur in synovitis and aseptic loosening (10–12). Activity around the head and neck of asymptomatic hip replacements, moreover, can persist for several years after implantation, possibly due to postoperative inflammation (13).
The propensity of 18F-FDG to accumulate in bone marrow has been described (30–32). Because the distribution of marrow can be altered by the placement of an orthopedic prosthesis, a possible explanation for periprosthetic 18F-FDG uptake in the absence of infection is the presence of periprosthetic marrow. Interpreting 18F-FDG images together with marrow images was more specific than interpreting 18F-FDG images alone, which suggests that at least some periprosthetic 18F-FDG uptake is related to marrow activity. Although it was more specific than 18F-FDG imaging alone, 18F-FDG/marrow imaging was, nevertheless, significantly less specific than WBC/marrow imaging.
Some investigators have analyzed periprosthetic uptake patterns in an effort to identify those patterns that might be unique to infection (10,11,15). In one series of 41 painful hip arthroplasties, the authors reported that the presence of BPI activity along the femoral shaft was 92% sensitive and 97% specific for infection (10). Our data, however, are less encouraging. BPI activity was neither sensitive nor specific for infection, and the data strongly suggest that, in failed hip prostheses, BPI activity is related to loosening of the femoral component, not to infection.
Among the knee replacements, in contrast to the hips, there was no obvious relationship between BPI and loosening. About half (7/13) of the loosened prostheses demonstrated BPI, as did half (3/6) of the fixed devices.
The significance of the intensity of periprosthetic 18F-FDG activity is uncertain. In one series, investigators reported that visual assessment of the intensity of BPI activity could accurately determine whether infection was present (11). In another series, however, the investigators reported considerable overlap in standardized uptake values between infected and noninfected prostheses and concluded that intensity of periprosthetic 18F-FDG uptake was not reliable for determining whether infection was present (9). In the current series, for hip prostheses, semiquantitative analysis—which was more accurate than any of the other criteria by which 18F-FDG studies were interpreted—was specific but not sensitive. These results are in agreement with the results of a recent study using visual analysis (15). The results of semiquantitative analysis for knee prostheses were similar: The study was specific but not sensitive.
18F-FDG was less accurate for knee prostheses than for hip prostheses, when the images were interpreted using criteria 2–4 (Tables 2 and 3). The explanation for this finding, which has also been observed by others (9), may be related to the knee prosthesis itself. The femoral component of a hip replacement is always several centimeters in length and >1 cm in width at its widest point. In contrast, the physical areas of knee prostheses are generally smaller than those of hip prostheses, often by several orders of magnitude. The femoral and tibial “stems” are, in many cases, little more than metal pegs, <1 cm in length and only a few millimeters in width (Fig. 5). Consequently, knee prostheses are subject to partial-volume effects (33). This phenomenon could compromise visual identification of activity at the BPI and, when using semiquantitative analysis, maximum counts could artifactually appear to be less for infected knee prostheses, even if 18F-FDG uptake were high in these smaller volumes.
The accuracy of 18F-FDG for diagnosing prosthetic joint infection in this series, which ranged from 47% to 71%, is less than what has been reported in earlier series. In this investigation, a coincidence detection system was used, whereas in previous investigations dedicated PET systems were used. Dedicated PET is clearly preferable, for many reasons, to coincidence detection, and it could be argued that these discordant results are related to the imaging device used. However, several investigators have reported excellent results using coincidence detection systems in the evaluation of musculoskeletal infection. In one series, the authors reported that coincidence detection 18F-FDG imaging was superior to 111In-labeled leukocyte imaging for diagnosing chronic bacterial osteomyelitis (34). Another group of investigators found that 18F-FDG imaging, using a coincidence detection system, was superior to MRI for diagnosing low-grade spondylitis (35). An intraindividual comparison of dedicated PET and a coincidence detection system in patients with chronic orthopedic infections found that, despite poorer image quality, results obtained with the coincidence detection system were comparable to those obtained with dedicated PET (36). A recent investigation, using dedicated PET, found that the accuracy of 18F-FDG PET for diagnosing hip replacement infection was only 69% using visual analysis of the intensity of uptake (15). These results are similar to the 78% accuracy that we found using semiquantitative analysis of periprosthetic uptake. Thus, it is unlikely that the lower accuracy of 18F-FDG PET in this series can be attributed solely to the imaging device used.
Definitive diagnosis of prosthetic joint infection depends on microbiologic or histopathologic analysis. The results reported in our study were based on surgical and microbiologic and histopathologic confirmation in all cases, whereas, in previous studies by other investigators, the number of surgical and microbiologically or histopathologically confirmed final diagnoses ranged from 14% to 72%, with the remaining diagnoses based on a variety of noninvasive assessments, the limitations of which are well known (9–14). In the only series reported to date in which 100% of the final diagnoses were microbiologically or histopathologically confirmed, the accuracy of 18F-FDG PET was only 69% (15).
Another very important factor that must be considered is the population studied. This investigation was limited to patients with failed lower extremity prostheses, who were destined for surgery, and for whom preoperative radionuclide studies were specifically performed to determine whether infection was present. Thus, ours was a highly select population: 25 (42%) of the prostheses were infected, and 30 (51%) of the prostheses were aseptically loosened, in contrast to previous investigations in which 25%–29% of prostheses were infected and 22%–25% were aseptically loosened (9–14). The ability to detect inflammatory conditions with 18F-FDG presumably depends on glucose utilization by white cells during their metabolic burst, which occurs when they are activated (37,38). Aseptic loosening and infection of a prosthetic joint are both accompanied by an inflammatory response in which leukocytes participate (39), and the inability of 18F-FDG to accurately differentiate the 2 conditions in this series is not surprising. The less satisfactory results we are reporting compared with what has been previously reported could be related to the fact that the prevalence of loosening or infection in our population was 93%, considerably higher than the approximately 50% prevalence of these conditions in other series.
WBC/Marrow
WBC/marrow imaging was both highly sensitive (100%) and specific (91%) for diagnosing the infected prosthetic joint. The results of WBC/marrow imaging in this series are in agreement with those previously published (1–3). In contrast to 18F-FDG, infection imaging with labeled leukocytes is dependent primarily on migration of labeled neutrophils to the nidus of infection (40). Although inflammation may be present in both the infected and the aseptically loosened device, neutrophils—which are invariably present in infection—are usually absent in aseptic loosening (20,21). This critical histologic difference between infection and aseptic loosening accounts for the high sensitivity and specificity of leukocyte/marrow imaging for diagnosing prosthetic joint infection, both in this series and in general.
CONCLUSION
In summary, these data demonstrate that, regardless of how the images are interpreted, the coincidence detection system 18F-FDG imaging is less accurate than, and is not a suitable replacement for, leukocyte/marrow imaging for diagnosing infection of the failed joint replacement. These data also point to the need for additional investigations in which dedicated PET systems are used and surgical and microbiologic or histopathologic findings constitute the basis of the final diagnosis.
Footnotes
Received Feb. 10, 2004; revision accepted July 15, 2004.
For correspondence or reprints contact: Christopher J. Palestro, MD, Division of Nuclear Medicine, Long Island Jewish Medical Center, 270-05 76th Ave., New Hyde Park, NY 11040.
E-mail: palestro{at}lij.edu.