Introduction

Periprosthetic infection following total hip or knee arthroplasty is associated with significant morbidity and costs [13]. The infection rates following primary implantation and revision surgery are approximately 1% and 3% for hip prostheses and 2% and 5% for knee prostheses, respectively [4]. Differentiating prosthetic joint infection from aseptic loosening is of crucial importance for appropriate patient management; the treatment of an infected joint prosthesis generally involves both systemic antibiotics for an extended period and exchange arthroplasty in one or two stages, whereas aseptic loosening usually requires a single revision arthroplasty [1, 2]. Diagnosing prosthetic joint infection is difficult; clinical signs and symptoms, laboratory tests, radiography, and joint aspiration are insensitive, nonspecific, or both [5]. In addition, cross-sectional imaging modalities such as CT and MRI are hampered by artifacts produced by the prosthetic devices themselves [5]. Radionuclide imaging is less affected by metallic implants and may be more useful [5]. Combined leukocyte–marrow scintigraphy has been reported to achieve a diagnostic accuracy of 90% or greater and is currently regarded as the imaging modality of choice for diagnosing prosthetic joint infection [5]. However, combined leukocyte–marrow scintigraphy is labor-intensive, time-consuming, not widely available, and potentially hazardous because of direct handling of blood products [5]. 18F-fluoro-2-deoxyglucose positron emission tomography (FDG-PET) enables visualization of hyperglycolytic inflammatory cells (leukocytes, macrophages, and other immunologically active cells) during infection; it may be an attractive alternative to combined leukocyte–marrow scintigraphy because it requires only one injection and scan and is more widely available [5]. Furthermore, treatment with antibiotics is not likely to affect the sensitivity of FDG-PET in delineating sites of infections because FDG does not rely on leukocyte migration, in contrast to combined leukocyte–marrow scintigraphy. However, controversial results have been reported on the diagnostic value of FDG-PET in detecting prosthetic joint infection and its utility is still under debate. The purpose of this study was, therefore, to systematically review and metaanalyze published data on the diagnostic performance of FDG-PET in detecting prosthetic hip or knee joint infection and to provide more insight into the causes of the controversial results in the literature.

Materials and methods

Search strategy

A computer-aided search of the PubMed/MEDLINE and Embase databases was conducted to find relevant published articles on the diagnostic performance of FDG-PET in detecting prosthetic hip or knee joint infection. The search strategy is presented in Table 1. No beginning date limit was used. The search was updated until 27 May 2008. To expand our search, bibliographies of articles which finally remained after the selection process were screened for potentially suitable references.

Table 1 Search strategy and results as on 27 May 2008

Study selection

Studies investigating the diagnostic performance of FDG-PET in detecting prosthetic hip or knee joint infection were eligible for inclusion. All reference standards used in the individual studies were accepted; however, when FDG-PET itself was part of the reference standard, the study was excluded. No language restriction was applied. Review articles, metaanalyses, abstracts, editorials or letters, case reports, guidelines for management, studies examining 15 or fewer patients with hip and/or knee prosthesis, studies performed in animals, and ex vivo studies were excluded. Studies that examined FDG with a gamma camera in coincidence mode were also excluded. Studies which provided insufficient data to construct a 2 × 2 contingency table to calculate sensitivity and specificity for detecting prosthetic hip or knee joint infection were excluded. When data were presented in more than one article, the article with the largest number of patients or the article with the most details was chosen.

Two researchers (T.C.K., R.M.K.) independently reviewed the titles and abstracts of the retrieved articles, applying the inclusion and exclusion criteria mentioned above. Articles were rejected if they were clearly ineligible. The same two researchers then independently reviewed the full-text version of the remaining articles to determine their eligibility for inclusion. Disagreements were resolved in a consensus meeting.

Study quality

The methodological quality of the included studies was assessed in terms of the potential for bias (internal validity) and lack of generalizability (external validity). For this purpose, a checklist adapted from Kelly et al. [6] and Whiting et al. [7, 8] was used. The complete criteria list is presented in Table 2. Internal validity criteria and external validity scores were scored as positive (adequate methods) or negative (inadequate methods, potential bias). If insufficient information was provided on a specific item, a negative score was given. Two reviewers (T.C.K., R.M.K.) independently assigned the scores. Disagreements between the two researchers were discussed and resolved by consensus. Subtotals were calculated for internal (maximum six) and external (maximum five) validity separately. Total quality scores were expressed as a percentage of the maximum score.

Table 2 Criteria list used to assess the methodological quality of the studies

Data analysis

Sensitivities and specificities of FDG-PET for the detection of prosthetic hip or knee joint infection (with corresponding 95%CIs) were calculated from the original numbers given in the included studies. Similarly, diagnostic odds ratios (DORs) of individual studies were calculated. The DOR is a single overall indicator of diagnostic performance and is, unlike sensitivity and specificity, independent of any threshold (cutoff) value [9]. In order to enable calculation of the DOR, a standard correction of adding 0.5 to all cells of the 2 × 2 contingency table was applied if the true-positive rate, false-positive rate, false-negative rate, or true-negative rate was zero. DORs of included original studies were displayed using forest plots.

Metaanalysis was performed using a bivariate random effects approach to pool the sensitivity and specificity [10]. This model assumes a bivariate normal distribution for the logit-transformed sensitivity and specificity values across studies, allowing for heterogeneity beyond chance due to clinical or methodological differences between studies. It incorporates and estimates the correlation that might exist between estimates of sensitivity and specificity within studies. A standard correction of adding 0.5 to all cells of the 2 × 2 contingency table was applied if the true-positive rate, false-positive rate, false-negative rate, or true-negative rate was zero. Estimates of the mean logit-transformed sensitivity and specificity were then obtained. Pooled estimates of sensitivity and specificity with 95%CIs were calculated after antilogarithm transformation of these logit estimates. To improve visualization of the results, the 95% coverage region of the estimated bivariate distribution of the logit sensitivity and specificity was transformed back to receiver operating characteristic (ROC) axes [10]. Results of the included studies were also plotted in ROC space.

Heterogeneity among the results of individual studies was tested by subjecting the DORs of individual studies to the Higgins and Thompson test, calculating the I 2 statistic [11]. If the DOR is equal across studies, the only cause of heterogeneity is a difference in cutoff levels for prosthetic joint infection. If the DOR varies across studies, factors other than cutoff differences exist as well [9]. Heterogeneity was defined as I 2 > 50%. Potential sources for heterogeneity were explored by subgroup analysis. Covariates analyzed were: study design (reported prospective study design vs. no or unreported prospective study design), way of patient recruitment (consecutive or random selection of patients vs. nonconsecutive, nonrandom selection, or unreported way of recruitment), patient spectrum (inclusion of only symptomatic prostheses vs. inclusion of both symptomatic and asymptomatic prostheses), type of joint prostheses (hip prostheses only vs. knee prostheses only), age of prostheses (only inclusion of prostheses older than 6 months vs. prostheses younger than 6 months were [also] included), reconstruction method (iterative reconstruction vs. filtered back projection), type of PET images reviewed (nonattenuation-corrected [NAC] images only or both NAC and attenuation-corrected [AC] images vs. AC images only), and way of image review (reported blinding to reference test vs. no or unreported blinding to reference test). Another important issue that requires subgroup analysis is the use of different criteria to diagnose prosthetic joint infection. Applied criteria for positivity can grossly be divided into four groups: (a) FDG uptake in the periprosthetic soft tissue; (b) increased FDG uptake at the bone–prosthetic interface (BPI); (c) increased FDG uptake at the BPI, while emphasizing that FDG uptake limited to the soft tissues adjacent to the neck of the prosthesis is not considered suggestive of infection (for hip prostheses only); and (d) other criteria. With regard to these criteria of positivity, subgroup analyses were performed as follows: (a) vs. (b, c, or d), (b or c) vs. (a or d), (b) vs. (a, c, or d), and (c) vs. (a, b, or d) (for studies or subsets in studies on hip prostheses only). Each of the predefined covariates was separately included in the bivariate model to compare the overall sensitivity and overall specificity between different strata, using a z test with the level of statistical difference set at 0.05.

Statistical analyses were executed using Meta-DiSc statistical software version 1.4 (Unit of Clinical Biostatistics, Ramón y Cajal Hospital, Madrid, Spain) and SAS statistical software package version 9.1.3 (SAS Institute, Cary, NC, USA).

Results

Literature search

The computer-aided search revealed 98 articles from PubMed/MEDLINE and 104 articles from Embase (Table 1). Reviewing titles and abstracts from PubMed/MEDLINE revealed 20 articles potentially eligible for inclusion [1231]. Reviewing titles and abstracts from Embase revealed 17 articles potentially eligible for inclusion, which were all already identified by the PubMed/MEDLINE search. Thus, 20 studies remained for possible inclusion and were retrieved in full-text version. After reviewing the full article, five articles [18, 21, 22, 25, 26] were excluded because the same data were used in another article comprising a larger number of patients, one article was excluded because it did not investigate the diagnostic performance of FDG-PET in detecting prosthetic hip or knee joint infection [14], one article [16] was excluded because the same data were used in another article providing more study details, one article [29] was excluded because less than 15 patients with hip and/or knee prosthesis were investigated, and one article [31] was excluded because it appeared to be an abstract only. Screening references of the remaining articles resulted in one other potentially relevant article [32]. However, this article was excluded because less than 15 patients with hip and/or knee prostheses were investigated [32]. Thus, eventually 11 studies [12, 13, 15, 17, 19, 20, 23, 24, 27, 28, 30], comprising a total sample size of 635 prostheses, met all inclusion and exclusion criteria, and they were included in this systematic review. The characteristics of the included studies are presented in Tables 3, 4, and 5.

Table 3 Patient characteristics of included studies
Table 4 FDG-PET parameters and image interpretation of included studies
Table 5 Reference standards used in the individual studies

Methodological quality assessment

Methodological quality was assessed by 11 items. The scores for internal and external validity are presented in Table 6. The total score for combined internal and external validity, expressed as a fraction of the maximum score, ranged from 45% to 91% (median 82%).

Table 6 Quality assessment of included studies

Diagnostic performance

The results of the 11 included studies are presented in Table 7, their DORs are displayed in Fig. 1, and the corresponding ROC plot is displayed in Fig. 2. Stumpe et al. [20] provided two results (Table 5), but only the first result of their study was used for metaanalysis and assessment of heterogeneity. Sensitivity and specificity of FDG-PET for the detection of prosthetic hip or knee joint infection ranged from 22.2% to 100% and from 61.5% to 100% with pooled estimates of 82.1% (95%CI = 68.0–90.8%) and 86.6% (95%CI = 79.7–91.4%), respectively. Heterogeneity among the DORs of individual studies was present (I 2 = 68.8%). Overall specificity of FDG-PET in hip prostheses was significantly higher than that in knee prostheses (89.8% vs. 74.8%, p = 0.0164). Overall specificity of studies using filtered back projection was significantly higher than that of studies using iterative reconstruction (98.3% vs. 82.3%, p = 0.0235). No statistically significant differences were observed in sensitivities and/or specificities within the subgroups study design (reported prospective study design vs. no or unreported prospective study design), way of patient recruitment (consecutive or random selection of patients vs. nonconsecutive, nonrandom selection or unreported way of recruitment), patient spectrum (inclusion of only symptomatic prostheses vs. prostheses younger than 6 months were also included), age of prostheses (only inclusion of prostheses older than 6 months vs. prostheses younger than 6 months were [also] included), type of PET images reviewed (NAC images only or both NAC and AC images vs. AC images only), way of image review (reported blinding to reference test vs. no or unreported blinding to reference test), and criteria for positivity (four different comparisons) (Table 8).

Fig. 1
figure 1

Forest plot with diagnostic odds ratios of included original studies (logarithmic scale)

Fig. 2
figure 2

ROC plot with pooled sensitivity and specificity (including 95% confidence ellipses) and results of included original studies for the detection of prosthetic hip or knee joint infection using FDG-PET

Table 7 Results of included studies
Table 8 Results of bivariate analysis with covariates

Discussion

This systematic review and metaanalysis included 11 studies comprising a total sample size of 635 prostheses. Overall methodological quality of included studies was good. Metaanalytically, FDG-PET achieves moderate to high sensitivity and specificity in detecting prosthetic hip or knee joint infection. However, this result should be interpreted cautiously because significant heterogeneity was identified among the results of individual studies. Several causes may underlie this heterogeneity and explain the controversial results in the literature. Subgroup analysis revealed that overall specificity of FDG-PET in hip prostheses was significantly higher than that in knee prostheses, and overall specificity of studies using filtered back projection was (inexplicably) significantly higher than that of studies using iterative reconstruction (Table 8). The lower specificity of FDG-PET in knee prostheses may be related to the relatively limited knowledge about the incidence and pattern of nonspecific FDG uptake around knee prostheses. Zhuang et al. [33] reported that increased FDG uptake around the femoral head and neck (possibly due to foreign body reaction to the material of the prosthetic joint) may persist for years following hip arthroplasty and can occur in both symptomatic and asymptomatic patients; it should not be interpreted as periprosthetic infection. Increased FDG uptake around the distal tip of the hip prosthesis is also nonspecific. However, FDG uptake along the interface between bone and hip prosthesis is virtually never seen in asymptomatic patients or in those with aseptic loosening and is, therefore, highly suggestive of infection [33]. Persistently increased nonspecific FDG uptake following knee arthroplasty has also been mentioned [33], but should be further investigated. More knowledge about the incidence and pattern of nonspecific FDG uptake around knee prostheses may improve the specificity of FDG-PET in detecting prosthetic knee joint infection. Despite the findings of Zhuang et al. [33], our subgroup analysis did not reveal any significantly higher sensitivity or specificity among studies which used FDG uptake at the BPI as criterion for positivity, while emphasizing that FDG uptake limited to the soft tissues or adjacent to the neck of the prosthesis was not considered suggestive of infection (Table 8). Metallic prosthetic material can cause artifacts on attenuation-corrected FDG-PET images and may also affect diagnostic performance. Goerres et al. [34] reported that the use of attenuation correction (both 68Ge-based and CT-based) generates artifacts of apparently increased FDG concentration around metallic hip implants. The shape of the prosthesis, the absorption properties of the surrounding tissues, and the method of transmission scanning (68Ge-based or CT-based) influence the appearance of such artifacts. It should be noted that all evidence regarding the diagnostic performance of FDG-PET in prosthetic joint infection has been acquired using stand-alone PET scanners, which use a radionuclide source for attenuation correction. Combined PET/CT is replacing the stand-alone PET scanner in clinical practice, but may perform differently because it uses CT-based attenuation correction; this important issue should be further investigated. Goerres et al. [34] further reported that patient movement worsens attenuation artifacts, whereas attenuated-weighted iterative reconstruction appears to reduce the visibility of artifacts [34]. The presence of artifacts on attenuation-corrected images has also been observed in knee prostheses; in a phantom study, Van Acker et al. [28] showed that artifacts mimicking FDG uptake adjacent to a knee prosthesis can arise in attenuation-corrected images obtained with different methods of image reconstruction. In addition, Heiba et al. [35] reported the observation of an artifact within the joint space of total knee metallic prostheses in two patients on attenuation-corrected images. No uptake, however, was noted in the same location on the nonattenuation-corrected images [35]. Thus, verification of attenuation-corrected images against nonattenuation-corrected images may avoid false-positive results because of the abovementioned reasons. However, our subgroup analysis did not reveal any significantly lower sensitivity or specificity in the study which exclusively evaluated attenuation-corrected images (Table 8). In addition, no statistically significant differences in diagnostic performance were observed in the subgroup analyses according to study design, way of patient recruitment, patient spectrum, age of prostheses, and way of image review (Table 8). It should be noted, however, that results from our subgroup analysis may not be conclusive because of the relatively small number of included studies. Furthermore, it was not possible to perform subgroup analyses according to FDG dose, time interval between FDG administration and scanning, acquisition time for emission scans, number and experience of interpreters, reference standard used, and way of interpreting the reference test because no (meaningful) stratifications could be made of the available data of included studies. A large multicenter study is required to further investigate potential sources of heterogeneity and validate the use of FDG-PET for diagnosing prosthetic joint infection. Another drawback of this metaanalysis is the use of different (imperfect) reference standards in the individual studies (Table 5), which may have lead to misclassification bias and may have affected the estimates of diagnostic performance of FDG-PET. However, because no perfect reference test exists yet for detecting prosthetic joint infection and all studies used a combination of reference standards (Table 5), we accepted this shortcoming.

Combined leukocyte–marrow scintigraphy is currently regarded as the imaging modality of choice for diagnosing prosthetic joint infection [5]. Two studies made a direct comparison between FDG-PET and combined leukocyte–marrow scintigraphy [15, 28]. Pill et al. [15] investigated 89 patients for revision of painful hip prosthesis. Of the 89 patients, 46 underwent both FDG-PET and combined leukocyte–marrow scintigraphy for a total of 51 hip prostheses. Although FDG-PET and combined leukocyte–marrow scintigraphy demonstrated comparable specificities (93% and 95.1%, respectively), FDG-PET exhibited a substantially higher sensitivity (95.2% and 50%, respectively) [15 ]. Van Acker et al. [28] investigated 21 patients with a painful knee arthroplasty. All patients underwent FDG-PET and 20 of 21 patients underwent combined leukocyte–marrow scintigraphy. Sensitivity and specificity of FDG-PET were 100% and 73%, respectively, and sensitivity and specificity of combined leukocyte–marrow scintigraphy were 100% and 93%, respectively [28]. Based on this small number of studies [15, 28], however, no definite conclusion can be drawn yet on the diagnostic performance of FDG-PET compared to that of combined leukocyte–marrow scintigraphy.

Antigranulocyte scintigraphy (AGS) with monoclonal antibodies or antibody fragments may be another attractive approach to detect prosthetic joint infection [3638]. Unlike combined leukocyte–marrow scintigraphy, which requires time-consuming and potentially dangerous in vitro labeling of autologous leukocytes, AGS allows in vivo labeling of granulocytes in the inflamed tissue surrounding the prosthesis [3638]. A recent metaanalysis on the diagnostic performance of AGS included 13 studies with a total sample size of 522 prostheses and reported independent random effects summary estimates of sensitivity and specificity of 83% and 80%, respectively [38]. Future studies are required to compare the diagnostic performance of combined leukocyte–marrow scintigraphy, FDG-PET, and AGS and to assess which imaging modality is most cost-effective.

In conclusion, in this metaanalysis, overall diagnostic performance of FDG-PET was moderate to high. Caution is warranted, however, because results of individual studies were heterogeneous and could not be fully explored. Future studies should further explore causes of heterogeneity and validate the use of FDG-PET for diagnosing prosthetic joint infection.