Introduction

Breast carcinoma is the most common cancer in women in Western Europe and the United States with an incidence highest in the 40–55 age range, and its prevalence is still on the rise (Parker et al. 1997; von Fournier et al. 1993). It accounts for 40,000 and 14,000 deaths yearly in the US and UK, respectively, and that makes it the second cause of cancer death in women in those countries (Parker et al. 1997; American Cancer Society 2002; Cancer Research Campaign 1996). Despite major progress in surgical treatment, radiotherapy, and adjuvant chemotherapy protocols, tumor recurrence and metastasis have remained as a major problem in breast cancer management (Yilmaz et al. 2007). Approximately, the risk for patient of breast cancer to develop recurrence is 7–30% and to suffer distant metastases is 45–90% at some time within the course of their disease (Bongers et al. 2004). The survival of women suffering form recurrence and metastasis is strikingly different: Women with a local recurrence have a 21–36% 5-year relative survival rate (Bongers et al. 2004), while women with distant metastatic disease have a 25% 5-year relative survival rate (Isasi et al. 2005). Early detection and accurate restaging of recurrent breast cancer are important to define appropriate therapeutic strategies and increase the chances of a cure (Schmidt et al. 2008; Radan et al. 2006; Ternier et al. 2006). In addition, distant metastases are the most important prognostic factors in women with breast cancer which changes the intention of therapy from curative to palliative (Landheer et al. 2005). Thus, it is critical to detect recurrence and distant metastases in the follow-up of women with breast cancer.

According to the recommendations of the American Society of Clinical Oncology (ASCO) 2006 update of the breast cancer follow-up and management guidelines in the adjuvant setting, physical examination and mammography should be used routinely in the breast cancer surveillance. Additional imaging methods, such as chest X-ray, bone scans, liver ultrasound (US), computed tomography (CT) scans, positron emission tomography with 18F-fluoro-deoxy-glucose (FDG-PET) scans and breast magnetic resonance imaging (MRI), are not recommended (Khatcheressian et al. 2006). But physical examination and mammography have their limitations, especially for lesions situated deep in the muscle layer, some distance away from the scar or in the axilla (Rissanen et al. 1993). Furthermore, the surgery and radiotherapy could also induce deleterious changes in breast tissue (Stomper et al. 1987; Orel et al. 1992; Dershaw et al. 1992). In such cases, the reliability of the diagnosis might be complemented by the use of the recent conventional anatomic imaging modalities such as US, CT, MRI, scintimammography (SMM) or the whole-body imaging modality FDG-PET, which provides information about the metabolic activity of tumors.

Although extensive researches have been performed with regard to US, CT, MRI, SMM and FDG-PET for the detection of recurrent and metastatic breast cancer, no comprehensive comparison has yet been conducted among all the non-invasive diagnostic tools. Isasi et al. (2005) performed a meta-analysis to assess FDG-PET for the evaluation of breast cancer recurrences and metastases; however, it did not assess other important non-invasive methods—US, CT, MRI, SMM which are widely used both in surveillance and follow-up. Thus, our study aims to perform a comprehensive systematic review to obtain the overall diagnostic performance of US, CT, MRI, SMM and FDG-PET for the detection of recurrent and metastatic breast cancer on a per-patient and a per-lesion basis, which, to our knowledge, had not previously been studied.

Materials and methods

Literature search

A computer literature search as a comprehensive search (Devillé et al. 2000) of abstracts about studies in human subjects from January 1995 to August 2008 through MEDLINE and EMBASE databases was performed to identify articles about the diagnostic performance of US, CT, MRI, SMM and PET (interpreted with or without the use of CT) for the detection of recurrent and metastatic breast cancer. The following keywords were used: (“US” OR “ultrasound” OR “CT” OR “computed tomography” OR “MRI” OR “magnetic resonance imaging” OR “scintimammography” OR “SMM” OR “PET” OR “positron emission tomography” OR “FDG” OR “fluorodeoxyglucose”) AND (“breast carcinoma” OR “breast cancer” OR “carcinoma of breast” OR “breast neoplasm”) AND (“sensitivity” OR “specificity” OR “false negative” OR “false positive” OR “diagnosis” OR “detection” OR “accuracy”). The China bio-medicine databases were used for Chinese articles with the following keywords: (“US” OR “CT” OR “MRI” OR “scintimammography” OR “SMM” OR “PET” OR “FDG”) AND “breast carcinoma” (in Chinese). Other databases such as Cochrane Library, Cancerlit, and China National Knowledge Infrastructure database were also searched for relevant articles. Carefully extensive cross-checking of the reference lists of all retrieved articles was done to supplement the list of articles.

Selection of studies

The inclusion criteria were as follows: (1) full reports published in English or Chinese, (2) all articles in the published literature, (3) both retrospective and prospective articles, (4) articles dealt with the performance of US, CT, MRI, SMM and PET (alone or in combination, but not in sequence) in recurrent and metastatic breast carcinoma. (5) Only articles confirmed the diagnosis with the reference standard as histopathologic analysis and/or close clinical and imaging follow-up for at least 6 months. (6) Only articles that present sufficient data to calculate the true-positive (TP) and false-negative (FN) values were included. (7) At least 10 patients were included in the article. (8) When data or subsets of data were presented in more than one article, the article with the most details or the most recent articles was chosen. CT studies without contrast agent were excluded. Studies using sequential test combinations (e.g., PET in patients selected on the basis of abnormal US or CT image) were excluded because the selection of patients on the basis of diagnostic test results could have unpredictably modified the estimate of the operative characteristics of the tests themselves (Sackett and Haynes 2002).

Four reviewers, who had at least 3 years work experience in the special fields of US, CT, MRI, SMM or PET, independently checked retrieved articles only in their own fields. To minimize bias in the selection of studies, one reviewer, who had more than 10 years work experience both in oncology and radiology, checked all articles. In case of discordances, a consensus re-review between all reviewers was performed.

Data extraction

Information extracted form each article included first author, study date, simple size, age of subjects, reference standard, unit of analysis (patients or lesions), technical characteristics of each imaging modality, and the number of true positives, false positives, true negatives, and false negatives. Data were extracted independently by the same four observers. Data abstraction was not blinded with regard to unnecessary information such as the authors, the authors’ affiliation, the journal name or year of publication (Berlin 1997). Disagreements were resolved in consensus.

Quality assessment

Quality Assessment of Diagnostic Accuracy Studies (QUADAS) criteria were used to assess the quality of every article (Whiting et al. 2003). The following data were extracted to perform accuracy analyses: (1) clinical characteristics of the study sample described (age, sex, number of patients enrolled, reason for performing particular imaging modality); (2) patient selection (consecutive or not); (3) study type (prospective, retrospective or unknown); (4) independence of test interpretation (blinded or not); (5) verification bias (no bias, limited or considerable: i.e., no bias means all patients or lesions were confirmed by histopathologic analysis; considerable verification bias means only a small number of patients or lesions were confirmed by histopathologic analysis; others were referred to as limited verifications bias).

The following features were also included: As to US, type of probe, probe frequency (MHz) and type of scanning were included. As to CT, the type of scanner (non-helical or helical), section thickness, or use of contrast agent or not were included. And as to MRI imaging, magnetic field strength, type of coil, use of contrast agent or not were included. As to SMM, scanner, contrast agent, contrast dose, collimator were included. As to FDG-PET, amount of tracer, camera model, resolution, attenuation correction and type of analysis (qualitative or quantitative or both) were included. The numbers of TP, FN, false-positive (FP) and true-negative (TN) results in the detection of recurrent and/or metastatic breast cancer were extracted on a per-patient or per-lesion.

Statistical analysis

A random effect model (Fleiss et al. 2003) was used for the primary meta-analysis to obtain a summary estimate for sensitivity and specificity with 95% confidence intervals (CI) of each non-invasive technique. We also calculated summary receiver operating characteristic (SROC) curves and the area under the curve (AUC). In a meta-analysis, each separate study contributes an estimate of true-positive rates (TPR) and false-positive rates (FPR). A graph is made from the TPR and FPR points. The SROC curve is placed over the points to form a smoothed curve which can be achieved using a regression model proposed by Moses et al. (1993). And then, we did Z test to find whether the sensitivity (SE), specificity (SP) and AUC of each modality was significantly different from others or not, if p < 0.05 was considered as statistically significant. All the statistics (sensitivity, specificity, PPV, negative predictive value (NPV), accuracy, SROC, AUC) refer to recurrence and surveillance.

We tested the following items: threshold effects between studies (Deeks 2001) using Spearman correlation coefficients ρ (the cutoff effect was considered present in the case of a ρ value > 0.4; Devillé et al. 2002); heterogeneity using the likelihood ratio χ 2 test (if p < 0.05 was considered having apparent heterogeneity; Fleiss et al. 2003) and I 2 index which is a measure of the percentage of total variation across studies due to heterogeneity beyond chance and takes values between 0 and 100%. Its values over 50% indicate heterogeneity (Huedo-Medina et al. 2006). Publication bias was assessed by funnel plots. Since data on US, CT, MRI or FDG-PET imaging were limited, we did not perform subgroup analyses.

All of the statistical analyses were undertaken using SAS statistical software version 8.2 (SAS Institute Inc., Cary, NC, USA) and Meta-DiSc (Version 1.4) (Zamora et al. 2006). (Meta-DiSc, produced by Javier.zamora, is freeware software to perform systematic review of studies of evaluation of diagnostic and screening tests.)

Results

Literature search and study design characteristics

The computerized search yielded 1,017 primary studies, of which 969 were excluded. The reasons for exclusion were as follows: (a) the aim of the articles was not to reveal the diagnostic value of US, CT, MRI, SMM, FDG-PET (with or without CT) for identification and characterization of recurrent or metastatic breast cancer (n = 817); (b) the reference standard was not used as histopathologic analysis or close clinical and imaging follow-up for at least 6 months (n = 79); (c) data from the article that could be used to construct or calculate TP, FP, TN and FN (n = 39); (d) data from the article come from a combination of different imaging modalities that could not be differentiated for assessment of single tests (n = 32); (e) article was printed more than once, article with smaller population was excluded (n = 2); (f) article that cannot be accessible (n = 3); (g) data included less than 10 patients (n = 3). A total of 43 studies (Yilmaz et al. 2007; Bongers et al. 2004; Schmidt et al. 2008; Radan et al. 2006; Ternier et al. 2006; Rissanen et al. 1993; Bruneton et al. 1986; Lee et al. 1993; Gilles et al. 1993; Dehdashti et al. 1995; Melani et al. 1995; Hagay et al. 1996; Winehouse et al. 1999; Rieber et al. 1997; Drew et al. 1998; Muüller et al. 1998; Moon et al. 1998; Cwikla et al. 1998; Hathaway et al. 1999; Qayyum et al. 2000; Stuhrmann et al. 2000; Bäz et al. 2000; Eubank et al. 2001, 2004; Kim et al. 2001; Belli et al. 2002; Liu et al. 2002; Goerres et al. 2003; Suárez et al. 2002; Kamel et al. 2003; Gallowitsch et al. 2003; Siggelkow et al. 2003; De Cicco et al. 2004; Shin et al. 2005; Weir et al. 2005; Lamuraglia et al. 2005; Preda et al. 2006; Wolfort et al. 2006; Piperkova et al. 2007; Rajkovaca et al. 2007; Usmani et al. 2007; Haug et al. 2007; Riebe et al. 2007) fulfilled all of the inclusion criteria and were considered for the analysis (Table 1). 15 studies were prospective, 16 studies were retrospective, and the remaining was not defined. Patient selection was consecutive in 18 studies and not defined in 25. 11 studies took only histopathologic analysis as reference standard, indicating a complete verification and lack of bias, while 10 studies showed limited verification bias and 22 studies still showed considerable verification bias. TP, FN, FP, TN results and some features of each modality were shown in Tables 2, 3, 4, 5 and 6.

Table 1 Main characteristics of the included studies
Table 2 TP, FP, FN, PN and other features of US (10 studies in all)
Table 3 TP, FP, FN, PN and other features of CT (eight studies in all)
Table 4 TP, FP, FN, PN and other features of MRI (11 studies in all)
Table 5 TP, FP, FN, PN and other features of SPECT (seven studies in all)
Table 6 TP, FP, FN, PN and other features of PET (21 studies in all)

Publication bias, heterogeneity and cutoff effect

To assess a possible publication bias, scatter plots were designed using the log diagnostic odd ratios (DORs) of individual studies against their sample size. The funnel plot of US, CT, MRI, SMM and PET was given in Fig. 1. In detail, the US, CT, MRI, SMM and PET showed marked asymmetry (with small studies missing from the bottom left quadrant, thus suggesting a publication bias). There was heterogeneity for most non-invasive modalities except SMM and PET, which confirmed either by likelihood ratio χ 2 test or I 2 index (Table 7). There was no conclusive evidence of a cutoff effect for US and PET to Spearman correlation coefficients (ρ < 0.4). But a cutoff effect was present for CT, MRI and SMM (ρ > 0.4; Table 8).

Fig. 1
figure 1

Funnel plots of US, CT, MRI, SPECT and PET

Table 7 Test for heterogeneity and threshold effect in the meta-analysis
Table 8 AUC and Q* index and ρ value for US, CT, MRI, SPECT and PET

Pooled sensitivity, pooled specificity and DORs

On the basis of a random effect model, pooled sensitivity, pooled specificity and DOR of those non-invasive modalities were shown in Table 9. Pooled sensitivity of MRI and PET (with or without CT) was 0.9500 and 0.9530, respectively, no statistical significant difference was found between MRI and PET (p > 0.05). However, they had highest pooled sensitivity, p < 0.05, when compared with US, CT and SMM. Pooled specificity of US and MRI was 0.962 and 0.929, respectively, no statistical significant difference was found between US and MRI (p > 0.05). However, they had highest pooled specificity, p < 0.05, when compared with CT, SMM and PET. The DOR estimates for MRI and PET were 131.78 (95% CI 70.9310–244.8100) and 106.88 (95% CI 68.1040–167.73), respectively, and were significantly higher than for US, CT and SMM (p < 0.05). The results were also shown in Table 9.

Table 9 Summary estimates of sensitivity, specificity, and diagnostic odds ratio (DOR) for US, CT, MRI, SPECT and PET

Summary ROC curves, AUC and the Q* index

Summary receiver operating characteristic analysis was used to compare those non-invasive modalities. The AUC of US, CT, MRI, SMM and PET (with or without CT) was presented in Table 8. AUC of MRI and PET (with or without CT) is 0.9718 and 0.9604, respectively; however, no significant difference was found between those two modalities, p > 0.05. Results of pairwise comparison between each modality demonstrated that AUC of both MRI and PET (with or without CT) was higher than that of US or CT, p < 0.05. AUC of SMM was 0.9386, no statistical significance was found when compared with that of MRI and PET (with or without CT), p > 0.05. In terms of its AUC, there was still no statistical significance between CT and US, p > 0.05. The Q* index estimates for US, CT, MRI, SMM and PET (with or without CT) were 0.8593, 0.7904, 0.9228, 0.8757 and 0.9051, respectively. Like AUC, the Q* index estimates for MRI and PET were significantly higher than for US, CT and SMM, p < 0.05. And, they were similar for MRI and PET (Table 9; Fig 2).

Fig. 2
figure 2

Summary ROC curves of US, CT, MRI, SMM and PET

Discussion

Soerjomataram et al. (2008) conducted a review to summarize available knowledge on the determinants of survival 10 years or more after breast cancer diagnosis and found that patients with recurrent metastasized or second cancer generally exhibited lower long-term survival than those without. Locoregional recurrences predominately affect the breast, skin, the axillary and supraclavicular nodes and the chest wall. Internal mammary (IM), mediastinal nodes, pleura and lung parenchyma are the most common sites of intrathoracic recurrence after primary surgical resection (Fisher et al. 2001; Hatteville et al. 2002). Extrathoracic recurrence often occurs in bone, liver and brain. The correct identification of local recurrences and distant metastases at the time of suggestive symptoms in the follow-up for breast cancer prompts clinical consideration for administering different therapies (Nomura et al. 1999; Wapnir et al. 2006). Thus, it is crucial for patients with breast cancer to early detect recurrences or metastases (Eubank et al. 2002; Kamby et al. 1988; Yang et al. 2007; Siggelkow et al. 2004). This meta-analysis focused on evaluating the diagnostic ability of US, CT, MRI, SMM and PET (interpreted with or without the use of CT), which are the widely used non-invasive modalities for the detection of recurrent and/or metastatic breast cancer.

Previous studies have discussed the diagnosis ability of US in detecting recurrent and/or metastatic breast cancer. Lamuraglia et al. (2005) determined the efficacy of Doppler US with contrast agent (DUPC) in local recurrent breast cancer, revealed a SE of 67% and a SP of 100%. Eubank et al. (2001) evaluated the benefit of echo-contrast-enhanced Doppler sonography the differentiation of benign versus malignant breast lesions in 38 patients who had surgical removal of a malignant breast mass. The baseline ultrasound examination showed an SE of 50% and an SP of 86%, after contrast enhancement the ultrasound findings demonstrated an SE of 100% and an SP of 96%. In conclusion of their findings, it suggests that contrast-enhanced sonography aids in the differentiation of local recurrence from benign scar lesions. Therefore, US may be most useful when abnormal, but normal values cannot exclude the presence of active disease.

CT imaging, by virtue of its cross-sectional display, is widely used in recent years. However, reports in the literature differ with regard to diagnostic accuracy of CT imaging in detecting recurrent and/or metastatic breast cancer, ranging from 40 to 92% and from 41 to 100% for SE and SP, respectively (Radan et al. 2006; Ternier et al. 2006; Winehouse et al. 1999; Bäz et al. 2000; Gallowitsch et al. 2003; Piperkova et al. 2007; Riebe et al. 2007; Armington et al. 1987). Recently, CT has been the main modality used to evaluate mediastinal nodes in oncology, but as this technique uses size as the main criterion to assess nodal status, it is limited by poor SE. Landheer et al. (2005) also found that metastatic lymph nodes are often not identified by CT, and those smaller than 1 cm are often described as non-pathological. Due to their small size and anatomical position, it is difficult to confirm a pathological diagnosis. Moskovic et al. (1992) found that the detection rate of CT of breast cancer recurrence in patients without a palpable axillary mass is extremely low and they suggested that this technique unjustified screening for clinically occult axillary disease in patients with arm symptoms following axillary surgery or radiation therapy for breast cancer. Similarly, Armington et al. (1987) demonstrated that 11 of 30 patients with axillary and supraclavicular lesions were missed because of inadequate visualization of the axillary apex with CT imaging. To date, early detection of metastases by repeated conventional imaging tests (CT, ultrasound, and bone scintigraphy) has not been shown to be of benefit over routine follow-up in terms of patient survival (McLoud et al. 1992; Webb et al. 1991).

Previous studies have demonstrated that the contrast-enhanced MRI imaging of the breast has been a sensitive modality for the detection of breast tumor recurrence, with a SE of nearly 100%, and this has become one of the most common indications for the examination (Kneeland et al. 1987; de Verdier et al. 1993; Bilbey et al. 1994; The GIVIO Investigators 1994). Preda et al. (2006) investigated 93 consecutive patients with breast cancer; the SE, SP, and NPV of MRI for the diagnoses of recurrent breast cancer were 93.8, 90, and 98.8%, respectively. The NPV of MRI, which indicates a very low likelihood of new malignancy if MRI defines the lesion as benign, is impressingly high. And, Preda suggests that lesions graded by MRI as Fisher I–II (BI-RADS I–II) can be safely monitored with the usual yearly follow-up. A repeat MRI examination after 6 months is recommended for lesions graded as Fischer III (BI-RADS III), if there is no clinical suspicion of recurrence before 6 months. For lesions graded higher than Fischer IV (BI-RADS IV), further cytological or histological evaluation is mandatory. This result is in line with previous result provided by Heywang-Köbrunner (et al. 1993), with a NPV of 100%. Schmidt et al. (2008) compared the performance in recurrent breast cancer patients using FDG-PET/CT and whole-body MRI and found that whole-body MRI showed a higher diagnostic accuracy of 94 versus 90% for FDG-PET/CT. In our study, we synthesized the currently available information of MRI in detecting recurrent and/or metastatic breast cancer, and found that the pooled sensitivity, pooled specificity and AUC are 0.9500, 0.929 and 0.9718, respectively. On the basis of current evidences, the overall diagnostic ability of MRI and PET was similar; however, MRI had the advantage that it had excellent contrast in soft tissue and parenchymal structures and the larger anatomical coverage compared to PET/CT (skull base to proximal femurs).

SMM is the method by which breast pathology is identified using a radiopharmaceutical. The agent used can be tumor specific such as 99m Tc-sestamibi (99mTc-MIBI) or a non-specific tracer such as 99mTc-methylene diphosphonate (99mTc-MDP) and Thallium-201. Several clinical studies have reported that 99mTc-MIBI SMM is accurate in differentiating palpable breast lesions, and the utility of the technique has been emphasized in decreasing the number of breast biopsies (Landheer et al. 2005; Kao et al. 1994). Although multi-center trials had been done, SMM has not been widely adopted to resolve cases that are equivocal by mammography (Khalkhali et al. 1995; Tolmos et al. 1998). The major problem is the lower SE of SMM for non-palpable tumor. Tiling et al. (1998) made a meta-analysis and showed that SMM may be useful in recurrent breast cancer because post-surgical or post-radiotherapy changes made anatomical methods of imaging of limited use. But due to the number of patients studied was rather small, SMM cannot be recommended for detection of recurrent and/or metastatic breast cancer.

PET with radiolabeled glucose analog FDG is a method that is based on the increased glucose metabolism of malignant tumors. It can reveal the functional information that even the most exquisitely detailed anatomic image cannot provide. FDG-PET seems to have reasonable sensitivity and specificity in the detection of recurrent and metastatic breast cancer, particularly in the subset of patients presenting with elevated tumor markers (Aide et al. 2007). Suárez et al. (2002) reported that patients with CA153 blood levels above 60 U/ml were always associated with positive PET, while CA153 blood levels below 50 U/ml were always associated with negative one. Liu et al. (2002) got the similar results, the diagnostic SE and accuracy of FDG-PET in patients with suspected recurrent breast cancer and asymptomatically elevated tumor markers were 96 and 90%. When compared to CT and MRI, PET was shown to be superior in the detection of mediastinal and IM node metastases (Eubank et al. 2001; Goerres et al. 2003). As for bone scintigraphy, PET had also been shown to be superior in detecting bone metastases (Kao et al. 2000).

PET–CT is a full-ring-detector clinical PET scanner combined with a multi-detector row helical CT scanner, which allows contemporaneous and co-registered acquisition of both PET and CT images (Fueger et al. 2005). In a retrospective review of 75 patients with suspected breast cancer, Tatsumi et al. (2006) compared performance of PET and PET/CT. PET/CT resulted in improved diagnostic confidence compared with PET in 60% of patients and in 55% of regions. Another two publications (Radan et al. 2006; Pecking et al. 2001) drew similar results; the use of PET/CT technology indicated only a marginal improvement in diagnostic accuracy, reporting SE, SP and accuracy rates of 90, 71, 83%, and 94, 84, 99%, respectively. Most importantly, several studies demonstrated that FDG-PET/CT had an impact on the management of 51–69% of patients (Radan et al. 2006; Eubank et al. 2004).

To our knowledge, this meta-analysis was the first report that assessed and compared summary estimates of overall diagnostic ability for those non-invasive methods that were currently used for detecting recurrent and/or metastatic breast cancer. In this clinical context, if those methods were compared with each other, the results of our meta-analysis demonstrated that US had the highest SP and PET had the highest SE. The AUC of MRI and PET, whether interpreted with or without the use of CT, was higher than that of US or CT, but there was no statistically significant difference when PET or MRI was compared with SMM. Because of the highest SE, an abnormal US image was always a strong indication of recurrent tumor; however, US had disadvantages in cases of fat necrosis and structural distortion after surgery and furthermore its results do not usually alter the management plan in terms of biopsy or follow-up determined on the basis of physical and/or mammographic findings (Bruneton et al. 1986). Therefore, additional imaging information of the recurrent and/or metastatic foci was necessary to a highly suspected patient with an indeterminate US. In our meta-analysis, both MRI and PET had highest SE, which resulted in higher cancer detection rate. Regarding that PETs’ high expense and modest whole-body radiation exposure, PET was not suited for screening purposes in breast cancer. Therefore, MRI should be the next diagnostic step in patients with an indeterminate or low probability of malignancy. Since that whole-body mets with MRI is impractical in most circumstances, PET had its own advantages in whole-body surveillance for mets. When MRI shows an indeterminate or benign lesion or MRI was not applicable (e.g., pacemaker), FDG-PET could be performed in addition. Furthermore, a lesion that was indeterminate or benign on MRI and negative on PET indicated a very low probability of malignancy. In conclusion, MRI seemed to be a more useful supplement to current surveillance techniques to assess patients with suspected recurrent and/or metastatic breast cancer.

To be sure, our study had some drawbacks. Firstly, the effect of characteristics of the patients could not be examined due to lack of data. Secondly, the reference standard used in this systematic review ranged from histopathologic analysis to follow-up. Thirdly, most results showed heterogeneity, suggesting the needs for high-quality prospective studies and multi-center trials. Fourthly, the possibility of publications bias occurred in our meta-analysis. It was possible that our pooled estimates were too optimistic, as studies with favorable results were more likely to be submitted and published. Finally, further cost-effectiveness analysis should be conducted regards to the surveillance techniques in the breast cancer.

Conclusion

In conclusion, MRI seemed to be a more useful supplement to current surveillance techniques to assess patients with suspected recurrent and/or metastatic breast cancer. If MRI shows an indeterminate or benign lesion or MRI was not applicable (e.g., pacemaker), FDG-PET could be performed in addition.