Main

The risk assessment and management of therapeutic strategies for prostate cancer are presently still mainly based on clinical criteria, for example, patient characteristics, clinical stage, serum PSA, and histopathological features, in particular, tumour extent and Gleason score. However, not all low-grade prostate cancers with a Gleason score less than 7 follow an indolent course, and carcinomas with a Gleason score of 7 or higher, on the other hand, may (rarely) present with a rather favourable clinical behaviour. These discrepancies have become increasingly apparent, particularly following the updated recommendations of the 2005 ISUP Gleason System (Harlan et al, 2003), therefore emphasising the pressing need for additional molecular prognostic markers. Interestingly, the lack of reliable molecular biomarkers for the risk assessment of prostate cancer stands in sharp contrast to an ever-emerging number of proposed marker candidates. With nearly 2900 PubMed hits for the search term ‘prostate cancer prognostic marker’ (July 2014), no evidently relevant prognostic biomarker has made its way into clinical practise (Kristiansen, 2012). This issue clearly stresses the necessity to reflect on the validity of biomarkers and biomarker development or even some aspects of our scientific publication culture.

Despite the limitation of immunohistochemistry being an only semiquantitative technique, it still represents an essential methodological approach for a large fraction of published biomarker studies. This prompted us to experimentally verify the prognostic value of a wider range of previously published immunohistochemical prognostic markers for prostate cancer in a representative monocentric cohort of radical prostatectomy patients.

Materials and methods

Criteria for identification of prognostic markers

In a first step, prognostic markers for prostate cancer were identified in a PubMed search (‘prostate cancer prognostic/prognosis immunohistochemistry’). Eligibility criteria for study inclusion were: (i) cohort size—a number of patients >50 in studies using conventional slides or >100 for tissue microarray (TMA)-based studies, (ii) multivariate prognostic value demonstrated in Cox analysis (P<0.05), and (iii) conclusive immunohistochemistry protocol and antibody availability. From the pool of resulting candidates, we obtained 28 markers for validation.

Prostate cancer cohort for validation

The cohort consisted of 238 patients having undergone radical prostatectomy for treatment of primary prostate cancer between 1999 and 2006 and for which follow-up data were available. This cohort was compiled as a TMA as described previously, all tumours were reviewed and graded according to the ISUP 2005 recommendations (Gerhardt et al, 2011; Mortezavi et al, 2011; Beer et al, 2012; Gerhardt et al, 2012). Follow-up data were obtained from review of patients’ medical records. Median follow-up time for all patients was 60 months (mean 55 months). A biochemical recurrence, defined as a rising PSA level exceeding 0.1 ng ml−1, after having reached a nadir post surgery, occurred in 111 (46.6%) patients. Median time to PSA relapse was 25 months (interquartile range, 12–46 months); among men not experiencing progression, the median follow-up was 72 months (interquartile range, 60–96 months). In total, 13.4% of patients were lost to follow-up in the first 4 years.

The median patient age was 64 years (range 46–75 years). The pT-category was pT2 in 152 patients (63.8%), pT3 in 76 patients (31.9%), and pT4 in 10 patients (4.3%). A total of 151 patients (63.4%) presented with complete surgical resection (R0 resection), whereas 84 patients (35.3%) had undergone a resection with positive margins (R1 resection). The Gleason score was <7 in 45 patients (18.9%), equalled a total score of 7 in 133 patients (55.9%), and was >7 in 60 patients (25.2%). Pre-operative PSA levels ranged from 0.39 ng ml−1 to 357 ng ml−1 (median 10.5 ng ml−1, no data were available for 11 patients). This study was approved by the Cantonal Ethics Committee of Zürich (approval number StV 25-2007).

Immunohistochemistry

Automated immunohistochemical staining was carried out on one of the two technical platforms (Ventana Medical Systems, Tucson, AZ, USA/Leica Microsystems, Melbourne, Australia). If possible, antibodies were selected according to the preceding publications. Staining protocols including antibody dilution, pretreatment, and technical platform are given in Supplementary Table S1.

Evaluation of immunoreactivity

Immunohistochemical slides were evaluated and scored by a single observer (FH) after instruction and supervision of an experienced genito-urinary pathologist (GK). Wherever possible, we adopted the scoring system described in the original publications of the respective markers. Technical details of scoring (localisation, evaluated feature, and scoring scale) are given in Supplementary Table S2.

Statistics

SPSS 20 (IBM SPSS, Version 20.0, Armonk, NY, USA) was used for descriptive statistics. Further data analysis was performed using the R language for statistical computing, version 3.1.0. The statistical significance level was set at 0.05. To verify the prognostic value of the selected markers, univariate and multivariate Cox proportional hazards regression was applied. Multivariate Cox regression was performed with inclusion of the individual selected markers, Gleason score, log10(pre-OP PSA), dichotomous pT status (pT1/2 vs pT3/4), margins (R0 vs R1), and patient age. Additionally, markers were dichotomised using the web-based tool ‘cut off finder’ (Budczies et al, 2012) and were analysed using Kaplan–Meier estimates and log rank tests. A post hoc power analysis of the respective models was performed using the PASS 2008 software (NCSS, Kaysville, UT, USA).

Results

Characteristics of selected biomarker studies from the literature

Of the markers matching our inclusion criteria, 27 candidates were selected, 3 of which (androgen receptor, oestrogen receptor-alpha and -beta) each yielded two data points (stromal and epithelial immunoreactivity), resulting in 30 biomarkers for further validation (Table 1). In the originally published studies, cohorts had a median size of 225 patients (range 53–2724), the median follow-up time covered 5.0 years (range 1–12), and the median hazard ratio for disease progression of the reported biomarkers was 2.42 (range 1.1–7.69; values of markers with hazard ratios <1 have been included as their reciprocal value). All selected biomarkers play a role in a broad spectrum of tumour-relevant processes including, for example, proliferation or apoptosis, cell cycle control, cell adhesion, or hormone signalling. Most studies (n=23) were based on radical prostatectomy (RPE) cohorts, four studies described watchful waiting cohorts, and one study described a mixed population (RPE and watchful waiting).

Table 1 Overview of selected prognostic markers for prostate cancer: original studies

Immunohistochemistry and expression of selected markers

For the markers included in this study, representative examples of positive immunoreactivity in prostate cancer are shown in Figure 1. Expression frequencies and cut-off values are illustrated in Figure 2. We attempted to adopt the cut-off values suggested in the original studies, but allowed for adjustment (optimisation with the cut-off finder tool) in order to ease verification.

Figure 1
figure 1

Immunoreactivity of selected prognostic biomarkers. Representative examples of positive immunostaining are shown for each marker, highlighting the typical subcellular localisation (magnification × 200).

Figure 2
figure 2

Frequencies of immunohistochemically detected expression of selected prognostic markers in primary prostate carcinomas. Markers labelled with an asterisk were recorded as immunoreactive scores and summarised for easier visualisation (IRS 0: negative, IRS 1–3: weakly positive, IRS 4–8 moderately positive, IRS>8: strongly positive). Black bars indicate the cut-off used for dichotomising the variable for Cox regression analysis. Abbreviation: ND: non determined/missing data.

Associations of biomarkers with clinico-pathological parameters

The association of marker expression in prostate cancer with pT category, Gleason score, and pre-operative serum PSA was analysed in a Spearman rank correlation analysis. As expected for independent prognostic markers, most associations failed significance, even though some dependencies became apparent, particularly with Gleason score and serum PSA, but high correlation coefficients were not noted (Supplementary Table S3).

Prognostic significance of putative prognostic biomarkers

Initially, we conducted a post hoc power analysis and found that the available sample size of 238 analysable patients on the TMA would be sufficient to detect clinically relevant hazard ratios at a significance level of 0.05 and a power of almost 100% (Supplementary Figure S1).

Kaplan–Meier analyses identified 11 markers as being univariate significant factors: AR epithelial/stromal, CB1R, CRGA, E-Cadherin, EZH2, Ki-67, NFkB, p21, p27, and PSMA (Table 2). In univariate Cox analyses, only nine markers were found significant (nuclear and stromal AR, CB1R, CRGA, E-Cadherin, NFkB, p21, p27, and PSMA), CD10 was of borderline significance (P=0.051) and Ki-67 dropped out. Multivariate Cox regression, which was applied to analyse the prognostic value of the markers in combination with other prognostic factors, demonstrated significance for only the four markers AKT1, stromal AR, EZH2, and PSMA (Table 3). A marker-wise comparison of published and verified multivariate prognostic value, illustrated by a forest plot, highlights these differences (Figure 3). In three of the four confirmed prognostic markers, the verified hazard ratio was lower than originally reported; only for one (PSMA), a higher hazard ratio was found.

Table 2 Kaplan–Meier analysis of PSA relapse-free survival for selected prognostic markers following cut-off optimisation
Table 3 Univariate and multivariate Cox analysis of PSA relapse-free survival for verified prognostic markers
Figure 3
figure 3

Forest plots for the prognostic markers predicting biochemical PSA relapse in multivariate analysis. Comparison of hazard ratios and 95% confidence intervals for published studies (left) and the Zurich prostate cancer cohort of 238 patients (right). Statistically significant markers are depicted in black, insignificant in red.

To further analyse the prognostic values in the clinically relevant subgroup of tumours with Gleason scores of 6 or 7, another multivariate model was calculated (Table 4). In this tumour subgroup, again AKT1, stromal AR, and PSMA and additionally CB1R, CD10, E-Cadherin, and N-Cadherin were confirmed.

Table 4 Multivariate Cox analysis of PSA relapse-free survival for verified prognostic markers in the subgroup of tumours with Gleason scores <=7

Discussion

The present study is the first to systematically analyse and verify a larger set of proposed prognostic immunohistochemical markers for prostate cancer that have been demonstrated to be multivariate significant predictors for disease progression in previous studies. Although a series of quality criteria needed to be fulfilled to allow study inclusion, we were unable to confirm the majority of prognostic markers selected. Only 10 out of 28 markers (35%) could be verified in a univariate Cox approach and only 4 (14%) were subsequently confirmed as significant in a multivariate setting. According to these results, the loss of stromal androgen receptor expression was the biomarker with the most significant prognostic impact (hazard ratio 3.33); a finding that doubtlessly deserves further investigation. This rather remarkable discrepancy of published data and this study's results might be explained by either significant shortcomings of this verification study or general deficits in the publication culture concerning prognostic markers that may be either over-optimistic or too permissive.

Concerning the former, great care was taken to minimise potential biases in the conduction of this study. The study cohort, which comprised 238 primary prostate cancer patients, is a well established, contemporary, monocentric prostatectomy cohort which has been extensively used in previous publications (Tischler et al, 2010; Gerhardt et al, 2011; Mortezavi et al, 2011; Beer et al, 2012; Gerhardt et al, 2012). Although the number of patients may appear low at first sight, we want to point out, that it already exceeds the median cohort size of the studies re-analysed. Further, the application of study cases compiled as TMA does not only allow for a convenient high-throughput workflow but also serves as a model of biopsy-detected prostate cancer. In the present setup, each index tumour is represented by a single core. Even though this introduces a potential sampling bias and may compromise the analysis of highly heterogeneously expressed proteins, it also increases the likelihood to identify robust clinical prognostic markers, which ought to be detectable in a single core of a tumour-positive biopsy. Protocols for immunohistochemistry were adapted as closely to the original protocols as possible, and immunohistochemical staining was performed on standardised automated platforms that are also used for routine diagnostics at the Institute of Surgical Pathology of the University Hospital Zurich. To minimise interobserver variability, a single observer evaluated all slides. Our attitude towards statistical marker verification was deliberately benevolent, allowing for optimisation of cut-off values. Even when simplifying the multivariate model, excluding margins and patient age, the number of significantly prognostic biomarkers did not increase (data not shown). However, a caveat to the validity of the current study is that Ki-67, which is assumedly the best verified prognostic marker in prostate cancer, only showed a univariate significance (Kristiansen, 2012; Fisher et al, 2013). Admittedly, the composition of our study cohort may lead to a bias, with cases enriched for high-grade and high-stage tumours, leading to a relapse rate of 46%, which is approximately 15 percentage points above the usual rates of consecutive RPE cohorts. However, this acknowledged bias should be expected to ease the verification of biomarker candidates and is therefore considered useful in this studies' setting.

The most apparent deficit of the original studies lies in their study design, which was generally based on a retrospective analysis of a single cohort. This type of analysis is prone to model overfitting and is nowadays increasingly regarded as only hypothesis-generating because the important validation is missing. A more appropriate approach includes an initial analysis of a training cohort to establish a test hypothesis and a cut-off value, which is then verified in an independent testing cohort. In a way, this study provides the necessary testing cohort for these studies, even though under considerably tightened conditions with probable differences in surgery, in the pre-analytic steps, the immunostaining protocols, and the analytical attitude of the observers in the interpretation of immunostainings. The impact of correct statistical handling in biomarker studies has long been recognised and has led to several recommendations (Harris, 2005; McShane et al, 2005b), most importantly, the REMARK (‘REporting recommendations for tumor MARKer prognostic studies’) guidelines (Simon and Altman, 1994; Hayes et al, 1996; Altman, 1998). These provide a comprehensive description of quality criteria for prognostic biomarker studies but are still not widely accepted, as Mallett et al (2010) recently demonstrated in a meta-analysis; an observation we can only confirm.

An additional potential bias lies in the composition of the respective cohort under analysis, which may also influence biomarker performance (Braun et al, 2011). It remains unclear whether a biomarker that performs well in clinically detected populations shows similar performance in a contemporary PSA screen-detected cohort. Also, a possible influence of patient ethnicity cannot be excluded. Last but not least, the evolution of Gleason grading in the last decade (ISUP 2005) may introduce a significant bias and it is unclear which type of Gleason grading the individual studies were based on.

A general deficit of most TMA-based studies is the missing demonstration of relevant construction bias. Ideally, a new TMA should be verified with a panel of established markers prior to the analysis of new candidates, and this confirmative data should be included in the Supplementary Material of the first publication. We further suggest publicising the immunohistochemistry raw data sets in order to allow re-analyses and insightful meta-analytic studies. This would allow for proper post-publication data review and would prospectively increase the quality of published papers. Even though high-quality journals demand the deposition of genomic data in centralised repositories, there is no widely acknowledged infrastructure for immunohistochemistry data. This may be partly due to the widespread notion of immunohistochemistry being a somewhat arbitrary technique yielding semiquantitative results. However, the growing number of immunohistochemistry-based biomarker studies indicates the necessity for better tools to increase their quality and to allow for a better comparison of the published data.

The main restriction of immunohistochemistry as a technique is missing normalisation. Many factors influence the final signal intensity, considering differences in pre-analytic steps, section thickness, antigen retrieval techniques, antibody quality and concentration, and detection systems. To our knowledge, the approach of normalising an immunohistochemical staining signal by simultaneous measurement of a ‘housekeeping protein’ has not been pursued widely and is a field that needs to be developed in the future. Certainly, this would necessitate computer-aided image analysis, which, taken by itself, has not been a major breakthrough so far (Rizzardi et al, 2012).

The search for molecular prognostic markers for malignant tumours has been a central aim of biomedical researchers in the last two decades. A more concise diagnosis provided by a molecular marker may facilitate individualised patient treatment. Additionally, prognostic markers might help to unravel the molecular background of tumour progression and even represent an attractive new therapeutic target. Ideally, a prognostic factor only has a few degrees of freedom and allows for a dichotomous or trichotomous readout (e.g., negative/positive or nil/low/high) to ensure a high degree of reproducibility. Unfortunately, this does not exactly hold true for the majority of biomarkers measured on expression level, which necessitate cut-off values to delineate meaningful prognostic subgroups. As these cut-off values are mainly arbitrarily chosen, and the most popular platform to determine protein expression in tumour tissues is immunohistochemistry, it does not surprise that the reproducibility of these prognostic markers is limited (Altman et al, 1994). Promising future exceptions could be genetic events of prognostic value, which may be detected with mutation-specific antibodies. Although it is difficult to overcome the inherent obstacles of contemporary immunohistochemistry, other biological platforms may offer a more robust alternative. As an example, several mRNA expression signatures have been proposed to predict disease progression and are currently undergoing intense verification (Cuzick et al, 2011; Karnes et al, 2013; Wu et al, 2013).

Although most authors of prognostic biomarker studies in prostate cancer investigate markers for therapy planning at the biopsy stage, these studies, as exemplified in our selection, mainly use RPE specimens (and not preoperatively sampled biopsy material) and analyse disease progression following radical prostatectomy. The inherent but to our knowledge unproven assumption is that markers of disease progression following surgery can also be used upfront to estimate the tumours’ endogenous aggressiveness; or, in other words, that predictive markers are also prognostic markers. The distinction of prognostic markers that estimate the natural course of disease and predictive markers that estimate the response to therapy is still often ignored. This aspect should be considered carefully, for surgery might indeed heal and not only ameliorate the disease. As the TMPRSS2-ERG translocation illustrates, a biomarker may well allow prognostication in untreated patients (Demichelis et al, 2007; Attard et al, 2008) but can still fail to predict progression following therapy (Minner et al, 2011; Pettersson et al, 2012). Other markers may work in both settings, as demonstrated by the cell cycle proliferation signature, proposed by Cuzick et al (2011).

We are convinced that the final and most crucial step in verification of a biomarker for therapy planning at the initial biopsy stage would be a prospective trial in an active surveillance cohort. A reliable biomarker should then be able to identify insignificant tumours that can safely be kept under surveillance for a longer time and do not necessitate active treatment because of the criteria of tumour progression. Even then, long-term follow-up data with either cancer-specific death or onset of castration refractory disease as an endpoint would be highly desirable. Another open point of discussion is, if the commonly used surrogate marker of disease progression, PSA relapse, is delineating a clinically meaningful endpoint, because many patients with a PSA progression will die of other, non-cancer related causes (Attard and de Bono, 2009). To complicate the matter, the definition of PSA progression also varies between studies (Nielsen and Partin, 2007). It also has to be kept in mind, that it is highly unlikely that molecular prognostic biomarkers solely can supersede the clinico-pathological parameters that build the basis of commonly used nomograms, but are only able to add prognostic information to these. This is in principle good news for histopathologists who should strengthen their efforts to provide even more standardised reports in the future, irrespective of molecular developments.

In summary, this study sheds some very critical light on contemporary immunohistochemistry studies that aim to identify prognostic biomarkers for prostate cancer. Acknowledging the inherent limitations of this comprehensive meta-analysis and verification study, the majority of published biomarkers could not be confirmed. This is disappointing but in excellent concordance with the skeptical view of biostatisticians and may also be true for other tumour entities (Ioannidis, 2005, 2013). We feel that in addition to the suggestions made above, the REMARK guidelines, which summarise important cornerstones of biomarker investigations, clearly deserve a wider reception, better acknowledgment, and stricter adherence in order to increase the quality of published data in the future (McShane et al, 2005a).