On the prognostic value of survival models with application to gene expression signatures

Stat Med. 2010 Mar 30;29(7-8):818-29. doi: 10.1002/sim.3768.

Abstract

As part of the validation of any statistical model, it is a good statistical practice to quantify the prediction accuracy and the amount of prognostic information represented by the model; this includes gene expression signatures derived from high-dimensional microarray data. Several approaches exist for right-censored survival data measuring the gain in prognostic information compared with established clinical parameters or biomarkers in terms of explained variation or explained randomness. They are either model-based or use estimates of prediction accuracy.As these measures differ in their underlying mechanisms, they vary in their interpretation, assumptions and properties, in particular in how they deal with the presence of censoring. It remains unclear, under what conditions and to what extent they are comparable. We present a comparison of several common measures and illustrate their behaviour in high-dimensional situations in simulation examples as well as in applications to real gene expression microarray data sets. An overview of available software implementations in R is given.

MeSH terms

  • Biostatistics*
  • Computer Simulation
  • Gene Expression Profiling / statistics & numerical data*
  • Humans
  • Lymphoma, Large B-Cell, Diffuse / genetics
  • Lymphoma, Large B-Cell, Diffuse / mortality
  • Oligonucleotide Array Sequence Analysis / statistics & numerical data*
  • Prognosis
  • Software
  • Survival Analysis*