Introduction

Clinical trials in Alzheimer’s disease (AD) are currently going through a difficult phase, as most of those aimed at modifying disease progression in patients who already have manifest dementia have failed. There is widespread consensus that, to overcome the current problems, future clinical trials will need to be performed earlier in the neurodegenerative process that ultimately leads to the manifestation of dementia in AD [1], in other words at the stage when patients typically present with mild cognitive impairment (MCI) [2, 3].

However, at the MCI stage of AD clinical and neuropsychological measures, such as the ADAS-cog scale and measures of activities of daily living, are subject to high measurement variation, while actual cognitive changes develop slowly and show considerable interindividual variability in progression rates. Because the use of “conversion to dementia” as an outcome variable in trials of patients with MCI requires very large samples and long study durations, it is difficult to achieve adequate study power. Indeed, there is a need for markers providing more objective and accurate measures of disease progression, as this would allow realistic sample sizes and study durations, and therefore, more powerful studies. Another possible strategy is that of enriching the study cohort of MCI patients with patients who are at particularly high risk of developing dementia, to obtain higher conversion rates and thus increase the power of the study [4, 5].

Strategies to improve study power require the use of biomarkers. In this regard, the imaging biomarkers hippocampus magnetic resonance imaging (MRI) morphometry, and positron emission tomography (PET) using either 18F-2-fluoro-2-deoxyglucose (FDG) or an amyloid tracer [6] appear particularly promising. Because of the increasing importance of biomarkers in clinical trials, regulatory authorities, in particular the FDA in the US and the EMA in Europe, are beginning to define requirements for the qualification of markers used in trials conducted for the licensing of new therapeutic drugs [7].

Formal requirements for the qualification of biomarkers are divided into three increasingly strict levels. Biomarkers may be used:

1. As covariates at baseline for covariate-adjusted (or subgroup) analysis to increase study power [8]. This approach has traditionally been used in medicine for post hoc exploratory analyses, which do not qualify for drug licensing, but it could also be integrated into study protocols for primary analysis.

2. For sample enrichment. The FDA and EMA have adopted procedures for qualification of biomarkers for this purpose, and the EMA recently published an opinion document on CSF markers and amyloid PET [9] in which it concluded that these techniques qualify (under certain circumstances) “to identify patients with a clinical diagnosis of mild to moderate AD who are at increased risk to have an underlying AD neuropathology, for the purposes of enriching a clinical trial population”.

3. As surrogate markers of outcome. Very strict requirements have been suggested to prevent the licensing of drugs that would improve some measured parameters while not actually improving patients’ health status. Current examples of this possible discrepancy are provided by trials with anti-amyloid agents in AD, which successfully removed some amyloid deposits, as demonstrated on amyloid PET scans, but did not improve dementia [10]. Thus, the required criteria include evidence from randomized trials showing that improvement in the surrogate endpoint consistently leads to improvement in the target outcome [6]. No imaging biomarker has yet met the regulatory agencies’ criteria for qualification as a surrogate marker of outcome in neurodegenerative diseases, but imaging biomarkers are accepted as secondary outcome parameters.

Complementary to these strict regulatory requirements for trials conducted for drug licensing purposes, biomarker research is also endeavoring to improve the design and efficiency of proof-of-concept trials (typically in phases 1 and 2), which serve to gauge the likelihood that a given intervention will be successful in classical phase 2 and phase 3 studies. Traditionally, pharmaceutical companies have relied heavily on animal studies to plan trials in humans, but it is increasingly being recognized that these studies are poor predictors of clinical responses in humans. Ultimately, successful prediction of which individuals will likely benefit from a particular intervention, based on theragnostic (also known as theranostic) markers [11] rather than the current approach of classification by clinical diagnosis, could make clinical trials as well as clinical practice much more effective.

The cerebral metabolic rate of glucose (CMRglc) measured by FDG PET has been established in many previous non-interventional studies as a marker related to synaptic dysfunction, which is the main pathophysiological deficit underlying cognitive impairment in AD [12]. It is therefore being considered as a candidate biomarker that could be used, in the context of clinical trials of AD, to select patients, to demonstrate pharmacological action in phase 2 studies [13], and to assess disease progression [14, 15]. It has also been used as an outcome marker in several interventional studies, often in patient subgroups from phase 2 and 3 studies [16]. In the present paper, we will review these data (Table 1) and discuss perspectives for the use of FDG PET as an imaging biomarker in future AD trials.

Table 1 Interventional studies

Acute metabolic effects in interventional studies

Most interventional studies using FDG PET as an outcome parameter are based on the concept that AD involves a deficit in synaptic activity and energy metabolism. Thus, improving the CMRglc (the brain’s key energy substrate) appears to be an attractive therapeutic goal, and most of the interventional studies using FDG PET (listed in Table 1) are based on this assumption. Actually, the reduction of regional cerebral glucose metabolism in AD may be not just a consequence, but also a possible cause of impaired synaptic function, due to a disturbance of neuronal insulin signal transduction [17, 18].

In a study of the peroxisome proliferator-activated receptor gamma agonist rosiglitazone, patients received four FDG PET scans in total, which made it possible to distinguish between acute and long-term metabolic effects on progression [19]. A consistent (although not significant) trend toward an initial increase in CMRglc in all brain regions, including the cerebellum, at the first measurement after 1 month was seen in verum patients only, followed by metabolic decline over 12 months in both groups. This increase was consistent with pharmacological expectations of this anti-diabetic drug, but the authors did not report corresponding clinical measures. Of note, the initial increase was seen only in the evaluation of CMRglc indices, but vanished after calculation of metabolic ratios relative to the cerebellum (because the initial increase in CMRglc was also present in the cerebellum).

An increase in global CMRglc was also observed in a small pilot study of nerve growth factor (NGF) produced by genetically transformed cells [20]. NGF gene therapy is still under investigation and has now reached trial phase 2 (NCT01608061).

One of the earliest interventional studies employing FDG PET also showed a pronounced increase in glucose metabolism in most cerebral cortical regions after only 2 weeks of treatment with the nootropic drug piracetam [21], but ultimately it could not be demonstrated that the drug actually improves synaptic function or cognitive deficits [22]. Thus, a global increase in CMRglc may fail to correspond to an improvement of cognitive function. Interestingly, a closely related compound, the anti-epileptic drug levetiracetam, was recently shown to ameliorate hippocampal hyperactivity during a memory task in patients with amnestic MCI [23].

In a phase 2 study of neotrofin [24], which is thought to improve memory and cognitive function by inducing expression of neurotrophic factors, no global changes, but regional increases as well as decreases in glucose metabolism were observed. Temporal and parietal changes tended to show a positive correlation with changes in memory while, unexpectedly, behavioral improvements were associated with a metabolic decline in frontal brain regions. Thus, there emerged no consistent relationship between metabolic and symptom changes, and the drug ultimately failed to show efficacy in phase 3 [25].

In a study of the non-selective phosphodiesterase inhibitor propentofylline, Mielke et al. [26] demonstrated a significant improvement of regional glucose metabolism under activation by an auditory recognition task after 3 months of treatment, while a decrease in the activation effect was observed in the placebo group. Resting-state CMRglc did not change significantly. Some clinical improvement was observed on a digit symbol test only. Thus, interpretation of the findings was difficult and the drug has not since been recognized as an effective treatment for dementia [27], but is still being explored for various other CNS disorders [28] and used in veterinary medicine.

Acetylcholine esterase inhibitors (ACHEIs) have proven efficacy to improve cognitive function in AD. FDG PET has played a role in demonstrating the action of tacrine [29] and metrifonate [30] as proof of principle for cholinergic agents, although these studies did not include a placebo group. Commercial development of these particular drugs was not pursued further as ACHEIs with a more favorable side effect profile, i.e. donepezil, rivastigmine and galantamine became available. Several FDG PET studies addressed the issue that not all patients actually experience substantial benefit when treated with ACHEIs. In some studies [31, 32], patients who responded to a drug with a substantial improvement of cognitive function also demonstrated a significant increase in regional CMRglc.

The metabolic effects of phenserine, an AChEI which is also supposed to inhibit the formation of amyloid precursor protein and to decrease A-beta amyloid deposition, were examined in 10 AD patients [33]. A significant increase in glucose metabolism was observed in frontal and temporoparietal cortices after 13 weeks, but was no longer present after 26 weeks. At all the time points, glucose metabolism in some regions correlated with a composite cognitive score and also with A-beta-40 levels in CSF and plasma. In spite of these positive findings, the drug has not since been developed into a commercial product [34].

An FDG PET study at the predementia stage with the anti-inflammatory drug celecoxib was conducted by Small et al. [35] in parallel with a large clinical prevention trial (ADAPT). While the clinical trial failed and the drug was abandoned because of cardiovascular side effects, the PET study indicated a regional metabolic increase in frontal brain regions corresponding to a clinical improvement in executive function, encouraging further research into anti-inflammatory strategies for the prevention of dementia.

Increasingly, non-pharmacological interventions are being tried in AD. A pilot study using deep brain stimulation of the fornix and hypothalamus [36] reported an increase in cerebral glucose metabolism in temporoparietal association areas that persisted for 1 year under stimulation. This increase corresponded to improved connectivity of memory networks and good outcomes in global cognition, memory and quality of life [37].

The correspondence between clinical effects and regional CMRglc changes in the majority of studies is encouraging. It supports the concept of using FDG PET as an efficient tool to demonstrate that a drug (or other intervention) actually has an effect on brain function in humans. This is particularly valuable at an early stage of its development and can be performed in a relatively small subject sample. The evidence of beneficial action is particularly strong if metabolic improvements are observed specifically in those regions (mostly temporoparietal and frontal association areas) that support the relevant cognitive functions. Global searches for any regional increases should be controlled by rigorous predefined statistical methods that prevent exploratory fishing for chance results. In retrospect, failures of drugs at phase 3 seem to have often been preceded by over-generous acceptance of “promising” phase 2 results based on shaky statistical and pharmacological reasoning.

Disease progression in interventional studies

Evidence for effects of disease progression was obtained in most studies lasting 26 weeks or longer. It was particularly obvious in studies with repeated FDG PET measurements, for instance in the already reported rosiglitazone study [19], which demonstrated similar regional decline in CMRglc in the verum and placebo groups, corresponding to progression of cognitive impairment. While this study showed a negative outcome for the drug, there are also multiple examples of studies demonstrating the expected progression in the placebo group in contrast to preserved metabolism under active treatment.

In a 24-week study of donepezil versus placebo in just 26 subjects, Tune et al. [38], considering the mean percentage change from baseline in regional CMRglc (normalized to pons), observed significant differences, in favor of donepezil, in the right parietal lobe, left temporal lobe, and frontal lobe bilaterally. Clinical outcomes [ADAS-cog, neuropsychiatric inventory (NPI)] showed the same tendency. Although the differences were not significant in this small study, larger trials proved the clinical efficacy of donepezil. Therefore, this study actually provides an example of FDG PET correctly demonstrating drug efficacy in a sample smaller than is needed when using clinical outcome parameters.

A detailed analysis of regional effects of rivastigmine over 52 weeks in 11 patients was published by Stefanova et al. [39]. Compared to a separate control group, treated patients showed significantly less CMRglc decline in temporoparietal association areas typically affected by AD and even an increase in some frontal regions and basal ganglia. Regional changes in a subgroup (n = 7) of patients who tolerated the highest dose (12 mg/day) of rivastigmine correlated with neuropsychological measures.

In the trial of phenserine [33], detailed above, the expected metabolic decline at week 26 was seen in the placebo but not in the active group. In this and the other studies listed above, it is impossible to distinguish between a sustained pharmacological effect of the drug on metabolism and a change in disease progression because there was no wash-out period before the last assessment.

The hypothesis that insulin insensitivity and the resulting metabolic deficit is a cause of AD prompted a 4-month trial of intranasal insulin [40]. To avoid a possible confounding effect from the acute action of insulin on glucose metabolism, all post-baseline outcome measures were obtained at least 12 h after dosing. While the expected decline in metabolism in frontal and parietal regions was observed in the placebo group, this was significantly reduced in the active treatment groups in a dose-dependent pattern. Correspondingly, the active treatment was associated with an improvement in delayed memory and preserved caregiver-rated functional ability.

In a study of memantine, which is an approved drug for dementia, Schmidt et al. [41] explored the feasibility of using FDG PET and other imaging biomarkers to detect a change in the progression of AD. They found the expected significant decline in glucose metabolism and increase of hippocampal atrophy over 1 year. There emerged a non-significant tendency toward 40 % less impairment on FDG PET and hippocampal atrophy with memantine compared to placebo, while the progression of total brain atrophy was similar in both groups. The authors concluded that a sample size of 202 would be required to detect a 40 % reduction in metabolic decline in a 1-year trial with a power of 80 %.

FDG PET is being used in multiple ongoing studies including novel or non-conventional drugs, such as the GLP-1 receptor stimulator liraglutide [42] (NCT01469351), ginkgo extract EGb761® (NCT00814346), and the tau protein inhibitor TRx0237 (NCT01689233).

Moreover, FDG PET has been employed in non-pharmacological interventions; a 6-month cognitive intervention study [43], which also included patients with amnestic MCI (aMCI), was performed with the aim of modifying the progression of disease. The authors actually found the strongest attenuation of metabolic decline, associated with an improvement in global cognitive status, in the aMCI subgroup, a finding that is consistent with the hypothesis that group-based multicomponent cognitive intervention would be most beneficial at this early stage of the disease.

Multi-center observational studies

Practically none of the published interventional studies used FDG PET as an inclusion criterion, even though the regulatory authorities are probably more willing to accept the use of imaging biomarkers as inclusion criteria than their use as outcome parameters. They could be used for a more precise diagnosis of AD, as suggested by the new criteria [44], and this application has actually been approved by the EMA for MRI morphometry and is being considered for CSF biomarkers [8]. However, the potential increase in trial efficacy at this stage of the disease may be quite limited because the improvement in diagnostic accuracy compared to detailed clinical and neuropsychological assessment would probably not be much greater than 10 % and brain damage at the stage of manifest dementia is largely irreversible.

It is hoped that this is not the case at the clinical stage of MCI. This issue has been the subject of several relatively large imaging multicenter observational studies over the past 15 years, and we therefore discuss their main results and the resulting perspectives for clinical trials.

An early observational multicenter study introducing a simple semi-quantitative ratio approach to test the diagnostic power of FDG PET in AD was conducted in Europe [45]. The study included 37 patients with probable AD and 34 healthy controls, and demonstrated a diagnostic discrimination accuracy of 95.8 %. This result also demonstrated the robustness of the method, as the study involved different scanners with different spatial resolutions. Quality control, image reconstruction and region placement were all conducted locally.

A larger study was then conducted as part of the European Network for Standardization of Dementia Diagnosis (NEST-DD, funded by EC framework 5 from 1998 to 2002). It involved 10 partners (including the Japanese PET group at the National Institute of Longevity Sciences) and enrolled 665 subjects retrospectively and 523 prospectively. Its main results concerned a series of areas: development and validation of an automated procedure for assessing the severity of metabolic deficits [46, 47], prediction of conversion from MCI to AD [48, 49], evaluation of multivariate techniques for image analysis [5055], description of the effects of apolipoprotein genotype [56], education [57], apathy and depression in AD [58] and FTD [59], and patients’ awareness of disease symptoms [60, 61]. A combined analysis including data from US-based groups was presented by Mosconi et al. [62].

Subsequently, in 2004 the challenge to develop imaging biomarkers was taken up by the American Alzheimer’s Disease Neuroimaging Initiative (ADNI) [63], with regional counterparts in various countries. Data are being collected prospectively according to a common protocol with central quality control that would also be suitable for drug trials. Volumetric MR with regular follow-up scans is performed in all patients, while FDG PET or amyloid PET is performed in patient subsets, therefore, also allowing comparison between techniques. The main emphasis is on inclusion of subjects at the MCI stage and their long-term follow-up until development of dementia to provide a basis for enriching clinical trial samples with patients at high risk of AD. Anonymous data are freely accessible to registered researchers, thus allowing an unprecedented breadth of analysis.

The results obtained have confirmed the diagnostic power and robustness of FDG PET in discriminating between AD patients and controls [64], and also the close association between metabolic and cognitive changes in MCI and AD patients [65]. Several groups addressed the issue of the relative strength of FDG PET compared to other biomarkers and the best combination of biomarkers to predict cognitive decline. Choo et al. [66], examining different combinations of demographic variables, cognitive tests and other markers, found the combination of parietal glucose metabolic rate and total tau to be the best predictor of AD progression, albeit without quantifying its accuracy. Another study suggested that including demographic variables and the ADAS-cog score alongside the three biomarkers (FDG PET, MRI and CSF) is the best model, as it reduced the misclassification rate by 40 % compared to clinical tests alone [67]. This study also showed that, of the three biomarkers, the FDG PET score added the most information to routine tests. Zhang et al. [68] approached the question using both longitudinal and multimodal biomarkers, predicting future cognitive decline from several previous measurements. Including measurements at several time points may have better prediction accuracy but this method would also raise the question of cost-effectiveness. A study by Yu et al. [69] examined the efficacy of biomarkers for enriching aMCI populations for clinical trials. As was to be expected, the best prediction accuracy (81 %) was achieved by a combination of all the biomarkers (MRI, FDG PET, and CSF), while the individual markers showed the following accuracy: MRI: 78 %, FDG PET: 68 %, and CSF: 65 %. When cost was also taken into account, they concluded that combining MRI, ApoE and cognitive scores was the best option.

The power of reduced glucose metabolism in the precuneus to predict subsequent progression was observed in subjects with subjective memory deficits at an even earlier stage of potential AD [70]. The most precise longitudinal data at the earliest stages of AD are those provided by the Dominantly Inherited Alzheimer Network (DIAN) [71], which demonstrates a decline in glucose metabolism in the precuneus as early as 20 years before onset of dementia, 5–10 years after the onset of amyloid deposition, and about 5 years before the earliest mild clinical symptoms. This cohort also provides a unique opportunity to study the prevention of dementia [72]. Interventions are planned in autosomal dominant AD mutation carriers using gantenerumab and solanezumab with FDG PET as a secondary outcome measure (NCT01760005).

These longitudinal observational studies and the corresponding finding of regionally declining metabolism in the placebo groups of clinical trials lasting 26 weeks or more strongly suggest that FDG PET is a robust and sensitive marker of AD progression, even at the earlier stages of the disease. Studies using FDG PET as an outcome marker require samples approximately 50 % smaller than those needed using the current standard clinical tool, ADAS-cog [73]. Estimated sample sizes per treatment arm for a 12-month study and a 25 % treatment effect vary considerably depending on underlying assumptions; they have been found to range from 100 to 400 for AD and 200 to 2000 for MCI [65, 74].

Methodological aspects

Many of the previous studies in this field have been investigator-initiated academic studies. Although the methods used for data acquisition and analysis were very heterogeneous, they have shown FDG PET to provide rather robust results. On the basis of these experiences and the ADNI study protocol, some consensus about the most appropriate procedures for clinical trials has emerged, and data collection and analysis can now be provided by commercial organizations [75].

Few FDG PET studies have been designed using classical, predefined confirmatory statistics with predefined regions of interest and few studies actually demonstrated a significant difference between verum and placebo groups [38]. A common approach is, indeed, to demonstrate a significant change in the active group, and no such significant change in the control group; however, this does not necessarily equate with a significant difference between the groups. Exploratory analyses are often used, typically employing statistical parametric mapping to maximize the chances of detecting interesting changes that could have been missed when using rigid predefined regions. This is reasonable at a very early stage of clinical drug development, when the main purpose of the investigation is to demonstrate the pharmacological action of the drug in humans. However, there is some evidence that liberal use of statistics may lead to over-optimistic evaluation of drug efficacy and thus contribute to avoidable late-phase failures in drug development.

There is little consensus on whether quantification of regional CMRglc in absolute values is required or whether simple FDG uptake with intensity normalization relative to a reference region is sufficient. Some studies [1921, 31] have demonstrated that drugs can cause global metabolic changes that would not be detectable by relative values. There is also evidence that CMRglc declines in all brain regions with progression of AD, leading to some underestimation of progression when using relative values. However, relative values tend to show less measurement-associated variation than absolute CMRglc measurements, which probably results in higher signal strength and study power. Classical techniques for absolute CMRglc measurements require arterial blood sampling, which is not practical in clinical drug trials, and substitute methods based on arterialized venous blood samples, or population-based or image-derived input functions, have not yet been standardized. The study by Tzimopoulou et al. [19] demonstrates that it is feasible to calculate indices of CMRglc in a multicenter trial, but the majority of studies utilized relative values (even when metabolic rates had been available).

The majority of studies were based on resting-state glucose metabolism, although a few used activation paradigms to activate metabolism [76] or looked at the difference between resting-state and active metabolism [26, 77]. The latter approach needs to take the psychophysiological response to the stimulation paradigm into account when analyzing the data. For instance, disease progression could make a task more difficult, thus requiring more metabolic resources, while it could also lead to a blunted response because of synaptic failure. In recent years, functional magnetic resonance imaging (fMRI) has largely superseded PET for the assessment of regional activation responses, including the acute effects of pharmacological intervention [78], while resting-state glucose metabolism and its changes under clinical drug application remain in the domain of PET. It will be interesting to see whether analysis of resting-state fluctuations by fMRI, which has led to the description of changes in specific large-scale neuronal networks [79], will provide results that are robust enough for use in drug trials.

Conclusions

FDG PET is now frequently used in clinical trials as a secondary outcome marker. It can provide information about pharmacological effects within a few weeks in relatively small samples (typically 6–20 subjects per group), which is particularly useful in early phase 2 of drug development. Care should be taken to establish whether the regional distribution of changes is consistent with the intervention’s proposed mechanism of action, to avoid being misled by non-specific changes in metabolic activity. In view of its ability to predict conversion to AD in patients with MCI, it could also be used for sample enrichment in phase 2/3 studies, but its relatively high cost compared to other indicators and current lack of regulatory approval for this purpose constitute considerable obstacles. There is ample evidence from clinical trials and from observational longitudinal studies that the decline in regional metabolism is closely linked to clinical progression, supporting the use of FDG PET to assess disease-modifying interventions. Significant effects have been observed as early as 6 months after the start of interventions, while the necessary sample sizes will depend heavily on disease stage, expected effect size and sample heterogeneity.