Abstract
PET was developed in the 1970s as an in vivo method to measure regional pathophysiologic processes. In the 1990s the focus moved to the detection of local increases in uptake, first in the brain (activation studies) and later in oncology (finding metastases), with 18F-FDG emerging as a highly sensitive staging technique. This focus on sensitivity has overshadowed the other main characteristic of PET, its quantitative nature. In recent years there has been a shift. PET is now seen as a promising tool for drug development and precision medicine—that is, a method to monitor or even predict response to therapy. Quantification is essential for precision medicine, but many studies today use simplified semiquantitative methods without properly validating them. This review provides several examples illustrating that simplified methods may lead to less accurate or even misleading results. Simplification is important for routine clinical practice, but finding the optimal balance between accuracy and simplicity requires careful studies. It is argued that the use of simplified approaches without proper validation not only may waste time and resources but also may raise ethical questions, especially in drug development studies.
Although there had been earlier attempts to develop PET scanners, a major step forward occurred in 1974 when the first PET scanner using a Fourier-based reconstruction algorithm, proper sampling, and exact attenuation correction was described (1,2). The final version, the PET III, was the first whole-body tomograph specifically designed for human studies (3,4). From this design came the first commercial tomograph, the ECAT (5), which was produced by EG&G Ortec, a company that specialized in nuclear physics measurement equipment. To modern eyes, these early scanners may look rather primitive, with single detector rings that contained large NaI(Tl) detectors and provided a spatial resolution of just below 2 cm in full width at half maximum. In addition, stepping motors were needed to move the detectors in both translational and rotational directions in order to obtain both reasonable spatial resolution and sufficient angular information for accurate reconstruction of radioactivity distributions. Nevertheless, at the time, these scanners were novel tools to measure human physiology in vivo, as is clearly illustrated by a series of early papers discussing general quantification issues that are still relevant (6–9). The development of PET as a molecular measurement technique involved efforts by many scientists from many different centers. A complete history of technical developments in PET is beyond the scope of this article but has recently been covered in a dedicated review (10).
NOTEWORTHY
The working principle should be “simplicity through complexity,” by which possible simple methods should be used but only after validation against fully quantitative, more complex methods.
The level of simplicity should depend on the underlying clinical or research question, finding the right balance between simplicity and accuracy.
Use of simplified scanning and data analysis protocols without proper validation may raise ethical questions, especially in drug development studies.
EARLY APPLICATIONS
Initially, PET studies focused on measurements of blood flow and metabolism, simply because of the availability of suitable tracers such as 15O-labeled water and gases and the glucose analog 18F-FDG (11,12). In fact, measurement of regional cerebral glucose metabolism with 18F-FDG was one of the very first examples of the use of PET measurements to map a physiologic process (12–14). Another example, from the early 1980s, is shown in Figure 1, which presents parametric images of cerebral blood flow (CBF), oxygen extraction fraction, cerebral oxygen utilization, and cerebral blood volume as derived from 15O-CO2, 15O-O2, and 11C-CO scans of two consecutive patients with a high-grade glioma (11,15). Despite the poor quality—according to today’s standards—of the images (acquired using an ECAT II with a spatial resolution of 16 mm in full width at half maximum), this quantitative example clearly illustrates that the underlying pathophysiology—in this case, CBF—can be completely different in two cases of the same disease. In addition, these quantitative parametric images illustrate that oxygen extraction fraction is lower in tumors than in normal brain irrespective of perfusion (16), a finding that still is not fully understood.
In the early days, PET was used primarily for studies of the brain and heart (17,18). In cardiology, mismatch between flow and metabolism actually became the most important diagnostic criterion in the assessment of myocardial viability (19). In neurology, the field progressed into two completely different directions. On the one hand, PET progressed to become a unique molecular imaging tool. The high sensitivity of PET and the development of an ever-increasing number of radiolabeled ligands made it possible to assess various neuroreceptor systems (20), an application that later also became a valuable tool in drug development (21). On the other hand, PET progressed into a more phenomenologic application, namely the detection of activated areas of CBF in the brain after a stimulus (22). These studies were performed using multiple 15O-H2O runs in a single scanning session. Initially, arterial blood sampling was used to quantify CBF (23). After a thorough assessment of the relationship between 15O-H2O uptake and CBF, however, the uptake interval was adjusted to obtain the optimal signal-to-noise ratio (as a shorter interval is related more to CBF whereas a longer interval allows for more counts and thus less noise). Studies were then performed without arterial sampling, often by normalization to uptake in whole brain (proportional scaling), as it was assumed that global CBF would not change between conditions (24). In most cases this approach was sufficient, as the main interest was the site of activation, not its magnitude. This example clearly illustrates how a thorough knowledge of the kinetics of a tracer allows scanning and analysis protocols to be optimally simplified for the clinical research question at hand.
THE SECOND WAVE
In the 1990s, when most scanners were still located in dedicated research centers, the use of PET in oncology was discovered. Although PET had already been used to study the pathophysiology of tumors, as illustrated in Figure 1, the possibility of performing whole-body scans was emerging (25). In view of the unsurpassed sensitivity of PET and the increased glycolytic rate of many tumors via the Warburg effect, PET became an indispensable method for staging. On an overall low level of background uptake, one needed only to search for unexpected areas of abnormal, high uptake. This development had an enormous impact on the field, as evidence began gathering rapidly of the important role PET could play in managing patient care, such as by indicating when local surgery would be futile given the presence of distant metastases (26). A further boost came from the development of PET/CT, which made it possible to relate functional findings to their exact anatomic locations (27).
Unfortunately, there was also a downside to this rapidly increasing interest in PET. Scanner manufacturers competed on the basis of image quality, and because of this so-called image beautification, the other key characteristic of PET—its quantitative nature—was somewhat neglected. For example, the iterative reconstruction algorithms that were implemented to improve image quality also could compromise quantitative accuracy, especially for low-count frames in a dynamic study (28).
THE NEED FOR QUANTIFICATION
Clearly, image quality is important if the main purpose is to find hot spots such as metastases. Currently, however, the focus has shifted toward precision medicine: monitoring response during therapy or, preferably, even assessing potential response before therapy (e.g., using radiolabeled drugs). As a result, there is renewed interest in quantification of tracer uptake. This interest is also apparent from the literature, which shows a steep rise in the use of such search terms as quantitative and quantification. Often, these quantitative claims are based on measurement of SUV (uptake normalized to injected dose and body weight) or, when a reference region is available, SUV ratio (SUVr). Yet, measuring uptake quantitatively is not the same as measuring a pathophysiologic process quantitatively.
To illustrate this point, Figures 2A and 2B show 11C-R116301 images reflecting neurokinin-1 receptor status (29) in a healthy human subject at baseline and after an oral dose of aprepitant, a neurokinin-1 antagonist. The images are expressed as SUV for the interval from 60 to 90 min after injection. If the dose of aprepitant is not known and one were to ask what level of occupancy had been achieved, what would be the logical answer? Presentation of this case at conferences, workshops, and courses over the last 2–3 years has shown that most people, including experienced PET scientists, estimate the level of occupancy at somewhere between 25% and 75%. In fact, however, the images represent a static portion (60–90 min after injection) of a dynamic scan. The dynamic scanning sequence also allows for calculation of nondisplaceable binding potential, or BPND (the ratio at equilibrium of specifically bound tracer to nondisplaceable tracer in tissue), on a voxel-by-voxel basis (30). The result of that calculation is shown in Figures 2C and 2D, parametric images revealing nearly complete occupancy (97%) in the striatum. Clearly, for an accurate assessment of the underlying receptor status, the parametric images are essential and the SUV images are, in fact, misleading.
So, why is there such a big difference between the SUV images and the parametric images of Figure 2? The SUV images show uptake at a certain time (60–90 min) after injection. Net uptake at any given time, however, is a complex interplay between delivery, uptake, retention, and clearance of the tracer. For example, increased uptake can be due to increased delivery (either increased plasma concentration or increased flow), decreased clearance, or a combination of these physiologic processes. From a single static scan, it is impossible to separate the various components that contribute to the total signal, such as specific binding, nonspecific binding, and free tracer in tissue. In contrast, with a dynamic scan it is possible to follow the kinetics (uptake, retention, clearance) of the tracer and to tease out the various individual components. Comparison of the SUV and parametric images of Figure 2 makes clear that all activity after aprepitant dosing was due entirely to nonspecific binding.
THE PRICE AND BENEFITS OF ACCURATE QUANTIFICATION
There are many reasons for not performing complex dynamic scans. First, for full quantification, dynamic scanning alone is not sufficient. The input function—that is, the metabolite-corrected arterial plasma input function—also needs to be measured, but such measurements require arterial cannulation, an invasive procedure with a risk of complications, albeit a small one. A potential solution is the use of image-derived input functions (31), but no generally applicable method is available yet. Second, dynamic scans typically last 60 min or sometimes even longer, and some patient motion is inevitable during such a long scan. Although several methods have been developed to correct for patient motion (32), they require additional processing time. Third, dynamic scanning, arterial sampling, and (especially) metabolite analysis are time consuming. Fourth, all these issues together mean that patient throughput is reduced compared with simple static scanning. The logical consequence is that quantitative dynamic scanning is substantially more expensive than qualitative or semiquantitative static scanning. So, why would one even consider performing complex dynamic studies? The answer to this question is given by the example of Figure 2. To show that this example is not an exception, a few more examples of common applications are provided below.
Receptor Occupancy
One application of PET in drug development, and possibly in precision medicine in the future, is measurement of receptor occupancy. For example, from PET studies it is known that occupancy of D2 receptors in the striatum by antipsychotic drugs has to be at least 65% to have any effect but should be less than 85% to avoid side effects (33,34). For a novel antipsychotic drug, therefore, PET can be used to determine the optimal dose—that is, the lowest dose for the level of occupancy needed. Determination of the optimal dose avoids overdosing, which is a genuine risk with the classic approach in which the initial dose in a trial is based on the toxicity profile of the drug under study. The beauty of PET is that the test–retest variability of quantitative parameters such as BPND often is in the range of 5%–10% (35–37) and that the optimal dose can therefore be found using only a limited number of scans. An early example is shown in Figure 3A, in which the optimal dose was established using data from only 8 healthy volunteers (21). In a follow-on study (38), the biologic half-life of the drug (binding to the receptor) was measured. Again, using a limited number of healthy volunteers, each one scanned at a different time after drug administration, it was established that therapeutic levels (i.e., receptor occupancy) could be maintained by administering the drug twice daily (Fig. 3B). Of course, these studies were less complex than other drug development studies, because data for 11C-raclopride can be analyzed using the simplified reference tissue model (39). In other words, neither arterial cannulation nor the associated labor-intensive metabolite measurements were needed. However, even if such measurements had been necessary, the costs of both PET studies together (and possibly similar studies on a relevant cohort of patients) are negligible compared with the costs of a clinical trial, and those PET studies guarantee that a subsequent trial can be performed using the most appropriate dose and dosing regimen of the drug.
Global Changes
Global changes provide another example in which absolute quantification is essential, even for diagnostic purposes. Figure 4 presents 15O-H2O–derived myocardial blood flow images at baseline and after adenosine-induced hyperemia in a healthy volunteer and a patient with triple-vessel disease. Both subjects show normal perfusion at rest, although baseline myocardial blood flow is somewhat higher in the patient. More importantly, however, the hyperemia scan of the patient appears to have a normal distribution, with the only indication of disease being that, globally, myocardial blood flow is much lower than in the healthy subject. A qualitative assessment of the hyperemia scan would have labeled this patient as having normal perfusion. For this application, the time of the entire procedure, including rest and stress dynamic 15O-H2O scans and CT angiography, is less than 1 h. The input function can be derived from the dynamic scan itself, and generation of the parametric myocardial blood flow images is essentially automatic (40).
Drug Targeting
An issue in both drug development and precision medicine is whether a certain drug reaches its target at sufficient concentrations. This issue can be addressed by labeling the drug. The treatment strategy can then be based on the level of uptake (or lack of it) in the lesion. One example of this principle is the use of tyrosine kinase inhibitors in lung tumors. It is known that only tumors with an activating mutation in the epidermal growth factor receptor will respond to therapy with tyrosine kinase inhibitors. It is, of course, possible and indeed routine practice to determine the mutational status through a biopsy, but biopsies are invasive and not feasible for all tumors. A study with 11C-erlotinib showed that uptake in tumors with an activating mutation was significantly (P < 0.016) higher than that in tumors with wild-type epidermal growth factor receptors (41), at least when the volume of distribution derived from full kinetic analysis was used (Fig. 5). In contrast, for the best simplified method based on a single static scan—in the case of 11C-erlotinib, the tumor-to-blood ratio 40–50 min after injection (42)—the difference between groups did not reach significance (P < 0.070) because of substantial overlap (Fig. 5), possibly stemming from the relatively large variability in the metabolic profile of 11C-erlotinib. In other words, within the context of drug development, the same answer can be obtained using smaller patient populations, not only compensating for the higher costs of a fully quantitative dynamic scan but also enabling smaller, more controllable trials and probably more definitive information. More importantly, if this method were to proceed to clinical practice (the right drug for the right patient), the quality of care would be substantially better with fully quantitative scans.
Amyloid Load
A final example is the use of amyloid imaging in Alzheimer disease, an application that is rapidly gaining popularity because it provides an in vivo method to establish the amyloid load in the brain. Clearly, for most diagnostic purposes (amyloid-positive vs. amyloid-negative), quantification will not be useful. For future therapeutic interventions, however, there is a need to identify patients at very early stages—that is, patients who would not be identified as clearly positive but who also are not entirely negative. Therefore, much effort is being put into quantifying uptake using SUVr, which essentially is the ratio of uptake in a target region (cortex) to uptake in a reference tissue (usually the cerebellum). This method is fine in principle, but it is now also being used in large clinical trials investigating novel antiamyloid therapies. That use of SUVr is questionable, as for most amyloid ligands SUVr has not been characterized well. The limitation in using SUVr for longitudinal studies has been demonstrated in one such study for 11C-Pittsburgh compound B (43), the results of which are summarized in Figure 6. In that study, no antiamyloid treatment was given and patients were followed for 2–4 y. Yet, SUVr showed a small but significant counterintuitive decrease in amyloid load, whereas BPND remained unchanged. In an attempt to explain these findings, simulation studies were performed. These simulations showed that the reduction in SUVr most likely was due to a decrease in CBF, a known phenomenon in Alzheimer disease. In contrast, BPND, derived from fully quantitative analysis (simplified reference tissue model), is independent of blood flow. Because a reduction in perfusion may result in delayed equilibrium conditions, SUVr, measured at a predefined time, may be affected to a degree that depends on the kinetics of the actual amyloid ligand. It should be noted that not only changes in perfusion but also changes in clearance rate may result in bias (44).
FORWARD TO THE PAST
The examples given here demonstrate that there are various applications in which dynamic scanning with full kinetic analysis is superior to semiquantification based on a static scan. These examples represent only a few of the many applications that could be mentioned. Unfortunately, over the 40 years that PET has been around, an increasing number of reports have been based on a qualitative or semiquantitative analysis of data without proper validation of the simplified analysis method being used. Obviously, the temptation to use these shortcuts is strong. Apart from increasing patient throughput, these shortcuts may facilitate publication. Publication of a paper that is based on exciting images with corresponding simplified semiquantitative analyses such as SUV may be easier and much faster than publication of a paper that is based on a thorough, fully quantitative study, especially if that study also shows that the semiquantitative indices (i.e., images) have potential limitations. Clearly, this attitude is promoted by the present pressure placed on scientists by citation indices such as the h-index. Science, however, is more than populism, and there is an urgent need to return to quantification as the basis of PET imaging—that is, ahead to the past, when this was the common approach. We should have nothing against simplified methods for routine clinical applications, but these methods should be validated before being used to draw otherwise potentially misleading conclusions. In other words, the working principle should be “simplicity through complexity,” or aiming to use simple methods for clinical applications but only after they have been validated using fully quantitative, more complex methods.
FINDING THE RIGHT BALANCE
Having made a plea for quantification, one should not also be overly dogmatic. Clearly, many applications require only limited or even no quantification. For example, for both staging in oncology and assessment of amyloid status in the memory clinic, a visual read of the scans usually suffices. There are also applications for which simplified analytic methods are more than adequate (e.g., 18F-FDG SUV for monitoring response to classic chemotherapy). The issue is that such a simplification should first be validated for the specific application (e.g., 18F-FDG SUV for monitoring response to novel biologicals may not necessarily be valid) (45), taking into account what the purpose of the study is. If different validated methods of analysis and acquisition are available, the level of simplicity should depend on the underlying clinical or research question. The aim should be to find the method that provides the maximum level of simplicity without compromising accuracy, that is, the capability to measure a difference (from normality or in a longitudinal sense) that is clinically relevant.
ETHICAL CONSIDERATIONS
Finally, in the debate about quantitative versus semiquantitative studies, little attention has been paid to ethical issues. Of course, everybody will agree that methods providing incorrect or misleading results should be avoided, especially if they are used for clinical decision making, in which case they could actually be harmful. In that sense, it is strange that so many shortcuts are being made given that, without validation, incorrect results cannot be excluded. Another issue is that in clinical trials the number of patients required can be reduced if more accurate techniques are used (as in the 11C-erlotinib example). In addition, in amyloid imaging trials, it is known that the test–retest variability of BPND is better than that of SUVr (43), again implying that with a fully quantitative method fewer patients need to be enrolled. That also means that fewer patients will undergo the entire study protocol with an experimental drug that may have some degree of toxicity and may not be effective. Even from a radiation protection point of view, it also means that fewer patients and, possibly, healthy volunteers will be exposed to the radiation associated with PET scans. Taking these considerations together, one could argue that use of a simplified scanning and data analysis protocol without proper validation not only may waste time and resources but also may raise ethical questions, especially in drug development studies.
DISCLOSURE
No potential conflict of interest relevant to this article was reported.
Footnotes
Published online May 18, 2017.
- © 2017 by the Society of Nuclear Medicine and Molecular Imaging.
REFERENCES
- Received for publication March 15, 2017.
- Accepted for publication May 11, 2017.