Introduction

Combined PET and CT (PET/CT) imaging of 90Y microspheres is fast becoming part of the routine protocol to confirm accurate delivery of radionuclide therapy to tumours in the liver after radioembolization. Clinical affirmation of the PET/CT imaging technique was first published in 2009 [1], and relies on the minute positron branching ratio (with probability 31.86 ± 0.47 × 10−6 [2]) as a result of pair production following de-excitation from the 0+excited state of 90Zr [3]. Since that time its clinical use has grown steadily, ranging from confirmation of radionuclide targeting and absence of extrahepatic uptake [4, 5], to activity quantification for dosimetry [611].

Whilst 90Y PET/CT is a desirable tool for assessment of the efficacy of the radioembolization procedure, there is much that is not well understood about the effects of the physical decay characteristics on the imaging and reconstruction process. This includes the impact of the low true coincidence counting rate due to the low positron branching ratio which results in noisy image data. In addition, such a low true coincidence rate means that the prompt gamma emissions from the natural 176Lu in LSO/LYSO crystals of certain PET scanners cannot necessarily be ignored, as is the case with typical PET radionuclides which have true coincidence rates that are orders of magnitude greater. Furthermore, the large flux of bremsstrahlung photons from the dominant beta decay mode of 90Y results in a singles count rate that exceeds the true coincidence count rate by a large factor, which was originally thought to have potential for detector saturation when high amounts of 90Y activity are imaged [1], although this has not been found in more recent investigations (for example [7, 4, 12]). The additional bremsstrahlung photons and prompt gammas result in a very high random fraction when imaging 90Y, seen to be in the order of 80 % at our institution, compared to a typical FDG scan of 30 – 40 %. Combined with problematic scatter modelling for such low count data, this typically results in very noisy true coincidence sinograms following the subtraction of both scatter and random events, which will ultimately affect both the qualitative and quantitative aspect of the reconstruction.

Postradioembolization 90Y PET/CT has the potential to allow an improved understanding of the absorbed dose–response relationship on a cancer-specific basis, information which may be used in the future to tailor treatments specifically to the individual. In order to establish a meaningful association between absorbed dose and response, a large-scale study is necessary, recruiting significant numbers of patients who are typically not available from any one site. Such a multicentre trial relies heavily on the comparability of intersite data, which relies on the quantitative accuracy and comparability of the imaging equipment itself [13]. The idea of harmonization of the image acquisition and analysis approach to establish intersite compatibility for multicentre trials based on initial phantom studies has been explored in the literature (for example [14, 15]). Makris et al. [16] found that the standard NEMA NU-2 image quality phantom is ideal for intersite testing and looking for differences in quantitative concentration measures, and that comparison of the quantitative accuracy of 18F imaging is better achieved using an average concentration measure across a volume, as opposed to a maximum. Geworski et al. [17] found that errors in FDG standardized uptake value (SUV) measurement (performed by a single observer) across multisite PET scanners using a uniformly filled phantom were below 10 % in 15 out of 19 tested scanners (3D imaging), in agreement with the findings of Park et al. [18] who also derived SUV calibration values for each system which could be applied to intersite compatibility of measures, with a maximum reported variation corresponding to a calibration factor of 1.24 (i.e. a 24 % variation in measurement). Whilst a number of phantom studies have been performed with 90Y on current generation scanners [12, 1923], to date there are no data to suggest that quantification estimates from all PET scanners are optimized and accurate (particularly when compared with known scanner performance with FDG), or that they are consistent across different generations and vendors, so as to offer comparable data in a trial setting.

The objective of this study was to investigate and compare the quantitative accuracy of 90Y PET/CT imaging on a large number of scanners from multiple sites, with the specific intention of moving towards a uniform approach in the setting of a large-scale clinical trial to establish the absorbed dose–response relationship following radioembolization with 90Y SIR-Spheres microspheres (Sirtex, Sydney, Australia) for liver cancer. As such, this report represents the preclinical assessment phase of a larger collaboration led by The University of Sydney, The Royal North Shore Hospital, and Sirtex (known as QUEST—Quantitative Uptake Evaluation in SIR-Spheres Therapy).

Materials and methods

Data were acquired on a variety of PET scanners (Table 1) from the major vendors, with a number of different reconstructions from systems equipped both with and without time of flight (ToF) and resolution recovery (RR). Each site followed an identical experimental protocol utilizing the NEMA 2007/IEC 2008 PET Body Phantom (Data Spectrum Corporation, NC), with a volume of about 10 L containing a “cold” (nonradioactive) solid insert (diameter 51 mm) and six fillable spheres of various diameters (∅ 10, 13, 17, 22, 28 and 37 mm) filled to an approximate eight-to-one sphere-to-background ratio with 90Y-chloride (YCl3) provided in a constant specific activity (PerkinElmer, Waltham, MA).

Table 1 The scanners contributing data to the study according to vendor and model (all scanners equipped with standard reconstruction corrections for attenuation, scatter and random events)

Each site was required to measure the phantom volume and the delivered 90Y solution in the departmental dose calibrator for comparison with the shipping certificate. The entire delivery vial was added to a volume of 1,300 ml, and this solution was used to fill the phantom spheres, before the reminder of solution was added to the background compartment, with the addition of EDTA/DTPA to the contents to prevent the YCl3 sticking to the phantom walls. This allowed an eight-to-one sphere-to-background ratio, in keeping with the NEMA NU 2-2007 [24] image quality guidelines, and was thought to require minimal phantom manipulation and activity handling at sites. Residual in the vial was estimated through re-measuring the vial in the dose calibrator after reconstitution to the initial volume with water. Residual in the needle and syringe was taken as negligible.

Imaging and reconstruction

After filling with [90Y]YCl3 according to the instructions supplied, the phantom was imaged on days 0, 3, 5 and 7, during which time the total activity decayed from 3 GBq to 0.5 GBq, thus covering the recommended activity range for therapy prescribed in the SIR-Spheres package insert formula. This was done to assess scanner performance under different rates of photon fluence, and to assess the impact of background radiation from 176Lu present in current generation (LSO/LYSO) detector crystals at lower counting rates. Imaging consisted of two overlapping bed positions to mitigate the triangular axial sensitivity profile of the scanner, each of 15 – 20 min duration, in 3D mode. Where 90Y was not available as a radionuclide selection in the acquisition software a long-lived isotope was selected (e.g. 22Na) to avoid any scanner decay correction and data were quantified after reconstruction by taking into account the ratio of the positron branching ratios of 90Y and the acquisition radionuclide. No additional sensitivity measures were required.

Sites were encouraged to use reconstruction parameters that had proven successful in their own 90Y experience, with all available corrections (scatter, attenuation, random coincidences, ToF and RR where available). Following day 7 of imaging a radiographic contrast agent was added to the background compartment of the phantom and a CT study performed to aid in image segmentation for volume definition of the fillable spheres.

Image analysis

Data were transferred in DICOM format via a secure data server (ABX-CRO Advanced Pharmaceutical Services, Dresden, Germany) to the core laboratory in Sydney (Royal North Shore Hospital, Sydney, Australia) for consistent analysis. All analyses were performed by a single operator (K.W.) on a dedicated nuclear medicine workstation (HERMES; Nuclear Diagnostics, Stockholm, Sweden) using in-house software written in IDL (Exelis Visual Information Solutions, Boulder, CO). For the quantitative assessment the shipping certificate indicating the amount of 90Y in the initial vial as determined by the supplier was treated as the gold standard, and the fraction of residual measured in the vial during the experimental procedure was regarded as the total residual (possible residual in the syringe, needle, beaker etc. was considered negligible due to the difficulty in reliable measurement of 90Y in the dose calibrator). These measures, together with measured phantom volume, were used to derive the true concentration and activity in the phantom at each imaging time-point. The uncertainty in ‘true’ estimates of activity and concentration in the phantom was taken to be ±10 %, a combination of possible volume measurement error (<1 %) and uncertainty in the calibration of activity in the delivery vial.

The coregistration of the reconstructed PET data from all imaging days with the contrast-enhanced CT study was confirmed and the CT data were used to segment the six fillable spheres as 3D volumes of interest (VOIs) using a semiautomated region-growing algorithm to delineate the physical sphere volume. Quantitative accuracy was assessed at each imaging time-point by measurement of:

  • Total activity in the reconstructed field of view (FoV) as an indicator of total injected activity.

  • Background concentration, following the NEMA NU 2-2007 guidelines (Fig. 1).

    Fig. 1
    figure 1

    Transverse CT slice of a phantom showing segmented hot sphere VOIs (white), cold insert ROI, and 60 background ROIs corresponding to each sphere diameter as described in NEMA NU 2-2007

  • Mean concentration for each of the CT-defined spherical VOIs (Fig. 1) and the associated recovery coefficient (RC) to assess partial volume effects (PVEs) on the day-0 data, defined as:

    $$ RC\left(\%\right)=\frac{\mathrm{Measured}\;\mathrm{Concentration}}{\mathrm{True}\;\mathrm{Concentration}}\times 100 $$
    (1)

    Lines of best fit (y = 100 − ae(−bx)) for recovered concentration were compared with the curve obtained from repetition of an identical phantom procedure using 18F (Siemens mCT Biograph PET/CT, 3i21s 5-mm gaussian RR + ToF) processed in an identical manner to the experimental 90Y data to generate reference RCs.

    The change in recovery of the largest diameter hot sphere, least affected by PVEs, over the range of imaging days was assessed for consistency of recovery with deteriorating count statistics.

  • Counts incorrectly misplaced in the central cold insert, assessed as the mean of counts in a central ROI replicated across five transverse slices (Fig. 1) as a percentage of true background concentration.

  • Background variability (BV), in keeping with the recognized NEMA NU 2-2007 measure of image quality, was also explored as an indication of potential variation in background concentration measures as a result of poor image signal-to-noise ratio, defined as:

    $$ B{V}_s=\frac{STDE{V}_{B,s}}{C_{B,s}} $$
    (2)

    where C B,s is the average of the 60 background ROI counts for sphere size s, and STDEV B,s is the standard deviation of the background ROI counts for sphere size s.

Given the number of contributing scanners and variations in submitted reconstruction parameters, analysis of the data was stratified by averaging the results according to the categories listed in Table 2.

Table 2 Reconstructions that contributed to the study categorized according to ToF and non-ToF systems from specific vendors

Validation of methodology

At one site three consecutive scans were performed on the same phantom with the same scanner (Siemens mCT Biograph) using identical image acquisition and reconstruction parameters. Consistency in the above measures between the three scans was assessed to indicate uncertainty that might be expected due to random noise and variations. Furthermore, at a single site a lengthy 8-h acquisition (GE Healthcare Discovery 690, 90Y acquisition isotope) of the phantom was performed in addition to the standard 40-min acquisition, the data from which were used to look for any improvements with increased count statistics.

In addition, two datasets—one phantom study with 18F (to act as a reference, performed on a Siemens Biograph mCT, reconstructed using 3i21s ToF + RR and a 5-mm gaussian filter) and one phantom study with 90Y (performed on a GE Healthcare Discovery 690, quantified by the scanner, (i.e. performed with 90Y as the acquisition isotope, and reconstructed with 3i18s ToF and an all-pass filter)—were analysed by a physicist at an independent institution not involved in the study. Background concentration and hot sphere concentrations and recovery were measured using the following three software packages for comparison with the in-house QUEST method:

  1. Method (a)

    An in-house ImageJ plug-in (NIH, Bethesda, MD) that uses NEMA guidelines to measure background concentration and the mean of a 50 % threshold-generated VOI to measure hot sphere recovery [25].

  2. Method (b)

    Software provided as part of the European Association of Nuclear Medicine Research Ltd (EARL) initiative for standard image quality assessment of background concentration and hot sphere recovery as measured through threshold-derived VOIs on the central slice of the PET images [26].

  3. Method (c)

    The ROVER package (ABX-CRO Advanced Pharmaceutical Services, Dresden, Germany) which again applies a 50 % growing algorithm to generate a VOI for hot sphere recovery measurement, and measures background concentration as the mean of two generated background VOIs, all of which are manually placed by the user.

Results

A total of 47 centres from 13 countries contributed data to the study. The average total activity in the phantom at the first imaging time-point was 3.26 GBq, with a standard deviation of 0.26 GBq (8 %).

Dose calibrator measures

The average absolute difference between an individual site’s measured 90Y activity in the delivery vial in the local dose calibrator and the vendor-supplied calibration certificate, decay-corrected to the same time-point, was 5 %, with a measured range of −4 – +25 %, and a median of +2.5 %.

Quantitative assessment

The accuracy of total activity measured in the FoV and the measured concentration of activity in the phantom background at each imaging time-point are shown in Figs. 2 and 3, respectively. Values are expressed as the percentage difference between the measured and expected values, where each measured value is the mean for a given category (note that the number of measured data that underlies these measured values does vary between scanner and reconstruction methods). The standard deviations of measures are shown as error bars (thus representing the combined inter-site variability and measurement error at consistent reconstruction parameters) and a general ±10 % tolerance is represented by the shaded region (representing expected uncertainty in ‘true’ values). For the Siemens non-ToF systems, only ‘PROMPTS + RANDOMS’ mode acquisitions were included (Fig. 2f) because acquisitions in ‘NETTRUES’ mode resulted in extremely large overestimates when quantification was performed through rescaling of the acquisition branching ratio, and the reconstructed data were therefore normalized to the actual total activity in the phantom, such that estimates of total activity were not meaningful. Despite this post hoc normalization approach not being ideal in a clinical scenario (due to the difficulty in measuring residual in the delivery apparatus and the potential for stasis to be reached during treatment), it was explored for comparison purposes in this controlled phantom study.

Fig. 2
figure 2

Differences in measured total activity in the FoV with respect to the expected total activity over all four imaging time points for (a) GE Healthcare ToF systems (N = 21), (b) GE Healthcare non-ToF systems (N = 23), (c) Philips ToF systems (N = 8), (d) Philips non-ToF systems (N = 1), (e) Siemens ToF systems (N = 28), (f) Siemens non-ToF reconstructions (N = 28, including only ‘PROMPTS + RANDOMS’ mode acquisitions, see explanation in text). Each datum is the mean for all scanners and the error bars represent one standard deviation (SD) of the measured values

Fig. 3
figure 3

Differences in measured background concentration with respect to the true background concentration for (a) GE Healthcare ToF systems, (b) GE Healthcare non-ToF systems, (c) Philips ToF systems, (d) Philips non-ToF systems, (e) Siemens ToF systems, (f) Siemens non-ToF systems (where +RAN and -RAN correspond to data acquired in ‘PROMTS + RANDOMS’ and ‘NETTRUES’ mode, respectively, and where -RAN was normalized for analysis)

Current generation GE Healthcare and Siemens ToF systems with RR and an all-pass filter produced acceptable estimates (within ±10 %) of total activity and background concentration over the range 0.5 – 3 GBq and 50 – 300 kBq/ml, respectively, with improvements seen in Siemens systems when reconstructing with two or three iterations. No evidence of detector saturation was seen, in agreement with the literature [10, 12, 19, 20]. The non-ToF Siemens systems gave accurate estimates of background concentration when acquired in ‘PROMPTS + RANDOMS’ mode. Both BGO and LYSO non-ToF GE Healthcare systems showed similar behaviour, including overestimates of activity at levels below 1.5 GBq and estimates of background above 100 kBq/ml within 15 %. Philips Gemini ToF systems appeared to underestimate total activity at levels below 3 GBq and background concentrations below 300 kBq/ml, whilst large overestimates across the entire range were seen on the 3D RAMLA reconstruction. For high activity levels (about 3 GBq) in the FoV, all scanners were capable of producing satisfactory estimates, presumably due to the improved count statistics which allowed improved scatter modelling and reduced effect of randoms subtraction.

Recovery of activity concentration measured in the hot spheres on day 0 of imaging is shown in Fig. 4, and the change in this recovery for the 37-mm diameter hot sphere over different days of imaging in Fig. 5. All ToF systems demonstrated comparable recovery of concentration in hot spheres (note that at this day-0 imaging time-point these reconstructions also had comparable measures of background concentration), and in all cases this was inferior to that achieved with 18F. Postreconstruction gaussian filtering resulted in a decrease in recovery due to smoothing of the activity concentration outside the geometrical volume. All systems demonstrated a steady decline in both 90Y and 18F recovery for spheres with a diameter below 37 mm due to PVE. The Siemens ToF reconstructions with two or three iterations were again superior to a single iteration. Whilst Siemens non-ToF data acquired in ‘PROMPTS + RANDOMS’ mode demonstrated improvement over ‘NETTRUES’ mode, with recovery similar to the GE Healthcare non-ToF reconstructions, the non-ToF systems generally achieved poorer recovery. All 90Y data suffered from underestimates in the range of 10 – 20 % of the true activity concentration of even the largest volume sphere (∅ 37 mm), a finding consistent with independent analyses of data (see Fig. 8). In ToF systems recovery underestimates for the largest hot sphere over all days of imaging gradually deteriorated (Fig. 5). This may have been due to the influence of the 176Lu present in the detector crystals, which has been suggested to affect low count studies [19]. Non-ToF systems demonstrated some variation, with a slightly better recovery with the BGO system and RAMLA at lower concentrations, which may have been due to deteriorating noise and associated spurious high counts in voxels.

Fig. 4
figure 4

Lines of best fit (y = 100 − ae(−bx)) for recovered concentrations in hot spheres of various diameters on day-0 imaging for (a) GE Healthcare ToF systems (R 2 = 0.94 – 0.98), (b) GE Healthcare non-ToF systems (R 2 = 0.90 – 0.99), (c) Philips ToF systems (R 2 = 0.97 – 0.98), (d) Philips non-ToF systems (R 2 = 0.90 – 0.96), (e) Siemens ToF systems (R 2 = 0.80 – 0.99), (f) Siemens non-ToF systems (R 2 = 0.94 – 0.97) (where +RAN and -RAN correspond to data acquired in ‘PROMTS + RANDOMS’ and ‘NETTRUES’ mode, respectively, and where -RAN was normalized for analysis). The black line of reference is the recovery curve for 18F derived from the same experiment

Fig. 5
figure 5

Lines of best fit (y = a + bx) for recovered concentrations in the largest hot sphere at different concentrations for (a) GE Healthcare ToF systems (R 2 = 0.54 – 0.87), (b) GE Healthcare non-ToF systems (R 2 = 0.31 – 0.93), (c) Philips ToF systems (R 2 = 0.55 – 0.88), (d) Philips non-ToF systems (R 2 = 0.47), (e) Siemens ToF systems (R 2 = 0.21 – 0.75), (f) Siemens non-ToF systems (R 2 = 0.77 – 0.99) (where +RAN and -RAN correspond to data acquired in ‘PROMTS + RANDOMS’ and ‘NETTRUES’ mode, respectively, and where -RAN was normalized for analysis)

The activity concentrations measured in the cold insert are displayed as percentages of true background concentrations on different days of imaging in Fig. 6. All ToF systems exhibited similar behaviour (on average about 30 % of background) and in general non-ToF systems measured far greater scattered events in the cold insert (on average about 60 % of background). For the GE Healthcare non-ToF systems this was reduced in BGO scanners, perhaps due to the absence of background counts from the natural 176Lu in the crystals.

Fig. 6
figure 6

Measured activity concentrations in the cold insert as percentages of the true background concentrations at different concentrations for (a) GE Healthcare ToF systems, (b) GE Healthcare non-ToF systems, (c) Philips ToF systems, (d) Philips non-ToF systems, (e) Siemens ToF systems, (f) Siemens non-ToF systems (where +RAN and -RAN correspond to data acquired in ‘PROMTS + RANDOMS’ and ‘NETTRUES’ mode, respectively, and where -RAN was normalized for analysis). This measure predominantly reflects the accuracy of scatter and randoms corrections

Figure 7 shows the BV measured on day 0 using regions of various diameters corresponding to the diameters of each hot sphere. As expected, BV was improved with postreconstruction gaussian filtering and deteriorated with increasing numbers of iterations and associated noise. The ToF systems from all three vendors displayed similar behaviour, with slightly better data from the Philips system perhaps due to the noise suppression properties of the BLOB OS TF algorithm. The non-ToF GE Healthcare BGO systems displayed significantly poorer results than their LYSO counterparts. This may have been due to the larger coincidence timing window associated with BGO, which increases the random coincidence rate and may further increase noise in the reconstructed images.

Fig. 7
figure 7

Background variability for different region diameters for (a) GE Healthcare ToF systems, (b) GE Healthcare non-ToF systems (note different scale on the y-axis), (c) Philips ToF systems, (d) Philips non-ToF systems, (e) Siemens ToF systems, (f) Siemens non-ToF systems (where +RAN and -RAN correspond to data acquired in ‘PROMTS + RANDOMS’ and ‘NETTRUES’ mode, respectively, and where -RAN was normalized for analysis)

As a measure of repeatability, the standard deviations between quantitative measures from three consecutive scans with an identical phantom, acquisition and reconstruction protocol are shown in Table 3. The total measured activity remained constant and a change in measured concentration in the order of 10 % was seen. The change in quantitative measures with the 8-h acquisition (single bed position) was within 10 % of that with the standard 40-min (two bed positions; Table 3), except for measures of misplaced events in the cold insert, where the improved counting statistics increased the cold contrast ratio by 20 %. This may have been due to better estimates of scatter prior to subtraction.

Table 3 Standard deviations between quantitative measures from identical processing of three consecutive scans performed on a Siemens mCT scanner (as percentages of true values), and the differences between quantitative measures from identical reconstructions of consecutive scans of 40-min duration and 8-h duration on a GE Healthcare Discovery 690 system (as percentages of those measured in the 8-h acquisition)

Comparison of the QUEST methodology with independent analysis using other software for a sample 18F and 90Y dataset are given in Table 4 for measures of background concentration and in Fig. 8 for measures of hot sphere recovery. All methods performed consistently when measuring both 18F and 90Y concentration data and demonstrated similar trends in underestimation in recovery curves. The ROVER package, method (c), produced slightly different measures of background concentration, most likely due to the use of a large VOI as opposed to multiple ROIs, and minor variations in recovery curves can be attributed to the method of VOI generation.

Table 4 Differences between measured and true values of background concentration for each of the analysis methods, represented as percentages of the true values
Fig. 8
figure 8

Recovered concentrations for hot spheres of various diameters using four software analysis methods for both 18F data (a) and 90Y data (b)

Discussion

It should be recognized that findings regarding scanner performance discussed in this work are not applicable to 18F imaging, only 90Y, and as such are not a reflection of scanner behaviour for the vast majority of clinical PET applications. Comparable and efficient scanner performance has been reported in the literature for qualitative and quantitative 18F imaging aspects for all vendors for example [2729]).

The experimental protocol was chosen to cover a clinically realistic range of activities for resin microspheres, where a standard administration of 90Y SIR-Spheres for radioembolization is of the order of 1.6 GBq [30]. Given the diversity in ‘typical’ liver size and tumour burden, nonuniform deposition of microspheres, and the large differences in tumour targeting resulting from the superselective radioembolization procedure, pinpointing a representative concentration in background and hot spheres of the phantom to correspond to patient liver and tumour uptake is not straightforward. The literature expresses large differences in this respect, with tested sphere-to-background ratios ranging from 3:1 [19, 31] to 40:1 [12], and associated background concentrations from as low as 37 kBq/ml to as high as 470 kBq/ml [20]. Given the larger volume of the phantom compared to a human liver, a clinical scenario is thought to lie towards the higher end of the count spectrum explored in this study.

The achievable measures of total activity to within 10 % of expected values when using optimized reconstruction parameters on two out of three tested types of ToF systems implies the suitability of clinical 90Y PET for confirmation of delivered activity after radioembolization. This may be particularly useful when stasis is reached during administration, before the entire prescribed amount of microspheres has been implanted. This is also true of background activity concentration measures, which translates to absorbed dose estimates in nontarget liver. However, Fig. 6 implies that 20 – 40 % of this background level could be measured in adjacent true cold regions, which may lead to overestimation of absorbed dose in healthy liver regions that are devoid of any activity deposition. The difficulty in determining the existence of scatter and noise in reconstructed 90Y PET data versus true nontarget activity deposition in background regions was investigated by Kao et al. [5], with recommendation for qualitative assessment to rely on the pattern of uptake and conformation with underlying anatomy for extrahepatic queries, as opposed to relying on visual intensity.

Figures 2 and 3 highlight an apparent difference in behaviour between ToF systems from the different vendors. During discussions with the vendor Philips, it was suggested that the large underestimates at low count rates seen in the Gemini ToF reconstructions may have been due to the scatter correction algorithm used. Specifically, the magnitude of the scatter component may be underestimated at low count rates due to the fact that any negative pixels in the scatter subtracted sinogram are zeroed prior to subtraction (positivity constraint on the reconstruction algorithm), as demonstrated in the RAMLA reconstructed data (Fig. 3d). The current generation Philips ToF systems use the same approach to approximate the final scatter estimate which is then incorporated into list mode iterative reconstruction. As such, at the last iteration of ToF reconstruction, the scatter is estimated from the scatter under-corrected emission data (RAMLA results), resulting in an erroneously high scatter contribution, and hence leading to lower ToF emission counts at these low count rates. It should be noted that successful quantification of 90Y on Philips ToF PET scanners has been demonstrated in the literature, using a different approach to quantification that relies on a measured scanner-specific sensitivity factor [22]. This approach may be desirable on a single-site basis, but for the purpose of this work (as a precursor to a multicentre evaluation) it was not an ideal method due to the need for additional experimental work and the inability for all sites to have an identical approach to quantification.

There is an evident difference in both image quality and quantification when comparing scanners with and without ToF and RR. There is also a consistent underestimation in all quantitative measures in hot spheres, seen even at long acquisition times (Table 3), and under circumstances of accurate background quantification. This is most likely related to the excessive random and scatter events and low count rates when imaging 90Y, and the way in which the reconstruction algorithm deals with this, and is not evident for 18F data which benefits from a true count rate that is orders of magnitude greater than that of 90Y. Iterative reconstruction algorithms recover low frequency or background events first, implying that higher iterations are needed for accurate recovery of small hot objects. However, given the noise and low signal present in 90Y PET, higher iterations are not a practical solution, and this is demonstrated in Fig. 4 by the lack of an obvious improvement with an increasing number of iterations, and is in agreement with the literature [12, 20, 32].

Furthermore, a longstanding problem in PET reconstruction is the bias introduced by the necessity to remove negative sinogram values (which become zeroed) following correction for random coincidences, in order to satisfy the assumption of a Poisson distribution which is the basis for expectation-maximization-based reconstructions, such as OSEM. This bias does not have a significant impact for the vast majority of clinical PET scanning. In the case of 90Y where extremely low count rates are observed in the setting of high random coincidences, the bias becomes greater. In contrast, it is known that as the fraction of random coincidences increases, the gain in signal to noise ratio associated with ToF increases [33], implying that current generation scanners that employ ToF would be more suited to imaging 90Y than previous generation scanners, an assumption that seems to be verified by these results. In addition, it is known that iterative reconstruction converges faster with the use of ToF [27]. These findings are also supported by other publications [12, 19, 34].

The importance of the treatment of random coincidences is well demonstrated in the Siemens data. The significant differences in quantification between non-ToF acquisitions in ‘PROMPTS + RANDOMS’ mode versus ‘NETTRUES’ mode is due to the fact that the latter performs direct subtraction of delayed coincidences event-by-event, as opposed to storing the separate acquisition of delayed coincidences that allows smoothing prior to subtraction from the prompt events. Direct subtraction without smoothing is more likely to result in false-negative values in the sinogram, which when reconstructed using the positivity constraint applied in OSEM algorithms creates noisy data and inaccurate quantification. The GE Healthcare systems employ the single-event based method of randoms correction (calculating the mean random coincidence rate for each line of response based on the coincidence timing window and the single photon event rate) and the non-ToF GE Healthcare systems do not exhibit the same extreme underestimates as seen in the non-ToF Siemens systems prior to the use of smoothing. Furthermore, scatter correction in low count studies may well be less accurate due to the difficulty of estimating scatter from such noisy sinograms, evident in the Philips data, and as suggested by van Elmbt et al. [19], may be further affected by additional signal coming from pair production in the LSO/LYSO crystals. The cumulative effect is a remarkable improvement in the ability of current generation scanners to image and quantify 90Y.

Results suggest that previous generation scanners without RR and ToF do not produce consistent quantitative 90Y measures for comparison with current generation scanners. From the range of data investigated in this study, 90Y imaging performance appears to be optimal for Siemens systems using two iterations and 21 subsets with ToF and RR for best quantification without compromising measures affected by noise, with an all-pass filter (or with a 5 – 8 mm gaussian filter for qualitative purposes). For GE Healthcare systems the use of an all-pass filter in conjunction with RR and ToF gave the most consistent results for quantification, and a subset analysis of data (not shown) suggested two iterations and 24 subsets. Investigation of the Philips ToF reconstructions of 90Y is ongoing, including communication with the vendor, but at present total measures of activity and concentration in background regions may be underestimated for low concentration regions.

In the imaging of 90Y for quantitative purposes with non-ToF generation GE Healthcare and Siemens scanners, measures of large areas of concentration (about 300 kBq/ml) can be expected, on average, to be within 9 % and 2 % of true values, respectively, but recovery of concentration measures in hot lesions (about 2,500 kBq/ml) can be expected to be inferior to imaging with their ToF counterparts, with average underestimates of −34 % and −27 %, respectively, for a 37-mm diameter object (see Table 5 for complete comparison). A different analysis method, such as a threshold-based VOI, may improve these RCs, but given the very noisy nature of the 90Y reconstructions this may also be affected by a spurious maximum value. This study suggests that with Siemens non-ToF scanners, data should not be acquired in ‘NETTRUES’ mode if correct quantification of 90Y is desired, and the most consistently accurate results were seen when using ‘PROMPTS-RANDOMS’ mode with two iterations and 21 subsets in combination with RR.

Table 5 Reconstruction parameters that provided most accurate quantification over the assessments performed in this investigation, and the expected accuracy and standard deviations associated with measures of warm background and hot spheres for each. All measures are based on the results from day-0 imaging, where phantom background and hot sphere concentration were about 300 kBq/ml and 2,500 kBq/ml, respectively

A ±10 % uncertainty can be expected on quantitative measures due to random noise in the acquisition and reconstruction process, which is approximately consistent in regions of non-zero background activity and hot spots (Table 3). Coupled with the uncertainty in the 90Y activity (±10 %) treated as the gold standard in this work, quantitative measures on ToF PET systems with the reconstructions discussed can be expected to produce acceptable estimates of activity and concentration in large homogeneous areas over a clinically realistic range of values. It should be expected that hot lesion quantification (and so absorbed dose estimates) may be underestimated with all current generation scanners to a consistent degree of 15 – 20 % for a 37-mm diameter object. Such underestimates may be improved with a different volume definition technique, as explored by Goedicke et al. [22]. Given the lack of significant improvement in warm and hot volume quantification with increased acquisition durations (Table 3), a 40-min acquisition is recommended in the clinical setting, acquired as two bed positions (20 min each) to avoid the area of interest (liver) imposing on the edges of the FoV where noise is greatest in the reconstructed data, and to avoid peaking of the scanner’s sensitivity profile.

Conclusion

In summary, current generation ToF PET scanners are capable of producing comparable quantification of 90Y over a large range of clinically realistic activity and concentration levels. In terms of quantitative accuracy of estimates and expected uncertainties for translation to clinical measures of absorbed dose, Table 5 shows the average errors and ranges of measures for those reconstructions that were found to be best, based on the data investigated in this work. Considering possible acceptance criteria for scanners acquiring data in a clinical trial setting, an achievable accuracy of concentration measures in large uniform regions of activity of 10 % (average) over a range of clinically realistic true concentrations (50 – 300 kBq/ml) may be considered suitable performance.