Abstract
In this work, we present details and initial results from a 177Lu dosimetry challenge that has been designed to collect data from the global nuclear medicine community aiming at identifying, understanding, and quantitatively characterizing the consequences of the various sources of variability in dosimetry. Methods: The challenge covers different approaches to performing dosimetry: planar, hybrid, and pure SPECT. It consists of 5 different and independent tasks to measure the variability of each step in the dosimetry workflow. Each task involves the calculation of absorbed doses to organs and tumors and was meant to be performed in sequential order. The order of the tasks is such that results from a previous one would not affect subsequent ones. Different sources of variability are removed as the participants advance through the challenge by giving them the data required to begin the calculations at different steps of the dosimetry workflow. Data from 2 patients after a therapeutic administration of 177Lu-DOTATATE were used for this study. The data are hosted in Deep Blue Data, a data repository service run by the University of Michigan. Participants submit results in standardized spreadsheets and with a short description summarizing their methods. Results: In total, 178 participants have signed up for the challenge, and 119 submissions have been received. Sixty percent of submissions have used voxelized dose methods, with 47% of those using commercial software. In initial analysis, the volume of organs showed a variability of up to 49.8% whereas for lesions this was up to 176%. Variability in time-integrated activity was up to 192%. Mean absorbed doses varied up to 57.7%. Segmentation is the step that required the longest time to complete, with a median of 43 min. The median total time to perform the full calculation was 89 min. Conclusion: To advance dosimetry and encourage its routine use in radiopharmaceutical therapy applications, it is critical that dosimetry results be reproducible across centers. Our initial results provide insights into the variability associated with performing dose calculations. It is expected that this dataset, including results from future stages, will result in efforts to standardize and harmonize methods and procedures.
Radiopharmaceutical therapies (RPTs) have demonstrated clinical utility in the treatment of disease such as thyroid, liver, neuroblastoma, neuroendocrine, lymphatic, and prostate cancers (1). Also, a new wave of theranostic radiopharmaceuticals (i.e., therapeutic and diagnostic) with highly specific molecular targeting for these and other cancers is entering clinical trials (2,3). This relatively new paradigm for treatment of widely metastatic cancer using radiopharmaceuticals has some advantages compared with other systemic therapies. The theranostics approach permits imaging of the biodistribution of the radiopharmaceutical, thus allowing physicians to treat what they see and see what they treat. Quantitative imaging has the potential to assess whether the binding of the radiopharmaceutical to a target of interest (e.g., a protein in the membrane of a cancer cell or a molecule involved in biochemical or metabolic cellular pathways) warrants targeted RPT. Imaging during or after treatment allows us to quantitatively assess the response to the therapy (e.g., by measuring decreasing uptake of responding tumors). Nuclear medicine imaging modalities, such as PET and SPECT, can provide quantitative 3-dimensional images representing the biodistribution, which is needed for dose estimation.
Quantitative 3-dimensional imaging is the basis of dosimetry calculations that estimate the amount of radiation dose (energy per unit mass) delivered to different tissues. Personalized dose assessments potentially facilitate optimizing treatment response by delivering the maximum possible dose to tumors while simultaneously monitoring the radiation dose to healthy organs and keeping them below toxic thresholds.
Despite this potential, RPT in clinical practice is most commonly administered using a simpler, nonpersonalized approach that ignores the potential for dose optimization based on imaging. Typically, and according to the U.S. Food and Drug Administration package inserts for most therapeutic radiopharmaceuticals, patients are administered the same activity on each therapy cycle; this approach does not account for individual differences in metabolic clearance or uptake of the radiopharmaceutical or anatomy. Moreover, dosimetry is also not routinely performed because it is believed to be difficult and time-consuming, requires expertise or staff that is not always available, and is not reimbursed.
The Committee on MIRD of the Society of Nuclear Medicine and Molecular Imaging (SNMMI) has developed a general framework for absorbed dose calculation at the organ, suborgan, voxel, and cellular levels (4). Guidelines for dose estimation using planar imaging, hybrid (SPECT plus planar), and multi-SPECT imaging workflows have also been published (5,6). The latest in this series, guidelines for image quantification of 177Lu using SPECT/CT (7), was published in 2016 as a collaboration between the MIRD committee and the dosimetry committee of the European Association of Nuclear Medicine.
The MIRD schema is straightforward, and the European Association of Nuclear Medicine has published guidelines for systematic ways to account for the impact of factors that affect bias and variability (precision) in dose calculations (8). There remains, however, a scarcity of data on variability, and this scarcity has complicated the goal of incorporating uncertainty estimation into dosimetry practice. Variability of absorbed dose results between different centers, practitioners, and patients is a key concern for dose-based treatment planning. This lack of knowledge of uncertainty has made it difficult to draw rigorous inferences about the robustness of dose–response relationships and to compare and combine data from different institutions and agents. Lack of these data has inhibited routine clinical implementation and complicated initiatives targeting reimbursement for dosimetry and dosimetry-based treatment planning.
The dosimetry workflow includes 5 general steps. In the first—data acquisition—quantitative SPECT images, planar images, or a combination of planar and SPECT images are acquired at multiple time points after the administration of the radiopharmaceutical. In the second—segmentation and registration—tissues of interest (e.g., tumors and organs at risk) are delineated (segmented) to define volumes of interest (VOIs) used in the analysis. Various methods are available to perform this segmentation and to register images acquired at multiple time points.
In the third step—data preparation—standard phantom dosimetry applies S values calculated using reference computational phantoms (9,10) that represent the average population anatomy. In this method, activities for tissues of interest (e.g., organs or tumors) are extracted from images. One approach to patient-specific, organ-level dosimetry is to calculate dose at the voxel level using activity and tissue maps based on imaging calculations from exact, individualized patient anatomy based on imaging (e.g., CT); dose rate maps (3-dimensional images of the dose deposited per unit time) are calculated from the activity images. In these approaches, organ-level dose rates can be calculated by averaging over tissues of interest.
The fourth step is integration. In standard-phantom dosimetry, the activities are integrated over time to obtain time-integrated activity (TIA) values. In some approaches to patient-specific dosimetry, activity images are integrated at the voxel level to form TIA images; in other approaches, dose rate images are integrated over time to calculate absorbed dose maps (3-dimensional images of the absorbed dose). The integration often involves the use of curve fitting.
The fifth step is dose calculation. In standard phantom dosimetry, S factors are combined with TIA values to calculate tissue-specific absorbed doses. In some patient-specific approaches, dose maps are calculated from TIA images using either Monte Carlo simulations or convolution with a precalculated dose kernel. Dose maps provide an estimate of absorbed dose in each voxel of the image (i.e., a voxelized approach). Regions of interest within the dose map can be used to provide different statistical values for absorbed dose within the tissue (e.g., the mean absorbed dose to the organ or tumor).
Variation in methods or application in any of these steps can result in variation in dose estimates for the same patient. Variability in the nuclear medicine images (from calibration, imaging, and reconstruction protocol, including compensations for image-degrading factors, or quantum noise) directly affects the variability of dose estimates. Variability in defining tissue VOIs leads to variability in both activity and mass estimates. Variation in methods for integrating the time–activity or time–dose-rate curves also contribute to variability in dose estimates. Variation in the dose calculation method or code, such as the S factors used, can also result in variations in dose estimates.
This is the first installment of multiple planned publications reporting on the 177Lu SNMMI Dosimetry Challenge. Here, we present details of the methodology used to conduct the challenge, including the design, the data used, the hosting of the data, and the variables collected. The challenge has gathered data from the global nuclear medicine community and aimed at identifying, understanding, and quantitatively characterizing the consequences of the multiple sources of variability in the dosimetry calculation pipeline. The challenge covers planar, hybrid, and pure SPECT dosimetry workflows using 5 different and independent tasks. For each participant and task, the study collects, among other variables, information about the methods used to perform the various steps of the dosimetry workflow, the software used, and the time required to perform the calculations. Having data on the magnitude of the various sources of variability is essential in developing harmonized and standardized dosimetry workflows that reduce variability. Reduced variability would allow for more precise, predictable, and repeatable therapeutic regimens and outcomes. The major goal of this study is to acquire such data.
Besides detailing the experimental methodology, this first publication summarizes the demographics of participants, categorizes and tabulates the general dosimetry approaches, and reports on the types of software used. Additionally, descriptive statistics associated with the uncurated absorbed dose calculation results from task 1 as submitted by the participants are reported. These data highlight the problem of variability in absorbed doses and other measured quantities in the dosimetry workflow. Further analysis of the correlations between different variables in the dosimetry workflow, a quantitative analysis as sources of variability are removed as the challenge progresses through tasks 2–5, and a detailed comparison of results calculated with different dosimetry approaches (i.e., planar vs. multiple SPECT/CT vs. hybrid approaches) will follow in the subsequent publications.
This dosimetry challenge focused on dosimetry for 177Lu-labeled therapy for neuroendocrine tumors, but the methodology developed could be applied in subsequent studies involving dosimetry calculations for RPTs using different radionuclides or targeting different diseases.
MATERIALS AND METHODS
Study Design
This study has been designed to measure the variability contributed by each step in the dosimetry workflow. However, variability in data acquisition is limited to comparison of pure SPECT, hybrid SPECT–planar, and planar-only acquisition protocols. Variability due to other aspects of data acquisition is important but is beyond the scope of what could be achieved in the time frame or with the resources available. The study was designed to accommodate a standard phantom and patient-specific dosimetry workflows at both the tissue and the voxel levels. Five discrete and independent tasks, each involving calculation of organ- and tumor-absorbed doses but starting at different points in the dosimetry workflow, were created for the study. Figure 1 shows schematically the tasks and the parts of the different workflow variability that is targeted by each task. The tasks were meant to be performed in sequential order and are summarized in Table 1. Pretherapy diagnostic image sets (CT or MRI) were provided to aid in delineation of organs and tumors. The order of the tasks and the provision of data were designed so that results from an earlier task would not affect the results of a subsequent one. Also, different sources of variability were removed as the participants advanced through the challenge (Fig. 1). We intentionally did not specify the methods or software to be used by participants.
Tasks 1, 4, and 5 each used 4 sequential 177Lu-DOTATATE SPECT/CT datasets acquired after therapeutic injection. The reconstructed SPECT images had voxel values in units of activity concentration (Bq/mL). Thus, the results of these 3 challenge tasks focus exclusively on the absorbed dose calculation workflow and purposely exclude variability and bias due to SPECT acquisition protocols, calibration, reconstruction, or quantification. This variability outside the absorbed dose calculation workflow can affect the results in 2 ways. First, variability in input data for differences in these factors would directly increase variability in the output dose values. Second, there could also be indirect effects. For example, image quantification, resolution, contrast, and noise properties are dependent on the scanner hardware and on the image acquisition and reconstruction protocols. Variability in these properties could result in, for example, variability between operators in defining VOIs and the resulting absorbed dose calculations.
In tasks 1–3, participants were asked to perform the entire dosimetry workflow, from segmenting images to absorbed dose calculations. Participants were asked to identify their VOI delineation method. We did not require partial-volume correction (PVC), for several reasons. A main reason is that there is currently no single, well-accepted method for PVC at the organ or especially the voxel level. A practical, widely used approach for organ- or tumor-level PVC is to apply volume-dependent recovery coefficients determined from phantom measurements. However, there are well-known limitations to using this approach, as recovery coefficients depend not only on volume but on other factors such as activity distribution and shape. For the purpose of this study, we thus treated PVC as part of the image acquisition, reconstruction, and quantification aspects of dosimetry, which are not addressed in this challenge. Neglecting PVC can cause large errors in dose estimates for small objects such as tumors. However, the interest here was in variability. It was emphasized in the instructions that applying PVC was not required but that if PVC was included, a description of the procedure should be added to the summary of the participant’s methods. We collected the volume of the region used to quantify the activity in the image and the volume of the region used to estimate the mass.
In task 4, we removed the variability associated with segmentation by providing participants with VOIs in the form of DICOM-RT structures or mask images that were to be applied to the SPECT/CT data to calculate organ and tumor activities and subsequently the corresponding absorbed doses. However, specific time–activity curve generation, fitting and integration, and dose calculation methods were left to the discretion of the participant. The tumor segmentations provided were defined manually by a radiologist; organ segmentations were based on deep-learning tools with fine adjustment by experts. Since we were testing primarily variability, the accuracy of the segmentations is not a limiting factor.
The difference in results from tasks 1 and 4 allows isolation of the impact of VOI segmentation on the variability of absorbed dose estimates. In task 5, a time-integrated-activity image in units of Bq/mL-s was provided. Participants were instructed to use this in combination with the segmentations from task 4. For each participant, the difference in calculated absorbed dose between task 5 and task 4 isolates the impact of differences in curve fitting and integration on the absorbed dose estimate. Results for task 5 provide data about variability due to the dose calculation method or software.
Sequential planar acquisitions are sometimes used to estimate absorbed dose. Methods for quantifying organ and tumor activities in these images have not been well standardized and require several cascaded corrections with poorly understood variabilities and biases. To mitigate some of these complexities, hybrid SPECT/planar methods use a single SPECT/CT scan at a time point coincident with one of a series of planar acquisitions to act as a quantitative calibration standard for the sequential planar data. Planar and hybrid methods are somewhat commonly used in clinical trials to reduce both acquisition time and, thus, cost and patient discomfort. Tasks 2 and 3 are designed to interrogate variability in absorbed dose estimates from these methods in comparison to pure SPECT-based methods.
Task 2 provided participants with a series of 4 177Lu-DOTATATE planar images (in units of counts) acquired after therapeutic injections (same patients and time points as for all other tasks). A sensitivity calibration factor was provided to convert in-air planar counts to activity. Participants were informed that the sensitivity data for the planar images were intentionally adjusted by a scaling factor (we used a factor of 2, which was unknown to participants) so that the results from task 1 (SPECT/CT) would not bias the results from task 2 (planar). It was incumbent on participants to select methods and perform corrections for scatter, attenuation, and other factors based on other data supplied (e.g., CT scans to estimate transmission factors). From task 2 entries, we anticipate not only understanding the variability in planar absorbed dose calculations but also having the ability to draw conclusions about differences in, and variability between, dose estimates as compared with dose estimates from the multiple SPECT/CT protocol from task 1.
Task 3 uses the 4 sequential planar scans from task 2 and a quantitative SPECT/CT dataset acquired at 24 h after injection. Differences in absorbed dose estimates between tasks 2 and 3 provide a measure of the difference in bias and variability associated with having the single SPECT/CT scan as a calibration standard for the planar images.
Datasets
All images were provided in DICOM format. Data from 2 patients (labeled A and B) who underwent a therapeutic administration of 177Lu-DOTATATE were used for this study (11–15). The same data were provided to all participants.
For each patient, 4 quantitative SPECT/CT images were acquired on a Intevo system (Siemens Healthineers) as part of an internal review board–approved research study at the University of Michigan. The acquisition of 360 frames was performed using 3 energy windows (120 projection views per window over 360°), a main window (186–227 keV), and scatter windows (165–186 keV and 227–248 keV). Images were reconstructed using xSPECT Quant software (48 iterations, 1 subset, without a postreconstruction filter; Siemens), which includes compensation for attenuation, scatter, and the collimator detector response. A sensitivity factor from a National Institute of Standards and Technology–traceable 75Se calibration source was applied by the scanner’s software to generate quantitative images (in units of Bq/mL) (16).
Details about the anonymized identifiers, therapeutic injection, acquired SPECT/CT and planar scans, and baseline and diagnostic scans are summarized in Table 2. These details were given to participants in the instructions and are also available in the DICOM headers of the shared images. No additional registration of the SPECT and CT images at each time point or between any image at different time points was performed. Participants were asked to estimate absorbed doses to each kidney, if possible, or to the kidneys as a whole, the spleen, healthy liver (i.e., the region of the liver without tumors), and specified tumors. Tumor locations were indicated on a fused SPECT/CT image provided in the instructions (Fig. 2). Patient B was splenectomized; no values are reported for this organ in this patient.
The planar images were acquired as part of the posttherapy imaging at the same time points as the SPECT scans. Transmission scans were not acquired, and the patient may have voided before the first scan. The provided diagnostic CT scan could be used to estimate the body thickness required for geometric mean attenuation compensation. The planar images were acquired with energy windows suitable for triple-energy-window scatter compensation.
Data Distribution
We looked for a centralized data library that could provide participants with access to the dataset, including images and metadata; allow the release of data needed for different stages at appropriate times during the study; and generate a digital object identifier and host the dataset beyond the end of the study, to allow use as a standard for future benchmarking methods and as a way to cite the data.
On the basis of these requirements, we selected Deep Blue Data (https://deepblue.lib.umich.edu/data), a data repository service run by the University of Michigan Library, to host the study data. Datasets, along with the associated documentation and metadata needed to discover, understand, and use the data, are deposited into Deep Blue Data. The challenge data were released in 4 stages as indicated in Table 1.
Participants had access to data from all previous stages and were specifically asked not to let results from a previous task influence results for a subsequent one.
Data Collection
The challenge was initiated by the SNMMI Dosimetry Task Force. An invitation to participate was issued through e-mail announcements to membership, through the SNMMI website, and through informal communications with other relevant international organizations. Each participant in the challenge self-identified themselves, their profession, and their respective institution, with the understanding that results would be presented only in aggregate form.
To aid in the identification and diagnosis of problems and distinct sources of variability in absorbed dose calculations, participants were asked to provide intermediate results for each stage of the dosimetry workflow. Table 3 summarizes the data and variables collected. A protected spreadsheet having unprotected cells available for reporting results and for pasting screenshots of VOI definitions and curve-fit plots, as well as having pull-down menus for items with a discrete number of answers, was created for each challenge task and provided to participants. In addition, to further understand possible outlier results, participants were asked to submit a page summarizing their methods and highlighting details of their procedures that might not have been covered in the collected variables.
Data Analysis
In this document, we are reporting only the demographics associated with the submission and early results from task 1, uninformed by subsequent tasks results. To show the variability in absorbed doses and other parameters of the dosimetry workflow, we calculated various descriptive statistics and generated various plots using data reported for task 1. All the results are presented in aggregated form. These data serve as a baseline for comparison of data from other tasks and include all sources of variability from all steps of the dosimetry workflow studied.
To understand the expertise of the submitters and the methods used, histograms of the self-reported professions of the submitters, the dosimetry method used (i.e., voxelized vs. organ level methods), the source of S factors, and the type of software used are shown.
To highlight the distribution and variability of the submitted results for task 1, violin plots of the volume of segmented regions, the reported TIA values, and the mean absorbed doses are shown. These plots are presented separately by patient and organ or tumor. Descriptive statistics including the minimum, mean, SD, and maximum, as well as the 25%, median, and 75% quartiles, were also calculated for the different distributions.
Because of limited resolution, activity in an organ can cover a larger region in the image than its physical size. A common method to compensate for this is to use a larger VOI to measure object activity and a smaller (more physically correct) region to estimate object volume or mass. Thus, we generated volume violin plots showing the distribution of volumes of the segmented VOIs used for activity and mass. Moreover, we generated bar plots that indicate whether the volumes used to measure the activity of an organ or tumor were identical to, smaller than, or bigger than those used to estimate the mass. In addition, some participants reported using a 4-mL sphere located inside an organ or tumor for which the absorbed dose was calculated; we account for this method separately within the bar plots.
Bar plots showing the functional forms used to fit the time–activity curve are shown for each organ and tumor.
Lastly, box plots with corresponding descriptive statistics are shown for the self-reported times required to perform the different steps of the dosimetry workflow.
The next publication resulting from the dosimetry challenge will include a more quantitative and comprehensive analysis of the variability of the absorbed dose using data from the different tasks of the challenge. Variance-component analysis based on mixed-effect models will be used to assess the relative contribution of each factor—such as software, VOI delineation method, and TIA generation method—to the variability in the absorbed dose calculation. Regression analysis will be performed to study the impact of these factors on dose results. We expect to provide guidance to the community about the areas on which efforts should be focused for standardization.
RESULTS
Here, we show preliminary results for task 1 and summarize the data as reported by the participants. We have performed initial vetting of the data to make sure that items were reported in the correct cells of the spreadsheet and that obvious unit errors were not present. When these were identified, we confirmed the results with the participant and have reported the updated values. More complete vetting of the data and detailed statistical analysis that identifies and characterizes more fully the magnitude of sources of variability will be published in part 2 of this study after data from all 5 tasks are collected and analyzed.
At the time of writing of this article, a total of 178 individuals had registered. We had received 119 submissions corresponding to 61 and 58 spreadsheets for patients A and B, respectively. A submission represents a received spreadsheet filled out by a participant. Each spreadsheet contains fillable cells for all the variables presented in Table 3. However, the numbers of the results presented for a particular item do not necessarily add to 119 as some participants did not report all the variables. Submitters, including their country and institution, can be found in the Acknowledgments section of this document. Several participants registered independently but submissions were made as part of a group.
Figure 3A shows the expertise of the participants who submitted data. The values in the graph do not add up to the number of submissions received for each patient as some of the participants submitted results using more than one dosimetry method.
Figure 3B shows the distribution of dosimetry methods used. Sixty percent of submissions used a voxelized approach. Organ-level approaches using precalculated S factors from a standard phantom accounted for 32% of submissions. Two submissions performed an organ-level approach but used a patient-specific mesh in combination with a Monte Carlo simulation. Four submissions reported calculating the dose to a 4-mL sphere placed inside the organ or tumor (i.e., did not segment the entire organ). Lastly, 4 submissions did not include information that would allow us to classify the method as organ- or voxel-based.
Figure 3C shows the distribution of S-factor sources based on submissions that reported using organ-level approaches. From these, 69% used OLINDA (17), and 60% of those used version 1 (including versions 1.0 and 1.1), 24% used version 2 (Hermes Medical Solutions, Sweden) (including versions 2.0, 2.1, and 2.2), 8% used only the OLINDA spheres models, and the remaining 8% used OLINDA in combination with other S-factor sources. The IDAC software (18), which follows International Commission on Radiological Protection publication 133 (19), accounted for 19% of submissions. Two submissions used OpenDose (20) in combination with factors published by Olguin et al. (21) for the tumors. Two submissions reported using local energy deposition instead of S factors. Lastly, 6 submissions are not included in Figure 3C as they reported also using local-energy-deposition–estimated doses using 4-mL spheres drawn within the organ or tumor.
Figure 3D shows the type of software used in voxelized dosimetry approaches. The commercial category includes submissions that performed their dosimetry calculation using commercially available software. Submissions that indicated that the software used was developed in-house were classified as homemade; 47% and 38% of the voxelized approaches were performed with commercial and homemade software, respectively. Hybrid submissions were those that used commercial software but for which a significant part of the calculation relied on in-house software, such as custom Monte Carlo simulation code. The hybrid submissions accounted for 10% of the voxelized approaches.
The indicated software in commercial and hybrid submissions included MIM (MIM Software), Hermes (Hermes Medical Solutions), Voximetry (Voximetry Inc.), and Varian (Siemens Healthineers), with 22, 10, 4, and 4 submissions, respectively. The 4 submissions categorized as open-source used Open Dose3D (20).
Figure 4 summarizes the method used to determine the volume and mass of the organ or tumor. Most participants used identical VOIs for these tasks. There were, however, cases in which the activity region was smaller, with participants drawing small spheres inside the organ to estimate the activity concentration, and some used larger VOIs to possibly account for partial-volume effects. The number of submissions that used each of the described methods is shown in Supplemental Table 1 (supplemental materials are available at http://jnm.snmjournals.org).
Figures 5A and 6A show the VOIs used for activity determination and for mass. Dosimetry calculations require an accurate measurement of both quantities. Larger VOIs are often used to compensate for partial-volume effects. If the mass of the VOI is estimated from this larger VOI, it can result in an underestimation of the absorbed dose. Detailed statistics (i.e., mean, SD, coefficient of variation, quartiles, and number of points) are presented in Supplemental Tables 2 and 3. For organs, the volume of the left kidney in patient B had the highest coefficient of variation: 102.4 mL ± 48.2% and 98.7 mL ± 49.8% for the activity and mass VOIs, respectively. Tumor 2 of patient B showed the highest variability for the activity VOI, at 14.1 mL ± 74.5%, and also for the mass VOI, at 12.5 mL ± 76.0%. Large variations in activity and mass do not necessarily result in large variations in absorbed doses, since dose is related to the ratio of these 2 quantities.
Figures 5B and 6B show the distribution of the calculated TIAs. For the organs, the highest variability in this parameter was observed for the left kidney of patient A, for whom reported values ranged from 182.3 to 1.57 × 105 with a coefficient of variation of 191.8%. For the lesions, tumor 2 of patient B showed the highest variability, with reported values ranging from 407.3 to 4.14 × 104 and a coefficient of variation of 172.2%. Detailed statistics on the TIA plots are shown in Supplemental Table 4. The reported TIAs from 3 submissions were excluded from the analysis because the reported values were almost certainly given in different units. Large variations in the TIA do not necessarily translate into large variations in absorbed dose. Some centers used a small sphere placed inside an organ to estimate its absorbed dose. The lower values of the ranges of the TIA correspond to the number of disintegrations in those smaller spheres. For these spheres, the TIA is small but the dose, because of the smaller mass, is much closer to that estimated from the entire organ.
Figures 5C and 6C show the distribution of the mean absorbed doses reported. The absorbed doses for the total kidneys of both patients showed the highest variability (reported as average value ± coefficient of variation calculated as SD divided by the mean), with values of 3.83 Gy ± 54.6% (range, 1.78–10.52 Gy) and 5.60 Gy ± 57.7% (range, 1.47–17.33 Gy) for patients A and B, respectively. Lesion 1 of patient B had the highest reported variability overall, at 4.21 Gy ± 98.1% (range, 0.72–33.32 Gy). Descriptive statistics for the absorbed doses are provided in Supplemental Table 5. Figures 7A and 7B show the type of function used to model the biodistribution of the organs and tumors, respectively. The reported functions included mono- and biexponential decays, an exponential uptake followed by a washout phase, and other types of functions. We did not specify the form of the washout function in the uptake-and-washout option although we were expecting a combination of exponential functions for the washout phase. We asked the participants for the different fit parameters, and we will report further on the functions used in the subsequent publications. Other types of functions included trapezoidal fits, trapezoids combined with monoexponential fits, 3-phase exponential fits, and semi- or fully automated methods that relied on combinations of mono- and biexponential fits. Detailed numbers are provided in Supplemental Table 6. The submissions indicated that monoexponential functions were the most widely used for the time–activity curve fitting of the organ biodistribution, but the exponential uptake followed by a washout phase was more common for the tumors. Comparisons of the other types of methods will be studied more carefully once the challenge concludes.
Figure 7C summarizes the time spent performing each task of the dosimetry workflow, as reported by the participants. Segmentation is the step that takes the longest time, with a median of 43 min to complete all requested VOIs and a range of 6–600 min. The median duration of the last step of the dose calculation (i.e., after generating the time–activity curve and calculating theTIA) was 33 min, but the maximum was 4,790 min. This maximum included computational time to run a Monte Carlo simulation and was not purely time invested by the participant. The median total time required to complete the dosimetry workflow was 89 min. Detailed times are presented in Supplemental Table 7.
Lastly, Supplemental Figure 1 shows 2 qualitative word clouds that summarize methods used by the participants to segment organs and tumors. The reporting of these methods has not been done in a standard way, but rather, participants entered a short description of their procedure. However, participants tended to use manual segmentation for the organs but semiautomatic gradient-based or thresholding methods for the tumors.
DISCUSSION
Few studies have tried to systematically evaluate the variability in dosimetry calculations performed using different protocols or methods.
Mora-Ramirez et al. (22) compared 5 commercially available dose programs on a cohort of patients treated with 177Lu-DOTATATE. Organ masses, TIA, and absorbed doses were estimated using software from the different vendors, and the resulting values were compared. They concluded that absorbed doses estimated with the different applications were of the same order of magnitude but that not all of them addressed the same part of the dosimetry workflow (i.e., some applications follow the whole dosimetry workflow whereas some others start or end at particular steps).
Multiple publications by He et al. investigated the contribution to variability from different steps of the imaging and dosimetry process: image quantification, quantum noise, VOI definition, and patient variability (23–26).
Gustafsson et al. also looked at the uncertainties in the absorbed doses to kidneys by introducing variabilities in different steps of the dosimetry workflow, including the γ-camera calibration (27).
Peters et al. (28) used phantoms to evaluate the quantification accuracy of images in multicenter and multivendor cameras and concluded that standardization of protocols and accuracy is feasible. A study by the International Atomic Energy Agency included 9 different centers to look at the accuracy and precision in the activity quantification for planar and SPECT using 133Ba as a surrogate for 131I (29).
Finocchiaro et al. (30) recently applied the European Association of Nuclear Medicine guidelines for uncertainty analysis in dose calculations for RPT (8) to a cohort of clinical cases. They aimed to show the uncertainties that can be expected in internal dosimetry and to identify which parameters have the greatest effect on those uncertainties. The results of the dosimetry challenge are expected to expand on that study because it includes the use of different segmentation methods (Figs. 5A and 6A) and because, in task 4, it isolates the effects of VOI definition.
Despite these previous efforts, there are still many unknowns, and more multicenter data are required. This study is unique because, to the best of our knowledge, it is the first study to invite the whole nuclear medicine community to perform dosimetry calculations on a standardized dataset without restrictions on, or prescription of, methods to be used. We think that this is a good representation of the current procedures implemented in nuclear medicine departments all around the world. However, we do recognize a limitation in that the current challenge does not address the variability in image acquisition parameters, reconstruction protocols, equipment calibration methods, and PVC. Moreover, the challenge does not address the accuracy of the results as it focuses only on identifying the sources of variability. This precludes use of the dataset for absolute benchmarking of the accuracy of dosimetry tools. We are working on addressing these limitations in a future study using simulated datasets for which the truth is fully known and allowing participants to select reconstruction and PVC methods and protocols.
The preliminary results presented in this work are only for task 1 of the challenge and do not yet allow comparison of acquisition approaches (i.e., planar vs. hybrid vs. SPECT). Also, sources of variability have not yet been systematically eliminated (they will be in task 4). These initial findings act as the baseline against which further tasks will be compared. The results already show substantial variability in many of the methods and calculations. We believe that this is an invaluable dataset and that results from subsequent tasks will provide data on the most critical sources of variability and help guide standardization and harmonization efforts in areas that have the most impact.
Medical physicists were, by a large margin, the professionals most frequently performing dosimetry calculations in this study, perhaps reflecting that this is a research project (Fig. 3). However, there are multiple disciplines involved in clinical RPT procedures, including technologists for image acquisition and physicians to interpret the images and make therapeutic decisions, among others. To optimize and reduce variability in dose assessments, it is important that all involved disciplines have knowledge of the dosimetry procedure. For example, technologists with knowledge of dosimetry procedures will better understand the need to record appropriate parameters and patient positioning. In addition, as with other procedures, technologists may be involved in the segmentation process or other aspects of the dosimetry workflow, though not in this study. The dosimetry challenge has created a standard dataset that might be used as an educational resource for training of various professionals in dosimetry procedures. We have received internal communications from participants who are using the data to educate their trainees.
Commercial software accounted for most of the submissions. However, when homemade and hybrid tools are combined, they account for most submissions, which means that there are still many noncommercial tools used. Although Mora-Ramirez et al. (22) compared 5 commercial software packages, we hypothesize that as the challenge evolves, the data will shed light on variability differences between in-house and commercial tools. This, in combination with the dataset made available through the challenge, can potentially be used to reduce the variability between the multiple tools used because it can act as a common benchmark for testing and development. Trainees, manufacturers, and developers can compare their results with the ones found in this and future articles of the challenge.
The first step that submitters had to perform for this challenge was the segmentation. Typically, segmentation was performed directly by the medical physicists. This was the most time-consuming task, and it is expected to be the largest source of variability in the absorbed dose results. As an example, a submission in which the kidney segmentation included only the renal cortex and medulla reported a 20.4% lower kidney-absorbed dose than one using the whole kidney (Supplemental Fig. 2), despite using the same software and methodology. To avoid these differences, it will be important to ensure and standardize the areas of organs that are segmented through input from physicians. Procedures in external-beam radiation therapy are initiated when technologists (dosimetrists) perform the segmentation. This is a model that RPT could potentially adopt, with appropriate training and standardization. Alternatively, use of simpler methods, such as using a small sphere inside a normal organ as a surrogate for the entire organ, could be recommended after validation to determine the resulting accuracy and precision. A small sphere is placed inside the kidneys to extrapolate the absorbed dose to the whole organ.
Variability in the TIA can be caused by variation in the activity values on the time–activity curve, which are impacted by the segmentation, and by variation in the fit function used to model the biodistribution. We will not be able to completely quantify the effect of the segmentation on variability in absorbed dose estimates without the results of task 4. However, we observed that the fit function varied widely among the submissions. Because monoexponential fits do not account for tracer uptake at early time points, monoexponential fits may, depending on the length of the uptake phase, result in absorbed dose estimates substantially different from fits obtained through the use of functions that model uptake and washout. However, a larger number of fitting parameters can reduce the precision of the fit. Other methods to address this issue included the use of a numeric integration (e.g., using a trapezoid) at the early time points. Differences between the fitting functions at late time points, such as when using mono- versus biexponential washout, can have a larger effect on the TIA and thus the absorbed dose. Guidelines to recommend fitting models for different situations could reduce this variability.
The variability in the different steps is reflected in the variability of the absorbed dose. However, the absorbed dose is also affected by differences in dose calculation methodology, such as the source of the S factors or Monte Carlo code used. In the submitted results, the reported absorbed doses differed by up to 100 times (tumor 2 of patient A in Fig. 6). On the last task of the challenge, we measure the variability due explicitly to this factor, and we thus expect to better understand the differences due to dose calculation method.
Lastly, although the median time spent to complete segmentation was the largest, the time spent for the final step of the dose calculation showed the highest variation. This is explained by the different dosimetry method. Applying an S factor to the TIA can be fast if that factor comes from a precalculated table or predefined phantom anatomy. However, when Monte Carlo simulations are used, the duration was up to orders of magnitude longer. Understanding of the time needed to perform the various parts of the calculation may provide important insights for reimbursement purposes.
Overall, this study aims to raise questions on best practices to reduce variability in dosimetry measurements. However, for the purposes of dosimetry standardization, it is essential that the accuracy of each dosimetry approach also be considered. Questions related to dosimetric accuracy are best answered using simulated data, which provide knowledge of ground truth. In this study, we used patient images and focused on investigating variability.
The initial results of the challenge presented here provide evidence of the importance of understanding the sources of variability in absorbed dose estimates. The dataset that has been, and will continue to be, collected has already generated important questions for future study. Some of these questions may be addressed in future stages of the challenge, and others may point to additional studies needed to harmonize and standardize dosimetry calculations once the challenge ends.
CONCLUSION
To advance dosimetry and encourage its routine use in therapeutic applications of RPT, it is critical that dosimetry results be reproducible across centers. There is currently a lack of comprehensive data on the sources of variability. The 177Lu dosimetry challenge presented in this study aims at collecting data from the international nuclear medicine community that can provide information needed for future standardization and harmonization procedures. The methodology and initial results of the first task were presented. Those results provide insights into the variability in expertise, software, segmentation, TIA calculations, absorbed dose results, and time required to perform the procedure. It is expected that this dataset, including results from future stages, will result in efforts to standardize and harmonize methods and procedures. This is deemed a critical step to justify and motivate reimbursement for dose assessments and clinical adoption of dosimetry-guided treatment in RPT, with the ultimate goal of improving patient outcomes.
DISCLOSURE
Yuni Dewaraja acknowledges support from NCI grant R01CA240706, under which the patient imaging studies were performed. Eric Frey acknowledges support from NCI SBIR grant R44 CA213782. Carlos Uribe acknowledges support from Natural Sciences and Engineering Research Council of Canada (NSERC) grant RGPIN-2021-02965. Yuni Dewaraja is a consultant with MIM (MIM Software) and receives research funding from Varian (Siemens Healthineers) and software support from Siemens (Siemens Healthineers) and MIM (MIM Software). Eric Frey is a cofounder and chief financial officer at Radiopharmaceutical Imaging and Dosimetry, LLC. Avery Peterson started a paid internship with MIM (MIM Software) after completing the analysis of the data in this article but before completing the submission. No other potential conflict of interest relevant to this article was reported.
KEY POINTS
QUESTION: Within the dosimetry workflow, what is the impact of the various sources of variability in dose results?
PERTINENT FINDINGS: Reported volumes varied by up to 142%;TIA, by up to 179%; organ doses, by up to 58%; and tumor doses, by up to 98%.
IMPLICATIONS FOR PATIENT CARE: Standardization and harmonization of methods and procedures in dosimetry are deemed a critical step in justifying and motivating reimbursement for dose assessments and clinical adoption of dosimetry-guided treatment in RPT, with the ultimate goal of improving patient outcomes.
ACKNOWLEDGMENTS
We acknowledge Bonnie Clarke, the director of research and discovery at the SNMMI, for all her help with the implementation, communication, and data collection of the challenge. We also gratefully acknowledge—by submitter, institution, and country—all the participants who submitted the results listed here: Adam Kesner, MSKCC, United States; Albert Bartrés, Onkologikoa Fundazioa, Spain; Alessia Milano, Italy; Andrew Prideaux, Hermes Medical Solutions, United States; Anne-Laurène Wenger, University Hospital of Zürich, Switzerland; Arda Könik, PhD, DABSNM, Dana Farber Cancer Institute, United States; Ashok Tiwari, University of Iowa, United States; Avery Peterson, University of Michigan, United States; Azadeh Akhavan, Geneva University Hospital, Switzerland; Benjamin Van, University of Michigan, United States; Carlos Montes Fuentes, Hospital Universitario de Salamanca, Spain; Chae Moon Hong, Kyungpook National University Hospital, South Korea; Daniel Mc Gowan, Oxford University Hospitals NHS FT, England; Daniele Pistone, University of Messina, Italy; David Adam, University of Wisconsin–Madison, United States; Diana McCrumb, BAMF Health, United States; Domenico Finocchiaro, Azienda USL di Reggio Emilia, Italy; Edoardo D’Andrea, Italy; Eric Brunner, BAMF Health, United States; Erin McKay, St. George Hospital, Australia; George Andl, Varian Medical Systems, United States; Greta Mok, University of Macau, China; Heying Duan, Stanford University, United States; Hina J. Shah, MD, DNB, BWH and DFCI, United States; Ivan Yeung, TECHNA Institute, University Health Network, Canada; Jacob Hesterman, Invicro, United States; Joe Grudzinski, Voximetry Inc., United States; Johan Blakkisrud, Oslo University Hospital, Norway; Joshua Scheuermann, University of Pennsylvania, United States; Juan Camilo Ocampo Ramos, MSKCC, United States; Julia Brosch-Lenz, University Hospital LMU Munich, Germany; Keon Min Kim, Seoul National University, South Korea; Lara Bonney, Oxford University Hospitals NHS FT, England; Lorena Sandoval, Instituto Nacional de Cancerologia, Colombia; Lukas Carter, MSKCC, United States; Natalie M. Cole, MIM Software, United States; Nathaly Barbosa, Instituto Nacional de Cancerologia, Colombia; Nuria Carrasco Vela, Hospital Dr. Peset, Spain; Paulo Ferreira, Champalimaud Centre for the Unknown, Portugal; Price Jackson, Peter MacCallum Cancer Centre, Australia; Raquel Barquero, Hospital Clinico Universitario, Valladolid, Spain; Rachele Danieli, Italy; Richard Laforest, Washington University, United States; Sean McGurk, Sheffield Teaching Hospitals NHS Foundation Trust, England; Shalini Subramanian, Rapid, LLC, United States; Stephen A. Graves, PhD, University of Iowa, United States; Su Bin Kim, Seoul National University, South Korea; Tay Young Soon, Singapore General Hospital, Singapore; Teresa Pérez, Hospital Universitario de Gran Canaria Dr. Negrín, Spain; Valentina Ferri, Stanford University, United States; Vikram Adhikarla, City of Hope, United States; William Erwin, UT M.D. Anderson Cancer Center, United States; Ying Xiao, University of Pennsylvania and IROC Philadelphia RT, United States; and Yazdan Salimi, Geneva University Hospital, Switzerland.
Footnotes
↵* Contributed equally to this work.
- © 2021 by the Society of Nuclear Medicine and Molecular Imaging.
REFERENCES
- Received for publication June 23, 2021.
- Revision received September 30, 2021.