Introduction

Image-guided surgery (IGS) is a relatively new and emerging platform, in which imaging techniques are applied intraoperatively. The goal of IGS is to provide the surgeon with real-time information on tissue in the surgical field, aiding in surgical decision-making [1]. Fluorescence imaging (FI) is ideal for intraoperative applications due to fast acquisition times (milliseconds), flexibility in application, and portability [2]. Various tumor-targeted near-infrared (NIR) fluorescence agents have been successfully studied in clinical trials [3,4,5,6]. Moreover, there is great potential for a broad range of clinical applications besides oncology, such as infectious and inflammatory diseases [7]. Consequently, new study groups, industry as well as hospitals are increasingly interested to explore and implement this technology in clinical care.

As NIR light (wavelength 600–900 nm) is invisible to the human eye, dedicated imaging systems are needed to detect the fluorescence signal and to form a two-dimensional (2D) image demarking its tissue distribution. The intraoperative detection of an imaging agent depends on various biological and optical factors (Table 1). The considerable increase in the number of clinical trials in the FI field has led to the development of a variety of FI systems [8]. However, as the imaging system represents the last link in the chain, sensitivity (i.e., detection limit) of the imaging system is crucial [9]. It is therefore important to ascertain if an imaging system is sensitive enough for the application of interest. Phantoms that mimic relevant concentrations of a fluorescent agent in scattering and absorption media can aid in the quantification of the imaging system performance. However, guidance or standard documents describing sensitivity assessment for imaging systems, whether or not including phantoms, is currently lacking [10, 11].

Table 1 Factors of influence on the signal to background ratio

The interplay between biological and optical factors ultimately results in a fluorescence image, in which the fluorescence signal in both the target and background can be semi-quantified. Using ImageJ (National Institute of Health, Bethesda, USA, a public domain image processing and analysis program) or proprietary software provided with the imaging system software, area and pixel value statistics in user-defined selections, known as a region of interest (ROI), can be analyzed. Standardized methods for selection of ROIs are not available, making this procedure prone to selection bias. Using the measured fluorescence signal in the ROIs of target and background, the signal-to-background ratio (SBR, also reported as target or tumor-to-background ratio [TBR]) is calculated as:

$$ \mathrm{SBR}=\frac{\mathrm{mean}\ \mathrm{signal}\ \mathrm{tumor}}{\mathrm{mean}\ \mathrm{signal}\ \mathrm{background}} $$

The SBR is the key determinant of sensitivity and detectability in FI and is frequently reported as a relevant endpoint in (pre-)clinical studies. Tichauer et al. [12] advocate that noise originating from the background can influence the contrast between target and background and suggest an alternative measure, the contrast-to-noise ratio (CNR):

$$ \mathrm{CNR}=\frac{\left(\mathrm{mean}\ \mathrm{signal}\ \mathrm{tumor}-\mathrm{mean}\ \mathrm{signal}\ \mathrm{background}\right)}{\mathrm{standard}\ \mathrm{deviation}\ \mathrm{background}} $$

Although it is theoretically plausible that CNR may of added value, this read-out has not yet been applied in daily practice. Thus, a comparison between TBR and CNR of in vivo obtained images is needed.

Intraoperative FI holds great promise to revolutionize surgery, but the ability to quantify FI, for reasons of comparison between centers and the imaging systems, will be a critical factor for successful application of the imaging technique. Just as the field is in the process of gathering the evidence through well-designed phase II/III clinical trials necessary for routine clinical application, standards are needed to assure standardization and to assess if imaging systems are adequately sensitive and fluorescence images can be accurately quantified [11, 13, 14]. Therefore, we conducted a systematic, controlled in vitro comparison between two commercially available, state-of-the-art clinical imaging systems using a novel designed calibration device for FI systems and various fluorescent agents to evaluate important performance characteristics of fluorescence imaging. In addition, we evaluated the effect of ROI selection and background noise on SBR calculation by analyzing 271 fluorescence images from previous studies [3, 15,16,17,18,19,20]. Based on these results, we propose an easily applicable, standardized approach to quantify and report imaging device performance and fluorescence image analysis.

Methods

Imaging System Performance

The CalibrationDisk™ (SurgVision,‘t Harde, the Netherlands) is a calibration device for FI systems. The disk can hold eight clear polypropylene tubes of 0.65 ml (Catalog # 15160, Sorenson, BioScience, Inc., Murray, USA) (Fig. 1). The device consists of two parts: an upper disk which holds the tubes in place and a base on which the upper disk can rotate. The upper disk has round windows that allow measurement of signal intensity in each tube. By rotating the disk, different concentrations of a tracer can be imaged at the same position and under the same excitation conditions providing assessment of homogeneity in illumination of the field of view. We performed the experiment with two different commercially available imaging systems with distinctly different modes of operation: imaging system A and imaging system B. Both systems are state-of-the-art clinical systems optimized for intraoperative NIR imaging providing real-time fluorescence images and white light overlays. System A, a cooled system, has two cameras, one for white-light image acquisition and one for fluorescence image acquisition of a single NIR channel (825 to 850 nm). System B uses a single camera for imaging two fluorescence channels (far red 700 to 830 nm and NIR 830 to 1100 nm) and a white-light channel. Four different NIR fluorescent agents including two dyes (indocyanine green (ICG) [Pulsion Medical Systems Munich, Germany] and IRDye 800CW[LI-COR Biosciences, Lincoln, NE, USA]) and two molecularly targeting fluorescent tracers (bevacuzimab-IRDye800CW [21] and a folate-NIR fluorophore (OTL-38) [3]) were used to make dilution series in Intralipid 2 %. All dilution series consisted of 21 concentrations, starting at 10,000 nM and, following one on one dilution with Intralipid 2 %, ending at 10 pM. Vials containing the 21 different concentrations were divided into three sets of seven (low, medium, and high concentration). A background or “0-vial” containing Intralipid 2 % without a fluorescent agent was added to each set (Fig. 1).

Fig. 1.
figure 1

The CalibrationDisk™ (SurgVision,‘t Harde, the Netherlands) loaded with eight clear polypropylene tubes of 0.65 ml (Catalog # 15160, Sorenson, BioScience, Inc., Murray, USA).

Each set was stacked into the CalibrationDisk™ and imaged at three different exposure times (low, medium, high) and three different gain settings (low, medium, high) with both imaging systems. Low, medium, and high settings of gain and exposure time were used rather than absolute values, as both imaging systems had a different maximum gain and exposure time. Imaging was done in a dark room under identical conditions, including an identical working distance of 20 cm. System A provides high quality and resolution 16-bits TIFF images. For system B, images were subtracted from videos (.qifs format) in the corresponding software suite, resulting in 8-bits TIFF images. Images were exported to ImageJ for gray value intensity analyses. Sensitivity was defined as the lowest concentration detectable at maximal settings (high gain, high exposure time). To mimic SBR in the clinical setting, we determined at what concentration a SBR > 2 between the pertaining vial and background vial was achieved. For a fair comparison of different bits size images, fluorescence intensity values were indexed for maximal imaging system value and plotted on log10 fluorophore concentration versus log10 fluorescence signal graphs. Linearity was defined as the slopes of linear fits to the log–log data. An optimal imaging system provides a doubling in signal strength for every twofold increase in concentration, resulting in a fitted linear slope of 1 in this logX–logY plot).

As a reference, the performance of the two intraoperative FI systems was compared to the Pearl Impulse preclinical imager (LI-COR Biosciences, Lincoln, NE, USA). This system contains an ambient-light-free chamber and can be used as a standard of the maximal linearity and sensitivity achievable. Analysis of images obtained with the Pearl imaging system was done using the software suite provided with the device.

Fluorescence Image Analysis

We evaluated 271 images available from previous studies to evaluate the effect of ROI selection and background noise on SBR. We randomly selected a representative sample of intraoperative and ex vivo images from both animal and human studies in different tumor types using different fluorescent agents and imaging systems (Table 2).

Table 2 Specifications of the images used for fluorescence image analysis

On these images, we drew a ROI around the (histologically confirmed) tumor. To evaluate the effect of ROI selection, we drew two different ROIs of similar area size in the background: the darkest region adjacent to the tumor ROI and the lightest region adjacent to the tumor. Lastly, we drew a ROI using our preferred method selecting the region surrounding the tumor ROI remaining within the anatomical structure in which the tumor is present. This can be done in ImageJ by subtracting the tumor ROI from the overlapping background, using the ROI manager menu and selecting the “More” button followed by the “XOR” button. Figure 2 displays a representative example of ROI selection. Mean gray values and the standard deviation of the pixels within one ROI were assessed using ImageJ. To evaluate the effect of background noise, we applied both SBR and CNR equation on the values obtained with ImageJ.

Fig. 2.
figure 2

a The influence of background selection on CNR and TBR. b A schematic example of different background selections. c Intraoperative image of a fluorescent metastatic lymph node with different background selections.

Results

Imaging System Performance

By assessing the lowest concentration visible with an imaging system, sensitivity of the system can be determined. Using the CalibrationDisk™, the lowest detectable concentration can easily be assessed for each agent and imaging system (Fig. 3). We found, irrespective of the fluorescent agent used, that system A is superior to system B in terms of sensitivity. The lowest detectable concentration with system A is 1 nM, for system B this is 500 nM. For comparison, the Pearl Impulse detects concentrations as low as 0.05 nM. Gain and exposure time settings influenced sensitivity of the system, with high settings leading to maximal sensitivity for both systems, nevertheless these settings did not affect the mutual differences between systems A and B.

Fig. 3.
figure 3

Fluorescence imaging of the CalibrationDisk™ containing seven vials with ascending concentrations of a fluorescent agent diluted in Intralipid 2 % and at 12 o’clock position a background or “0-vial” containing Intralipid 2 % without a fluorescent agent.

Moreover, SBR values > 2 (compared to the background vial) could be achieved from the low concentration set (0.61 nM) using system A, while system B could only achieve a SBR > 2 at concentrations exceeding 312.5 nM. Thus, in vitro performance of system A was superior to system B in terms of SBR.

Analysis of the linearity of imaging systems A and B reveals striking differences, with system A being superior, approaching the linearity of the Pearl Impulse. For the low and medium concentrations, the detection limit of the system was reached; therefore, signals measured by system B remain in the same range resulting in a horizontal line on the log10 graph. However, for the high concentrations, system B does display a linear gradient similar to system A and the Pearl Impulse (Fig. 4).

Fig. 4.
figure 4

Analysis of the linearity of imaging systems A and B compared to the Pearl Imager. An optimal imaging system provides a doubling in signal strength for every twofold increase in concentration, resulting in a slope of 1 (linear fit with 45° angle in logX–logY plot).

Fluorescence Image Analysis

The method applied for ROI selection had a profound influence on both SBR and CNR. Figure 2 shows the influence of background selection on CNR and TBR. Obviously, selection of a darker background will increase the SBR. As Fig. 2 effectively displays, a sufficient SBR (> 2) can be achieved by adapting background ROI selection. In addition, the area size of the background ROI influences the CNR. As the selection of a small area as background results in a small standard deviation, CNR is higher when smaller ROIs are selected.

Discussion

Quantitative ability of a FI system will play a crucial role in the clinical adoption of intraoperative FI. Despite repeated calls from the FI community, guidelines or standards for quantification of the performance of imaging systems or the analysis of fluorescence images are still lacking [11, 13, 14]. As clinical trials are expanding, the need of a performance test was regarded highly urgent and as such we propose a simple and low-cost imaging system performance test that can be applied to every fluorescent agent and imaging system. We demonstrate how this test can be performed and how data can be interpreted. In addition, we evaluated a representative sample of 271 fluorescence images. Based on the effect of ROI selection and background noise on SBR calculation, we also propose a routine procedure for quantification of fluorescence images.

Imaging System Performance

Sensitivity of two different imaging systems for intraoperative use was assessed. The goal of our experiment was not to quantify performance of an individual imaging systems, but to demonstrate how to compare different imaging systems and predict clinical performance in an experimental setting. Hence, we decided to select systems with a distinct mechanism of action and anonymize both systems. For the imaging system assessment, we used the CalibrationDisk™ and tubes filled with descending concentrations of different fluorescent agents. Various types of other phantoms are described in literature. The use of solid polyurethane phantoms, with TiO2 particles mimicking scattering and quantum dots mimicking different concentrations of a fluorophore, is suggested by Zhu et al. [9]. Benefit of these solid phantoms is their longer shelf life that allows repeated measurements over time. Disadvantages are that these phantoms are difficult to construct. More importantly, however, is that while quantum dots mimic the fluorescence of the agent, it does not use the actual fluorescent tracer that will be used in humans and thus provides only a distal proxy of the crucial information. Others have suggested the use of more tissue-like phantoms made from gelatin [22, 23]. The fluorescent inclusions, used to mimic tumors, are prepared using a custom-made silicone mold, which is filled with agarose mixture containing a relevant concentration of the fluorescent agent. Alternatively, hydroxyapatite (HA) crystals loaded with Pam78, a fluorescent derivative of the bisphosphonate pamidronate, calibrated against the relevant concentration of the fluorescent agent can be used. Background tissue is made from gelatin to which various ingredients can be added to mimic absorption (hemoglobin or pink India ink), scattering (using Intralipid or milk powder), and autofluorescence (ICG). The fluorescent inclusions can be incorporated at various depths in the gelatin base. Although close to the clinical setting, the manufacturing of these tissue-like phantoms is laborious and the shelf-life is limited. For training purposes, these phantoms are probably superior, but for sensitivity and comparability testing of imaging systems most features seem superfluous. The use of the CalibrationDisk™ allowed determination of the lowest concentration detectable and the ability to quantify concentrations. This data can be used to compare imaging systems or to predict clinical performance and can consequently simplify the task of selecting the right system for a certain application. However, users of imaging systems should be aware that the generated in vitro data is a simplification of the in vivo reality. In vivo optical tissue properties influence the ability to discriminate the signal from its background, sufficient knowledge and careful consideration of these limitations remains of the utmost importance. Moreover, various other factors besides sensitivity may play a role in the selection process. Dsouza et al. suggest six key features for imaging systems [10]: (1) real-time overlay of white-light reflectance and fluorescence images; (2) fluorescence-mode operation with ambient room lighting present; (3) high sensitivity to tracer of interest; (4) ability to quantify fluorophores in situ; (5) ability to image multiple fluorophores simultaneously; and (6) maximized ergonomic use.

Although we focused on the sensitivity point, the other points are equally relevant when deciding on the optimal imaging system for a certain application. In addition, the costs should also be taken into account.

Fluorescence Image Analysis

The basic principle for FI is the excitation of fluorophores using a light source and the subsequent detection of photons emitted by the excited fluorophore using the imaging system. The detection of emitted photons is influenced by tissue optical properties like absorption and scattering (including reflection). Absorption of photons is a consequence of tissue specific absorption properties, of which in humans blood is the main absorber [24]. Scattering is the change of the direction of a photon in tissue. Scattering events can cause decreased signal strength and source localization, as occurs with fatty tissue. The effect of these phenomena is increased when a photon has to travel through more tissue, thus with greater tissue depths. Reflection seen at the surface of tissues causes diffusion of the signal and consequently reduced detection. Besides the targeted fluorophore, excitation can cause endogenous fluorophores within the tissue to fluoresce as well (e.g., autofluorescence). Noise is the sum of autofluorescence, scattering and reflection events and can make it difficult to discern the actual fluorescence signal. A SBR > 2 is generally considered adequate to differentiate target from background [25]. Nevertheless, there are clinical trials in which SBR values below 2 are described as sufficient for intraoperative FI [26]. Despite routine use in optical imaging, including nuclear medicine, the cut-of value of 2 seems to be based on marginal evidence and the clinical relevance of this cut-of seems at least questionable. CNR is the ratio of the absolute difference between background and tumor signal and the standard deviation of the background. Rewriting the CNR formula shows that the CNR is strongly dependent on the SBR:

$$ \mathrm{CNR}=\left(\mathrm{SBR}-1\right)\frac{\mathrm{mean}\ \mathrm{signal}\ \mathrm{background}}{\mathrm{standard}\ \mathrm{deviation}\ \mathrm{background}} $$

The use of CNR is theoretically favorable over SBR to quantitate in vivo obtained (patient) imaging data as it is more comprehensive measure and thus provides extra information. Following the three-sigma-rule, an empirical statistic rule often used in descriptive statistics, a CNR of 3 or higher indicates that the average tumor signal is present only in approximately 0.135 % of the background selection [27]. It could therefore be argued that the CNR has a more evidence-based cut-off value.

However, this does not instantaneously mean that the CNR can be declared superior to SBR, as both quantitative measures are critically dependent on the ROI selection. Background ROI selection has an important influence on the mean background signal (SBR and CNR) and its standard deviation (CNR), rendering CNR is equally prone to selection bias as SBR. To increase reproducibility in FI research ROI selection process should be standardized and described more in detail in scientific articles. We suggest a ROI selection procedure that is representative and least prone to selection bias. However, as selection is done manually, bias cannot be excluded completely. The gold standard remains performance of biodistribution studies describing the percentage of the dose per gram of tumor and background tissue.

Conclusion

In conclusion, assessing the sensitivity in terms of the detection limit of an imaging system is easy and yields relevant data. Data can be used to compare imaging systems or to predict clinical performance, which allows selecting the right system for a certain application. Importantly, other factors, including costs and ergonomics should also be taken into account. Although CNR is partly a result of SBR, it does provide additional information regarding the noise and allows a more scientific base for a cut-of-value. Irrespective of the use the SBR of CNR formula, for contrast quantification, selection of a representative background and tumor ROI is of the utmost importance. Manual ROI selection without a clearly defined procedure, allows bias, potentially misrepresents CNR and SBR results and has low reproducibility. In general, researcher and clinicians should be aware of these possibilities and limitations of FI systems and image quantification.