Introduction

Central Serous Chorioretinopathy (CSCR) is a retinal disorder prevalent in the adult population, defined by fluid accumulation in the subretinal space, originated from the choroid layer of the eye [1,2,3]. This build-up can lead to potential decreases in visual acuity (VA), causing significant discomfort and disruption to the daily lives of affected individuals. Therefore, the need for accurate diagnosis and effective monitoring of CSCR is of paramount importance in mitigating visual impairment and improving patient outcomes. In this context, the development of an efficient, precise, and automated system capable of segmenting fluid regions in ocular scans represents a crucial breakthrough for diagnosing and managing CSCR.

Over the past years, Photodynamic Therapy (PDT) has emerged as a treatment of choice for CSCR due to its proven safety and efficacy [4,5,6]. PDT operates by selectively activating a photosensitizer, causing the creation of reactive oxygen species that target and destroy abnormal choroidal vasculature. This approach is highly compatible with the pathophysiology of CSCR, aiding in effective subretinal fluid (SRF) reduction and encouraging the restoration of retinal function.

The adoption of PDT is increasingly widespread, backed by a solid body of clinical evidence attesting to its effectiveness and safety. Scientific studies reveal a significant reduction in SRF following PDT administration, with associated improvements in VA [4]. However, the effectiveness of PDT is not uniformly seen across all patients, underscoring the need for an individualized treatment approach. Here, the segmentation of fluid regions in retinal images plays a crucial role in guiding the treatment plans.

Parallel to advancements in treatment options, Optical Coherence Tomography (OCT), a non-invasive imaging technique, has shown tremendous value for diagnosing CSCR [7,8,9]. OCT generates high-resolution images that enable the identification and quantification of SRF, a key indicator of CSCR. However, the evaluation of fluid regions in OCT scans currently relies heavily on manual intervention. This process is not only resource-intensive and time-consuming but also susceptible to variability due to its dependence on the expertise of the individual conducting the segmentation.

This study aims to circumvent these limitations by proposing an automated system for 3D fluid region segmentation in OCT scans of CSCR patients, leveraging the power of deep learning (DL). We envision this system drastically reducing segmentation time and effort, substantially improving efficiency, and enabling the analysis of large datasets. The system is also expected to produce standardized and consistent results, minimizing potential errors and variability. Consequently, it would significantly contribute to the accuracy in diagnosing CSCR, monitoring its progression, and assessing responses to treatments like PDT [10, 11].

Our study distinguishes itself through a keen focus on harnessing the volumetric, 3D information inherent in OCT scans. OCT scans capture an extensive range of data, providing a 3D perspective of the retina that is essential for a comprehensive understanding of the disease state. We hypothesize that incorporating this 3D context into our DL model will facilitate the extraction of nuanced features and comprehensive characterization of the fluid regions. Consequently, the derived insights may shed light on the evolution of these regions over time, thus offering an invaluable perspective on the progression of the disease and the effectiveness of treatments.

In addition to automating the segmentation process, this study extends its scope to predict the response of chronic CSCR patients to PDT based on the analysis of pre-treatment OCT scans. We endeavor to unearth patterns and correlations that may act as dependable indicators of treatment efficacy. The predictive power of this system has the potential to significantly enhance patient care. By foreseeing treatment responses, clinicians can make informed modifications to treatment plans, thereby optimizing patient outcomes. This proactive approach aids in personalizing therapeutic strategies and enables the efficient allocation of medical resources.

Furthermore, this system could prove to be an indispensable resource for researchers. It could facilitate large-scale, systematic studies into the impact of PDT on chronic CSCR, thus enhancing our comprehension of the disease and its treatments. Such investigations, otherwise hindered due to the extensive manual efforts required in data analysis, could potentially lead to the discovery of novel treatment methodologies and improved management strategies.

Related works

In this subsection, we explore the existing body of literature pertinent to the two primary objectives of this work: fluid segmentation in CSCR and PDT response analysis. This overview will shed light on the methodologies and techniques previously employed in these areas, their successes and limitations, thereby establishing the context and necessity for the approach proposed in this study.

CSCR fluid segmentation

The automation of CSCR diagnosis using Computer-Aided Diagnosis (CAD) systems can notably enhance efficiency and accuracy, minimizing the errors that arise from subjective expert evaluations. A number of studies have delved into the use of CAD for CSCR diagnosis. For instance, the work by Chen et al. [12] proposed an attention-gated network DL model for the automatic detection of CSCR leakage points in fundus fluorescein angiography, showcasing the potential of DL models in this realm. Similarly, the research by Xu et al. [13] designed a DL-based framework for screening SRF from fundus images, employing a cascading approach with two Convolutional Neural Network (CNN) models. Yoo et al. [14] leveraged a CNN architecture for efficient SRF area segmentation in fundus photography, further validating the utility of DL in CSCR characterization.

In parallel, OCT has garnered attention for diagnosing retinal diseases, primarily due to its prowess in pathological fluid region detection. By harnessing high-resolution cross-sectional OCT images and automatic segmentation techniques, researchers have made substantial progress in analyzing various retinal diseases, including diabetic macular edema, glaucoma, and age-related macular degeneration [15,16,17,18,19].

Several recent studies have utilized OCT to automate CSCR analysis. Gao et al. [20] applied an area-constraint fully-convolutional network for the automatic segmentation of CSCR regions in OCT images, yielding results on par with manual segmentation, following independent layer segmentation as well as quantitative and qualitative evaluations. Rao et al. [21] improved the segmentation of regions using DL-based architectures coupled with a data pre-processing stage. De Moura et al. [22] proposed an end-to-end methodology employing a fully convolutional architecture to identify and segment intraretinal fluid regions associated with CSCR in OCT scans.

These studies bear witness to the strides made in automating the analysis of CSCR using OCT imaging. However, a closer examination reveals that the full potential of OCT scans, particularly their 3D volumetric information, remains underexplored in the context of fluid segmentation in CSCR. Furthermore, while these methods have proven effective, their efficiency and accuracy can still be improved. These uncharted territories provide the impetus for the current study, prompting us to propose a comprehensive DL-based approach for 3D fluid segmentation in CSCR.

Response to PDT

In clinical practice, the response to PDT varies considerably, and the prediction of which patients will experience a favorable treatment response with complete resorption of SRF remains a challenging task. The resolution of SRF with half-fluence PDT has been reported to range from 67% to 97% across different series [4, 6, 23, 24]. Several clinical factors, such as advanced age, low baseline VA or the extent of retinal pigment epithelium (RPE) damage, have been associated with a less favorable response to PDT [25]. However, there are few studies that quantify the predictive value of these characteristics. Furthermore, the specific baseline anatomical features observable in OCT scans that might determine a patient’s response to PDT remain largely unexplored.

Within the realm of ophthalmology, the applications of artificial intelligence (AI) have primarily been concentrated on improving image analysis and predicting clinical outcomes [26,27,28]. Despite the demonstrated success of DL in accurately identifying CSCR using fundus images and distinguishing between its acute and chronic forms through imaging analysis [29, 30], studies investigating the potential utility of AI in the analysis of CSCR using OCT have been limited. Recently, an AI-based study conducted by Xu et al. [31] exhibited that their DL and machine learning-based algorithms can predict VA and post-therapeutic OCT images in patients with CSCR. Fernández-Vigo et al. [32] in a recent publication presented a study that focused on predicting the response to PDT in CSCR patients by leveraging DL with spectral-domain optical coherence tomography (SD-OCT) images.

Nevertheless, a critical review of these studies reveals that they primarily rely on 2D image analyses. Therefore, a comprehensive 3D analysis that fully utilizes the volumetric information embedded within OCT scans to predict PDT response is still largely uncharted. Also, the application of DL to extrapolate treatment response from pre-treatment images, thereby providing an a priori estimate of treatment success, remains an area requiring further exploration. These gaps in current knowledge provide the motivation for the present study, urging us to propose a 3D DL approach for predicting PDT response in CSCR patients.

Materials

Datasets

In this study, we employed two custom datasets specifically designed for our research. The Ethics Committee of the Hospital Clí­nico San Carlos in Madrid (HCSC) approved the protocol of this study. The datasets for each task are detailed below.

3D CSCR Fluid Segmentation Dataset

The dataset used for the segmentation task comprised 2769 OCT images where collected before PDT treatment from 124 persistent (SRF lasting more than 6 months) and complex (total area of RPE alteration greater than 2-disc area diameter) patients. Each of these images was paired with corresponding fluid labels, expertly annotated by an ophthalmologist with more than 10 years of experience. Data were organized into 124 3D volumes, each volume representing a distinct patient. The careful composition of this dataset ensured a diverse and representative assortment of OCT scans, comprising images from patients between 35 and 78 (mean age 55.2 ± 8.2), both male (80.8%) and female (19.2%), and right and left eye images from a 54.8% and a 45.2% of the patients respectively, thereby enabling our model to effectively learn and generalize the task of fluid region segmentation.

Prediction of the Response to PDT Dataset

The dataset used for the PDT response prediction task was composed of 216 volumes from 216 chronic CSCR patients treated with half-fluence PDT between January 2017 and December 2020 in the Hospital Clínico San Carlos, Madrid. Inclusion criteria were patients over 18 years old with diagnosed chronic CSCR that satisfied the major and minor criteria recently described for this disease [33], being all the cases persistent and complex, and eligible for half-fluence PDT, on the other hand patients with other retinal pathologies, suboptimal quality OCT images, severe retinal damage, or large photoreceptor atrophy were excluded. The patients were evaluated by experts based on the resolution of SRF 3 months after PDT treatment, this time point was set because it is when the response to PDT is assessed in clinical practice due to the progressive effect of this therapy that shows a higher rate of SRF resolution at 3 months than at 1 month. The main reason is that after PDT, it is frequent to observe an early worsening of the SRF related to the inflammation produced by the PDT [34]. The OCT software’s tools were used to measure and compare SRF before and 3 months after treatment, leading to the classification of patients into three groups:

  • Group 1: 100 patients who exhibited complete resolution of SRF. Figure 1(a) illustrates the complete resorption of the SRF after treatment.

  • Group 2: 66 patients who exhibited partial resolution of SRF, signified by at least a 15% decrease in baseline SRF height. As shown in Fig. 1(b), while the treatment led to a degree of SRF resorption, it was not fully resolved.

  • Group 3: 50 patients who did not show any SRF resorption, where the decrease in baseline SRF height was less than 15%. Figure 1(c) demonstrates that despite the treatment, the SRF did not experience any significant reduction.

Fig. 1
figure 1

Illustrative example of pre- (1st column) and post- (2nd column) treatment images for a patient from each group. The presented images vividly portray the various degrees of SRF resolution following PDT treatment

SD-OCT images were obtained using the Spectralis system from Heidelberg Engineering, Germany, both before and 3 months after PDT treatment. Macular cube images covering a 6 mm x 6 mm area were extracted from the OCT scans. The volume scan employed a “fast volume" preset, capturing 25 sections separated by 240 \(\mu m\). Each section was averaged 9 times using automatic real-time (ART) technology, resulting in a lateral resolution of 512 pixels and an axial resolution of 496 pixels. Regarding patients information, ages ranged between 18 and 86 years, with a mean age of 55.6 ± 10.8 years. As for the gender distribution, a 70.6% were males and a 29.4% females. Representative samples of pre- and post-treatment patient images from each group, exhibiting typical manifestations of CSCR, can be viewed in Fig. 1.

Software and Hardware

In this research, Python (version 3.8.10) served as our primary programming language, due to its widespread use in scientific computing and the availability of numerous auxiliary libraries. For the segmentation task, we employed the nnU-Net library [35], which is well-regarded for its robust capabilities with 2D and 3D images. It is particularly suited to medical imaging tasks, given its ability to handle voxel spacings, anisotropies, and class imbalances.

To read and write 3D image data, we used the NiBabel library (version 4.0.2) [36], a Python package specifically designed for working with neuroimaging data. This tool was instrumental in efficiently handling our OCT scans.

The MONAI (Medical Open Network for AI) framework [37] was used for training and validation of the classification model in predicting PDT responses. This PyTorch-based platform is tailored for medical imaging applications, providing the necessary tools for developing our predictive model.

Performance evaluation of our classification model was facilitated by the Scikit-learn library [38], which offers robust methods for model evaluation, thereby helping us objectively assess our model’s performance.

Finally, all our models were trained, validated, and tested on a computer powered by an AMD Ryzen Threadripper 3960X 24-Core Processor and equipped with an NVIDIA® RTX A6000 GPU. This setup allowed us to efficiently conduct our computational tasks and run our experiments smoothly.

Methodology

In this section, we provide a detailed description of the methodology employed in this study. As shown in Fig. 2, the proposed system is designed to take a 3D image and perform the CSCR fluid segmentation, followed by the analisys of the PDT response through three different strategies. The following subsections provide more detailed information on the different components of the methodology, including the network architecture used for CSCR fluid segmentation, as well as the PDT response analysis approach.

Fig. 2
figure 2

General scheme of the proposed methodology

3D CSCR Fluid Segmentation

The accurate segmentation of the CSCR fluid is essential for the automated analysis of OCT images. To achieve this, we employed three 3D and a 2D configurations of the nnU-Net architecture to analyze their performance and determine which configuration produces better results. Figure 3 shows an overview of the 3D CSCR fluid segmentation pipeline, the input is a 3D OCT image and it outputs a 3D image with the 3D CSCR fluid region segmented. nnU-Net was selected as the reference architecture for semantic segmentation tasks in medical images due to its ability to accurately segment complex structures. These network architectures have been widely used in medical image segmentation tasks and have demonstrated the ability to accurately identify complex structures and objects within images [35]. By evaluating these architectures in our study, we aimed to determine the optimal approach for fluid segmentation in OCT images.

Fig. 3
figure 3

3D CSCR fluid segmentation pipeline

Network Architecture

In this work, we use nnU-Net that is an advanced DL architecture specifically designed for medical image segmentation tasks. It extends the original U-Net architecture on and follows an encoder-decoder structure and employs 3D U-Net modules, which serve as its core building blocks. These modules leverage 3D convolutions, max-pooling, and upsampling operations to extract high-level features from the input and generate accurate segmentation maps. The nnU-Net offers four different configurations:

  • 2D: processes 2D slices of medical images independently, making it computationally efficient but lacking the consideration of 3D spatial context between slices.

  • 3D full resolution: takes advantage of the entire 3D volume, preserving spatial relationships between slices for better accuracy, but it is computationally demanding and memory-intensive.

  • 3D low resolution: downsamples the 3D volume, striking a balance between efficiency and accuracy, while still retaining some spatial context.

  • 3D full resolution cascade: employs the segmentation results of the 3D low resolution configuration upsampled to the original voxel spacing and passed them as additional (one hot encoded) input channels to a 3D full resolution network, which is trained on patches at full resolution.

Training details

Regarding the training stage, we first randomly divided the used OCT image dataset into 2 smaller subsets, with 80% of cases for training and validation following a 5-fold cross-validation strategy and the remaining 20% for testing. It is also important to note that for each cross-validation fold a 80% and a 20% of cases were for training and validation respectively. In this way, we can obtain a more reliable and robust measure of the behaviour of the trained models. All images from one patient are exclusively assigned to one data set, avoiding any splits between the training, validation and testing data sets. This ensures the independence of the data sets and prevents any information leakage between them. The network parameters (weights and biases) are adjusted by applying the stochastic gradient descent (SGD) optimization method with a Nesterov momentum of 0.99 and an initial learning rate of 0.01. The learning rate is decaying according to the ’poly’ learning rate policy, \((1- \frac{epoch}{epoch_{max}})^{0.9}\). The training is conducted using a combination of Dice and Cross-Entropy as the loss function [39,40,41].

Data Augmentation

In recent years, DL has demonstrated very good performance on a variety of problems. However, to obtain robust and consistent results, it is necessary to train models with a considerable amount of data. In this way, an attempt is made to mitigate the well-known problem of over-fitting [42]. Therefore, in this work, we apply a data augmentation strategy to increase the size of the image dataset by applying different computer vision techniques. Specifically, we apply the data augmentation strategy, only on the training subset, using the following augmentation techniques on the fly during training: random rotations, random scaling, random elastic deformations, gamma correction augmentation and mirroring.

Evaluation

Precision, Recall, Accuracy, Jaccard and Dice coefficient are the most commonly used statistical metrics in the state of the art [22, 43] to quantitatively evaluate and validate the CSCR fluid segmentation developed by our method. These metrics are used to compare the predicted segmentation with the ground truth labels.

In addition to these metrics, we conducted repeated measures ANOVA analysis to further examine the performance of our segmentation method across different models. Furthermore, to provide deeper insights into the differences between segmentation results across different models, we performed a Tukey post hoc test. This test enabled us to identify specific pairwise differences in segmentation outcomes and determine statistically significant variations between groups. By employing repeated measures ANOVA analysis and Tukey post hoc test in conjunction with traditional evaluation metrics, we ensure a comprehensive assessment of the effectiveness and reliability of our method in the context of CSCR fluid segmentation.

Prediction of the Response to PDT

Predicting a patient’s response to treatment before initiation is vital for effective management and recovery acceleration. For this study, we formulated three unique scenarios to extensively analyze the prediction of response to PDT using OCT images. Each scenario targets a different, clinically relevant condition. Detailed descriptions of these strategies are as follows:

  • Strategy 1: (Group 1) vs (Group 2) vs (Group 3). This first strategy examines the prediction of three distinct PDT responses in CSCR patients with persistent SRF: complete resorption, partial resorption, and the absence of any resorption. This comprehensive analysis evaluates the differentiability between all considered clinical scenarios.

  • Strategy 2: (Group 1) vs (Group 2 + Group 3). The second strategy adopts a predictive approach to assess the distinguishability between positive and negative PDT responses. To do this, we combine the responses of partial resorption and no resorption into a single class for analysis.

  • Strategy 3: (Group 2) vs (Group 3). The final strategy formulates a computational approach to determine the degree of separability between classes, focusing specifically on cases where the predicted response to PDT is negative.

These strategies aim to provide a comprehensive exploration and assessment of PDT response prediction in OCT images, considering various clinically significant scenarios.

3D Computational Approaches for Prediction

In our pursuit of achieving optimal classification performance, we have incorporated five distinct computational approaches in this study. The diversity in the use of input data by these approaches allows for a comprehensive exploration of their predictive capabilities with respect to our models. Each approach is outlined as follows:

  • 3D volumes: This approach employs a 3D DenseNet121 classifier, using 3D volumes as input to predict the treatment response group. See Fig. 4 for an overview of this method.

  • 3D fluid regions: This strategy, as shown in Fig. 5, utilizes the outputs of the 3D fluid segmentation method as inputs to a 3D DenseNet121 classifier.

  • 3D concatenated: This approach, visualized in Fig. 6, employs a 3D DenseNet121 classifier that uses concatenated 3D volumes and 3D fluid regions as input, predicting the corresponding group.

  • 3D merged: As depicted in Fig. 7, this method uses a 3D DenseNet121 classifier, taking an intercalated combination of 3D volumes and 3D fluid regions as input.

  • 3D features fusion: This strategy, outlined in Fig. 8, uses the previously trained classifiers to extract features from 3D volumes and 3D fluid regions. These features are then concatenated and a SVM classifier is used for final prediction.

  • 3D features fusion + 2D Biomarkers: For this approach, we extract the 2D Biomarkers and we concatenated it with the features obtained in the “3D features fusion” strategy. For the final prediction, we use the SVM classifier.

  • 3D features fusion + 3D Biomarkers: In this approach, we follow the same procedure as the previous one, but this time we replace the 2D biomarkers with their 3D counterparts in the concatenation process.

  • 3D features fusion + 2D Biomarkers + 3D Biomarkers: This strategy involves concatenating the features extracted from the "3D features fusion" approach with the 2D biomarkers calculated in "3D features fusion + 2D Biomarkers," and the 3D biomarkers obtained in "3D features fusion + 3D Biomarkers". The final prediction is made using a SVM classifier.

Each approach provides a unique perspective, enhancing our understanding of how varying forms of input data can impact the prediction of treatment response in OCT images.

Fig. 4
figure 4

3D volumes method pipeline

Fig. 5
figure 5

3D fluid regions method pipeline

Fig. 6
figure 6

3D concatenated method pipeline

Fig. 7
figure 7

3D merged method pipeline

Fig. 8
figure 8

3D features fusion method pipeline

Network architecture

DenseNet is a deep CNN architecture which has garnered popularity in the realm of medical image classification tasks, attributable to its unique design and exceptional performance. Introduced by Huang et al. in 2017 [44], it incorporates dense connections between layers, enabling each layer to receive the feature-maps of all preceding layers. This connectivity pattern promotes feature reuse, alleviates the vanishing gradient problem, and enhances the flow of information throughout the network. Specifically within the context of medical image classification, the dense connections of DenseNet facilitate efficient propagation of crucial features, empowering the network to discern intricate patterns and complex structures intrinsic to the images. Additionally, the compact and efficient design of DenseNet requires fewer parameters, making it an apt choice for medical image analysis tasks where data might be limited or imbalanced.

In this study, we have opted to utilize the 3D DenseNet-121 variant of DenseNet in four out of our five approaches (3D volumes, 3D fluid regions, 3D concatenated and 3D merged) for its proven efficiency and strong performance in handling 3D volumetric data, essential for the analysis and prediction tasks in our study.

Fig. 9
figure 9

Evolution of the CSCR fluid segmentation trained models during the 5 folds in terms of mean ± standard deviation. a Train & b validation Accuracy, and c train & d validation Loss

3D deep features extraction and classification

In the computational approach dubbed as “3D features fusion”, we employed 3D feature extraction and subsequent SVM classification [45], using our previously trained 3D DenseNet-121 models. These models facilitated extraction of deep features, which were then concatenated to fit and evaluate the SVM classifier.

The features were extracted from the output of the penultimate layer of the model, encapsulating the semantic information of the input images, hence providing more meaningful and compact representations compared to the raw pixel values. Two sets of 1,024 features were specifically obtained: one derived from the 3D volume and the other from the 3D CSCR fluid region. These two sets of features were concatenated post extraction, forming a comprehensive 2,048-feature vector.

Finally, this vector was fitted to a SVM classifier and its performance was evaluated. The choice of SVM classifier was motivated by its demonstrated effectiveness in medical imaging classification tasks [46,47,48,49].

Biomarkers Extraction

We obtained five different biomarkers from the 3D image, referred to as 3D Biomarkers. Additionally, we derived the same type of measurements from the 2D projection of the 3D image, dubbed as 2D Biomarkers. These five measurements were extracted for both 2D and 3D image modalities:

  • Area: The area of the CSCR fluid region in pixels.

  • Area of the bounding box: The area of the bounding box that encloses the CSCR fluid region in pixels.

  • Lenght of the major axis: The length of the major axis of the ellipse that has the same normalized second central moments as the CSCR fluid region.

  • Lenght of the minor axis: The minor axis length of the ellipse with the same the normalized second central moments as the CSCR fluid region.

  • Solidity: The ratio of pixels in the CSCR fluid region to pixels of the smallest convex polygon that encloses the CSCR fluid region.

Training details

Our initial step involved randomly dividing the dataset in accordance with a 5-fold cross-validation scheme, wherein for each fold, 80% of the cases were used for training and the remaining 20% for validation. It is important to note that the same proportion of samples from each group as in the complete dataset was maintained. The adjustment of network parameters (weights and biases) was achieved through the Adam optimization method [50], with an initial learning rate of 0.001. The training was conducted utilizing Cross-Entropy as the loss function, with class weights adapted to address class imbalance. The advantage of using loss weights for class imbalance includes training balance by assigning higher weights to the minority class, leading to improved model performance and reduced biases. It stabilizes training, preserves information in severely imbalanced datasets, and allows control over the influence of each class on the overall loss function.

Evaluation

In assessing the effectiveness of our models in predicting PDT responses, we relied on established benchmarks in the realm of classification tasks: precision, recall, F1-score, and accuracy metrics. These metrics are ubiquitously employed by state-of-the-art models when tackling similar problems. We used repeated measures ANOVA analysis and Tukey post hoc test alongside traditional evaluation metrics to thoroughly assess the statistical differences between the test results obtained by the different classification approaches. These statistical methods helped identify significant differences in classification outcomes between approaches, ensuring a comprehensive evaluation of the effectiveness and reliability of our method in the prediction of the Response to PDT.

Results and Discussion

This section covers our experimental setup and results designed to evaluate the proposed methodology. We have divided this section into three parts. First, we evaluate the 3D CSCR fluid segmentation process, and then we present the results of the PDT response analysis. Finally, we perform a comparison with existing literature.

Evaluation of the 3D CSCR Fluid Segmentation

In our initial experiment, we focused on evaluating the effectiveness of our proposed method for CSCR fluid segmentation. Additionally, we compared the performance of the nnU-Net architecture using four different configurations for both 2D and 3D segmentation. To conduct a robust assessment, we employed 5-fold cross-validation, ensuring reliable results. The progression during training and validation stages is depicted in Fig. 9. The outcomes reveal that, on the whole, all 3D configurations exhibit satisfactory CSCR fluid segmentation in OCT images. Notably, these configurations achieve stability from epoch 950, as evidenced by consistent mean and standard deviation values. However, the 2D nnU-Net approach demonstrated an interesting pattern. While it initially attains high training accuracy in the early epochs, it subsequently falls victim to overfitting. As the model grows increasingly complex, it starts to memorize noise and concrete features present in the training data, leading to a loss of its generalization capabilities.

Table 1 presents the test results obtained from 5-fold cross-validation, represented by mean values and their corresponding standard deviations. The evaluation aims to assess the performance of our proposed CSCR fluid segmentation system using different configurations.The results indicate that our proposed system achieves satisfactory performance for the full resolution configurations on the test subsets. Specifically, the 3D nnU-Net with full resolution achieves the best results, demonstrating robustness in fluid segmentation. The average Precision is \(0.8267\pm 0.2564\), the average Recall is \(0.7335\pm 0.2283\), the average Accuracy is \(0.9975\pm 0.0027\), the average Jaccard Index is \(0.6435\pm 0.2534\), and the average Dice Coefficient is \(0.7448\pm 0.2494\). In contrast, the 2D nnU-Net and 3D low-resolution configurations show lower performance metrics, likely due to the loss of 3D information and the loss of information during resolution reduction, respectively. These observations underscore the importance of leveraging the full resolution and 3D information for improved segmentation results in OCT images. We perform a repeated measures ANOVA analysis in order to analyse statistically for significant differences comparing the 4 architectures. We found significant differences at \(\alpha =0.05\) for all the metrics presenting all of them a \(p-value < 0.0001\), except Precision with a \(p-value=0.0311\). We also perform Tukey post hoc test, whose results can be found in Table 2. Figure 10 is an illustrative example of the resulting segmented CSCR region over the input image.

Table 1 Mean ± standard deviation of CSCR fluid segmentation test results for the 5 folds
Table 2 Tukey post hoc test results for pairwise comparison between the test results of the segmentation models. Comparisons with no statistically significant difference (\(p-value>=0.05\)) are represented by \(\approx\) and \(\Uparrow\) represent the situations where there is a statistically significant difference (\(p-value<0.05\))

Evaluation of the Prediction of the Response to PDT

This subsection focuses on evaluate the predictive performance of different approaches in determining the response to PDT in OCT images.

Fig. 10
figure 10

Representation of the segmented CSCR fluid region over the correspondent input image

Predictive analysis of Group 1 vs. Group 2 vs. Group 3

In this first strategy we have analyzed 216 cases, 100 belong to group 1, 66 to group 2 and 50 to group 3. Table 3 show the results of the different experiments developed. The best precision was found using 3D features fusion as input, achieving a mean accuracy value of \(0.6438\pm 0.0617\), as well as a \(0.7507\pm 0.0442\) group 1 precision value. Regarding the other methods, we observe that 3D merged performs worse than 3D volumes, so that this fusion of data is not appropriated for this task. We have performed a repeated measures ANOVA analysis with \(\alpha =0.05\). In this analysis we found significant differences between the results obtained by the different approaches with \(p-value 0.0078\) for all metrics, except for group 1 recall (\(p-value=0.5706\)), that no significant differences were found. These findings suggest notable variations in outcomes across different classification approaches. Moreover, Table 4 outlines the results of the Tukey post hoc test, illustrating significant differences between approaches incorporating 3D feature fusion and those that do not.

Table 3 PDT response prediction classification results for strategy 1 (G1 vs G2 vs G3)
Table 4 Tukey post hoc test results for pairwise comparison between the test results of the classification approaches for strategy 1 (G1 vs G2 vs G3). Comparisons with no statistically significant difference (\(p-value>=0.05\)) are represented by \(\approx\) and \(\Uparrow\) represent the situations where there is a statistically significant difference (\(p-value<0.05\))

Predictive analysis of Group 1 vs. Group 2 & Group 3

In this second strategy, we analyze 100 eyes of group 1 and 116 included in groups 2 and 3. The outcomes of this experiment are presented in Table 5. The best metric values were obtained with the 3D features fusion, reaching a mean accuracy value of \(0.7923\pm 0.0942\) and a F1-score for the groups 2 + 3 of \(0.8034\pm 0.0839\). For these strategy, performing the 3D merged fusion information improves slightly the results of the one input methods. We conducted a repeated measures ANOVA analysis (\(\alpha =0.05\)), revealing a \(p-value < 0.0001\) across all classification metrics except for precision for the group 1 (\(p-value = 0.0104\)) and precision for the groups 2 + 3, that is the only metric that do not showed significant differences with a \(p-value = 0.2370\). These findings underscore substantial disparities among the outcomes generated by the different classification methods. Furthermore, Table 6 presents the results of the Tukey post hoc test, emphasizing notable distinctions between approaches incorporating 3D feature fusion and those that do not, except for 3D-volumes.

Predictive analysis of Group 2 vs. Group 3

In this third strategy, we examined 66 volumes belonging to group 2 and 50 volumes included in group 3. The results obtained for this strategy are presented in Table 7. The mean accuracy value reached \(0.9141\pm 0.0608\), and the recall value for group 2 was \(0.9407\pm 0.0728\). This method outperforms the other ones by approximately a 20%. In this particular case, the fusion of the 3D volumes data results in a worse classification.We have performed a repeated measures ANOVA analysis (\(\alpha =0.05\)), in this analysis we obtain a \(p-value < 0.0001\) for all the classification metrics, except a \(p-value = 0.0044\) for group 2 recall. These results showed that there are significant differences between the results obtained by the different classification approaches. In adition, in Table 8 we show the results of the Tukey post hoc test, that highlights that significant differences are found between the approaches that involve 3D features fusion and the ones that do not involve them.

Table 5 PDT response prediction classification results for strategy 2 (G1 vs (G2 + G3))
Table 6 Tukey post hoc test results for pairwise comparison between the test results of the classification approaches for strategy 2 (G1 vs (G2 + G3)). Comparisons with no statistically significant difference (\(p-value>=0.05\)) are represented by \(\approx\) and \(\Uparrow\) represent the situations where there is a statistically significant difference (\(p-value<0.05\))
Table 7 PDT response prediction classification results for strategy 3 (G2 vs G3)
Table 8 Tukey post hoc test results for pairwise comparison between the test results of the classification approaches for strategy 3 (G2 vs G3). Comparisons with no statistically significant difference (\(p-value>=0.05\)) are represented by \(\approx\) and \(\Uparrow\) represent the situations where there is a statistically significant difference (\(p-value<0.05\))

Comparison with Existing Literature

Given the absence of a publicly accessible labelled 3D CSCR dataset, a direct comparison with other methodologies is challenging. However, our proposed approach has shown encouraging results, aligning and in some instances surpassing, performance metrics set by state-of-the-art methodologies for similar tasks. These studies are discussed in Section 1.1.

With respect to PDT response prediction, the only study for comparison currently is Fernández-Vigo et al. [32]. As indicated in Table 9, our methodology has shown significant improvements over the Fernández-Vigo study in terms of accuracy, with an increase of 20.75%, 17.91%, and 36.76% for strategies 1, 2, and 3, respectively. Moreover, our method exhibits superior performance across all groups and strategies, as reflected by higher metrics. In addition, our method demonstrates lower standard deviation values, suggesting a potential for more stable and robust predictions. This improved performance can be primarily attributed to the efficient use of the 3D features of the image and the 3D fluid region generated by our method.

The primary contributions of this work are multifaceted, bridging DL techniques and the clinical field of retinal disorder diagnosis and management. Our efforts push the boundaries of current practice in automated OCT analysis, paving the way for improved patient outcomes and more impactful research studies. They are summarized as follows:

  • 3D CSCR Fluid Segmentation: This research presents a cutting-edge DL methodology that leverages the 3D context present in OCT scans. By comprehending the volumetric attributes, the model facilitates a more refined segmentation of fluid regions, thereby enhancing the granularity of the derived insights.

  • Prediction of the Response to PDT: Our model transcends traditional segmentation boundaries by using pre-treatment OCT images to foresee PDT response in chronic CSCR patients. This forward-looking feature could catalyze a shift in patient management, paving the way for bespoke treatment plans and the optimal realization of patient outcomes.

  • Clinical and Research Utility: The outcomes of this study bear implications that extend beyond academia, impacting real-world clinical practices. By decreasing the time and effort required for segmentation, offering standardization, and facilitating extensive research studies, our model could significantly bolster the quality of patient care and propel research in the CSCR domain.

Conclusions

The development and application of a reliable, automated system for the segmentation of fluid regions in OCT scans are paramount to the diagnosis, treatment, and research progression of CSCR. This study encapsulates a comprehensive exploration of this domain by leveraging state-of-the-art advancements in DL and AI.

The system proposed herein, built upon a 3D end-to-end fully convolutional architecture, has manifested encouraging results in the accurate segmentation of fluid regions in OCT scans. Upon assessing various network configurations, it was revealed that the 3D nnU-Net with full resolution rendered the most superior performance. This underscores the importance of exploiting the 3D information inherent in the scans to enhance segmentation results. The proposed system’s capacity to automate the segmentation process has far-reaching implications, including significant reductions in analysis time and effort, as well as enabling a more efficient and consistent processing of large datasets.

Table 9 Comparison of our top-performing approach for PDT response prediction with existing literature

Furthermore, our work extends to the prediction of treatment responses before the administration of PDT. This aspect is crucial as it opens the door to personalized medicine, enabling treatment plans to be tailored based on individual patient characteristics, thus optimizing effectiveness while minimizing adverse effects. An early prediction approach also contributes to the avoidance of unnecessary treatments, thereby saving valuable resources.

Our PDT response analysis yielded results that underlined the effectiveness of our proposed methodologies, significantly outperforming the established baselines. Particularly, our 3D feature fusion approach demonstrated substantial improvements in both accuracy and precision. The capability to accurately predict treatment responses allows clinicians to strategically optimize treatment plans and closely monitor patients’ progress, thus contributing to enhanced patient outcomes and quality of life.

Overall, this research delivers significant contributions to the field of ophthalmology, presenting an accurate and efficient system for fluid segmentation and treatment response prediction in CSCR patients. The system proposed offers multiple advantages over manual segmentation, notably increased speed, consistency, and accuracy. It harbors the potential to bring about a paradigm shift in the diagnosis and treatment of CSCR, ultimately leading to improved patient outcomes and aiding the furtherance of research in the field.

There are some limitations to this study though. While the usage of OCT technique has effectively revealed detailed aspects of retinal pathology, we acknowledge that expanding the range of imaging modalities could enrich our understanding. Including techniques such as fundus photography, OCT-A and fluorescein angiography in future studies would allow for a broader evaluation of disease mechanisms and enhance the comprehensiveness of our findings. Additionally, the three-month follow-up period utilized in our study to assess the PDT response effectively allowed us to evaluate its immediate effects. However, to more fully understand the long-term impacts and stability of these treatments, future research should consider extending this follow-up period. Such extensions would provide valuable insights into the long-term efficacy of treatments and potential recurrences.

Future works could look to integrate larger and more diverse datasets to amplify the system’s robustness and generalizability. Additionally, the incorporation of specific biomarkers and other relevant patient information could be explored to analyze their relation with the patient’s response to treatment.