FormalPara Key Summary Points

Diverse Applications of AI in Retinal Imaging: This review emphasizes the use of artificial intelligence (AI) models in analyzing retinal fundus photographs for diagnosing both ophthalmic disorders as well as non-ophthalmic conditions.

Enhanced Accuracy and Efficiency: This review found that AI algorithms, compared to clinical data and physician experts, represents an innovative solution with demonstrated superior accuracy in identifying ophthalmic and non-ophthalmic disorders.

Transformational Impact on Healthcare: AI has the potential to transform healthcare by improving accuracy, speed, and workflow, lowering cost, increasing access, reducing mistakes, and transforming healthcare worker education and training.

Introduction

The integration of artificial intelligence (AI) in retina-based healthcare has the potential to revolutionize medical diagnoses and treatments by providing faster and more accurate diagnoses, reduced medical errors, and increased efficiency [1]. The use of AI has already been shown to be successful in well-defined clinical tasks, such as dermatology, radiology, and oncology [2, 3]. Machine learning (ML) and deep learning (DL) are key AI technologies revolutionizing data analysis and decision-making. ML enables algorithms to learn from data and make predictions, improving with experience, while DL, a subset of ML, uses layered neural networks to automatically learn complex data relationships. These methods are crucial in healthcare, particularly in diagnosing eye and systemic diseases through retinal image analysis. Retinal imaging, including retinal fundus photography and optical coherence tomography (OCT), is non-invasive and provides real-time methods to investigate the integrity of the visual system. DL approaches have been shown to be highly accurate in disease classification. Moreover, not only can they assist ophthalmologists in making accurate diagnoses but they can also help decision-making in non-ophthalmic disorders, such as dementia and kidney disease [4,5,6,7]. Of the different ophthalmic imaging approaches, retinal fundus photography has the highest potential for clinical translation and adoption, largely due to its ubiquity in ophthalmology, optometry, and primary care settings, as well as its lower cost and ease of use.

We conducted a systematic review on the use of retina photo-based AI in evaluating ocular and non-ocular conditions. This article covers the current and future landscape of AI in addressing such issues as current research challenges and future perspectives (Fig. 1).

Fig. 1
figure 1

Overview of applications for retina photo-based artificial intelligence (AI) algorithms: 1. demographic/ medical data; 2. ophthalmic disorders; 3. non-ophthalmic disorders

Methods

This article is based on previously conducted studies and does not contain any new studies with human participants or animals performed by any of the authors. To adhere to the Systematic Reviews and Meta-Analyses (PRISMA) guidelines and ensure a comprehensive and structured systematic review, the following methodology was employed:

Study Selection and Search Strategy

A meticulous literature search was conducted to identify relevant articles. A comprehensive set of keywords was utilized, combining both medical and machine learning (ML) terms. The medical group of keywords encompassed terms pertaining to ophthalmology, retinas, and both ophthalmic and non-ophthalmic disorders. The ML group included such terms as “artificial intelligence algorithms” and “deep learning.” This search was conducted in the PubMed database, supplemented by Google Scholar to ensure inclusiveness. We searched using the following keywords: (“Retinal photo” OR “Retina” [Mesh] OR “Retinal Diseases” [Mesh]), (“Deep Learning” OR “Artificial Intelligence” OR “Machine Learning” OR “Neural Network” OR “Artificial Intelligence” [Mesh] OR "Deep Learning” [Mesh]), and (“Medicine” [Mesh] OR “Medical Informatics” [Mesh]).

Inclusion and Exclusion Criteria

The articles included had to meet specific criteria. The publication period ranged from January 2016 to June 2023, ensuring the most contemporary research. A total of 1562 articles were initially identified through titles and abstracts. The focus was primarily on the quality of research and the application of AI. For historical context, articles published before 2016 and those related to closely aligned topics were also incorporated.

Language Considerations

Articles predominantly published in English were given thorough consideration, accounting for 520 of the papers. We also reviewed a total of 359 papers predominantly published in Chinese. For articles in other languages, including French, Spanish, and German, only the abstracts were evaluated. This multilingual approach ensured a comprehensive assessment of the literature landscape.

Data Extraction and Analysis

Following the meticulous selection process, relevant data from the included articles were extracted and analyzed. Key themes, trends, advancements, and challenges related to retina photo-based AI algorithms in medicine were systematically synthesized.

In accordance with PRISMA guidelines, this systematic review employed a structured and rigorous approach, encompassing a comprehensive literature search, thorough inclusion criteria, language considerations, and meticulous data extraction. Two of the reviewers (Kai Jun and Juan Ye) independently screened 520 studies for eligibility. Any disagreements were resolved by discussions with the third author (Andrzej Grzybowski). This approach ensured the synthesis of a comprehensive overview of the current state of research in the domain of retina photo-based AI algorithms in medicine. Finally, a total of 120 papers were selected as suitable for further analysis.

Present Applications for Retina Photo-Based AI Algorithms

As an extension of the central nervous system, the retina has been used as a window into the body for diagnostic purposes. This can be illustrated by the term “cerebroscopy,” coined by Xavier Galezowski in the 19th century to describe how early ophthalmoscopy of the retina (and mostly the optic disc) can be used to detect certain brain disorders [4]. Soon after the introduction of optical coherent tomography (OCT) of the retina, it was shown that this technology could be used to monitor the progression of many neurodegenerative disorders, including multiple sclerosis, Alzheimer’s disease (AD) and Parkinson’s disease (PD). Presently, the list of disorders is much longer and includes strokes [5]. It was, however, believed that only OCT scans have the necessary features, including retinal nerve fiber layer (RNFL) and ganglion cell layer (GCL) thickness, which allow for the monitoring of CNS disorders. Recent studies using fundus-photo-trained AI algorithms have shown that fundus pictures can also be used for this purpose. Fundus pictures are far less expensive and their implementation is simpler than OCT scans, thus making them a very promising approach for screening and early disease detection.

The eye—particularly, the retina and conjunctiva—is also the only part of the human body where blood vessels can be directly and non-invasively visualized and analyzed using optical methods. Accordingly, since the 19th century, physicians have performed fundus examinations in patients with hypertension to determine the presence and severity of retinal vascular damage as a means to estimate the risk of cardiovascular disease (CVD) [6, 7]. This is based on an assumption that changes observed in the retinal vasculature, such as the degree of retinal arteriolar narrowing, reflect vascular disease in peripheral, cerebral, and coronary blood vessels [9,10,11,12,13,14], and may therefore be a marker of CVD risk [15]. In recent years, increasing interest in retinal vascular imaging, coupled with support from technological developments, has allowed for the derivation of accurate retinal vascular metrics, including vessel caliber, tortuosity, branching angle, and retinal fractal dimension [16, 17].

Several AI techniques significantly enhance the accuracy and efficiency of medical image analysis. Convolutional neural networks (CNNs) are a cornerstone of our approach, renowned for their capability in image recognition tasks. CNNs automate the intricate process of extracting features from images, a crucial step in interpreting complex retinal photos. Vision transformers (ViT) represent another layer of our strategy, drawing from their success in various image-related applications. ViTs excel in dissecting spatial hierarchies within images, allowing for a detailed examination of retinal images to identify early signs of diabetic retinopathy (DR). Self-supervised learning models, which utilize unlabeled data for initial training, enhance the model’s ability to recognize diverse visual features without direct human annotation. This method is particularly valuable for pre-training models on extensive datasets, ensuring our system is adept at identifying subtle indicators of disease progression. These AI methodologies, explained in our study, underscore our commitment to harnessing the latest in technology to improve ophthalmic care.

The recent development of ML and DL has shown potential for automatic analysis and the quantification of retinal vascular biomarkers to predict cardiovascular risk factors and vascular systemic events [18]. This led to the development of the concept of oculomics, in which ocular data can be used as biomarkers of systemic diseases [19]. With advancements in computer vision technology, retina photos can be used to identify and track various ocular (Table 1) and systemic diseases (Table 2, Fig. 2), as well as provide demographic information. AI algorithms can analyze large datasets of retina photos to extract meaningful information and patterns that can aid in medical decision-making and diagnosis.

Table 1 Fundus photo-based AI-related studies in ophthalmic disorders
Table 2 Fundus photo-based AI-related studies in non-ophthalmic disorders
Fig. 2
figure 2

PRISMA 2020 flow diagram for the systematic reviews that included searches of databases and registers

Retina Photo-Based AI in Evaluating Demographic and Medical Data

Retina photo-based AI has shown promise in evaluating demographic and medical data. Furthermore, AI has showcased its potential in detecting various demographic and medical characteristics, including age, sex, smoking status, body mass index (BMI), blood pressure, race, and HbA1c levels. Using ML algorithms, comprehensive datasets of health parameters and such associated features as retinal images, electronic health records, and biological markers can be analyzed in an automated, accurate, and efficient manner. For example, AI analysis of retinal images can provide information on age, sex, and race [8,9,10]. Similarly, AI algorithms trained on electronic health records can detect patterns and predict 47 systemic biomarkers as outcome variables, including BMI, blood pressure, and HbA1c levels [10]. The ability to accurately detect these characteristics can aid in early diagnosis, risk stratification, and personalized medicine approaches. However, as with all AI applications, the quality and accuracy of the data used for training these algorithms is crucial for their success. Additionally, ethical considerations, such as privacy and transparency, must be carefully addressed to avoid unintended (and undesirable) consequences. Recently, AI has been shown to estimate best-corrected visual acuity (BCVA) directly from fundus photographs in patients with diabetic macular edema (DME), which can help clinicians manage the condition by reducing the personnel needed for refraction [11]. Realizing the potential of AI in clinical care requires ongoing work, including in studies that focus on identifying and addressing the limitations of AI algorithms. AI algorithms should also be designed to perform well across a wide range of different disease severities, while also ensuring consistency of results among people from diverse populations. These principles are crucial not only for the development of AI algorithms in ophthalmology but also for AI’s broader application in medicine [12].

Retina Photo-Based AI in Ophthalmic Disorders

Diabetic Retinopathy

Past methods of diabetic retinopathy (DR) screening were manual and time-consuming, resulting in delayed diagnosis and treatment. The motivation for using AI in DR screening is to lower costs, improve accuracy, and increase patient access to screening. Various AI-based screening tools have been developed, such as Retmarker DR, Eye Art, and the Bosch DR algorithm, which use convolutional neural networks to analyze retinal images for DR detection and progression tracking [13]. These AI-based methods have demonstrated high sensitivity and specificity in DR detection. The AI DR Screening system has demonstrated high diagnostic accuracy and sensitivity in detecting DR, making it a promising tool for high-volume DR screenings [14]. Recently, the AI algorithm achieved an accuracy of 80.79–93.34% in generating image quality, location, laterality of eye, phase, and five types of DR-related lesions [15]. However, effective legislation is needed to ensure the safety and effectiveness of AI-based medical devices [16]. More research is needed to understand the long-term implications of AI in healthcare.

Diabetic Macular Edema

DL models have the ability to provide an accurate and efficient system for diabetic macular edema (DME) screening and diagnosis [17]. Their use in modality processing leads to a significant increase in sensitivity and specificity values [18]. The use of decision AI-based support systems and applications in the processing of retinal images increases the sensitivity and specificity in the screening and detection of DME [19, 20]. A novel approach to automated screening of clinically significant macular edema (CSME) improves DME screening accuracy by addressing class imbalance, and replacing exudate segmentation with a pre-trained deep neural network and meta-heuristic feature selection [21]. A mobile smartphone-based AI system developed to screen for DME, which proposes future integration with e-cloud technology to establish an AI-based telemedicine remote medical application, can achieve a diagnostic accuracy of 90.06% with fewer equipment requirements [22]. The deep learning system (DLS) was validated on multiple datasets from four countries, and demonstrated a higher specificity and non-inferior sensitivity compared to human experts [23].

Retinal Venous Occlusion

AI algorithms can improve the diagnosis and management of retinal vein occlusion (RVO) early in life, and prevent vision impairment, especially in areas with limited access to retinal specialists [18, 24]. The DL-based model for the early identification and treatment of RVO using color fundus photographs (CFPs) has demonstrated a positive performance in recognizing the condition and identifying lesions [25]. AI models can accurately detect RVO in ultra-widefield fundus images (UWF), which may be useful in standardizing the diagnosis and management of RVO [26]. Interpretability refers to a system’s ability to provide clear and understandable explanations for its decisions, which is important for gaining trust from medical professionals and patients. The lesion-level dissection approach improves the accuracy of the RVO and other eye-disease screening systems by analyzing fine-grained features of UWF fundus images [27]. Identifying non-perfusion areas in retinal images is significant because it can help diagnose and guide treatment for various retinal diseases, such as branchial retinal venous occlusion (BRVO). The DL models could be used to automatically identify non-perfusion areas in retinal images, which would both save time and improve diagnostic accuracy for ophthalmologists [28]. Additionally, the models could be used to guide laser photocoagulation treatment for patients with BRVO by identifying the precise location of non-perfusion areas that require treatment. Anti-VEGF injections are the first-line treatment for RVO-induced macular edema (ME), while steroid implants and subthreshold micro-pulse laser therapy can be viable alternatives. There is a need for more studies to help identify the ideal treatment methods and regimens for RVO-induced ME. The potential of imaging biomarkers and AI in ME treatment also requires further research and development [29].

Age-related Macular Degeneration

The CFP-based DL algorithm was able to fully automatically diagnose and grade age-related macular degeneration (AMD) with an accuracy comparable to ophthalmologists. When compared with previous methods on the binary classification task, the DL algorithm demonstrated greater prospects for application in clinical practice [30]. The DeepSeeNet algorithm outperformed retinal specialists in patient-based classification (accuracy = 0.671, kappa = 0.558; accuracy = 0.599, kappa = 0.467, respectively), with a high area under the curve (AUC) in detecting large drusen (0.94), pigment abnormalities (0.93), and late AMD (0.97). DeepSeeNet demonstrated high accuracy and greater transparency in automatically assigning individual patients to AMD risk categories based on the simplified AREDS severity scale [31]. To improve the detection of choroidal neovascularization (CNV) activity in neovascular age-related macular degeneration (nAMD), a multimodal DL model that uses feature-level fusion (FLF) to combine OCT and optical coherence tomography angiography (OCTA) images achieved high AUC values, showing similar or superior performance when compared to human experts, leading to more effective disease management for patients with AMD [32].

Glaucoma

The traditional method for diagnosing glaucoma is subjective and requires significant clinical training, while the agreement between clinicians is relatively limited. DL-based assessment of fundus images could be useful in clinical decision support systems, as well as the automation of large-scale glaucoma detection and screening programs [33]. A large dataset of 241,032 retinal fundus images was collected, which were then used to develop and validate the DL model. Results show that the performance of DLS diagnosis varied across data sources, highlighting the importance of testing and validation using diverse populations [34]. AI-based DL algorithms for optic nerve disease detection have shown excellent performance in differentiating glaucoma from non-glaucoma on color fundus photographs [35]. One model, the ResNet-50 for glaucoma, showed a sensitivity of 93.4% and a specificity of 81.8%. The area under the recall precision curve showed an average precision value of 0.874. Fan found that vision transformers have the potential to improve generalizability and explainability in DL models for detecting not only eye disease but also other potential medical conditions that rely on imaging for clinical diagnosis and management [36]. Huang et al. proposed a fine-grained DL system for universal glaucoma visual field (VF) grading based on a novel standard with different data patterns and solid results, achieving potential guidance for the clinical management of glaucoma [37]. The Ocular Hypertension Treatment Study (OHTS) is a clinical trial that investigated the effectiveness of reducing intraocular pressure in delaying or preventing the onset of primary open-angle glaucoma (POAG). One study found that DL models using OHTS photographs had high diagnostic accuracy for detecting POAG, suggesting that it has the potential to automate end-point determination and improve the efficiency of POAG clinical trials [38].

Retinopathy of Prematurity

Recent findings suggest that there has been a shift toward using DL for AI applications in retinopathy of prematurity (ROP). While several AI approaches have demonstrated proof-of-concept performance in research studies, the next steps focused on determining whether these algorithms are sufficiently robust for handling variable clinical and technical parameters in practice [39]. One study showed that the algorithm used showed an AUC of 0.98, with 100% sensitivity and 78% specificity. Incorporating AI into ROP screening programs may lead to improved access to care for ROP secondary prevention [40]. In the future, potential clinical workflows for AI in ROP may include autonomous ROP screening, and AI-assisted management decisions and image acquisition. AI and ML can improve the accuracy and efficiency of ROP diagnosis by analyzing large amounts of retinal images and identifying patterns that may indicate the presence or severity of ROP [41]. This could increase the accuracy of clinicians’ diagnoses, reduce inter-observer variability, and prioritize high-risk cases for timely treatment [42]. However, there are also potential challenges and limitations to using AI in pediatric retina diagnosis. For example, AI algorithms may not be able to detect the subtle changes in retinal images that experienced human clinicians can identify. Additionally, there may be concerns about the ethical implications of relying solely on AI for medical decision-making. Further research is needed to address these issues and ensure that AI is used safely and effectively in clinical practice.

Myopic Macular Degeneration

DL algorithms for myopic maculopathy can be developed in order to determine whether eyes with pathologic myopia can be identified and whether each type of myopic maculopathy lesion on fundus photographs can be diagnosed by such algorithms [43]. Some potential benefits of early detection and diagnosis of pathologic myopia include better patient outcomes, improved treatment options, and reduced healthcare costs. Some potential challenges or limitations to using AI in addressing myopia include the need for sustainable and cost-effective implementation into clinical practice and ensuring consistent algorithm performance in the clinical setting with generalizability across multiple ethnicities [44]. Researchers are working to overcome these challenges by developing accurate training and validation methods for AI algorithms across diverse datasets and improving image-capturing techniques for highly myopic eyes. One of the tested algorithms showed a diagnostic performance with AUC of 0.969 (95% confidence interval [CI] 0.959–0.977) or higher for myopic macular degeneration (MMD) and 0.913 (0.906–0.920) or higher for severe myopia. In a randomly selected dataset, DL algorithms outperformed all six expert evaluators in detecting each condition (AUC 0.978 [0.957–0.994] for MMD and 0.973 [0.941–0.995] for high myopia. This study indicates that DL algorithms can be effective tools for risk stratification and the screening of MMD and high myopia among the large global myopia population [45].

Refractive Errors

ML has been shown to be a useful adjunctive for refractive errors prediction and biometry for cataract surgery in highly myopic individuals [46]. One of the algorithms tested showed a mean absolute error (MAE) of 0.56 diopters (95% CI 0.55–0.56) when estimating the spherical equivalent on the UK Biobank dataset and 0.91 diopters (95% CI 0.89–0.93) for the AREDS dataset. The baseline expected MAE (derived by simply predicting the mean for this population) was 1.81 diopters (95% CI 1.79–1.84) for UK Biobank and 1.63 (95% CI 1.60–1.67) for AREDS [47]. By using DL to extract novel information from retinal fundus images, we may be able to improve our ability to diagnose and treat vision problems more quickly and accurately than ever before. This could lead to improved patient outcomes and a more efficient healthcare system overall.

Macular Hole

The significance of developing a DL system for macular hole classification lies in providing an automatic and reliable method for identifying the etiology of a macular hole (idiopathic or secondary), and predicting its status (closed or open) after vitrectomy and internal limiting membrane peeling. The system developed by Yu Xiao and his team achieved high accuracy in both classification tasks, with an AUC of 0.96 for etiology classification and 0.93 for status prediction at 1 month after surgery [48]. In Son’s study, the algorithms showed a consistently reliable performance for macular hole. It should be mentioned that the performance of the algorithms rivaled that of the retina experts, raising the possibility of their clinical use in primary screening situations with low access to specialists. The DL algorithm used in this study was found to be comparable, or even superior, to traditional methods of screening for retinal diseases, including macular holes. Indeed, the algorithm achieved a sensitivity of 89.8% in detecting any of ten retinal abnormalities, and reached a superior or similar diagnostic sensitivity compared to senior retinal specialists in detecting seven of ten retinal diseases [49]. The potential implications of these studies for the field of ophthalmology are significant in that DL algorithms have been shown to provide non-invasive and efficient methods for diagnosing and predicting outcomes of macular hole treatment.

Epiretinal Membrane

AI has been widely applied in the healthcare industry due to its robust and significant performance in detecting various diseases [49]. These DL models could be employed in clinical settings so as to improve the efficiency and accuracy of retinal screenings in situations where there is little access to specialists [50]. Shao et al. validated the use of a previously trained deep neural network based-AI model in epiretinal membrane (ERM) detection based on CFP [51]. The results showed that AI can be a reliable and cost-effective alternative to OCT for ERM detection.

Optic Disc Edema

The use of AI to detect these abnormalities from fundus photographs has received insufficient scholarly attention. One DL system was trained on ocular fundus photographs and validated using a dataset of 1000 fundus photographs with known diagnoses [52]. The study found that the DL algorithm had high sensitivity and specificity for discriminating between papilledema and normal optic nerves. DL algorithms can analyze fundus retinal images and identify patterns associated with papilledema, allowing for automatic detection without the need for human interpretation [39]. The automation of image quality screening has the potential to improve the efficiency and accuracy of the diagnosis and treatment of neuro-ophthalmic disorders [53]. While these technologies can also improve the accuracy and efficiency of papilledema diagnoses, there may be limitations in terms of generalizability to different populations or imaging techniques.

Optic Nerve Atrophy

DL technology assists in the diagnosis of fundus optic neuropathy by using a fully automated neural network algorithm for assistant diagnosis [54]. The potential benefits of using DL for such diagnoses include the early detection of these optic neuropathies, alleviating the issue of lack of medical resources, and providing interpretability for neural networks [54,55,56]. The accurate segmentation of the optic disc in fundus images is significant for the early detection and quantitative diagnosis of optic atrophy [53]. The extensive experiments on collected images and six public datasets demonstrated that the proposed coarse-to-fine DL framework can achieve a relatively reliable performance.

Central Serous Retinopathy

DL technology improves the accuracy of central serous chorioretinopathy (CSC) assessment on fundus photographs by analyzing large quantities of data and identifying patterns that may not be visible to the human eye [57]. In one study, researchers examined 400 eyes to measure choroidal thickness and vascular density, with promising results for the diagnosing of CSC [58]. Aoyama’s study showed that DL can be used to accurately diagnose typical CSC eyes based on the choroidal vascular pattern, and that DL was able to identify with certainty some findings that were previously thought to be ambiguous [59]. The DL algorithm was found to be more accurate than traditional methods of detecting leakage points in fundus fluorescein angiography (FFA)images, which takes advantage of temporal information to automatically detect leakage points. This can both save time and improve accuracy compared to manual detection methods [60].

Retinitis Pigmentosa

Early detection of retinitis pigmentosa (RP) is important in that it can help patients seek further consultation and potential treatments, and assist with family planning. The DL model has shown promising results in detecting RP from color fundus images (CFP), achieving an area under the receiver operating characteristic curve (AUROC) of 0.96 [61]. Automated ML has the potential to improve the accuracy and efficiency of diagnosing RP, which could lead to earlier detection and treatment [26]. Researchers have recently begun developing new methods for collecting and sharing large datasets of retinal images, as well as developing more transparent and interpretable AI algorithms.

Cataract

DL algorithms that can detect visually significant cataracts based on retinal photographs may be an alternate approach to the conventional slit-lamp-based examinations for cataract diagnosis. To ensure accurate cataract grading, the algorithm employs several key features and techniques. An important method here is the extraction of textural features from the fundus images, which are then used for image classification. The accuracy of six-level grading achieved by the proposed method can reach 92.6 [62]. The GLA-Net model was specifically designed for cataract classification, and the approach of integrating global and local features could potentially be applied to other ophthalmic diseases or medical imaging tasks [63]. One study has proposed automatic cataract detection and grading methods using improved Haar features, visible structure features, and neural networks with discrete state transition (DST) or exponential DST (EDST), with EDST-MLP achieving the highest grading accuracy [64]. Additionally, an automated DL algorithm that can detect visually-significant cataracts based on retinal photographs may help to address the issue of limited reach and screening capacity in rural communities. The algorithm achieved the highest AUROC in 95.5%, at a specificity level of 80% and a sensitivity level of 95.6% [65].

Retina Photo-Based AI in Non-ophthalmic Disorders

Alzheimer’s Disease

Retinal imaging could assume a highly important predictive role for patients with mild cognitive impairment, of which only a fraction will eventually develop AD [66]. This is due to the retina sharing many similarities with the brain and its ability to provide early signs of AD-related changes. Through analyzing retinal images and metadata from the CLSA (Canadian Longitudinal Study on Aging) database, which are easily accessible and can be obtained without invasive procedures, Corbin et al. found that the DL model was able to predict cognitive scores with high accuracy [67]. It has been found to be related to AD, as changes in macular microvascular density have been observed in individuals with AD compared to healthy control samples [68]. The development of AI, coupled with the utility of retinal imaging, could potentially be the answer to this difficult conundrum [69]. Using eye-based ML models for dementia identification has several advantages, including low cost, easy acquisition, and the non-invasive feature, which make retinal tests suitable for large-scale population screening and investigations of preclinical AD [70]. In one study, in an internal validation dataset, a DL model was found to have an accuracy of 83.6% (SD 2.5), a sensitivity of 93.2% (SD 2.2), a specificity of 82.0% (SD 3.1), and an AUROC of 0.93 (0.01) for the detection of AD. This (along with other research) proves that a DL algorithm based on retinal photography can detect AD with high accuracy, showing its potential for screening in social settings [6].

Parkinson’s Disease

Retinal age gap has the potential to identify individuals at a high risk of developing future PD. One study used a DL algorithm to accurately predict retinal age from fundus images and found that retinal age gap was independently associated with the incident PD [71]. The study adjusted for a wide array of confounding factors in the final model, suggesting that retinal age gap is a true biomarker for PD, independent from other known factors. Sangil et al. offered a noninvasive and readily available tool for predicting Hoehn and Yahr scale and the Unified Parkinson’s Disease Rating Scale Part III (UPDRS-III) score using fundus photography. The algorithm had high sensitivity above 80% for both the Hoehn and Yahr and UPDRS-III scores assessment, but a lower specificity ranging from 66 to 67%. This suggests DL’s potential to analyze fundus photographs with demographic characteristics to assess neurologic dysfunction in patients with PD. While further research is needed to determine whether this biomarker can be used for early detection and prevention, it shows promise as a potential large-scale screening tool that could be further empowered by incorporating smartphone-based teleophthalmology assessments.

Stroke Risk

The importance of eye fundus changes in the diagnosis of stroke risk has long been established [72]. Studies have found that various retinal signs and diseases are associated with cerebrovascular disease, including traditional hypertensive retinopathy signs, clinical retinal diseases, and retinal vascular imaging measures (e.g., retinal vessel diameters and geometry characteristics) [73]. The predictions made using retinal imaging were found to have fair sensitivity and specificity in identifying patients with poor and good pial collaterals. While this method may not be as accurate as others (e.g., CT or MRI), it has the potential to be a non-invasive and cost-effective tool for assessing stroke severity and predicting patient outcomes [74]. While technological advances have allowed for subtler abnormalities to be detected through AI-based DL technologies, there may still be limitations to their accuracy and reliability. Recent explorations of retinal vessel imaging tools for conceivable early cognitive decline detection have been provoked by the ties between cerebrovascular disorders and dementia [75]. One study has suggested that there may be a link between stroke and ocular health, which could have important implications for stroke prevention and treatment strategies [76]. The ML method for localizing cerebral white matter hyperintensities in healthy adults based on retinal images showed good overall performance in detecting age-related white matter changes and cerebral small vessel disease [77]. The DL model achieved an accuracy of 85% in distinguishing the CVD and control groups, which is comparable to such traditional methods as blood tests and electrocardiograms [78]. Using DL for stroke diagnosis can reduce costs and increase accessibility, as it does not require invasive procedures or specialized equipment.

Cardiovascular Disease

Retinal imaging provides a unique opportunity to non-invasively assess vascular structure and function, vessel features, and microcirculation within the retina. Mounting evidence suggests that retinal vessel calibers, microvascular features, and vascular characteristics extracted from various imaging modalities are associated with alterations in left ventricular structure and function in stage B heart failure (HF), as well as incident development of symptomatic HF in the general population [79]. AI can be used to predict systemic parameters and diseases from ophthalmic imaging by analyzing large datasets of images and identifying patterns associated with specific conditions or parameters [80, 81]. AI-based retinal microvasculature analysis can supplement existing cardiovascular disease (CVD) risk stratification approaches [82, 83]. It can predict both direct cardiovascular events (e.g., CVD mortality) and risk factors (e.g., blood pressure and diabetes) [84].

The DL-funduscopic atherosclerosis score (FAS) was developed and validated in this study using a DL model. The association between cardiovascular mortality and DL-FAS was also explored, and it was found that DL-FAS added value to the prediction of cardiovascular death above that of the Framingham Risk Score [85]. One study suggested that AI-enabled retinal vasculometry can be used as a tool with which to more accurately predict circulatory mortality, myocardial infarction, and stroke than existing risk algorithms [86]. The SIVA-DLS system was found to be effective in the assessment of CVD based on DL systems for analyzing retinal photographs [87]. Retinal age gap could serve as a fundamentally salient biomarker for monitoring and detecting end-organ damage in CVD and, specifically, for the early detection of vascular cognitive impairment [88].

Hypertension

Hypertensive retinopathy (HR) is the most common ocular manifestation of systemic arterial hypertension. AI can provide a faster and more objective HR diagnosis and develop a new instrumental classification of HR [89]. Large-scale epidemiological studies have reported the association of retinopathy signs with hypertensive end-organ damage, including stroke, dementia, and coronary heart disease. Major technological advances in ocular imaging techniques have improved the assessment of retinal microcirculation, offering a unique opportunity to further our understanding of eye–body relationships, and support the development of novel diagnostic and prognostic tools through noninvasive means [90]. The potential benefits of using oculomics for noninvasive examinations in ophthalmology include improved diagnosis and the monitoring of systemic diseases, such as hypertension and diabetes. Zhang’s study introduced DL approaches for predicting CVD risk factors using retinal fundus images and achieved an accuracy of 68.8% in detecting hypertension [91].

Chronic Kidney Disease

One study showed that the DL system has an AUC of 0.938 in detecting chronic kidney disease (CKD) using retinal images from multiethnic populations [7]. The system’s performance was compared to that of human doctors, and was found to be comparable or even superior in some cases. The potential implication is that AI-based algorithms could be integrated into retinal cameras to serve as a complementary community- or primary care-based model for CKD screening, which has traditionally been reliant only on serum creatinine and estimated glomerular filtration rate. Another study found that AI-based analysis of retinal images can accurately predict CKD risk factors, such as age, gender, smoking status, and blood pressure[92]. The study demonstrated the potential of AI-based smartphone diagnostic systems to broaden access to healthcare and encourage self-monitoring. In the case of predicting systemic biomarkers from retinal photographs, DL algorithms can analyze such features as blood vessels and optic disc morphology to identify biomarkers associated with various systemic diseases, which could be used for early diagnosis and treatment [10].

Peripheral Arterial Disease

The early detection of peripheral arterial disease (PAD) is crucial for preventive measures and addressing risk factors. However, current diagnostic techniques often fail in this regard. Retinal imaging has the potential to detect subtle variations in vascular structures indicative of atherosclerosis. Previous PAD-detection methods have focused on down-scaled images, which may not capture detailed vascular structures and lead to misdiagnosis. Simon et al.’s proposed method achieved an AUROC score of 0.890, indicating high accuracy in detecting PAD [93]. Attention weights indicated that the optic disc and the temporal arcades were weighted significantly higher than the retinal background in the detection of PAD.

Anemia

Anemia is a major contributor to the global burden of disease, affecting an estimated 1.62 billion people worldwide. Hemoglobin (Hb) concentration is traditionally measured using a venous or capillary blood sample, which can be invasive and painful, cause infection in patients and healthcare workers, and generate biohazardous waste. Recently, automated algorithms have been developed to analyze the retinal fundus images and predict Hb with a MAE of 0.63 and an AUC of 0.88, indicating reasonable accuracy [94]. The DL algorithm trained with retinal images and subject metadata from the UK Biobank can estimate blood-Hb levels and predict the presence or absence of anemia with impressive accuracy [95]. This technology could be used to improve diagnosis and treatment by providing a non-invasive and cost-effective method for screening large populations for anemia. AI-based applications also have significant potential for informing evidence-based and resource-efficient clinical diagnosis and management of sickle cell retinopathy [96].

Hepatobiliary Diseases

The use of DL models for the screening and identification of hepatobiliary diseases using ocular images is innovative and has shown good performance. The screening model achieved an AUC of 0.74 for slit-lamp images and 0.68 for fundus images, while the identification model achieved an AUC of 0.93 (slit-lamp) and 0.84 (fundus) for liver cancer, and 0.90 (slit-lamp) and 0.83 (fundus) for liver cirrhosis [97]. The DL models established qualitative associations between ocular features and major hepatobiliary diseases, suggesting new opportunities for understanding disease characteristics and pathophysiological mechanisms.

Sarcopenia

Sarcopenia is a condition characterized by the loss of muscle mass and function in older adults. Prior work has lacked effective screening tools for sarcopenia. One study analyzed data from the Korean National Health and Nutrition Examination Survey, and developed ML models to identify sarcopenia risk using ocular measurements and demographic factors. The XGBoost models outperformed logistic regression techniques and DL networks for tabular data (TabNet) methods in predicting sarcopenia risk, showing areas under the receiver operating characteristic curves of 0.746 and 0.762 in men and women, respectively [98]. The findings suggest that ocular measurements can be used as a cost-effective screening tool to detect sarcopenia early in predictive, preventive, and personalized medicine contexts. Further studies are required to confirm a direct association and evaluate whether the proposed method can be used in clinical settings.

Discussion

We have analyzed the main results (Fig. 3) and the strength of evidence of the included literature in this review paper, as shown in Figs. 4 and 5. We compared the AUC, which is the key performance indicator for major ophthalmic and non-ophthalmic disorders (Fig. 3). In ophthalmic disorders, AI-based systems trained on fundus photos have demonstrated remarkable results, with the AUC often exceeding 0.9, or occasionally reaching 0.95 or higher. Fundus photo-based AI studies in ophthalmic disorders exhibited higher AUC and accuracy values compared to non-ophthalmic disorders due to the availability of specialized datasets, expert annotation, the relative simplicity of fundus manifestations, and established diagnostic protocols in ophthalmology. To improve AI performance in non-ophthalmic disorders, efforts should be directed to collecting larger, more diverse datasets, enhancing expert annotation, and developing standardized diagnostic criteria for fundus-based diagnoses in these domains. Additionally, advancements in AI model maturity and training techniques can contribute to improved performance across all medical disciplines.

Fig. 3
figure 3

Hierarchical pooled results (AUC, area under the curve) for each condition. a Results of fundus photo-based AI-related studies in ophthalmic disorders. b Results of fundus photo-based AI-related studies in non-ophthalmic disorders. AI artificial intelligence, DR diabetic retinopathy, AMD age-related macular degeneration, RVO retinal vein occlusion, ROP retinopathy of prematurity, MMD myopic macular degeneration, RP retinitis pigmentosa, AD Alzheimer’s disease, PD Parkinson’s disease, CKD chronic kidney disease, PAD peripheral arterial disease, NAFLD Non-alcoholic fatty liver disease, CVD cardiovascular disease

Fig. 4
figure 4

Retina photo-based artificial intelligence (AI) in all ophthalmic and non-ophthalmic disorders

Fig. 5
figure 5

Results of the quality assessment for diagnostic accuracy studies (QUADAS)-2 evaluation of studies included in the review

Current Research Difficulties

In recent years, the application of DL in ophthalmology has become increasingly widespread, and the research results have been highly fruitful. However, some obstacles have been encountered.

The first is the problem of algorithm generalization. An algorithm that performs well on one dataset may not achieve its original performance on another. Generalist medical AI stands as a transformative paradigm within the realm of healthcare [99]. Unlike specialized AI systems that focus on singular tasks, the generalist variety is designed to encompass a broader range of diagnostic, prognostic, and decision-making functions across various medical domains. It operates as a versatile tool, capable of analyzing diverse patient data sources, including medical images, clinical notes, and laboratory results, to offer comprehensive insights into a patient’s health status. Generalist medical AI has the potential to revolutionize clinical practice by aiding healthcare professionals in making accurate and timely diagnoses, formulating personalized treatment plans, and predicting patient outcomes. However, alongside its immense potential, its integration requires meticulous validation, ethical considerations, and a collaborative effort between AI developers, medical experts, and regulatory bodies to ensure its safe, effective, and responsible application in the complex landscape of healthcare.

The second issue relates to the clinical translation of research results. The ultimate goal of intelligent medical research is to serve the medical treatment itself, meaning that research design and model construction should be combined with clinical needs [100]. When training DL networks, the chosen method should not be as complex as possible. Some algorithms that perform well may not function on hospital computers due to excessive computational power, making them difficult to apply to clinical practice. In addition, most of the single disease single modality diagnostic models studied appeared not to fit into clinical diagnosis and treatment ideas. To build a DR screening system, the possible impact of other diseases on the algorithm should be simultaneously considered. Therefore, developing multi-disease diagnoses and treatment systems may be the trend of AI research. Cen et al. developed a diagnosis and treatment platform for in-depth learning, and used heterogeneous fundus images from 7 different clinical centers to detect 39 related fundus diseases and lesions [101], covering such common diseases as diabetes retinopathy, glaucoma, AMD, pathological myopia, and fundus lesions (e.g., hard exudates, subretinal hemorrhage, new blood vessels, and pigment epithelium detachment). This study suggests that, in order to achieve the clinical transformation of models, it is essential to design models based on clinical needs, which poses challenges to both clinicians and engineers.

The integration of AI into ophthalmology has been marked by significant milestones, notably the FDA approval of IDX-DR (Digital Diagnostics) in 2018, the first AI system authorized for detecting DR. Later, two more AI-based devices were approved by FDA for the same purpose, namely EyeArt (Eyenuk Inc), and AEYE-DS (AEYE Health Inc). In Europe, there are over 15 CE-marked AI-based devices mostly to diagnose DR, but some are also dedicated for some other retina disorders and glaucoma suspects [102]. These approvals highlight AI's growing role in improving diagnostic accuracy and personalized care in ophthalmology. Such regulatory approvals underscore the safety, efficacy, and clinical utility of AI in eye care, promising further innovations that could broaden the scope of ophthalmic diagnostics and treatment. The regulations of the use of medical AI devices in the European Union (EU) are different to those in the USA. European law classifies medical devices into four classes: I, IIa, IIb, and III, depending on the perceived risk level of the device. Medical devices are certified under independent nongovernment bodies in a decentralized manner. The European "open" approach is vastly different from the strict perspective of the FDA, as reflected by the number of available medical AI devices. The EU is currently in a transitory period between two regulations, and just before introducing the AI Act bringing many more restrictions to this area, further complicating the legislative landscape.

The third issue regards the lack of collaborative efforts among multidisciplinary teams, comprising clinicians, data scientists, ethicists, and regulatory experts. Currently, there are hundreds of publicly-accessible image datasets in ophthalmology [103], including MESSIDOR, DRIVE, EyePACS, and E-Ophtha [104,105,106,107], providing resources for many researchers with non-medical backgrounds and effectively promoting the development of intelligent-assisted diagnosis of ophthalmic diseases. By involving clinicians from different regions and demographics, the algorithm’s training data can be more comprehensive, thus reducing bias and enhancing its performance across varied patient populations [107]. Collaborative research between AI experts and healthcare professionals can result in the development of algorithms with improved interpretability. Close collaboration between AI researchers and healthcare institutions can lead to the testing and refinement of algorithms in real-world clinical settings. Collaboration with ethicists and regulatory experts is essential to navigating the ethical implications and regulatory requirements associated with AI algorithms. Cooperation between researchers, healthcare institutions, and industry partners can lead to the continuous monitoring, evaluation, and updating of AI algorithms. Combining the expertise of various stakeholders can lead to more robust, inclusive, interpretable, and sustainable AI solutions that genuinely benefit patients and enhance healthcare outcomes.

Future Perspectives

Firstly, with the advancement of AI applications, a growing number of institutions are willing to participate in AI-related research, thus laying the groundwork for the generation of new datasets. Moreover, ever more researchers appear to be willing to share their own research data while proposing disease diagnosis-related algorithms to promote the further development of DL in ophthalmology, which would allow for diverse datasets. In addition, federated learning is a distributed ML technology that can achieve a balance between data privacy protection and data sharing by only sharing parameters or intermediate results without exchanging data samples [108]. Moreover, group learning is based on the idea of edge computing and blockchain technology. It adopts a decentralized architecture and sets nodes to share parameters [99] Currently, federal learning has made some achievements within ophthalmic medicine [100]. The addition of these methods may provide new ideas for image data sharing.

Secondly, compared to the simple image-based diagnosis model, future AI-based retinal imaging research may place a greater focus on the prediction progression of diseases[109]. This requires a gradual transition from current cross-sectional studies to longitudinal cohort or comparative studies, collecting follow-up data from patients, and using DL’s computational power to explore and intervene in key factors affecting the occurrence and development of diseases. In so doing, truly integrated intelligent medicine can be achieved. An online calculator was developed based on the prediction model to generate the predicted probability of melanoma. The prediction model was validated externally and had a discrimination value of 0.861.

Thirdly, the advanced generative models can revolutionize medical image analysis by generating synthetic retina images that can supplement limited or unavailable clinical data, thus aiding in algorithm training and validation. Through generative AI, the scarcity of diverse and well-labeled datasets can be mitigated, thereby enhancing the robustness and generalizability of retina-based AI algorithms. By integrating insights on the use of retinal imaging for identifying indicators of drug abuse, we highlight a significant leap forward in the field of oculomics [110]. This integration extends the utility of fundus photography, traditionally focused on ophthalmic conditions, to encompass the detection of systemic health anomalies. The utility of advanced AI techniques, specifically multistage generative adversarial networks (GANs), in pushing the boundaries of what can be diagnosed through ocular assessments. This innovative approach underscores the potential of retinal imaging not just in the realm of ophthalmology but as a powerful tool in the broader spectrum of medical diagnostics.

Additionally, these models can facilitate interpretability by generating visual explanations for AI-driven diagnoses, empowering clinicians to more comprehensively understand the rationale behind algorithmic decisions. By fostering a seamless human–AI collaboration, generative AI could lead to innovative diagnostic support tools that enhance accuracy and streamline clinical workflows. However, these possibilities are accompanied by ethical and regulatory considerations, necessitating careful validation, transparency, and the alignment of generative AI outputs with established medical standards. As these technologies continue to evolve, the integration of generative AI in retina photo-based AI algorithms holds the potential to drive significant advancements in medical diagnostics and patient care.

Incorporating the methodology outlined in our study holds significant promise for advancing telemedicine, particularly in the post-COVID era where the demand for remote healthcare services has surged. The utilization of DL to predict corneal curvature from fundus photography exemplifies the innovative approaches emerging in teleophthalmology [111]. This methodology not only aligns with the broader objectives of enhancing diagnostic precision remotely but also underscores the potential of AI in making specialized ophthalmic assessments more accessible outside traditional clinical settings.

Conclusions

Recent studies have shown that fundus photos can be used for AI algorithm training to detect not only ophthalmic disorders, but also non-ophthalmic diseases, and to provide information on biomedical and demographic data. The present list is impressive, although certainly not final. The retina is the only part of the human body to non-invasively and optically visualize and analyze the blood vessels and central nervous system structures, which provides a unique opportunity to develop new methods for detecting and monitoring cardiovascular and neurodegenerative disorders. The screening of many eye disorders, namely DR, AMD, glaucoma, ROP, is presently available due to recently developed AI-based medical devices, which have already been registered in many countries.

We believe that, in the future, fundus pictures will be more widely employed by non-ophthalmologists than ophthalmologists for screening and predicting the risk of systemic disorders.