Introduction

Sickle cell disease (SCD) is a public health challenge with a significant burden globally, particularly affecting sub-Saharan Africa, where 75% of the disease occurs [1]. That comparatively higher prevalence is largely driven by malaria, against which persons with the sickle cell trait have higher resistance [2]. Indeed, the mutation causing SCD arose in Africa thousands of years ago and was propagated within human populations to help protect against malaria, a historically major cause of death on the continent [3]. Characterized by a group of inherited red blood cell disorders, sickle cell anemia, the most severe form of SCD, results from the homozygous inheritance of the sickle hemoglobin gene. This genetic alteration leads to the production of abnormally shaped red blood cells, precipitating a cascade of phenotypical clinical complications ranging from acute pain, chronic pain and severe infections to stroke and organ damage [4].

Each year, more than 400,000 babies are born with SCD globally, with the majority, approximately 300,000, affected by sickle cell anemia [1, 5]. These numbers are expected to double by 2050 [5], underscoring the urgent need for improved care, research, and prevention strategies. In Uganda, SCD is a public health priority, with approximately 20,000 babies born with sickle cell anemia annually, constituting roughly 6.67% of reported global SCD births [6].

Although sub-Saharan Africa bears the highest burden of SCD worldwide, there is a lack of comprehensive data from well-designed longitudinal studies on SCD essential for understanding the disease’s epidemiology, progression, and the interplay of genetic and environmental factors influencing its cause [7]. Recently, there has been an increase in SCD research from Africa [8,9,10], signaling a growing recognition of the disease’s burden and an increased focus on addressing it. However, most studies are single-center, employ inconsistent data collection methods, and are not longitudinal. In order to better understand the epidemiology, disease progression, and disease modifiers of SCD, there is a need to conduct well-designed multi-center, long-term longitudinal cohort studies in Africa, backed by appropriately designed registries.

SCD registries are particularly useful for long-term follow-up studies that aim to understand the patterns of disease severity, and the social, environmental determinants of outcomes in patients with SCD in Africa. The establishment of centralized, national, electronic, SCD registries consented by patients in Africa holds the potential to revolutionize SCD research and treatment. Such a registry not only promises a deeper understanding of the disease but also serves as the cornerstone for the development of personalized healthcare strategies tailored to the unique and diverse needs of SCD patients [1].

Uganda joined the Sickle Pan Africa Research Consortium (SPARCo) in 2020 [11]. SPARCo is a collaborative initiative involving 6 countries (Ghana, Nigeria, Tanzania, Mali, Zambia, Zimbabwe, and Uganda) whose aim is to improve outcomes of persons with SCD through multi center research and regionally adapted evidence based clinical care in Africa. Here, we present an in-depth description of the development of an electronic patient consented registry in Uganda. We also present an analysis of the patient enrollment and management thereby contributing valuable insights and laying the foundation for future research and intervention strategies tailored to combat SCD’s profound impact in sub-Saharan Africa.

Methods

Study sites

The registry was developed at Mulago National Referral Hospital Sickle cell clinic in Kampala, the capital city of Uganda, which is one of the most populous areas with 1.6 million inhabitants [12]. The Mulago Sickle cell clinic is the largest and oldest clinic in Uganda and the East African region, and serves mainly the central region, but receives patients from the rest of the country. Subsequently, the registry was expanded to three additional Regional Referral Hospitals: Jinja Regional Referral Hospital in Eastern Uganda, Mbale Regional Referral Hospital and Lira Regional Referral Hospital. The geographic distribution of the program’s catchment areas thus extends across three of four major regions in the country, all of which possess a high prevalence of SCD [6]. The Ministry of Health (MOH) of Uganda was involved as key stakeholders from the inception of the registry, with regular updates and participation in the Advisory Committee to ensure sustainability. Furthermore, the health workers who were responsible for enrolling patients were employed by the MOH, which helped promote the registry’s long-term viability.

Study design

The registry utilized a longitudinal study design, wherein data was systematically collected from each patient during every follow-up visit.

Database design

The SPARCo Uganda registry was developed using a comprehensive approach, beginning with the adoption of standardized ontologies for SCD under the coordination of the Sickle Africa Data Coordinating Center (SADaCC) [13, 14]. Adjustments and customizations of these were then applied to suit local context, both by way of collected data types and local clinical standards (Fig. 1). The harmonization and mapping were done in Python and scripts have been added in the code section. These optimized ontologies facilitated structured data representation and standardized data capture. Following this, a Case Report Form (CRF) was developed, aligning data items from various sources, including existing forms at Mulago National Referral Hospital, to ensure consistency and accuracy in data collection. This process was then tailored to fit specific clinical processes at each site. We integrated Research Electronic Data Capture (REDCap), a versatile web-based platform optimized for modern scientific data collection [15, 16]. Scripts were developed utilizing its built-in validation rules and automated data quality checks including blank value checks, incorrect values for calculated fields checks and field validation to ensure the integrity of research data.

Fig. 1
figure 1

Figure showing the ontology data mapping and harmonization process

Implementation of registry deployment and mobile app utilization for data collection

The deployment of the registry was set up on a Linux server running CentOS 8 and housed at the African Center for Excellence in Bioinformatics and Data Intensive Sciences (ACE) [17]. Single Sign-On authentication was integrated to enhance security and multi-site access. This approach simplified user access across different sites. Additionally, the REDCap mobile application was implemented, enabling data entry personnel at various hospital sites to efficiently collect data. This was also configured to allow for data collection both in online and offline modes, ensuring uninterrupted data capture (Fig. 2).

Fig. 2
figure 2

Figure showing the database design and architecture of the registry

Participant consent procedures

The study procedures were explained to both patients and/or their caregivers. Prior to enrollment in the registry, individuals who consented to participate were required to sign a written informed consent form, indicating their approval. In instances where patients were below 18 years of age, informed consent was obtained from their caregivers. For patients aged 18 years and older, consent was directly obtained from the patients themselves. Patients aged between 8 and 17 years were provided with assent forms, in addition to written informed consent from their caregivers.

Data entrant training

Training workshops were conducted at each site, where data managers, coordinators, and clerks were trained on completing CRFs and inputting data electronically. These sessions were designed to ensure uniformity and high-quality data collection across all sites.

Patient enrollment and data collection

Using CRF, demographic data, including date of birth/age at enrollment, gender, religion, and place of residence, were collected through interviews with the patient/caregiver, along with a review of the patient’s medical records. Information regarding SCD phenotypes, blood group, and SCD test type administered were extracted from the patient’s medical records. Lab results including a complete blood count with differential counts. Follow-up clinical data at each clinic visit were also captured and included vital signs such as body temperature, and blood pressure and management details such as usage of hydroxyurea, penicillin V, malaria chemoprophylaxis, folic acid, up-to-date status of pneumococcal vaccination were captured. Additional clinical information about chest pain, priapism, anemia, and jaundice were also captured as part of the follow-up visit. A deduplication process was implemented within Redcap using unique identifiers to prevent duplicate entries. Enrolling sites had access only to their local patient data ensuring confidentiality and data security. The full data collection instrument with all the variables that are collected has been provided as part of supplementary material (Supplementary Material 1). The data used to generate the figures has been attached in the supplementary materials as two separate files namely enrollment_trends.csv and site-frequencies.csv.

Data management & analysis

Regular data quality checks and progress monitoring were conducted on a weekly basis, accompanied by the timely resolution of any arising queries. The overall data manager conducted comparative analysis and presented aggregated data during these weekly meetings. During the data collection phase, data clerks were afforded ongoing assistance and guidance to navigate and resolve any technical challenges encountered including mobile updates and errors resulting from attaching big sized images. The registry prioritized data quality assurance, employing in-house rules to enforce data type restrictions, range checks, and uniform data entry procedures. In addition to the server’s daily automatic backups, data was also manually backed up onto an external hard disk system for added security. Furthermore, comprehensive automatic backups were performed quarterly, with data being stored both online and on a distinct external hard disk for redundancy. To ensure further data safety, de-identified data were routinely transmitted to the SPARCo Hub and SADaCC via REDCap, serving as an additional backup layer. A descriptive analysis of the collected data depicting the number of enrollment trends and number of patients at each site was conducted in R 4.2.1 [18], dates were formatted using lubridate v1.8.0 [19] with and the map visualization was done using sf v1.0-15 [20] and rnaturalearth v1.0.1 [21] packages.

Results

From June 2022 to October 2023, we registered and enrolled a total of 5,655 patients with SCD across the 4 sites in Uganda. The geographical distribution of these enrollment sites is depicted in Fig. 3.

Fig. 3
figure 3

Map of Uganda showing regions with different sites

Enrollment trends over time

The highest number enrolled in a month was in August 2022(902), and the lowest was in October 2023(56). This is shown in Fig. 4.

Fig. 4
figure 4

Enrollment trends of SCD patients from June 2022 to October 2023 showing monthly increases and fluctuations over the study period

Discussion

The SPARCo database has established an extensive registry, distinguished as the inaugural and most extensive of its kind in Uganda. The data components employed by the Ugandan registry mirror those utilized in other nations participating in SPARCo and originate from the SCD ontology, aiming to foster consistency across SCD databases throughout Africa [13, 22]. Although this continent-wide ontology forms the backbone of all registries, national registries can have either more or less ontological elements, depending on their own clinical protocols and standards of care or even simply what data they can afford to collect. In our case, elements from previous forms (Supplementary Material 2) about physical examination, previous diagnosis and whether a patient undertook daycare treatment were added to the ontology.

In collaboration with the SADaCC, this registry aims for universal acceptance and precise data capture, adhering to the structured SCD ontology. Standard operating procedures (SOPs), devised jointly with SPARCo and SADaCC, ensure uniform data collection, data quality, and compliance with the FAIR (Findable, Accessible, Interoperable, and Reusable) principles, enhancing both human and computer usability [23]. This establishment is similar to SPARCo Ghana, Nigeria and Tanzania registries [24,25,26]. In contrast with SPARCo Tanzania, the registry enrolled patients from four regional referral hospitals and district hospitals, with varying SCD epidemiology.

The advancement and coordination of our registry, in collaboration with the broader SPARCo network, are paving the way for the consolidation of data from numerous sites. This consolidation is vital for higher statistical power to drive deeper insights into the landscape of SCD on the continent and improvement of the utilization of the REDCap mobile application, particularly for data collection in settings where internet connectivity is unavailable. Previously, the database at Mulago site encountered challenges with lab results often returning a day after they were ordered or some patients lacking funding to order lab tests promptly, leading to inconsistencies in the entered results. However, these issues have been resolved due to the registry’s ability to easily update and amend records, ensuring more consistent recording of lab results. The development of harmonized SOPs and minimum uniform data elements across participating sites ensures that the data collected is standardized. This standardization is key in forming a comprehensive dataset within the consortium. Utilizing this extensive data, we are positioned to apply advanced data science approaches, employing innovative methodologies and algorithms designed for analyzing large datasets. Such analysis will be instrumental in yielding critical insights, which are essential for the development of effective interventions and improving the overall care for patients with SCD.

The trend in patient enrollment demonstrated a strong and steady increase during the first half-year, with an average of 500 new patients each month. This rate is notably higher than that observed in other SPARCo registries across Africa, such as in Tanzania, Ghana, and Nigeria, where the monthly average was around 150 patients. This highlights the particularly high patient volume in Uganda and the likely more mature SCD clinical infrastructure in the country. Enrollment numbers were highest in August due to the outreach efforts and the addition of two new sites, Jinja and Mbale, which peaked in their first month of enrollment. Initially, only Mulago was operational contributing to the numbers in June and July. The registry experienced a decrease in enrollment in the month of December, this trend has been similarly observed in the other SPARCo sites (Tanzania, Nigeria and Ghana). During December, extended public holidays are observed, prompting many families to celebrate the Christmas festive season away from their regular residences.

Over a period of 17 months, our registry successfully enrolled a total of 5,655 patients, setting a record for the most rapid accumulation of patient data among SPARCo SCD registries within such a timeframe, especially when compared to similar registries in Tanzania, Ghana, and Nigeria [24,25,26]. The National Referral Hospital, particularly Mulago National Referral Hospital, enrolled more patients than the regional referral hospitals. This discrepancy is attributed to the sickle cell clinic’s daily operation, compared to weekly operations at regional hospitals, its larger staff, and its broader service population. The Mulago sickle cell clinic has seen some recent rise in the number of researchers, and patients there might also be more likely to consent. Furthermore, this might be a reflection of increased awareness among the patients on the importance of registries and clinical research. Taken together, the establishment of the registry and the quick accrual of patient numbers in the registry provide evidence of the capacity for sub-Saharan Africa sites to initiate and lead large prospective SCD studies.

Lessons learned

The development of the SCD registry has imparted several valuable lessons, pivotal for the planning and execution of future disease registries. Having robust infrastructure emerged as a cornerstone of our efforts, underscoring the necessity for a solid technological and organizational foundation to support data integrity, management, and scalability. The infrastructure, which was provided by the NIH-funded African Center of Excellence in Bioinformatics and Data Intensive Sciences, has also supported various technological and infrastructural initiatives, further enhancing research and development capabilities across the continent [17, 27,28,29]. This was complemented by the formation of a multidisciplinary team, bringing together clinicians, software developers, data managers, and policymakers. This diverse collaboration not only facilitated a comprehensive approach to registry development but also ensured that the system was responsive to the needs of all stakeholders, thereby enhancing the utility and applicability of the registry in various research and clinical contexts. Furthermore, leveraging REDcap, an existing database platform – and only making modifications and customizations to its core relational architecture to fit local context, did not only save development resources but also quickened interoperability with other national registries within the consortium.

The approach of starting small and then expanding proved to be instrumental in the registry’s development, allowing for the initial focus on a limited set of data elements and sites. This methodology enabled the team to identify and address issues in a controlled environment, refining data collection processes before scaling up. Such a phased expansion was crucial for maintaining system stability and adaptability, ensuring that the registry could accommodate a growing scope of data and a wider network of participating sites without compromising on data quality or system performance.

The implementation of extensive training programs for all involved parties, especially data entrants, was another critical factor in the registry’s success. Ensuring that individuals were well-versed in the registry’s operations and objectives significantly contributed to the accuracy and reliability of the data collected. Moreover, achieving comprehensive geographical coverage ensured the collection of data that was representative of the entire country, capturing the diversity of SCD manifestations and treatments. This wide-ranging data collection was further enhanced by the inclusion of local language support, making the registry more accessible and inclusive, thus broadening the spectrum of participants and enriching the dataset. Connectivity issues in remote areas hindered consistent data collection, a challenge that was addressed through the adoption of mobile data collection tools that allowed for offline data capture.

Conclusion

The establishment of the SPARCo Uganda registry represents a significant milestone in the realm of SCD research, particularly within the context of resource-limited environments. This endeavor demonstrates the viability of developing comprehensive disease registries in such environments and underscores the imperative of integrating individuals diagnosed with SCD seamlessly into the healthcare system, ensuring they receive the specialized care they need.

The findings from the SPARCo Uganda registry, which are currently being used to facilitate a range of studies, including the analysis of SCD clinical phenotypes, newborn screening, and evaluation of hydroxyurea use, offer a promising direction for SCD management, research, and data-driven approaches in Uganda. They suggest that implementing a nationwide electronic registry for hemoglobinopathies, with patient consent, could revolutionize SCD research and patient care.