A Guideline for Open-Source Tools to Make Medical Imaging Data Ready for Artificial Intelligence Applications: A Society of Imaging Informatics in Medicine (SIIM) Survey

Vahdati, Sanaz; Khosravi, Bardia; Mahmoudi, Elham; Zhang, Kuan; Rouzrokh, Pouria; Faghani, Shahriar; Moassefi, Mana; Tahmasebi, Aylin; Andriole, Katherine P.; Chang, Peter; Farahani, Keyvan; Flores, Mona G.; Folio, Les; Houshmand, Sina; Giger, Maryellen L.; Gichoya, Judy W.; Erickson, Bradley J.

doi:10.1007/s10278-024-01083-0

A Guideline for Open-Source Tools to Make Medical Imaging Data Ready for Artificial Intelligence Applications: A Society of Imaging Informatics in Medicine (SIIM) Survey

Open access
Published: 01 April 2024

(2024)
Cite this article

Download PDF

You have full access to this open access article

Journal of Imaging Informatics in Medicine Aims and scope Submit manuscript

A Guideline for Open-Source Tools to Make Medical Imaging Data Ready for Artificial Intelligence Applications: A Society of Imaging Informatics in Medicine (SIIM) Survey

Download PDF

Sanaz Vahdati¹,
Bardia Khosravi¹,
Elham Mahmoudi¹,
Kuan Zhang¹,
Pouria Rouzrokh¹,
Shahriar Faghani¹,
Mana Moassefi¹,
Aylin Tahmasebi²,
Katherine P. Andriole³,
Peter Chang⁴,
Keyvan Farahani⁵,
Mona G. Flores⁶,
Les Folio⁷,
Sina Houshmand⁸,
Maryellen L. Giger⁹,
Judy W. Gichoya¹⁰ &
…
Bradley J. Erickson ORCID: orcid.org/0000-0001-7926-6095¹

1247 Accesses
6 Altmetric
Explore all metrics

Abstract

In recent years, the role of Artificial Intelligence (AI) in medical imaging has become increasingly prominent, with the majority of AI applications approved by the FDA being in imaging and radiology in 2023. The surge in AI model development to tackle clinical challenges underscores the necessity for preparing high-quality medical imaging data. Proper data preparation is crucial as it fosters the creation of standardized and reproducible AI models while minimizing biases. Data curation transforms raw data into a valuable, organized, and dependable resource and is a fundamental process to the success of machine learning and analytical projects. Considering the plethora of available tools for data curation in different stages, it is crucial to stay informed about the most relevant tools within specific research areas. In the current work, we propose a descriptive outline for different steps of data curation while we furnish compilations of tools collected from a survey applied among members of the Society of Imaging Informatics (SIIM) for each of these stages. This collection has the potential to enhance the decision-making process for researchers as they select the most appropriate tool for their specific tasks.

Efficient Large Scale Medical Image Dataset Preparation for Machine Learning Applications

A Standardised Approach for Preparing Imaging Data for Machine Learning Tasks in Radiology

VISCERAL: Towards Large Data in Medical Imaging — Challenges and Directions

Discover the latest articles, news and stories from top researchers in related subjects.

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

Artificial intelligence (AI) continues to play a significant role in medical imaging. As of 2023, the highest percentage of AI algorithms cleared by the FDA were for imaging (83%) and radiology (75%) [1, 2]. The increasing rate of AI model development to address clinical challenges has escalated the need to prepare high-quality medical imaging data. Optimal data preparation is of paramount importance since it leads to the development of standard, reproducible AI models and alleviates biases [3].

Data curation for AI model development is a multifaceted and challenging process. Data curation creates a dataset representative of the problem domain. Crucially, the representativeness of the data directly influences the performance and generalization capabilities of the AI models [4]. Effective data curation ensures that raw data is transformed into a high-quality, organized, and reliable resource that underpins the success of machine learning and analytical endeavors [5]. The ideal tool for data curation should assist developers and researchers in preparing the data in the fastest and most well-curated manner. Such tools not only save time but also contribute to the accuracy and robustness of models. Staying well-versed in these tools empowers professionals to navigate the complex journey from raw data to refined information, thus unlocking the true potential of data-driven innovations [6]. Given the abundance of tools, each targeting distinct aspects of data preparation but often sharing considerable similarities, it is crucial for researchers and developers to remain well-informed about the most applicable tools within their particular research domains.

Gaining knowledge regarding available tools not only helps selecting the best tool for the assigned task (e.g., detection, segmentation, or classification) but also can point out the possible limitations the user might face if starting to work with the inappropriate tool (for instance a tool that can only create one label in multi-label segmentation task).

In the current study, we aim to provide a descriptive outline for the phases of data curation while we furnish compilations of tools gathered from a survey carried out among members of the Society of Imaging Informatics (SIIM) for each of these stages. This compilation serves to enhance the decision-making process for researchers as they select the suitable tool for their specific tasks.

Method

For data collection, a survey consisting of questions requesting researchers to identify their preferred tool, provide a description of the tool, and highlight its core features was created and shared with 500 members of the Society of Imaging Informatics (SIIM). A total of 54 responses from 26 medical informatics centers were collected. Duplications, general answers that did not introduce a specific tool, not open-access, and in-house solutions (not publicly available) were excluded, resulting in the inclusion of a total of 28 tools. The tools in the next phase were carefully investigated by the authors (S.V, B.K, E.M, P.R, S.F, M.M, A.T) to be characterized based on the core features, including cloud features, input data, de-identification functions, data conversion, data normalization, data labeling, data annotation, storage, workflow, and federated learning support. In the subsequent sections, we briefly describe different steps of data curation with a collected list of tools that are particularly useful for each task (Tables 1 and 2). References and links to all tools are given in the supplemental material. We created the SIIM Tools Survey GPT (https://chat.openai.com/g/g-X6o0w5duF-siim-tools-survey) using GPT4 as a chatbot based on the collected information.

Table 1 Tools used for data curation with their specific core features

Full size table

Table 2 Tools used for data annotation with their specific core features

Full size table

It is noted to mention that some words and steps might have been used in different categories or instead of each other in AI data preparation studies. For example, some might categorize the “annotation” and “curation” as two separate categories in their studies [7], while the “annotation” is considered part of data curation in other studies [8]. Likewise, in the current work, we consider and describe these steps as subsets of data curation.

Data Curation

Data curation is an important process in model development, applied to data from the time it is first acquired to the point it is ready for use by AI (Fig. 1). Tools have been widely developed to address some or all of the steps for data curation [7].

Data curation can be referred to as the process of collecting, sorting, filtering, tagging, normalizing, standardizing, converting, and management of data prior to feeding the data to AI models for development purposes (Fig. 2). This broad category of tasks plays a crucial role in optimizing model development in the field of medical imaging [4].

De-Identification

In the United States, the Health Insurance Portability and Accountability Act (HIPAA) de-identification approaches, including “Safe Harbor” and “Expert Determination” delineates a comprehensive set of distinct categories of protected health information (PHI) that necessitate removal prior to the use of a medical document for many research endeavors [9]; most countries outside the US also have similar requirements for privacy preservation. It is worth noting that institutions have a range of de-identification methods to select from since there is not a concrete consensus on using a specific method. The data objects must undergo modifications that involve eliminating unnecessary PHI and substituting essential PHI with research identifiers. The research identifiers maintain the connection between data objects in addition to establishing a disassociation between the data and the human entity which served as the subject of the trial (Fig. 3). There are two types of de-identification [10]. One is “anonymization,” which replaces all PHI with either nothing or random data; anonymized data is completely devoid of any information that could potentially disclose the patient’s identity. The second approach is “pseudonymization.” In this approach, a known identifier replaces PHI, and there is a separate file that stores the mapping from this identifier to the PHI [11]. The latter approach enables researchers to conduct follow-up studies and additional multi-modal analyses in their future works and is common in clinical trials.

PHI is nearly always present in the meta-tags of medical images (e.g., DICOM header). Patient data can also be part of the image (pixels), which is called “burned-in” data. This may be introduced by post-processing software that puts PHI into the pixels, especially in cases of some older imaging devices. Specific tools are introduced to adhere to the mandates of HIPAA in addition to ensuring the preservation of data quality and reliability, but no gold standard tool for pixel-level de-identification has been introduced yet. In addition to textual PHI de-identification, it is possible to identify an individual based on the images themselves, such as facial reconstruction in neuroradiology imaging [12]. For this purpose, defacing tools such as “Mridefacer” [13] will alter voxels in the facial region of an MRI scan while preserving the brain structure. A list of data de-identification tools that are commonly applied for de-identification in our survey is demonstrated in Table 1.

Data Format and Conversion

One of the initial considerations for choosing a tool is the “data format” that the user is working with. Digital Imaging and Communications in Medicine (DICOM) is known as the standard file format and communication profile in radiology [14]. The NIfTI file format was later developed to store volumetric image data, such as 3D MRI scans, in a standardized manner. It covers a wide array of 2D, 3D, and 4D data formats, including both structural and functional MRI, diffusion tensor imaging (DTI), and positron emission tomography (PET) scans [15]. NIfTI is used in medical imaging because it directly supports 3D and 4D data, while JPEG is primarily a 2D format (there is a 3D JPEG standard, but we are not aware of any annotation tool that supports it). Most deep-learning models require input in the form of matrices or tensors. This typically involves converting the DICOM images into a pixel array representation or converting them to a standard image format like JPEG, PNG, or NIfTI format.

If a tool only accepts JPG format, one can not work on NIfTI images with it. “ImageJ” is a tool that receives and saves several types of data, including JPEG, PNG, FITS (Flexible Image Transport System), TIFF (Tag Image File Format), and DICOM. One of the software that was widely used by our survey participants was the dcm2niix. It is exclusively developed for the conversion of DICOM files to NIfTI format, providing multiple options to use the original DICOM’s metadata in the naming of the output NIfTI file. This can be considered an advantage since converting DICOM to alternative formats can lead to losing metadata information related to images, which is not desired by users in many AI applications [16]. Converting tools mentioned in our survey can be found in Table 1.

Image Normalization

In medical imaging, it refers to the process of adjusting the intensity values or pixel sizes of medical images to a more consistent scale or range. Normalization plays an important role in medical image curation as different imaging devices or techniques can produce images with varying intensity ranges, scales, and statistical distributions [6]. In this regard, many software have introduced tools for data normalization, such as “MONAI Label” and “Mango.” In addition, one important part of normalization is harmonization, which makes studies from various institutions consistent and compatible. In our survey, the “Medical Image and Data Resource Center” (MIDRC) was the only tool that harmonizes imaging studies through Logical Observation Identifiers Names and Codes (LOINC) mapping. It yields common long names and thus enables searching and efficient cohort building.

Cloud Computing and Operating Systems

Cloud computing refers to the utilization of off-premise computing services and infrastructure provided by a separate entity for storing, managing, and processing medical image data. It involves the storage of medical images and related information on remote servers hosted in data centers over the Internet [17]. On the other hand, a non-cloud tool is executed on local hardware. Users must install the software on these local servers and continuously manage their maintenance, updates, and security [18]. This maintenance overhead, and also the upfront cost, may be incentivizing institutions to use cloud computing more for their AI development pipelines. However, cloud computing has some challenges in terms of operational management, interface efficiency, financial costs, and security preservations [19].

When choosing a suitable tool, developers should consider the operating system required by the tool as well. While most tools introduced in our survey are agnostic, meaning they run on any common operating system, some tools only work on specific operating systems. In our survey, “Horos,” which is used for data de-identification and labeling, works exclusively on Mac; on the other hand, “Niffler” and “Moose” are solely available for Linux operating systems.

Table 1, derived from our survey, illustrates the tools that exhibit the characteristics outlined above.

Annotation and Labeling

Image annotation and labeling are often used interchangeably, but they are two different tasks commonly used for the analysis of medical imaging. Annotations represent the regions that contain the developer’s object of interest; they might be in the form of a bounding box or a freeform delineation around the portion of the image thought to represent the object or pathology of interest. Labels provide a textual or categorical representation of the identified regions or structures. Labeling is the process of classifying the images or the annotations into certain categories [20]. Labeling is required for most classification tasks, and annotation may also be required. For example, a chest radiograph can be labeled as “pneumonia” or “normal” without a need for annotating the suspected region; it can also be used in companion with an annotation task for labeling multiple pathologies annotated in an image, such as “consolidation,” “pneumothorax,” or “mass.” Table 2 demonstrates the core features of collected tools for annotation and labeling.

Segmentation

Annotation tasks can be conducted by finely delineating the exact borders of an object, as in segmentation (Fig. 4). Segmenting a large number of data manually is tedious and time-consuming. Semi-automated and fully automated tools that facilitate the process of segmentation [21, 22] are available. “Semi-automated segmentation” is a combination of manual user input and automatic algorithms. In this approach, the user typically initiates the segmentation by providing an initial annotation or seed region in the image. The tool then applies algorithms to propagate and refine the segmentation based on the user-provided information, which can be adjusted and corrected by the user [23]. On the other hand, “fully automatic segmentation” tools do not require user input or intervention for explicit segmentation [24]. They analyze the image data and identify the regions of interest based on the learned template.

Since many of the image annotation tools are open-source, they benefit from constant improvement and expansion by new extensions being developed. A good example is the recently introduced tools of automatic segmentation, annotating several body organs with the click of a button. These tools, including MONAI bundle model zoo [25] and the “segment anything model (SAM)”-based medical imaging tool (MedSAM) [26], have introduced extensions to be used on popular tools used in our surveys such as 3D Slicer and OHIF; so that the operator can edit the generated annotations and extract different measures from the annotated masks.

Object Detection

It is a gross detection of an object without distinguishing its exact borders to enhance the training process by focusing on the region of interest and assigning a class label for that region [27]. This task can be performed by tools such as Label Me, Markit, or Prodigy. They store the coordinates of the manually generated, so-called bounding boxes in formats, such as JSON, to feed the deep learning object detection models (Fig. 5).

Active Learning

It is another feature in segmentation that helps annotators by providing suggestions on challenging regions during annotation. They guide annotators to data points that would result in improvement of model performance. This can also be used to optimize segmentation performance over time after initial segmentation [28, 29].

Co-registration

It is the alignment of two or more volumetric images based on specific mathematical transformations [30]; it provides multi-parametric information about the structure or region of interest [31, 32]. By aligning images, co-registration enables one to combine voxels for anatomical structures from different image types like functional MRI, T1-weighted, diffusion-weighted images (DWI), or pre- and post-contrast images [33]. 3D slicer, ITK-SNap, MITK, and ImageJ were applications that included co-registration functions in our survey.

Table 2, derived from our survey, demonstrates the tools that encompass the features outlined above.

Data Collection and Storage

Medical imaging data are usually acquired by submitting a query from identified data sources such as Picture Archiving and Communications Systems (PACS). The query contains various search criteria such as patient demographics (e.g., age, gender), examination type (e.g., X-ray, MRI), body region (e.g., head, abdomen), imaging modality, and date range. Since most studies focus on a specific disease and since the PACS usually does not store diagnostic codes, it is often necessary to first query a diagnosis database or registry and then perform the PACS query.

“Atlas” is a platform to support design and observational analyses. It is used for phenotyping and cohort definitions for target, control, and outcome populations. Another platform for data retrieval is “DIANA,” which is capable of de-identification, cohort definition, and radiation dose monitoring for prospective oncology studies [34].

While nearly every radiology or imaging practice has a PACS, some open-source storage solutions for DICOM images were introduced in our survey. “Orthanc” is a standalone DICOM server that provides an extensible platform for storing, retrieving, and managing medical images. It can be easily integrated into existing PACS infrastructures. “MIDRC” exists on the Gen3 Platform for intake, storage, viewer, cohort building, and data downloading [35]. With the increasing size and complexity of medical imaging datasets, cloud computing platforms such as “NVIDIA DGX Cloud,” “AWS,” “Google Cloud,” or “Microsoft Azure” provide scalable infrastructure for data storage, processing, and training. These platforms offer a range of tools and services that facilitate data management, distributed computing, and access to powerful GPUs for accelerated AI training.

Federated Learning

Obtaining ethical and legal approvals such as Institutional Review Boards (IRB) or data use agreements with healthcare research institutions is a prerequisite for starting data collection. With the rapid advances of AI in medical imaging, further collaboration in multi-institutional frameworks is a key component in eliminating biases and the development of reproducible and standardized models. This goal was previously only pursued by data sharing among institutions and involved significant difficulties, including limited data storage and data privacy challenges. Data privacy concerns or legal restrictions limit the accessibility and usability of the required data. Federated Learning was introduced as a possible solution in medicine [4]. Federated learning is a privacy-preserving approach in which the data is used locally to train a model, and the weights from the local training are sent to a central server. Then the updates from all sites are combined, and the new weights are sent back to all the sites. The weights from multiple institutions are iteratively updated and shared with the central model until its performance reaches the ending criteria [36]. Tools such as “NVIDIA FLARE” and “MLFlow” provide an environment for researchers to adapt existing AI model workflow to federated learning.

Workflow

Data handling, model development, and performance evaluation are considered three requisites of building AI algorithms. Each of these tasks is divided into steps based on the data at hand and the general approach to model development. Establishing a workflow enables developers to gain an overview of the whole network, follow defined steps, and track their model’s performance in each task at hand. Platforms, such as MD.ai and MLflow, provide a workflow with a wide range of applications from data preparation to evaluation.

Security Considerations

Radiology departments and AI developers must adhere to stringent data protection regulations ensuring the security and privacy of patient data. In addition to data de-identification mentioned above, there are other key considerations for securing patient data. “Data encryption,” “availability,” and “integrity” are major concepts for data security framework. Data must be encrypted both at rest and in transit. This ensures that even if data is intercepted or accessed without authorization, it remains unreadable and secure [37]. Backup strategies and data recovery solutions are essential components of a comprehensive security plan [38]. Cloud service providers should provide robust security measures, compliance with healthcare regulations, and features that support the secure handling of radiology images and associated patient data. Some examples of clouds following this structure are AWS, Microsoft Azure Health Data Services, and Google Cloud Healthcare API. In addition to the cloud services, there are open-access tools in the imaging informatics space, which can be integrated with non-cloud products to promote information security. “Wire Shark” [39], which points out to analyzing network traffic, “Shodan” [40], which discovers internet-connected devices, and “Nmap” [41], which maps the network and identifies possible unauthorized connections.

Discussion

Continuous improvements in AI model developments may overcome obstacles of data curation for optimal data preparation. The challenging task of removing burned-in data or defacing during de-identification may eliminate parts of the metadata critical for advanced processing [42]. Khosravi et al. [43], in a recent work, developed a deep learning model to detect and anonymize radiographic markers. The AI tools for automatic segmentation have substantially improved. Many developers build emergent deep learning models to curate specific body parts or tumors. For example, Wasserthal et al. proposed a deep learning model entitled “Total Segmentator” for segmenting 117 anatomical structures [44]. In another study, Cai et al. introduced a deep learning model for the segmentation of intracranial structures in brain CT scans [45]. Furthermore, deep learning studies focusing on superresolution and mask interpolation based on voxel values are vastly under investigation. These techniques potentially improve the quality of medical images with low resolution and further reduce hallucinations in imaging reconstructions [46]. These developments may increase inter-reader agreements, creating a more feasible workspace for annotators in the future. No matter what the purpose, the optimal tool should have intuitive interfaces, clear workflows, and comprehensive documentation to facilitate its application. Furthermore, software support for these tools is desirable, like NVIDIA AI Enterprise [47], which supports MONAI, MONAI Label, and FLARE.

In addition to the aforementioned tools, platforms such as “MIDRC” provide researchers with data commons and valuable machine learning algorithms, including a metrology tree to help AI investigators determine appropriate performance metrics and a bias awareness tool to help AI investigators identify potential sources of bias and understand methods of mitigation [48]. Online competitions and challenges such as the Brain Tumor Segmentation (BraTS) Challenge [49] and the RSNA Kaggle Competition [50] have brought together developers and provided opportunities to showcase their expertise, collaborate with peers, and push the boundaries of medical image analysis and artificial intelligence. These platforms drive innovation–refining algorithms and advancing medical imaging for AI applications.

Training and support resources provided by tool developers or the research group community can also influence the process of choosing the appropriate tool. The particular preferences and skill levels of the users are important factors to be considered for optimizing their adoption and productivity.

In our study, we excluded commercial and not open-source tools despite covering a wide range of data curation tasks. For instance, MD.ai [51] was used by several participants of the survey as an image viewer, de-identifier, format converter, and segmentation tool, even though it is not open source. Additionally, imaging formats such as “Analyze” and “minc” [52], as well as co-registration tools such as Advanced Normalization Tool (ANT) [53], which are vastly known, were not mentioned by participants in the current study.

Conclusion

This study provides a comprehensive overview of available open-source tools, drawing from the insights and experiences of the SIIM community. By curating a list of practical and widely used tools, we aimed to streamline the process of tool selection for researchers, enabling them to make informed decisions based on community expertise.

It is important to recognize the dynamic nature of the field of AI. As advancements continue to be made and new tools emerge, our work can serve as one of the starting points for researchers seeking to navigate the ever-expanding landscape of open-source tools. By incorporating feedback from the community and staying abreast of the latest developments in the field, we aim to continually improve the utility and relevance of our resources for researchers.

Data Availability

The dataset collected is available upon reasonable request to the corresponding author.

References

Center for Devices, Radiological Health Artificial Intelligence and Machine Learning (AI/ML)-Enabled Medical Devices. In: U.S. Food and Drug Administration. https://www.fda.gov/medical-devices/software-medical-device-samd/artificial-intelligence-and-machine-learning-aiml-enabled-medical-devices
Zhang K, Khosravi B, Vahdati S, Erickson BJ (2024) FDA Review of Radiologic AI Algorithms: Process and Challenges. Radiology 310:e230242
Article PubMed Google Scholar
Leipzig J, Nüst D, Hoyt CT, Ram K, Greenberg J (2021) The role of metadata in reproducible computational research. Patterns (N Y) 2:100322
Article PubMed Google Scholar
Willemink MJ, Koszek WA, Hardell C, Wu J, Fleischmann D, Harvey H, Folio LR, Summers RM, Rubin DL, Lungren MP (2020) Preparing Medical Imaging Data for Machine Learning. Radiology 295:4–15
Article PubMed Google Scholar
Prevedello LM, Halabi SS, Shih G, Wu CC, Kohli MD, Chokshi FH, Erickson BJ, Kalpathy-Cramer J, Andriole KP, Flanders AE (2019) Challenges Related to Artificial Intelligence Research in Medical Imaging and the Importance of Image Analysis Competitions. Radiology: Artificial Intelligence. https://doi.org/10.1148/ryai.2019180031
Article PubMed PubMed Central Google Scholar
Parmar C, Barry JD, Hosny A, Quackenbush J, Aerts HJWL (2018) Data analysis strategies in medical imaging. Clin Cancer Res 24:3492–3499
Article PubMed PubMed Central Google Scholar
Diaz O, Kushibar K, Osuala R, Linardos A, Garrucho L, Igual L, Radeva P, Prior F, Gkontra P, Lekadir K (2021) Data preparation for artificial intelligence in medical imaging: A comprehensive guide to open-access platforms and tools. Phys Med 83:25–37
Article PubMed Google Scholar
Demirer M, Candemir S, Bigelow MT, et al (2019) A User Interface for Optimizing Radiologist Engagement in Image Data Curation for Artificial Intelligence. Radiol Artif Intell 1:e180095
Article PubMed PubMed Central Google Scholar
Office for Civil Rights (OCR) (2012) Guidance regarding methods for DE-identification of protected health information in accordance with the health insurance portability and accountability act (HIPAA) Privacy Rule. In: HHS.gov. https://www.hhs.gov/hipaa/for-professionals/privacy/special-topics/de-identification/index.html.
Erickson BJ, Fajnwaks P, Langer SG, Perry J (2014) Multisite Image Data Collection and Management Using the RSNA Image Sharing Network. Transl Oncol 7:36–39
Article PubMed PubMed Central Google Scholar
Aryanto KYE, Oudkerk M, van Ooijen PMA (2015) Free DICOM de-identification tools in clinical research: functioning and safety of patient privacy. Eur Radiol 25:3685–3695
Article CAS PubMed PubMed Central Google Scholar
Shahid A, Bazargani MH, Banahan P, Mac Namee B, Kechadi T, Treacy C, Regan G, MacMahon P (2022) A Two-Stage De-Identification Process for Privacy-Preserving Medical Image Analysis. Healthcare (Basel). https://doi.org/10.3390/healthcare10050755
Article PubMed PubMed Central Google Scholar
GitHub - mih/mridefacer: Helper to aid de-identification of MRI images (3D or 4D). In: GitHub. https://github.com/mih/mridefacer.
Wiggins RH 3rd, Davidson HC, Harnsberger HR, Lauman JR, Goede PA (2001) Image file formats: past, present, and future. Radiographics 21:789–798
Article PubMed Google Scholar
Sriramakrishnan P, Kalaiselvi T, Padmapriya ST, Shanthi N, Ramkumar S, Kalaichelvi N (2019) An medical image file formats and digital image conversion. Int J Eng Adv Technol 9:74–78
Article Google Scholar
Oladiran O, Gichoya J, Purkayastha S (2017) Conversion of JPG Image into DICOM Image Format with One Click Tagging. In: Digital Human Modeling. Applications in Health, Safety, Ergonomics, and Risk Management: Health and Safety. Springer International Publishing, pp 61–70
Shini SG, Thomas T, Chithraranjan K (2012) Cloud Based Medical Image Exchange-Security Challenges. Procedia Engineering 38:3454–3461
Article Google Scholar
Pareek A, Lungren MP, Halabi SS (2022) The requirements for performing artificial-intelligence-related research and model development. Pediatr Radiol 52:2094–2100
Article PubMed Google Scholar
Alshareef HN (2023) Current development, challenges, and future trends in cloud computing: A survey. Int J Adv Comput Sci Appl. https://doi.org/10.14569/ijacsa.2023.0140337
Le KH, Tran TV, Pham HH, Nguyen HT, Le TT, Nguyen HQ (2023) Learning From Multiple Expert Annotators for Enhancing Anomaly Detection in Medical Image Analysis. IEEE Access 11:14105–14114
Article Google Scholar
Aiello M, Esposito G, Pagliari G, Borrelli P, Brancato V, Salvatore M (2021) How does DICOM support big data management? Investigating its use in medical imaging community. Insights Imaging 12:164
Article PubMed PubMed Central Google Scholar
Eley KA, Delso G (2020) Automated Segmentation of the Craniofacial Skeleton With “Black Bone” Magnetic Resonance Imaging. J Craniofac Surg 31:1015
Article PubMed Google Scholar
Bianco S, Ciocca G, Napoletano P, Schettini R (2015) An interactive tool for manual, semi-automatic and automatic video annotation. Comput Vis Image Underst 131:88–99
Article Google Scholar
Sakinis T, Milletari F, Roth H, Korfiatis P, Kostandy P, Philbrick K, Akkus Z, Xu Z, Xu D, Erickson BJ (2019) Interactive segmentation of medical images through fully convolutional neural networks. arXiv
Website. MONAI Consortium. (2023). MONAI: Medical Open Network for AI (1.2.0). Zenodo. https://doi.org/10.5281/zenodo.8018287.
Mazurowski MA, Dong H, Gu H, Yang J, Konz N, Zhang Y (2023) Segment Anything Model for Medical Image Analysis: an Experimental Study.
Singh S, Ahuja U, Kumar M, Kumar K, Sachdeva M (2021) Face mask detection using YOLOv3 and faster R-CNN models: COVID-19 environment. Multimed Tools Appl 80:19753–19768
Article PubMed PubMed Central Google Scholar
Nath V, Yang D, Landman BA, Xu D, Roth HR (2021) Diminishing Uncertainty Within the Training Pool: Active Learning for Medical Image Segmentation. IEEE Trans Med Imaging 40:2534–2547
Article PubMed Google Scholar
Yang L, Zhang Y, Chen J, Zhang S, Chen DZ (2017) Suggestive Annotation: A Deep Active Learning Framework for Biomedical Image Segmentation. In: Medical Image Computing and Computer Assisted Intervention − MICCAI 2017. Springer International Publishing, pp 399–407
Dean CJ, Sykes JR, Cooper RA, Hatfield P, Carey B, Swift S, Bacon SE, Thwaites D, Sebag-Montefiore D, Morgan AM (2012) An evaluation of four CT–MRI co-registration techniques for radiotherapy treatment planning of prone rectal cancer patients. BJR Suppl 85:61–68
Article CAS Google Scholar
Huhdanpaa H, Hwang DH, Gasparian GG, et al (2014) Image coregistration: quantitative processing framework for the assessment of brain lesions. J Digit Imaging 27:369–379
Article PubMed PubMed Central Google Scholar
Wildeboer RR, van Sloun RJG, Postema AW, Mannaerts CK, Gayet M, Beerlage HP, Wijkstra H, Mischi M (2018) Accurate validation of ultrasound imaging of prostate cancer: a review of challenges in registration of imaging and histopathology. J Ultrasound 21:197–207
Article PubMed PubMed Central Google Scholar
Chen DQ, Dell’Acqua F, Rokem A, Garyfallidis E, Hayes DJ, Zhong J, Hodaie M (2019) Diffusion Weighted Image Co-registration: Investigation of Best Practices. bioRxiv 864108
Yi T, Pan I, Collins S, et al (2021) DICOM Image ANalysis and Archive (DIANA): an Open-Source System for Clinical AI Applications. J Digit Imaging 34:1405–1413
Article PubMed PubMed Central Google Scholar
The Medical Imaging and Data Resource Center Commons. https://data.midrc.org/.
Darzidehkalani E, Ghasemi-Rad M, van Ooijen PMA (2022) Federated Learning in Medical Imaging: Part I: Toward Multicentral Health Care Ecosystems. J Am Coll Radiol 19:969–974
Article PubMed Google Scholar
Eichelberg M, Kleber K, Kämmerer M (2020) Cybersecurity in PACS and Medical Imaging: an Overview. J Digit Imaging 33:1527–1542
Article PubMed PubMed Central Google Scholar
Shah C, Nachand D, Wald C, Chen P-H (2023) Keeping Patient Data Secure in the Age of Radiology Artificial Intelligence: Cybersecurity Considerations and Future Directions. J Am Coll Radiol 20:828–835
Article PubMed Google Scholar
Wireshark · go deep. In: Wireshark. https://www.wireshark.org/.
Shodan. In: Shodan. https://www.shodan.io/.
Nmap: the Network Mapper - Free Security Scanner. https://nmap.org/.
Kohli MD, Summers RM, Geis JR (2017) Medical Image Data and Datasets in the Era of Machine Learning—Whitepaper from the 2016 C-MIMI Meeting Dataset Session. J Digit Imaging 30:392–399
Article PubMed PubMed Central Google Scholar
Khosravi B, Mickley JP, Rouzrokh P, Taunton MJ, Noelle Larson A, Erickson BJ, Wyles CC (2023) Anonymizing Radiographs Using an Object Detection Deep Learning Algorithm. Radiology: Artificial Intelligence. https://doi.org/10.1148/ryai.230085
Article PubMed Google Scholar
Wasserthal J, Breit H-C, Meyer MT, et al (2023) TotalSegmentator: Robust Segmentation of 104 Anatomic Structures in CT Images. Radiology: Artificial Intelligence. https://doi.org/10.1148/ryai.230024
Article PubMed Google Scholar
Cai JC, Akkus Z, Philbrick KA, et al (2020) Fully Automated Segmentation of Head CT Neuroanatomy Using Deep Learning. Radiology: Artificial Intelligence. https://doi.org/10.1148/ryai.2020190183
Article PubMed Google Scholar
(2010) A non-local approach for image super-resolution using intermodality priors. Med Image Anal. 14:594–605
NVIDIA AI: Advanced AI Platform for Enterprise. In: NVIDIA. https://www.nvidia.com/en-us/ai-data-science/?ncid=pa-srch-goog-679855&_bt=663202418341&_bk=nvidia%20ai%20enterprise&_bm=e&_bn=g&_bg=153503051907&gclid=Cj0KCQiAuqKqBhDxARIsAFZELmIEbOeKttEz3rSkz_LGk-V8ppyEzFyfmpJXgRKyV5hluz4NdoxN2GkaAlkKEALw_wcB.
Drukker K, Chen W, Gichoya J, et al (2023) Toward fairness in artificial intelligence for medical image analysis: identification and mitigation of potential biases in the roadmap from data collection to model deployment. J Med Imaging (Bellingham) 10:061104
PubMed Google Scholar
MICCAI BRATS - The Multimodal Brain Tumor Segmentation Challenge. http://braintumorsegmentation.org/.
RSNA 2022 Cervical Spine Fracture Detection. https://kaggle.com/competitions/rsna-2022-cervical-spine-fracture-detection.
Ai MD MD.ai. https://md.ai/.
Larobina M, Murino L (2014) Medical image file formats. J Digit Imaging 27:200–206
Article PubMed Google Scholar
ANTs by stnava. https://stnava.github.io/ANTs/.

Download references

Author information

Authors and Affiliations

Artificial Intelligence Laboratory, Department of Radiology, Mayo Clinic, 200 1st Street, SW, Rochester, MN, 55905, USA
Sanaz Vahdati, Bardia Khosravi, Elham Mahmoudi, Kuan Zhang, Pouria Rouzrokh, Shahriar Faghani, Mana Moassefi & Bradley J. Erickson
Department of Radiology, Thomas Jefferson University, Philadelphia, PA, USA
Aylin Tahmasebi
Department of Radiology, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA, USA
Katherine P. Andriole
Department of Radiological Sciences, Irvine Medical Center, University of California, Orange, CA, USA
Peter Chang
Center for Biomedical Informatics and Information Technology, National Cancer Institute, Bethesda, MD, USA
Keyvan Farahani
NVIDIA, Santa Clara, CA, USA
Mona G. Flores
Diagnostic Imaging & Interventional Radiology Moffitt Cancer Center, Tampa, FL, USA
Les Folio
Department of Radiology and Biomedical Imaging, University of California San Francisco, San Francisco, CA, USA
Sina Houshmand
Department of Radiology, The University of Chicago, Chicago, IL, USA
Maryellen L. Giger
Department of Radiology, Emory University School of Medicine, Atlanta, GA, USA
Judy W. Gichoya

Authors

Sanaz Vahdati
View author publications
You can also search for this author in PubMed Google Scholar
Bardia Khosravi
View author publications
You can also search for this author in PubMed Google Scholar
Elham Mahmoudi
View author publications
You can also search for this author in PubMed Google Scholar
Kuan Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Pouria Rouzrokh
View author publications
You can also search for this author in PubMed Google Scholar
Shahriar Faghani
View author publications
You can also search for this author in PubMed Google Scholar
Mana Moassefi
View author publications
You can also search for this author in PubMed Google Scholar
Aylin Tahmasebi
View author publications
You can also search for this author in PubMed Google Scholar
Katherine P. Andriole
View author publications
You can also search for this author in PubMed Google Scholar
Peter Chang
View author publications
You can also search for this author in PubMed Google Scholar
Keyvan Farahani
View author publications
You can also search for this author in PubMed Google Scholar
Mona G. Flores
View author publications
You can also search for this author in PubMed Google Scholar
Les Folio
View author publications
You can also search for this author in PubMed Google Scholar
Sina Houshmand
View author publications
You can also search for this author in PubMed Google Scholar
Maryellen L. Giger
View author publications
You can also search for this author in PubMed Google Scholar
Judy W. Gichoya
View author publications
You can also search for this author in PubMed Google Scholar
Bradley J. Erickson
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bradley J. Erickson.

Ethics declarations

Ethical Approval

This work has been established based on a survey conducted by the Society of Imaging Informatics (SIIM) Research Committee and collaboration with the Mayo Clinic Artificial Intelligence Laboratory (MayoAILab).

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Vahdati, S., Khosravi, B., Mahmoudi, E. et al. A Guideline for Open-Source Tools to Make Medical Imaging Data Ready for Artificial Intelligence Applications: A Society of Imaging Informatics in Medicine (SIIM) Survey. J Digit Imaging. Inform. med. (2024). https://doi.org/10.1007/s10278-024-01083-0

Download citation

Received: 11 January 2024
Revised: 29 February 2024
Accepted: 08 March 2024
Published: 01 April 2024
DOI: https://doi.org/10.1007/s10278-024-01083-0

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

A Guideline for Open-Source Tools to Make Medical Imaging Data Ready for Artificial Intelligence Applications: A Society of Imaging Informatics in Medicine (SIIM) Survey

Abstract

Similar content being viewed by others

Efficient Large Scale Medical Image Dataset Preparation for Machine Learning Applications

A Standardised Approach for Preparing Imaging Data for Machine Learning Tasks in Radiology

VISCERAL: Towards Large Data in Medical Imaging — Challenges and Directions

Introduction

Method

Data Curation

De-Identification

Data Format and Conversion

Image Normalization

Cloud Computing and Operating Systems