Introduction

In 2008, Apple and Google launched application (app) stores where users can download computer programs and software onto mobile devices. Apple launched 500 apps; Google launched 50. By 2023, Apple had nearly 2 million apps, Google had more than 3 million, and more are added daily (Apple, 2023; Google, 2023). With >350,000 mobile health (mHealth) apps available, many people utilize mHealth apps for added support in accomplishing their health and well-being goals (Ceci, 2022a, b; Levine et al., 2020). mHealth apps are making their way into clinical practice via patients asking providers about apps and providers recommending them to patients. Identifying which mHealth apps should be recommended by healthcare providers and used by patients can be daunting due to the volume, as well as lack of oversight and evaluation. By October 2020, >70 frameworks and scales for mHealth evaluation existed in the literature (Lagan et al., 2021a), yet developers rarely evaluate mHealth apps prior to launch.

This paper reviews the importance of evaluating apps and describes the cautions and benefits of using mHealth. App evaluation can be completed by an individual provider, at an organization level, or by professional organizations. Frameworks and scales that are comprehensive, widely used, accessible to providers, validated or peer reviewed, and believed to be most beneficial in helping providers make recommendations in healthcare settings were included. Further, we summarize what should be integrated into an mHealth evaluation framework or scale, introduce free searchable databases that can facilitate app selection, describe a published evaluation model, and present guidelines and a call to action for mHealth integration. Some frameworks, scales, and databases herein are specific to mental health; however, they are applicable across all healthcare disciplines. The goal of this document is to facilitate more streamlined implementation of mHealth into healthcare by helping healthcare providers and organizations competently and confidently evaluate mHealth options prior to recommendation to the people they serve and integration into treatment workflows.

What Are Digital Health and mHealth?

Digital health encompasses all healthcare technologies (e.g., artificial intelligence applications; eHealth; e-tools to monitor, change, and evaluate personal health domains; electronic medical records; mHealth; telehealth and telemedicine; wearable devices) and exists across several domains (Table 1). Digital health has “the vast potential to improve our ability to accurately diagnose and treat disease and to enhance health care delivery to all people” (U.S. Food & Drug Administration, 2020). mHealth is a subset of digital health that uses smartphone and mobile device apps to accomplish these goals (Chan, 2021). Coinciding with increased functionality and use of smartphones, tablets, and other personal electronic devices, mHealth utilization has increased exponentially in the past two decades.

Table 1 Digital health domains

mHealth can facilitate positive change in clinical practice and consumer health. For instance, mHealth can help patients monitor symptoms and medication adherence between appointments and easily share real-time data directly with providers (Rowland et al., 2020). Collecting and reviewing this information outside of appointments may allow for more accurate data collection compared to retroactive recall, allow providers to more efficiently use the brief face-to-face time they have with patients in overburdened healthcare systems, and ultimately improve patient and provider relationships and health outcomes (American Psychiatric Association, 2023a). mHealth provides opportunities for healthcare teams and patients to record data, set and observe progress toward goals, and increase access to interventions that may improve health outcomes. However, comprehensive evaluation is needed to address concerns about widespread implementation of mHealth apps in healthcare settings across the world.

Why Is Evaluating mHealth Important?

It is important to evaluate mHealth apps for effectiveness, quality, and safety prior to clinic workflow integration or recommending apps to patients. Selecting an appropriate app that will meet patient needs is difficult due to the overwhelming number of mHealth apps available. Additionally, existing apps get updated, some disappear, and new apps become available regularly (American Psychiatric Association, 2023a). Providers must also be confident that mHealth apps are reliable, user friendly, and appropriate for their patients’ needs. Healthcare providers often hesitate to recommend mHealth apps to patients because they do not have adequate knowledge of available apps (Wangler & Jansky, 2021). Wangler and Jansky (2021) surveyed 2,138 German primary care physicians and found that only 18% of respondents recommended apps frequently or occasionally; over half never recommended apps. Teaching providers how to evaluate mHealth could increase the use of appropriate mHealth apps in clinical settings and improve patient health outcomes.

While many mHealth options are available and may promote positive change for patients and clinics, little is known about their efficacy. Apps are often not evaluated beyond the app store’s five-star rating system, which does not correlate with an app’s effectiveness at doing what it purports to do (Levine et al., 2020). mHealth clinical trials are rare, and deployment in clinical settings is often unstudied. Thus, it is important to assess whether an app has an evidence base.

The notable interest in mHealth and the growing market have enabled many companies, often led by investors and tech specialists, not health or medical experts, to focus on building and selling mHealth without adequately testing for feasibility and effectiveness, nor sufficiently considering data security. Apps not developed by content experts in the healthcare field where they are to be implemented may deliver inaccurate information that could lead to adverse outcomes. In addition, inclusion in an app store is based on technological, not clinical or efficacy, requirements. Consumers may confuse these requirements and believe an app does what it purports. Therefore, patient health and safety may be at risk if an app has not been properly evaluated (Roberts et al., 2021). Apps should be assessed on whether they were developed via scientific methods and by a company with a health focus, not a company focused on breaking into a lucrative, rapidly expanding industry.

In addition, unless designated as a medical device, mHealth is not regulated by the US Food and Drug Administration (FDA) or other international regulatory organizations (e.g., European Medicines Agency). While the FDA recognized this gap and deployed a pilot program for regulating mHealth between 2017 and 2021, this Digital Health Software Precertification Pilot Program is not finalized nor comprehensive (U.S. Food and Drug Administration, 2022; Lagan et al., 2021a). Further complicating the landscape is the breadth of services that may be commonly referred to as an “app” including, but not limited to, self-help tools, clinical augmentation, self-guided treatment, connecting patients with providers, and other health tools. Thus, little oversight leaves end users to determine whether an mHealth product is safe and effective. End user ratings may correlate with popularity but not with evaluation framework ratings. Some critics fear, and evidence supports, that mHealth may cause harm to consumers, potentially violating the healthcare principle of beneficence (Levine et al., 2020). mHealth products may also provide incorrect or misleading information to consumers, be ineffective, lack privacy and security measures, or sell users’ personal data (American Psychiatric Association, 2023a). In fact, a variety of well-known mHealth apps faced legal consequences in response to sharing sensitive information with external marketing and analytics firms and for being ineffective and misleading in what they claimed to do (Wicklund, 2017). The Federal Trade Commission also acted against deception and false advertising by mHealth apps (Wagner, 2020). Therefore, mHealth apps used in healthcare settings should be assessed for privacy and security since protecting patients’ personal health information is of the utmost importance.

What Should One Look for in an App Evaluation Framework or Scale?

The first step for mHealth app evaluation is identifying an appropriate framework or scale. Frameworks and scales typically have a theoretical basis with the goal of being an evaluation tool. Scales differ from frameworks in that they are composed of quantitative questions that provide at least a composite score, whereas frameworks serve as an evaluation road map, often with yes/no questions, that provide guidance on app selection. The most recent and thorough review of mHealth frameworks spanned the literature through October 2020 (Lagan et al., 2021a). The authors reviewed >70 frameworks, yet the number of frameworks and apps continues to grow. Since the mHealth industry is constantly evolving, it is important to know how to evaluate an app. The following questions will assist in choosing a framework or scale to determine whether an app is the right fit.

What Metrics Does the Framework or Scale Assess?

mHealth evaluation frameworks and scales evaluate a variety of criteria, most often usability and accessibility (e.g., learnability, efficiency, errors, and satisfaction; Aljaber et al., 2015). Additional criteria include engagement, privacy and security, content quality, effectiveness, and evidence base/research.

Does the Framework or Scale Assess the Technology of Interest?

To assess mHealth, it is important the selected framework or scale has been developed and tested for use with mHealth specifically. Using apps in healthcare settings creates unique implementation considerations, including clinical setting nuances and privacy and security concerns, which must be considered by the selected framework or scale.

Is the Framework or Scale Evidence-Based and/or Tested?

A peer-reviewed evidence base increases the credibility of frameworks and scales; a framework or scale should be well-researched and statistically validated.

mHealth Evaluation Frameworks and Scales

How Were Evaluation Frameworks and Scales Chosen for This Review?

Evaluation frameworks and scales needed to meet an established list of criteria for inclusion. Frameworks and scales had to be accessible online, available in English due to authors’ language and translation barriers, and focused on mHealth, rather than digital health broadly. Frameworks and scales needed to have an evidence base, have a data-driven development process, and be useable across all healthcare disciplines. Minimally, frameworks and scales were published in peer-reviewed journals, and scales were validated for internal consistency and interrater reliability. Finally, as these guidelines focus on recommending mHealth to patients and integrating mHealth into healthcare clinics, assessment of privacy and security was strongly preferred. If a framework or scale was limited to a specific population, to a specific outcome, or did not assess privacy and security but were otherwise well-designed and researched, they were included with the suggestion that they be used in conjunction with an additional tool.

Several frameworks and scales are available online that are not peer reviewed in the scientific literature and/or lack validation data; these were excluded for those reasons. We believe the two selected frameworks and three scales are comparable, comprehensive, and meet the inclusion criteria.

Frameworks

American Psychiatric Association App Evaluation Model

The American Psychiatric Association App Evaluation Model (APA Framework) was developed to help psychiatric providers, other healthcare providers, and their patients identify apps to support mental health–related treatment goals (Torous et al., 2018). The APA recognized that app selection is an individualized decision and should consider many patient-specific factors, the clinical context, and app version. Patients regularly use apps and ask for provider input, yet the APA acknowledged that healthcare providers are not trained to identify appropriate apps or make informed recommendations (American Psychiatric Association, 2023b). Additionally, previous frameworks did not reliably consider app safety and usefulness (Torous et al., 2018). In 2018, the APA Framework launched for mental health apps. It was expanded for use across healthcare disciplines by a multi-stakeholder expert panel in 2021. The framework is applicable to all mHealth apps and has an increased focus on accessibility and clinical research and employs more accurate terminology (Lagan et al., 2021b). This panel also developed a brief eight question screener that could be more easily applied in busy clinical settings (Lagan et al., 2021b).

The APA Framework presents an “adaptable scaffold for informed decision making” when selecting an mHealth app (Torous et al., 2018). This structure uses hierarchical stages (Fig. 1, horizontal pyramid) which allow the evaluator to stop evaluating an app at any stage if concerns are noted. The five levels, starting with the most foundational on the left, are accessibility, privacy and security, clinical foundation, engagement style, and therapeutic goal (Fig. 1; American Psychiatric Association, 2023b). This framework ensures healthcare providers and their patients have adequate information to make an informed decision based on their unique circumstances (American Psychiatric Association, 2023b).

Fig. 1
figure 1

American Psychological Association mental health app evaluation framework, adapted from Lagan et al. (2020). This model encourages the user to move through the evaluation process from left to right, beginning with level 1 and moving through level 2 through level 5. Each level must be assessed and passed before moving onto the next. Level 1 recommends users to assess accessibility concerns, level 2 assesses privacy and security, level 3 assesses clinical foundation, level 4 assesses engagement style, and level 5 assesses therapeutic goal

The APA Framework is straightforward, comprehensive, flexible, and relevant to diverse contexts. Importantly for the healthcare sector, it prioritizes accessibility, privacy, and security. This framework encompasses all domains identified as key standards for app evaluation by mHealth leaders in industry and academia, and it can be used across all mHealth settings. This framework evaluates mHealth effectiveness by assessing an app’s evidence base, whether it does what it claims to do, and demonstrated effectiveness. Furthermore, the screener increases accessibility for busy clinicians and patients.

Summary

The APA Framework is a comprehensive, adaptable, and hierarchical model, developed by experts and people with lived experience, that can be used by healthcare providers and patients to make informed decisions on which mHealth apps can be tailored to individual health and behavioral health goals.

mHealth for Older Users

It is common for mHealth evaluation tools and frameworks to assess usability; however, they often fail to address important considerations for the aging population who “interact differently with information technology compared to younger people” (Wildenbos et al., 2018). In response, Wildenbos et al. (2018) created the mHealth for Older Users (MOLD-US) framework to assess usability for the older adult population (Fig. 2). Based on scientific literature, four barriers that influence mHealth usability are described: (1) cognition, (2) motivation, (3) physical ability, and (4) perception (Fig. 2, top row). From there, medical conditions that contribute to barriers are identified (Fig. 2, second row), followed by elements of mHealth user experience possibly affected by barriers and medical conditions (Fig. 2, bottom row). For example, cognitive barriers may include decline in working memory due to stroke that may cause satisfaction, memorability, learnability, efficiency, and error issues in the user experience (Fig. 2, left column). Population-specific barriers should be considered throughout mHealth development, implementation, and evaluation.

Fig. 2
figure 2

mHealth for Older Users (MOLD-US) evaluation framework, adapted from Wildenbos et al. (2018). MOLD-US guides users through an evaluation process to identify how usability barriers for older users can translate into dimensions of app usability. Four barriers (cognition, motivation, perception, and physical abilities) can be seen in a variety of diseases (e.g., diabetes, concentration issues, cataracts, and rheumatoid arthritis). Users may determine which barriers are likely in their audience and use the framework to identify the dimensions of app usability (satisfaction, memorability, learnability, efficiency, and errors) that they will need to address to increase usability for older adults

MOLD-US is unique in identifying older adult-specific barriers to mHealth while providing examples of health conditions that may cause specific usability issues. It provides a roadmap to comprehensively evaluate mHealth for population-specific usability, bridging a gap in the implementation and accessibility of mHealth for aging populations (Wildenbos et al., 2018). MOLD-US, however, lacks a validated scale that can be used to assess unique usability barriers in mHealth, making adoption of MOLD-US in healthcare settings more difficult. It is only recommended for evaluating apps for older adults and apps for medical conditions commonly diagnosed in older adults, but MOLD-US could be used to inform other innovative population-specific usability frameworks. In addition, MOLD-US should be utilized in conjunction with a more comprehensive framework or scale since it does not assess other important aspects of mHealth, such as data privacy and security or effectiveness.

Summary

MOLD-US identifies older adult-specific mHealth barriers. It describes how barriers are related to common medical conditions in this population and how usability may be affected by barriers and medical conditions. While bridging an important gap in mHealth development and implementation, it does not assess effectiveness, content quality, or privacy and security. MOLD-US should be used in conjunction with a more comprehensive framework or scale.

Scales

Adapted Mobile App Rating Scale

The Mobile App Rating Scale (MARS), the first app evaluation tool, was developed in 2015 to evaluate user experience. It was updated to the Adapted MARS (A-MARS) to expand its use for e-tool (e.g., websites, online courses) evaluation, in addition to apps (Roberts et al., 2021). A-MARS is comprehensive, widely used, and available in multiple languages, but does not evaluate privacy or security, crucial considerations for healthcare settings.

The 28-item scale measures app and e-tool quality across six domains: engagement, functionality, aesthetics, information, subjective quality, and health-related quality. Items are scored on a 5-point Likert scale; higher scores are better. Each domain receives an average score, and each app is assigned an overall score (see Roberts et al., 2021, for the scale). Preliminary internal testing of the A-MARS showed high internal consistency ( = 0.94) and interrater reliability. In addition, A-MARS maintains this consistency when looking at the first four domains independent from the two quality-related subscales ( = 0.91; Roberts et al., 2021).

A-MARS was created specifically for use with mHealth and can evaluate apps and e-tools, whereas most scales can be used to evaluate apps only (Roberts et al., 2021). Implementing A-MARS requires adequate training and is time-consuming. The creators recommend piloting the scale by evaluating 3–5 apps and having multiple people rate each app to ensure interrater reliability. Therefore, providers may not have the time nor skillset to effectively utilize A-MARS. The creators encourage organizations to consider hiring a digital navigator to identify and evaluate mHealth for their patient population (Roberts et al., 2021). The A-MARS contains one effectiveness-focused question that evaluates an app’s evidence base and whether it has been trialed or tested; it also assesses whether the language is appropriate for end users. A-MARS does not, however, assess data privacy and security. This is a significant limitation, so A-MARS should be used in conjunction with a more comprehensive framework or scale. Alternatively, security and privacy-specific items from other scales could be integrated into A-MARS to assess these domains and decrease risk to patient data.

Summary

A-MARS is an effective and reliable scale that can be used to evaluate apps and e-tools. It can also be used by adequately trained healthcare providers to identify appropriate mHealth apps and e-tools for their practices, with the understanding that this tool does not assess data privacy and security.

App Behavior Change Scale

Evaluating mHealth for its design, usability, data privacy, and security is crucial, but if an app claims to change behavior, it is also important to assess its potential to do so. The App Behavior Change Scale (ABACUS) is a validated 21-item scale that assesses an app’s potential behavior change across four domains: (1) knowledge and information, (2) goals and planning, (3) feedback and monitoring, and (4) actions (Alslaity et al., 2022; McKay et al., 2019). See McKay et al. (2019) for the scale. Based on health behavior change interventions, the scale underwent rigorous testing prior to final validation. ABACUS was validated by rating 20 apps; results indicated high internal consistency ( = 0.93) and interrater reliability (McKay et al., 2019).

ABACUS is also widely used in the literature. Its innovation in the assessment of behavior change is a strength that is missing from most mHealth frameworks and scales. While measuring potential for behavior change may not translate to actual behavior change, it may act as a proxy for effectiveness when deciding whether to recommend an app to a patient. Since ABACUS specifically measures potential behavior change, it does not measure other important aspects of mHealth, such as usability, data privacy and security, content quality, and engagement. Therefore, ABACUS should be used with another more comprehensive framework or scale.

Summary

ABACUS is a validated and widely used scale to measure mHealth apps’ potential behavior change. It may serve as a proxy to measure potential effectiveness of mHealth apps and should be used in conjunction with another scale or framework that assesses other critical evaluation criteria, including usability, data privacy and security, content quality, evidence base and research, and engagement.

THESIS

The THESIS scale aims to bridge gaps in other evaluation resources (e.g., concerns of access, privacy, security, and interoperability) and specifically evaluate mHealth apps for chronic disease. The creators recognized that most apps are developed for “relatively healthy patients and few are developed specifically for high-cost, high-need patients, or patients with chronic disease” (Levine et al., 2020). It is well known that longitudinal care benefits this population, yet few apps are developed and intended for long-term use. A panel of experts and patient representatives convened in 2017 to rate and review criteria identified by reviewing other evaluation tools. Criteria other tools lacked (e.g., bandwidth and device memory requirements) were added while considering the population of interest. These factors significantly impact whether a patient can download and use an mHealth app, particularly important considerations since people with chronic conditions often live in poverty.

THESIS is an acronym for the six domains evaluated by the scale: “transparency, health content, excellent technical content, security/privacy, usability, and subjective” (Levine et al., 2020, Fig. 3). A score of 1–5 is provided for the app overall and within each domain (see Levine et al., 2020, for the scale). Of note, THESIS contains two effectiveness-focused questions that ask whether the app measures what it claims to measure and interprets what it claims to interpret. Furthermore, it asks how many languages an app is available in and whether users with low literacy/numeracy can use the app with ease.

Fig. 3
figure 3

Domains evaluated by the THESIS rating tool. The THESIS rating tool assesses six domains for mHealth evaluation; all are weighted equally: transparency, health content, excellent technical content, security/privacy, issues of usability, and subjective rating

THESIS creators acknowledge that their scale does not address every aspect of mHealth apps that may be of interest; instead, their goal was to create a scale that was quick to use. THESIS can evaluate an mHealth app in ~ 12 min by raters with a college-level education or tech background. Interrater reliability is moderate (K = 0.3–0.6) and internal consistency is high ( = 0.85). THESIS could benefit from further validation using a larger cohort of raters from varying backgrounds and areas of expertise, however (Levine et al., 2020).

Summary THESIS is a scale that evaluates mHealth apps for chronic health conditions, including mental health conditions. It is one of the only frameworks or scales that includes consideration of app size and the related impacts on access; however, it may be less accessible to raters without a college degree or tech background.

mHealth Evaluation Databases

To help ensure a selected app(s) will meet the patients’ and/or clinic’s needs, providers should independently evaluate apps prior to integration into a clinic or recommendation to patients. Two free searchable databases evaluate mHealth apps based on frameworks or scales reviewed herein. Their ratings and reviews may assist in identifying the best apps for specific circumstances. Notably, at the time of publication, available searchable databases focus primarily on the evaluation of apps that target mental health.

One Mind PsyberGuide

One Mind PsyberGuide is a searchable database of >230 mental health apps that is operated through a non-profit collaboration between Northwestern University and University of California, Irvine (One Mind PsyberGuide, 2023ab). Created in 2013, the goal of the database is to improve access to high-quality mental health apps that can improve mental wellness without bias or endorsement (One Mind PsyberGuide, 2023a). A team of mental health and technology experts developed and maintain the database by assessing app credibility, user experience, and transparency. Credibility includes items focused on proposed goals, evidence-based content, research base and independence, software updates, development team, and process. In addition to subjective quality and perceived impact scores, the original MARS is used to evaluate user experience, including engagement, functionality, aesthetics, and information. Effectiveness is measured in the credibility and user experience domains (MARS Section D) via questions about evidence-based content and existing evidence/research studies. Finally, transparency is scored as acceptable, questionable, or unacceptable based on strict guidelines. To be rated as having acceptable transparency, an app must provide accessible information and must conform to standard policies regarding data collection, storage, and exchange (One Mind PsyberGuide, 2023a).

One Mind PsyberGuide is user friendly; allows users to filter results by mental health concern, platform, audience, and cost; and provides clear professional reviews for some apps. The database prioritizes reviews of popular apps based on the number of reviews in the Apple and Google Play App Stores, but requests for an app review can be submitted by contacting PsyberGuide. Many apps listed, however, have incomplete evaluations (e.g., credibility rating only without reviews), and One Mind PsyberGuide’s policy regarding updating ratings and re-evaluating apps as they evolve is unclear (e.g., some are as old as 7 years). Furthermore, the database only evaluates mental health apps (One Mind PsyberGuide, 2023a, c).

mHealth Index and Navigation Database

The mHealth Index and Navigation Database (MIND) is a searchable database of >600 health apps based on the APA Framework. While most apps are mental health–focused, other domains are included (e.g., fitness, food diary, and physical health tracking apps). In addition to the five APA Framework levels, MIND includes review of how data is inputted into the app and how it outputs information (Fig. 4, inputs and outputs). MIND was developed by the APA and the Digital Psychiatry Lab at the Beth Israel Deaconess Medical Center. It uses 105 objective yes/no questions to make the APA Framework “functionable and actionable for public use” (Division of Digital Psychiatry, 2023). See Division of Digital Psychiatry (2023) for the questions. Subjective questions, such as whether the app is easy to use, and objective questions that are not easily answered by available app data are not included (Lagan et al., 2021a; Fig. 4, gray). Effectiveness is evaluated in domain four (evidence base and clinical foundation) via questions about whether the app does what it claims to do and what feasibility/usability or evidence/efficacy studies exist, as well as their impact factors. MIND does not provide a formal score, but the database is searchable based on end user interests. For example, MIND assesses for accessibility and language availability, allowing users to search for apps by specific concerns or keywords (e.g., Spanish, offline, or own your data). MIND allows providers and patients to identify mHealth apps that may be best given the clinical context and patient’s individual goals. App reviews are completed by trained volunteer raters who complete a 3-h online training program and are reviewed by a member of the Digital Psychiatry Lab before the review is published. Apps are rated every 6 months to increase review accuracy as apps update.

Fig. 4
figure 4

mHealth Index and Navigation Database’s (MIND) most frequently addressed questions and unaddressed usability considerations, adapted from Lagan et al. (2021a). MIND is based off the APA Framework by assessing for app origin and functionality, privacy/security, evidence base and clinical foundation, features and engagement, and interoperability and data sharing. However, it adds to the APA Framework by reviewing how data is inputted in the app and how the app outputs information. Importantly, MIND does not assess ease of use and usability in its app reviews. Like the APA Framework, reviewers move through the evaluation process from left to right, beginning with app origin and functionality and ending with interoperability and data sharing

The Implementation Process

Proper mHealth implementation includes more than simply assessing mHealth for various criteria and differs across settings. For example, an evaluation committee may be formed if a clinic or healthcare system plans to implement mHealth, but a committee may not be appropriate if an individual provider is evaluating apps. One published evaluation model describes a comprehensive process for evaluation and implementation at the organizational level. Camacho et al. (2020) describe a comprehensive model for evaluating and implementing mHealth apps at an organization level (Fig. 5). The Technology Evaluation and Assessment Criteria for Health Apps (TEACH-Apps) model has four stages: pre-conditions, pre-implementation, implementation, and maintenance and evolution (Fig. 5, left to right). The model begins with the identification of end users’ app needs, including language and literacy level, and the creation of a diverse committee of stakeholders. The committee will ideally include people who hold a range of positions across the organization (e.g., healthcare providers, healthcare workers, support staff, quality and safety personnel, peer support specialists). We also recommend the inclusion of people who will use the app whenever possible, consistent with a community-based participatory approach. Considering the pre-condition findings, pre-implementation involves identifying criteria that reflect the priorities and needs of the end users. A committee then evaluates each app using frameworks and/or scales, such as those described herein, to identify which best aligns with the defined criteria (Camacho et al., 2020). Once inclusion or exclusion decisions are made, the organization moves onto implementation where committee members test remaining apps and provide feedback to determine which best fit the setting and population. Once apps are selected, an educational handout (e.g., flyer, webpage) discussing app pros and cons should be created and offered to patients. Since mHealth apps constantly change, the maintenance and evolution phase is ongoing. It is recommended that evaluations and handouts be updated quarterly, but at least twice a year (Camacho et al., 2020).

Fig. 5
figure 5

Technology Evaluation and Assessment Criteria for Health Apps (TEACH-Apps) model, adapted from Camacho et al. (2020). TEACH-Apps guides users through a four-stage evaluation process. In the pre-conditions stage, users consider many apps while gathering stakeholders. Next, pre-implementation eliminates some apps by customizing their criteria, conducting and initial evaluation, and creating a training and support plan by utilizing an evaluation framework, app feature database, and patient groups/clinician trainings, respectively. Hands-on evaluation during the implementation stage requires templates and meeting guides to stay on track and complete the evaluation. Finally, maintenance and evolution requires users to disseminate and use the final mHealth apps and track progress via templates and guides while continually updating current and evaluating new apps for use

Conclusions and Guidelines for mHealth Implementation

mHealth has the potential to revolutionize healthcare globally, increase treatment access, make health data more accurate, and improve health outcomes. Yet identifying, evaluating, and recommending mHealth apps to patients is complicated, particularly when added to the regular demands of overburdened healthcare providers and systems and because app developers are often more focused on profit than testing their apps for efficacy, security, and reliability. Hundreds of thousands of mHealth apps exist and more are being added, updated, and deleted. Thus, mHealth apps are constantly in flux, the number of proposed evaluation frameworks and scales is overwhelming, and the time commitment of comprehensive evaluation can be a barrier to implementation. The goal of this paper is to facilitate streamlined mHealth implementation into healthcare by reviewing frameworks, scales, and searchable databases (summarized in Table 2).

Table 2 Summary of criteria assessed in frameworks, scales, and searchable databases

Frameworks and scales rarely assess population-specific issues, so it is important to acknowledge and address these gaps while implementing mHealth within marginalized and international populations. Ideally evaluators will engage the community during mHealth selection and implementation, obtaining qualitative feedback to help meet population-specific needs. This feedback will better assist evaluators in identifying and implementing culturally appropriate mHealth apps that will be acceptable to the target population (Maar et al., 2017). In addition to cultural appropriateness, mHealth apps should be evaluated for domains specific to the target population, such as literacy, language, and cost (Sharma et al., 2022). Infrastructure, technological or otherwise, needed to implement mHealth, should also be considered, especially if mobile devices are less common (Abaza & Marschollek, 2017), to improve the implementation process and increase uptake of recommended mHealth apps.

The process of selecting mHealth apps can impose a significant burden on healthcare systems that are already under-resourced. Furthermore, implementation is nuanced and must be tailored to the population being served. Yet, commitment to a formal evaluation and implementation process can enhance services, alleviate system burden over time, improve health outcomes, and increase patient satisfaction. This becomes particularly beneficial to marginalized and international communities whose access to healthcare is often limited. Considering the noted benefits and this review, we propose seven guidelines and one call to action for mHealth integration into healthcare settings:

  1. 1.

    mHealth apps should be evaluated by an individual provider, committee, or healthcare system before healthcare providers recommend them to the people they serve.

  2. 2.

    The evaluation framework(s) and/or scale(s) used to select mHealth apps for a healthcare setting needs to consider data security, safety, and effectiveness.

  3. 3.

    Whenever possible, the selected evaluation framework(s) or scale(s) should consider population-specific needs (e.g., MOLD-US for older populations or THESIS for populations with chronic disease).

  4. 4.

    Language, cultural, cost, and infrastructure requirements should be considered, especially with mHealth integration for international and marginalized populations (Abaza & Marschollek, 2017; Maar et al., 2017; Sharma et al., 2022).

  5. 5.

    mHealth apps must be regularly reviewed as the mHealth app landscape continuously evolves.

  6. 6.

    To expedite the selection process, the MIND database may be a helpful place to start. Be mindful of the most recent review date when using a database.

  7. 7.

    It is the individual providers’ and healthcare organizations’ responsibility to determine whether an mHealth app is appropriate for recommendation.

Call to Action

mHealth evaluation burden can be reduced by a collaborative approach, in which professional medical organizations take the lead on app evaluation. We encourage professional medical organizations to form committees dedicated to mHealth app evaluation, maintenance, and regular updating of an electronic and widely available list of apps specific to their specialties or topic areas. The list should include strengths and weaknesses of the apps reviewed and potential conflicts of interest between the organization and app developers. It should be clear when the committee last completed the evaluation of each app included on the list.

mHealth is a constantly shifting industry with potential to improve health globally, yet apps are not regulated nor adequately evaluated. Therefore, to help prevent potential harm to the people served, healthcare providers and healthcare systems interested in implementing mHealth into their clinics must dedicate time and resources to evaluating apps prior to making recommendations. The reviewed frameworks, scales, databases, and resulting guidelines are presented to simplify and expedite this essential process. Finally, while the call to action is an ambitious ask, it could greatly benefit healthcare globally, increase treatment access, and improve health outcomes.