Background

Interprofessional collaboration (IPC) has become the key approach to comprehensive care [1], especially in the treatment of multimorbid patients or illnesses that require the involvement of different professions. As “involvement” does not necessarily mean “collaboration,” there is a steadily growing number of projects searching for the “best practice” of IPC [2]. Not only researchers but also clinicians, healthcare professionals, and policy makers are interested in IPC as it seems to be very promising not only for the quality of care but also for economic reasons.

When examining the impact of a complex intervention like IPC, it is one approach to focus on patient-reported outcomes (PRO). PRO are an essential part of treatment evaluation and “of most importance to patients and families” [3]. The inclusion of patients’ perspectives may lead to a better understanding of patient-centered care, treatment quality, and patients’ treatment decisions and can be relevant to provide the best possible health care for this reason. However, most of the studies still concentrate on objective outcomes, such as mortality, rehospitalizations, length of stay, and healthcare costs.

Pannick et al. [4] reviewed the literature with regard to length of stay, readmission, or mortality rates for general medical wards and found only small effects of IPC. Although they consider PRO to be “valuable,” the effect of IPC on PRO has not been studied. The Cochrane review by Reeves et al. (2017) [1] aimed to assess the impact of IPC on, among other objective outcomes, quality of life, and patient-assessed quality of care. Only one of nine included randomized controlled trials (RCTs) focused on patient-reported quality of care, but due to a very low certainty of evidence, the authors concluded to be “uncertain” whether IPC improves PRO. Considering the steadily growing role of IPC in the health sector, it is therefore important to constantly (re-)evaluate its impact on PRO, and studies are needed which review the current state of literature on this topic.

As the inpatient setting represents a place in which different healthcare professions work next to each other, it is probably easier to implement and evaluate interprofessional interventions in inpatient than in outpatient care. Moreover, compared to ambulatory setting, inpatient care operates in a more controlled setting. For this reason, this systematic review focuses on the question whether IPC affects PRO in inpatient care and, if so, whether there are any heterogenous effects of IPC within different medical fields and/or study population or by type of intervention.

The review follows the Preferred Reporting Items for Systematic Reviews and Meta-Analyses PRISMA [5] (Additional file 1). This systematic review is registered in PROSPERO (registration number CRD42017073900).

Methods

The full study protocol [2] has been published after undergoing peer review. There are three modifications — all of them methodical extensions — in comparison with the study protocol:

  • We extended the exclusion criteria in accordance with the PICO scheme (addition of “population not suitable,” “study design not suitable,” and “methodical limitations”).

  • Moreover, the risk of bias of included controlled before-and-after studies (CBAs) is evaluated using the quality criteria of the Cochrane Effective Practice and Organization of Care Groups (EPOC) [6] and the “Risk Of Bias In Non-randomized Studies” of Interventions tool (ROBINS-I) [7], as well.

  • The extraction includes more variables than previously planned: indication, number of professions involved in the intervention, average treatment intensity (in hours), average length of stay of the intervention group (in days), statistical balance at baseline (y/n), and outcome measure.

Literature search

We developed the search strategies (see Additional file 2) in collaboration with an information specialist as well as a researcher with long-term experience in conducting systematic reviews and developing search strategies. The Peer Review of Electronic Search Strategies (PRESS) Checklist [8] was used to develop the search strategies.

We searched the electronic databases PubMed, Web of Science/Social Science Citation Index (SSCI), CENTRAL (Cochrane Library), Current Contents (LIVIVO), CINAHL (EBSCO), and Embase for records published between 1997 and April 2021. The first literature search was carried out in July 2017 and was updated in July 2019. A third update was performed in April 2021 using the two databases which revealed the most records in the previous searches (SSCI and E). Additional studies were identified through forward and backward citation tracking, manual search of Google Scholar (using the keywords “interprofessional collaboration” alone or in combination with “impact,” “effect,” or “patient-reported outcomes”), and consultation of authors when full texts were not available.

Inclusion criteria (PICO)

Our inclusion criteria are based on the PICO scheme.

Types of participants

The review only includes study populations of patients who received interprofessional interventions in an inpatient setting.

Types of interventions

The inclusion of interventions is based on a pre-specified definition of IPC. Specifically, IPC is defined as a work-sharing cooperation in which professionals from more than one health or social care profession cooperate with the explicit goal of improving the healthcare quality. This definition is adapted from two key publications, the Cochrane review by Reeves et al. [1] and the systematic review by Pannick et al. [4]. According to Reeves et al. [9], interprofessional interventions can be classified into three different types. The interventions described in the included studies are assigned to these individual types of intervention:

  1. 1)

    “Interprofessional education defined as interventions that included a curriculum with explicitly stated learning objectives/outcomes and learning activities (e.g., seminars and simulation) aimed at improving collaboration”

  2. 2)

    Interprofessional practice defined as interventions that aimed to improve how professionals interacted in practice through the use of activities such as meetings or checklists

  3. 3)

    “Interprofessional organization defined as interventions that aimed to promote collaboration by the use of institutional policies, clinical guidelines or the redesign of workspaces” [9]

Types of included studies

As we have learned from studies published at the time of the initial literature search (e.g., the Cochrane Review by Reeves et al. 2017 [1]), PRO of IPC interventions are rarely investigated within randomized controlled studies (see Lidstone et al. 2020 [10] as current evidence of this). Thus, to ensure that all available evidence is reviewed on this question, we decided to include not only RCTs but also non-randomized studies (NRS), CBAs, and interrupted time series (ITS) in this review and to present our results, both in the manuscript and in the tables, with regard to study designs. The EPOC criteria and terminology [11] are used to define the different study types.

Types of outcomes

Studies focusing on PRO, such as overall satisfaction, willingness to recommend, quality of life, or self-reported success of treatment are included in this review, regardless of whether the outcome is defined as primary or secondary.

Exclusion criteria

There were eleven reasons for the exclusion of studies (see Table 1). Studies were excluded if they did not meet the criteria of the PICO scheme, if they were duplicates, animal studies, written in a language other than English or German, or if the full text was not available.

Table 1 Exclusion criteria

As a result of the decisions of the Bologna Conference in 1999, many changes have taken place in the vocational education and professional position of healthcare providers in many countries. Due to these changes, both responsibilities and awareness concerning “collaboration” have shifted. To allow a general comparability between studies of different countries, we decided to limit the search period to the previous 20 years relative to the initial date the literature search was carried out in 2017. For this reason, the final search period covers years from 1997 to April 2021 (i.e., 24 years). Additionally, we restricted the countries to those which belong to the World Health Organizations’ (WHO) mortality strata A [12] for external validity reasons.

Selection of studies

The selection of studies occurred in a two-stage screening process, where the first screening focused on title and abstracts and the second screening on full texts. Two reviewers (LK and SB) screened a random subsample of 10% of the full sample of studies independently. Since the inter-rater reliability within this subset of studies was sufficiently high (kappa statistic of 0.84), subsequent screening of the remaining 90% of the sample was conducted with each screener covering 50% of the remaining sample. In update literature searches, LK carried out the first screening. Next, the full texts of all included studies have been screened independently by LK and SC and were either included or excluded according to the defined criteria (second screening). In case of any disagreements or uncertainties during the screening process, studies were discussed regarding their eligibility.

Quality assessment

RoB was assessed by LK using the Cochrane “risk of bias 2” (RoB 2) tool [13] for RCTs as well as the ROBINS-I tool [7] for NRS and CBAs. Moreover, the EPOCs’ quality criteria [6] was used to assess the methodological quality of the included CBA.

Data extraction

All studies included in the second screening were subject to data extraction. LK extracted data on country, setting (medical field), indication, definition of IPC, description of intervention and the authors’ suggested causal mechanism, details to control conditions, number of professions involved (intervention), treatment intensity (hours, mean), length of stay (intervention group, mean, days), study design, outcome measure, study population size, participant demographics, intervention classification to one of the three intervention groups (interprofessional education, interprofessional practice, interprofessional organization), times of measurement, outcomes (such as overall satisfaction, willingness to recommend, quality of life or self-reported success of treatment), baseline imbalances, and statistical data for calculation of effect sizes and/or reported effect sizes.

Data synthesis

As there is high clinical heterogeneity in the included studies and only one study with low risk of bias, the authors decided to waive the originally planned quantitative meta-analysis. Therefore, results are presented for each outcome concept using narrative synthesis of effect estimates (unstandardized mean differences (MD), standardized effect estimates (Cohens’ d, Hedges’ g), and/or p-values) as they are reported in the included studies.

Results

Search results

The systematic searches yielded 10,213 records (see Fig. 1). After the first screening, there were 338 records eligible for the second screening. Twenty-two studies (16 RCTs [14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29], five NRS [30,31,32,33,34], and one CBA [35]) were included as a result of the second screening and subject to data extraction. Studies excluded in the second screening can be found in the supplementary appendix (see Additional file 3).

Fig. 1
figure 1

Flow diagram

Study characteristics

Studies include between 20 [19] and 1531 [15] patients and are conducted in Australia [19, 28], Denmark [14, 34], France [31], GB [17, 22], Germany [21, 23, 27, 29, 32, 33], Italy [24], Netherlands [35], Norway [18], Switzerland [30], and the USA [15, 16, 20, 25, 26]. Five studies focus on patients with chronic pain [21, 23, 30, 32, 33], four studies on patients undergoing palliative care [16, 19, 20, 26], and two studies on patients with neurological diseases (Parkinson’s disease (PD) [24], multiple sclerosis [14]), cancer [27, 31], or severe mental illness [34, 35]. Cognitive impairment [17] in old age, fibromyalgia [18], general medical patients [25], patients with anorexia nervosa [29], critical care survivors [28], homeless patients [22], and patients in old age [15] are study subject in one study each. Based on the categorization by Reeves et al. [9] (see above), nine studies describe an intervention that can be categorized as “interprofessional practice” intervention [15, 16, 19, 22, 25, 26, 28,29,30], and one study [24] evaluates an “interprofessional organization” intervention. The remaining studies assess the effect of interventions containing elements of at least two [14, 18, 20, 23, 27, 31,32,33, 35] or even all three types of interventions [17, 21, 34]. The number of professions involved varies from two [22, 25] to 10 [14]. Only one study (Gade et al. 2008 [16]) provides tentative evidence of a suggested causal mechanism. No information concerning the number of professions could have been extracted from four studies [23, 27, 30, 32]. The observational period between baseline t0 and follow-up t1 ranges from 2 or 3 days [16] (control resp. intervention group) to 26 [14] weeks. The study characteristics, including indications, details of intervention and control, number of patients, time points, and both outcomes and outcome measures, can be found in Table 2; the complete extraction sheet is in Additional file 4.

Table 2 Characteristics of included studies

Risk of bias

The results of risk of bias assessment are detailed in the Additional file 5 and summarized in Tables 3 and 4.

Table 3 Risk of bias in RCTs using the risk of bias 2 tool
Table 4 Risk of bias in NRS and CBA using the ROBINS-I tool

RCTs

Only two studies [21, 27] are considered to have a “low,” while twelve studies [14,15,16,17, 19, 20, 22, 23, 25, 26, 28, 29] had a “high” RoB. Two RCTs [18, 24] were rated to have “some concerns” (see Table 3).

NRS and CBA

There is no NRS for which the RoB was rated as “low” but instead rated as “moderate” [33], “serious,” [30, 31, 34] or “critical” [32] (see Table 4). Likewise, the CBA by Deenik et al. [35] is classified as having “serious” RoB. Three EPOC criteria are rated as “not done,” two as “done,” and one as “not clear.”

Relationship between IPC and PRO

Whether IPC affects PRO is evaluated using 59 outcome measures (see Table 5). As not all outcome measures are publicly available and/or questions explicitly presented within the respective manuscripts, an overall statement regarding the questionnaire scaling is not possible. However, the information whether full or partial questionnaires were used and whether validation studies were quoted can be found in the Additional file 6. The following nine outcomes were defined inductively during extraction process (see Table 6): QoL (quality of Life), coping, satisfaction, functional ability and health status, pain, psychiatric morbidity, managing one’s own health care, therapeutic relationship, and treatment success.

Table 5 Outcome measures and concepts
Table 6 Outcomes and risk of bias

As we decided to waive the originally planned quantitative meta-analysis, we describe the reported effect estimates within the manuscript text and, moreover, summarize them within structured tables of results across studies. Furthermore, results are presented regarding the differences between groups that occurred between baseline (t0) and follow-up (t1). Some studies also present results on second follow-up. However, as this only applies to a small number of studies, these results can be found in the complete extraction sheet in Additional file 4. Due to differences in the amount of reported adjusted MD, standardized effect sizes (ES), or p-values (between groups), we decided to present the effect estimates of the two outcomes with the most reported effect estimates in tables within the main manuscript (i.e., QoL, coping). The effect estimates of the remaining outcomes are reported in Additional files 7, 8, 9, 10, 11, 12 and 13.

QoL has been assessed with 17 questionnaires (ten generic, seven disease-specific) applied in nine RCTs [14, 16, 17, 20, 22,23,24, 26, 28], four NRS [30, 31, 33, 34], and one CBA [35] (see Table 7). ES could only be extracted from the NRS by Angst et al. [30] (Short Form (SF)-36) and Semrau et al. [33] (SF-12). Here, estimated treatment effects are small and positive, but the corresponding confidence sets can neither rule out relatively small or large positive or negative effects. Seven studies [14, 17, 22, 24, 26, 33,34,35] report unstandardized MD, and four studies [16, 20, 23, 31] only present p-values. The majority of MD do not show any positive or negative effect of IPC. However, positive and statistically significant effects of IPC were reported in four studies focusing on patients with multiple sclerosis [14], PD [24], acute heart failure in palliative care [26], and severe mental illness [34].

Table 7 Reported adjusted unstandardized mean differences, standardized effect sizes, and p-values (between groups) in studies measuring QoL

Five studies (three RCTs [18, 21, 23], two NRS [30, 33]) evaluate the effect of IPC on coping using five outcome measures in total (FESV (pain management questionnaire), PRCQ-C (pain-related cognitions questionnaire for children), ASES (Arthritis Self-Efficacy Scale), AEQ (avoidance-endurance questionnaire), and CSQ (coping strategies questionnaire) (see Table 8). The indications are largely comparable across study types with CLBP (chronic low back pain) and chronic pain (pediatric) in two RCTs [21, 23] and CLBP and CP (chronic pain) in NRS. Three studies [18, 30, 33] report positive, but partly insignificant positive effects. Hechler et al. [21] does not report adjusted MD, standardized effect sizes or p-values, whereas Mangels et al. [23] report significant p-values between groups.

Table 8 Reported adjusted unstandardized mean differences, standardized effect sizes, and p-values (between groups) in studies measuring coping between baseline (t0) and follow-up (t1)

Five RCTs [15, 16, 19, 25, 27] as well as two NRS [31, 34] assessed satisfaction using two unknown outcome measures, one (EORTC IN-PATSAT32, EORTC Inpatient Satisfaction with Cancer Care Questionnaire) disease-specific and six generic (CSQ-8 (client satisfaction questionnaire), MCOHPQ (Modified City of Hope Quality of Life Patient Questionnaire), Picker Questionnaire, Press Ganey, HCAHPS (Hospital Consumer Assessment of Healthcare Providers and Systems), and QPP (quality of care from the patients’ perspective) questionnaires. None of these studies report standardized ES. Singer et al. state statistically insignificant odds ratios, whereas O’Leary et al. [25] and Marcussen et al. [34] report positive and partly statistically significant adjusted MD when asking general medical patients resp. patients with severe mental illness (see Additional file 7). Gade et al. [16] (old age (> 70 years old)) as well as Counsell et al. [15] (patients with palliative care) report significant differences between groups at t1 (i.e., post treatment). However, no further effect estimates were reported in these studies, and nonsignificant p-values are reported in Cheung et al. [19] (patients with preterminal or terminal condition in palliative care) as well as Brédart et al. [31] (patients with cancer).

Functional ability has been assessed in seven RCTs [15, 17, 18, 21, 23, 24, 28] and one CT [33] using 12 different kinds of outcome measures (see Additional file 8). Three studies [21, 23, 28] do not report any MD, ES, or p-value but raw mean scores and standard deviations. One study only [15] provides information about p-values but no ES estimates. Statistically positive, but insignificant effects were reported in Goldberg et al. [17] (MD 0.5 (95% CI: −5.2, 6.2, p = 0.87; patients with cognitive impairment in old age) as well as Hamnes et al. [18] (ES = 0.15, p = 0.265; Cohens’ d); patients with fibromyalgia). Semrau et al. [33] provide impact estimates of IPC on patients with CLBP. Adjusted MD and ES are estimated to be positive (i.e., in favor of the experimental group), but only one estimate is significantly different from zero (FFbH-R (Hannover Functional Ability Questionnaire-back pain) MD 0.91 (95% CI: −1.43, 3.24); FFkA (Freiburg Questionnaire of physical activity), MD 0.63 (95% CI 0.12, 1.13)). Moreover, Monticone et al. [24] provide evidence of a statistically significant reduction in MDS-UPDRS (part 3) (Italian Movement Disorder Society Unified Parkinson’s Disease Rating Scale) (MD 24.5 (SE 3.2); inverted scale), i.e., a desirable effect favoring the treated group (patients with PD).

Pain was assessed using the ESAS (Edmonton System Assessment Scale) and the FPS-R (Faces Pain Scale–Revised) in RCTs [21, 26] and the WHYMPI (West Haven-Yale Multidimensional Pain Inventory) and German Pain Questionnaire in NRS [30, 33] (see Additional file 9). One study [21] neither reports on MD nor standardized ES or p-values. Semrau et al. [33] describe a positive, but statistically insignificant effect on pain after questioning patients with CLBP using the German Pain Questionnaire (ES = −0.013, p = 0.755, Cohens’ d; inverted scale). Positive and statistically significant effects have been reported in Sidebottom et al. [26] (MD 3.69 (95% CI: 3.39, 3.99), p = 0.000); patients within palliative care and with acute heart failure) as well as Angst et al. [30] (ES = 0.09, p = 0.034 resp. 0.18, p = 0.559, Hedges’ g; patients with CP).

The PHQ-9 (Patient Health Questionnaire), BDI (Beck Depression Inventory), GHQ-20 (General Health Questionnaire), DASS-21 (Depression Anxiety Stress Scale), EDE-Q, and DIKJ (Depression Inventory for Children and Adolescents) were used for evaluating psychiatric morbidity in seven RCTs [18, 20, 21, 23, 26, 28, 29]. Moreover, the HADS (Hospital Anxiety and Depression Scale), ADS (General Depression Scale, “Allgemeine Depressions-Skala”), K10 (Kessler Psychological Distress Scale), and SCL-90-R (Symptom Checklist-90-R) were used in three CTs [30, 32, 34]. Overall, significant unstandardized MD are presented in Sidebottom et al. [26] (MD = 1.42 (1.12, 1.73; p = 0.000); patients with acute heart failure within palliative care) as well as Mangels et al. [23] who focused on patients with CLBP and report a significant between group difference at t1 (p < 0.01). The remaining studies do either report no [21, 28, 29, 32] or statistically insignificant estimates [18, 20, 30, 34] of treatment effects (see Additional file 10).

Managing one’s own health care was assessed in the RCT by Hamnes et al. [18] by using the EC-17 (Effective Musculoskeletal Consumer Scale) which revealed a positive effect in the questioning of patients with fibromyalgia (ES = 0.24, Cohens’ d; MD = 4.26 (95% CI: 0.8, 7.7) (see Additional file 11).

In contrast, in the RCT by O’Leary et al. [25], a positive, but statistically insignificant effect of IPC on general medical patients was found regarding the assessment of therapeutic relationship by using the PAM-SF (Patient Activation Measure – Short Form) (MD = 0.69, (95% CI: −2.82, 4.19); p = 0.58) (see Additional file 12).

Treatment success in patients with PD was evaluated in the RCTs by Monticone et al. [24] and Ziser et al. [29]. Neither MD nor standardized ES are reported in these studies. However, the between-group p-value revealed a significant difference between IG and CG at an 8-week follow-up in Monticone et al. [24] (p < 0.001; see Additional file 13).

Due to highly heterogenous (or unobserved) intervention characteristics, medical fields, and/or study populations, no further conclusions can be drawn regarding varying effects by these aspects.

Discussion

The aim of this systematic review was to study whether IPC affects PRO in inpatient care and, if so, whether these effects vary by type of intervention, indication, and/or study population. In order to answer these questions, we systematically searched six electronic databases, and Google Scholar, tracked citations of included studies, and contacted relevant authors. The search yielded 10,213 records, from which 22 studies fulfilled the inclusion criteria in a two-step screening process. Most of the included RCTs are considered to have a high RoB [14,15,16,17, 19, 20, 22, 23, 25, 26, 28, 29]. Likewise, the RoB of NRS and CBA is mostly rated as serious [30, 31, 34, 35]. Only two studies [21, 27] have a low RoB. To summarize, while some studies do not report effect estimates, and some of the reported effects appear to be imprecisely estimated, the overall results indicate that IPC may affect PRO positively across all outcomes. Nevertheless, there are also some studies that do not report any effect. Moreover, due to heterogeneity, neither the RoBs nor the type of intervention, medical field, or study population allow further conclusions on heterogenous impacts of IPC on PRO.

To our knowledge, this is the first systematic attempt to evaluate the effectiveness of IPC on PRO including RCTs, NRS, and CBAs as well as all three types of IPC interventions and a multitude of indications. In using a purposely broad search strategy and inclusion criteria, we explicitly attempted to investigate which outcomes and indications have already been studied to contribute to an overall overview to the current state of literature. Due to the broadness of the research question, the systematic search strategy was very sensitive and yielded a lot of results. It is therefore surprising that there were only 22 studies that were included in this review. In accordance with Pannick et al. [4], we were also unable to show a clear effect of IPC on PRO. The Cochrane review by Reeves et al. [1] aimed to assess the impact of “interprofessional practice” interventions on both objective and PRO, as well as clinical process and efficiency outcomes. They also concluded that the heterogeneity of studies does not allow for a meta-analysis and a clear conclusion on the effect of IPC interventions.

While screening the literature, it became obvious that there seems to be a lack of a clear and generally valid definition of IPC. There were a lot of different synonyms used to define IPC interventions, such as interdisciplinary [15, 16, 21, 25, 30, 33] or interprofessional [34], multidisciplinary [14, 18, 23, 24, 27,28,29, 31, 32, 35] or comprehensive [20]/enhanced [17]/intensive [19]/complex [22]/integrating [26] care. Since we were aware of this before finalizing our search, we were able to address this circumstance in our search strategy. In addition, we were careful to apply a broad definition of IPC in advance so that the definitions and synonyms of the study authors could be subordinated. Nevertheless, this does not change the fact that the different wording can lead to difficulties in the classification of interventions and can make it difficult to reliably assess the effects of IPC in a comparative context. As a result, the classification of the interventions into the three types of interventions was not easily applicable, since in most cases combined interventions were used. IPC as a multicomponent intervention is difficult to delineate for this reason and thus makes it difficult to study its relative effectiveness.

In addition, the definition of PRO measures seems to vary as well [36]. For example, the question of whether satisfaction is a PRO is easier to answer than for functional outcomes, such as physical function. Whereas “satisfaction” cannot be answered without asking a patient, the outcome “physical functioning” such as the “mastery of activities of daily living,” can not only be answered by the patient himself, but it can also be observer ministered. This circumstance had to be considered in the selection of literature. Therefore, we decided to include all assessments in which the patients were asked to answer the question(s) and exclude all observer-ministered outcome measures. This is in line with the definition of the Food and Drug Administration (FDA) which defines a PRO as “any report of the status of a patient’s health condition that comes directly from the patient without interpretation of the patient’s response by a clinician or anyone else” [3]. However, proxy answers were allowed to avoid systematic exclusion of study populations who are not able to answer the questions themselves (old age, cognitive impairment, pediatric). Since relatives are closely involved in the treatment process and usually also play a decisive role in deciding it, they can equally be regarded as recipients of the healthcare services. There are two studies in which proxy answers were included in analysis. Firstly, Goldberg et al. [17] asked patients with cognitive impairment in old age (> 65 years old) as well as their proxies to report QoL (EQ-5D, EuroQuol-5D). No statistically significant effects have been found, neither in self-reports nor in proxy answers. Secondly, Hechler et al. [21] included patients aged 9 to 17 years and, among others, evaluated the “functional ability and health status” using the P-PDI (Pediatric Pain Disability Index). Whereas children aged 11 and older answered the questionnaire themselves, for children under 11, it was their parents who answered the P-PDI. Here, the description of results did not distinguish between self-, and proxy reports. Both studies have been marked with “proxy completed” in Table 2.

Nevertheless, as definitions of PRO measures vary across studies, one outcome measure can be observer ministered in one study and patient reported in another pointing out the important role of a sufficient validation in the respective application and study population. In the included studies of this review, there were 15 assessments in which scales were only implemented partially (14, 21, 24,25,26, 30,31,32,33,34) (see Additional file 6), and references to validation studies are missing in three studies [15, 16, 25].

This review has several limitations. First of all, our results are limited to the fact that included studies are conceptually heterogeneous and with high risk of bias, which was assessed by only one person. Only two studies [21, 27] have a low risk of bias, and a lot of different terms were used to describe IPC, and a lot of outcome measures were used to assess PRO. The included studies took place in ten different countries (Switzerland, Germany, Great Britain, Australia, Netherlands, Denmark, France, Italy, Norway, and USA) with different healthcare systems and different vocational trainings and professional roles. Additionally, only one study [21] reported treatment effects which were adjusted for multiple hypotheses testing, thus yielding the possibility of type 1 error inflation of the reported unadjusted effects. Therefore, quantitative meta-analysis was not feasible, and description of results is limited to the effect sizes which were reported in the studies. The results within some studies are ambiguous as well, for example, in cases where an outcome was assessed with several outcome measures. For this reason, the question whether interprofessional collaboration affect PRO cannot be answered conclusively. Nonetheless, most of the reported effect estimates suggest a positive effect on interprofessional interventions on PRO.

Secondly, psychometric properties of PRO measures as well as minimal important differences (MIDs) were not considered in presentation of results, although they are important when it comes to assessing whether the respective effects are also relevant from the patients’ perspective. However, we recorded which study reports validation studies to the outcome measures used and present our records in Additional file 6.

Previous reviews sought to measure the effect of IPC by focusing on objective patient outcomes [4, 37], collaborative behavior and team satisfaction [38], or specific settings and indications [39,40,41,42]. Our aim was to add the effects on PRO to the existing knowledge on the effectiveness of IPC. Even though it remains challenging to make a clear statement, this systematic review shows the current state of what has been established so far and thus points to the following research implications:

  • Methodically rigorous studies are needed to contribute to the current state of literature and enable a reliable statement with regard to IPC. Specifically, randomized controlled trials reporting the underlying definition of IPC as well as the psychometric properties of PRO measures along with corresponding MIDs would be desirable. Especially in cases in which only single parts of questionnaires are used, the validity and reliability of measurement scales should be discussed.

  • As our review highlighted the importance of standardized terminology, future studies are needed that focus on the definition and conceptualization of IPC.

  • This review focused on inpatient care. For a comprehensive overview, the outpatient setting should be subject of future research.

Conclusion

Twenty-two studies were included in this systematic review. There was a broad variety of different definitions of IPC, and studies covered a wide range of populations, interventions, indications, and outcomes. Thus, the high expected clinical heterogeneity and high RoB made it impractical to aggregate the treatment effect estimates statistically. While heterogenous effects depending on indication and outcome may be possible in the broader set of studies, the results considered here are indicative of a generally positive effect of IPC on PRO, irrespective of these observable study characteristics. Future methodically rigorous studies are needed to answer the question of effectiveness of IPC on PROs.