Introduction

Vascular malformations are benign congenital anomalies of the vascular system that are categorized based on the type of vascular channels involved: arterial, venous, lymphatic or capillary vessels, singularly or combined [1,2,3,4]. The disease impact varies, including cosmetic concerns, pain and functional impairment, depending on the size and location of the lesion. Multiple treatment options are available, including compressive stockings, sclerotherapy and surgery; however, evidence is lacking to support clinical decision-making.

In current literature, there is no uniformity regarding the outcomes that are measured to evaluate treatment effectiveness (‘outcome domains’) and what measurement methods or tools are used to measure these outcomes (also known as ‘outcome measurement instruments’ or simply‘instruments’. For this reason, outcomes of different clinical trials cannot be pooled, which hampers the development of evidence-based guidelines.

The Outcome Measures for VAscular MAlformations (OVAMA) group initiated the OVAMA project aiming to develop an international ‘Core Outcome Set’ (COS) for adult as well as pediatric patients with peripheral congenital soft tissue vascular malformations for measuring outcomes of therapeutic interventions globally. A COS is an agreed minimum set of outcomes that should be measured and reported in all clinical trials of a specific disease or trial population [5]. In an earlier e-Delphi study, the OVAMA consensus group reached consensus on the core outcome domains for patients with venous, lymphatic, and arteriovenous malformations (VM, LM, and AVM respectively): radiological assessment, physician-assessed location-specific signs, patient-reported pain, overall severity of symptoms, health-related quality of life (HRQoL), patient satisfaction with treatment and outcome, and adverse events (Online Resource 1). For each unique type of vascular malformation, specific physician- and patient-reported signs and symptoms were included separately. Furthermore, recurrence and appearance were recommended outcome domains based on the e-Delphi study but require further discussion before final inclusion in the COS [6]. However, it is unclear which instruments are most suitable for measuring the core outcome domains in adults and children.

This review, as part of the OVAMA project (Online Resource 2), aims to identify the outcome domains and instruments for vascular malformations that were used in previous prospective studies, and to assess the quality of the available patient- and physician-reported outcome instruments, to inform the selection process of instruments to measure the core outcome domains in future studies.

Materials and methods

The OVAMA project was registered at the Core Outcome in Effectiveness Trials (COMET) database (http://www.comet-initiative.org/), designed following the Harmonizing Outcome Measures for Eczema (HOME) roadmap, and embedded within the Cochrane Skin Group—Core Outcome Set Initiative (CSG-COUSIN) [7]. We followed the guidelines of the PRISMA-P statement [8], the Core Outcome Set—STAndards for Reporting (COS-STAR) [9], and the COnsensus-based Standards for the selection of health Measurement INstruments (COSMIN) ‘Protocol for Systematic Reviews of Measurement Properties’ [10]. The study protocol was registered in PROSPERO (42017056242).

Literature searches, study selection and data extraction

This systematic review consisted of three literature searches (Online Resource 3), as described below.

All searches were performed with the help of a clinical librarian. The PubMed function ‘Similar Articles’ and reference lists of all included articles were screened for additional studies. Study selection and data extraction were performed by two independent reviewers. Disagreements were resolved by consensus.

Search 1: identification and description of outcome domains and instruments used in previously published studies

In the first search, prospective studies evaluating treatment outcomes in vascular malformations were identified to collect all outcome domains and instruments that were previously used. We searched MEDLINE, EMBASE and CENTRAL for studies measuring treatment effectiveness, using search terms for peripheral vascular malformations. We included prospective studies evaluating treatments that enrolled at least 20 patients (all ages) with all singular and combined types of vascular malformations. Capillary, visceral, bone and central nervous system vascular malformations were excluded as these were outside the scope of the OVAMA project. Articles published before 1996 were excluded, since the current classification and terminology for vascular anomalies was established in 1996 by the International Society for the Study of Vascular Anomalies (ISSVA) [11]. Data on study characteristics, outcome domains and instruments were extracted.

Searches 2 and 3: evaluation of the quality of the identified outcome measurement instruments

The second search was performed in MEDLINE and EMBASE to identify development and validation studies of patient- and physician-reported disease-specific instruments for vascular malformations, using search terms for vascular malformations and a validated PubMed search filter for finding studies on measurement properties, developed by Terwee et al. [12].

The third search aimed to identify development and validation studies of patient- and physician-reported outcome measurement instruments that were used in previously published prospective studies on vascular malformations, as identified in search 1, but were not specifically developed and validated for vascular malformations. To explore their potential applicability in vascular malformations, we additionally searched for studies on measurement properties in patient populations that share clinical similarities with vascular malformations, predefined as: ‘benign vascular diseases’, ‘benign lymphatic diseases’ and ‘benign soft tissue tumors’. The expert group considered these groups of health conditions to have the greatest similarity to vascular malformations in terms of clinical appearance, signs, symptoms and potential disease burden. Search terms for the identified patient- and physician-reported instruments were combined with terms for the predefined clinically similar diseases, and the abovementioned PubMed filter for studies on measurement properties [12]. Only studies reporting on at least one measurement property of an instrument that was developed for vascular malformations, or for a clinically similar condition but previously used for vascular malformations, were included. Studies not reporting on measurement properties and studies focusing on health conditions other than vascular malformations or the predefined clinically similar conditions were excluded.

Data on characteristics of the included instruments, study samples and the study results concerning measurement properties were extracted.

Evaluation of the methodological quality of the included studies

The methodological quality of the included studies on measurement properties was evaluated by two authors independently using the COSMIN checklist (www.cosmin.nl) [13, 14]. With this checklist, we evaluated, for each included study separately, which measurement properties were investigated (following the COSMIN taxonomy, Online Resource 4) and if the methods to do so were appropriate. Several items per measurement property were rated using on a 4-point rating scale ranging from ‘excellent’ to ‘poor’. The overall score for each measurement property was determined by the ‘worst score counts’ principle [15]. As a gold standard for patient-reported outcome measures is lacking, criterion validity was not considered. Data on interpretability and feasibility were collected if presented in the included studies. Two reviewers independently extracted data on the measurement properties from the selected articles and evaluated methodological quality of the studies. Discrepancies were discussed with a third reviewer until consensus was reached.

Evaluation of quality of the measurement properties

Two authors independently evaluated the quality of the measurement properties by rating the results of the analyses on measurement properties in each included study based on the criteria for good measurement properties as recommended by Terwee et al. [16, 17] (Online Resource 5). The study results were independently rated by two reviewers as ‘positive’ (+), ‘negative’ (−) or ‘indeterminate’ (?) for each measurement property.

Best evidence synthesis

The best evidence synthesis is aimed at reaching a conclusion about the overall quality of each of the measurement properties of the included instruments. For this purpose, the quality assessments of the measurement properties based on the study results of the included studies were combined and adjusted for the methodological quality of the studies by applying levels of evidence as recommended by the COSMIN group [16], taking into account the number of studies, the methodological quality of the studies and the consistency of the study results on measurement properties across studies.

For each measurement property, the methodological quality of the study (poor, fair, good or excellent) and the direction of the study results of the analyses on this measurement property (negative, indeterminate or positive result) were combined into the best evidence synthesis (Table 1): +++, ++, +: positive rating indicating “adequate” measurement property; ?: unknown rating indicating indeterminate measurement property; − − −, − −, −: negative rating indicating “inadequate” measurement property; ± : conflicting findings; NI: not interpretable (due to indeterminate result of analysis); NA: not available. (analysis was not performed for this measurement property).

Table 1 Levels of evidence for the overall quality of a measurement property (www.cosmin.nl) [47]

Results

Search 1: identification of outcome domains and instruments

In 26 of the 27 studies identified by search 1 (Fig. 1) the authors exclusively used outcomes that were recommended or selected as core outcome domains in the OVAMA e-Delphi study [6], however, inconsistently across the studies, measuring at least one of the following: adverse events [100% of studies], radiological assessment [56%], appearance [52%], patient-reported symptoms including pain [37%], patient satisfaction with treatment and/or outcome [26%], physician-reported signs [15%], HRQoL [15%] and recurrence [7%]. In one study, ‘healthcare costs’ was the primary outcome, categorized under the domain practical issues [4%] [18], which was not selected as a core domain in the OVAMA e-Delphy study. All instruments used for each outcome domain are listed in Table 2. None of these were disease-specific instruments for vascular malformations. Published ‘named’ instruments were only available for the assessment of HRQoL. Instruments for other outcome domains were unnamed questionnaires that were only created for singular use by the authors of the concerning study.

Fig. 1
figure 1

Flowcharts of study selection. Search 1 (left): the identification of instruments previously used in prospective studies on vascular malformations. Search 2 and 3 (right): the identification of development and/or validation studies of disease-specific instruments for vascular malformations (II), and development and/or validation studies for instruments previously used in prospective studies on vascular malformations (III)

Table 2 All instruments used in prospective studies on vascular malformations, categorized per outcome domain

Searches 2 and 3: evaluation of the quality of the identified instruments

The searches for development and validation studies on instruments used for vascular malformations provided 4446 articles (vascular malformations n = 3170; similar diseases n = 1276), the major reasons for exclusion were failure to report on measurement properties and studies investigating unrelated health conditions. Twenty-two studies were included [19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40] evaluating six different instruments (Fig. 1), of which there was only one disease-specific instrument for vascular malformations. The other included instruments were four generic instruments developed or partly validated for other clinically similar diseases (two including a disease-specific module) and one disease-specific instrument for varicose veins, all previously used in vascular malformation studies. Characteristics of the selected instruments and study populations are presented in Tables 3 and 4, respectively. The methodological quality of the studies, the quality of the measurement properties and the best evidence synthesis are presented in Table 5. Details on the COSMIN ratings and the evaluation of the measurement properties per included study can be found in Online Resource 6.

Table 3 Characteristics of the identified ‘named’ patient- and physician-reported outcome measurement instruments that were developed or previously used for vascular malformations, for which studies on measurement properties were available
Table 4 Characteristics of the test populations of the included studies
Table 5 Methodological quality, quality assessment of measurement properties and best evidence synthesis per instrument and per disease

Lymphatic malformation function instrument (LMF)

The LMF is a disease-specific questionnaire to assess functional and clinical signs and subsequent impact on daily life in pediatric patients with cervicofacial LMs. We found strong evidence for adequate content validity [40]. Because it was not described how missing values were handled, there was limited evidence for adequate internal consistency, for inadequate structural validity, and for inadequate hypotheses testing. The evidence for test–retest reliability was unknown, as the analysis was only performed in seventeen patients and therefore the methodological quality of the study on reliability was poor [27]. Data on interpretability, responsiveness to changes over time or feasibility were lacking.

20-item chronic venous insufficiency QoL questionnaire (CIVIQ-20)

The CIVIQ-20 is a disease-specific HRQoL questionnaire for patients with chronic venous insufficiency (CVI), which was validated in patients with CVI [22, 24,25,26, 29, 30, 37] and in (isolated) varicose veins [39]. In CVI, moderate evidence was found for adequate internal consistency and reliability. The evidence for measurement error was rated ‘indeterminate’, as the minimal important change was not defined [22], although this is crucial for concluding if changes in score can be attributed to true changes in the construct. For chronic venous insufficiency, there was moderate evidence for adequate content validity and structural validity.

The evidence for adequate hypotheses-testing and responsiveness was rated as moderate (CVI) and limited (varicose veins), as the statistical methods used were suboptimal. Ceiling effects were found in three items, which may limit the instrument’s ability to detect changes in health. The number of missing items was low (0–9.4%).

Short-Form-36 health survey version 2 (SF36v2)

The SF-36 is a generic HRQoL questionnaire [20, 21, 28, 32,33,34,35,36, 38]. Evidence for adequate internal consistency varied from unknown (venous leg ulcers and lymphedema of the lower limb) [20, 35, 36], limited (varicose veins) [21, 34] to strong (deep venous thrombosis (DVT)) [38]. The evidence for adequate reliability was considered limited for varicose veins and moderate for DVT, as information on missing items was lacking. The evidence for structural validity could not be interpreted due to the lack of information about explained variance [34, 38]. Finally, limited to moderate evidence for adequate responsiveness was found in lymphedema of the lower limb [35] and varicose veins [32, 33], respectively. There was conflicting evidence for venous leg ulcers [20, 28, 36], and in DVT the evidence was not interpretable since predefined hypotheses about the expected results were lacking [38]. In the studies describing interpretability [20, 35], floor- and ceiling effects were found in three of eight dimensions. Feasibility aspects were not reported.

EuroQoL-5-dimensions (EQ-5D)

The EQ-5D is a generic HRQoL questionnaire, for which only 3 validation studies were performed in our target populations. In venous leg ulcers, limited evidence was available for inadequate hypotheses-testing for construct validity as no predefined hypotheses were stated for the expected correlations [31], whereas in lymphedema of the lower limb, limited evidence was found for adequate hypothesis-testing validity [35]. The evidence for adequate responsiveness was moderate and limited quality in venous leg ulcers and lymphedema of the lower limb, respectively, because the statistical methods applied were suboptimal or not appropriate [28, 31, 35]. No floor- and ceiling effects were found [35]. Feasibility aspects were not reported.

Pediatric QoL inventory (PedsQL) -NF1, adults

The PedsQL-NF1 is a disease-specific HRQoL questionnaire for patients with neurofibromatosis type 1 [23]. There was strong evidence for adequate internal consistency and moderate evidence for adequate content validity. Evidence for structural validity is unknown because no explained variance was presented in the included study [23]. Evidence for adequate hypotheses-testing validity was rated limited, as hypotheses were formulated in retrospect and not predefined. Feasibility, measured by the percentage of missing responses was 4.8% for all subscales [23]. Interpretability was not assessed.

PedsQL-NF1, children, adolescents and young adults

Finally, strong evidence for adequate content validity was found for the PedsQL-NF1 in children, adolescents and young adults [19, 23]. Information on other measurement properties is lacking.

Discussion

All eight outcome domains which were agreed on as the core outcome domains in the OVAMA e-Delphi study were assessed in previous prospective studies. However, the outcome domains are not measured consistently throughout studies, and the instruments used differ markedly as well.

The LMF is the only partially validated disease-specific instrument available that has been developed as a composite instrument to assess signs and subsequent impact on daily activity of living in pediatric patients with cervicofacial LMs. We found strong evidence for adequate content validity in patients with cervicofacial lymphatic malformations, which is promising. Yet, future studies in larger samples should further investigate the internal consistency, reliability, measurement error, structural validity, hypotheses-testing for construct validity, cross-cultural validity and responsiveness of this instrument. The LMF was only published recently and has not yet been used in prospective studies [27]. Responsiveness has not been investigated, which is a crucial aspect when evaluating treatment outcome. A disadvantage is the applicability to a subset of pediatric patients with cervicofacial LMs only. To broaden its applicability, further validation of this instrument in other types of cervicofacial vascular malformations would be necessary.

The other included validation studies only concerned HRQoL instruments that were either generic or developed for other conditions with clinical similarities to vascular malformations. Since these instruments were not validated for vascular malformations, we evaluated the measurement properties of these instruments in similar clinical populations. The generalizability of these results may be debatable, but it reflects the best available evidence for exploring which instruments show the greatest promise for use in vascular malformation research.

For assessing HRQoL, the SF-36 is a promising measure for adult patients as its measurement properties are well-investigated in diseases that are clinically similar to vascular malformations. Fewer validation studies in smaller patient populations with clinical similar diseases were available for the EQ-5D. The SF-10, FACT-G and FACIT were previously used for vascular malformations, but have not been validated for this condition or a similar disease. They may be equally applicable to vascular malformations in terms of item relevance, comprehensiveness and comprehensibility, but to date there is no evidence to support this. For children with vascular malformations, the PedsQL was the only HRQoL instrument used. This instrument was investigated in children and parents with neurofibromatosis and seems favorable with regard to the measurement properties analyzed so far [19, 23].

The CIVIQ-20 [41] has adequate measurement properties for patients with venous insufficiency and therefore it may be worthwhile to further investigate this instrument for vascular malformations of the lower extremities. This may also apply to other instruments for similar diseases that have not yet been used for vascular malformations, like VEINES-QoL [42] or the Nottingham Health Profile [43] for varicose veins. However, the face and content validity of these instruments may be suboptimal for capturing the most important aspects for patients with vascular malformations, or may only be applicable to a specific type or location of the vascular malformation.

Interestingly, all included validation studies used classical test theory (CTT) methods, as opposed to item response theory (IRT). In CTT, measurement properties are assessed on instrument-level, depending on the items and study sample used, whereas IRT has an item-level focus [44]. The individual validation of items in IRB-based instruments facilitates computer-adaptive testing, in which the items used in the questioning process adapt to the respondent’s previous answers [45], and linkage with existing item banks such as the ‘Patient-Reported Outcomes Measurement Information System’ [PROMIS] [46].

There were no validation studies available for instruments measuring the patient- or physician-reported core domains for vascular malformations pain, overall severity of symptoms, patient satisfaction with treatment and outcome nor for the recommended domains appearance and recurrence. Radiologic assessment of the vascular malformation fell outside the scope of this study as we focused on patient/physician-subjective instruments. Further research is required to determine which radiologic imaging modalities are suitable for measuring treatment outcome.

It seems necessary to develop a new disease-specific instrument for vascular malformations, or a disease-specific attribution module that can be used alongside a generic instrument, to adequately cover all previously established core domains. In general, disease-specific instruments are also more likely to pick up small differences in quality of life caused by disease burden than the broad generic instruments.

Conclusion

This study provides information on the available evidence for the quality of patient- and physician-reported outcome measurement instruments that have been developed or have previously been used for peripheral vascular malformations. The LMF is the only available disease-specific instrument for assessing signs and life impact in children with cervicofacial LMs. The identified generic HRQoL instruments, of which SF-36 (for adults) and PedsQL (for children) seem the most widely applicable, most investigated and promising in terms of measurement properties, may be used but it remains unclear if these instruments are responsive to treatment-induced changes in health in patients with vascular malformations. Further research into measurement properties may therefore be necessary to assess if the instruments that were identified in this systematic review are suitable for inclusion in the COS. It is likely that new disease-specific instruments need to be developed to adequately cover all core domains for vascular malformations.

The results of this review will be used as input for the future consensus meeting with all stakeholders aiming to reach consensus on the core outcome measurement instruments.