Introduction

Breast cancer is the most common cancer among women worldwide. According to the 2020 Global Cancer Burden data released by the International Agency for Research on Cancer (IARC), the number of newly diagnosed breast cancer cases in Chinese women surpassed lung cancer for the first time, making breast cancer the leading newly diagnosed cancer among women in China1. Many countries have established national systems using a framework of processes and outcome indicators to monitor the performance of healthcare institutions to improve breast cancer treatment2,3. For instance, under the leadership of the Dutch National Breast Cancer Working Group (NABON) and Breast Cancer Audit (NBCA), the Netherlands developed a set of indicators to assess the quality of breast cancer treatment4. In China, our research team developed a set of breast cancer care quality indicators tailored to the Chinese healthcare system based on differences in clinical practices and healthcare systems5. These indicators are used to monitor and enhance breast cancer care quality in China, minimize management disparities across healthcare institutions, and ensure treatment quality. Despite years of quality assessment efforts, significant differences in breast cancer treatment persist between China and other high-income countries, as well as among medical institutions in different regions of China.

Public reporting of quality indicator results can promote hospitals’ engagement in quality improvement activities and alleviate differences in quality care among hospitals6,7. Most studies evaluating and comparing treatment quality rely on outcome indicators because outcome-based data are relatively clear and easily accessible8,9. However, in recent years many scholars have found that relying solely on outcome indicators such as recurrence rates cannot accurately measure the quality of care10. First, traditional outcome indicators, such as mortality and recurrence rates, have gradually declined due to recent advances and improvements in breast cancer treatments11. The reliability of using outcome indicators alone to measure quality of care remains questionable. Second, when there are significant differences in the number of patients and incidence rates between hospitals, especially those with small sample sizes, the statistics are unstable and the signal-to-noise ratio is small12. Differences in outcome indicators may be highly contingent and do not accurately measure the quality of care in hospitals, and better indicators are needed to assess the quality of care13. Compared with outcome indicators, process indicators are more sensitive to whether patients receive accurate care and are easier to intervene promptly and are more likely to intervene in a timely manner, and process indicators are easier to measure, making them an important tool for assessing and improving quality of care. Moreover, many medical processes are closely related to improvements in patient prognosis14. Finding a reliable process indicator for measuring breast cancer treatment quality is useful for decision-makers who want to understand better the practical benefits of the interventions they might implement. For individual patients, meaningful clinical improvements can be made based on the application of available resources, improving women’s tumor outcomes and quality of life.

In summary, despite significant advances in breast cancer treatment, assessing the quality of care in hospitals remains challenging. First, since there is no external benchmark value to objectively compare the quality of care across hospitals, the quality of hospital care can only be ranked through relative comparisons between hospitals. Therefore, in selecting a specific metric to compare and publicly report on the quality of breast cancer care across hospitals, it should be able to demonstrate differences between hospitals and reliably assess differences in the quality of care between healthcare organizations15. Currently, there is a lack of data to support the use of a specific metric to compare the overall quality of breast cancer care between organizations in the context of breast cancer care. In addition, current research has focused on outcome metrics, which often fail to provide direct and actionable insights about quality of care, largely ignoring the potential of process metrics to more directly reflect quality of care. We use rankability16 to measure the extent to which crude hospital performance comparisons reflect genuine differences in treatment quality. Our study aims to investigate whether process indicators provide a more reasonable assessment of hospital quality of care compared to outcome indicators.

Materials and methods

Design

This study is a retrospective analysis of data sourced from the Specific Disease Medical Service Quality Management and Control System in China. It includes 113,942 breast cancer patients treated in 298 hospitals between January 2019 and April 2023. The study aims to evaluate the effectiveness of process indicators compared to outcome indicators in assessing the quality of breast cancer care. To achieve this, we calculated the rankability of 11 process indicators and used these as weights to create a new composite indicator. The indicators were then compared using the observed-to-expected (O/E) ratio categories. Additionally, a sensitivity analysis was conducted using bootstrap sampling to determine the impact of different years on the results.

Study population

The data for this study were obtained from the Specific Disease Medical Service Quality Management and Control Systems in China. This system aims to collect and manage quality indicator (QI) data and serves as the receiving end of a nationwide, web-based, voluntary quality reporting platform for hospitals at all levels, which has been operational since 2009. The system collects retrospective data from secondary and tertiary hospitals across 31 provinces, autonomous regions, and municipalities in China. Studies have shown that this data system covers 74% of tertiary hospitals in China17. The number of single diseases monitored by the system has gradually increased from the initial 6 to 51, with breast cancer data being included in the system's collection scope since 2019.Hospitals voluntarily apply for monitoring system accounts and report data based on provincial evaluation requirements or the focus areas of hospital quality management. Data are reported by trained clinical data collectors using standardized definitions. Each participating hospital designates a surgeon or nurse responsible for reporting data through secure online forms. From this dataset, demographic, and clinical characteristics relevant to this study were selected.

A cohort of patients with breast cancer diagnosed between January 1, 2019, and April 30, 2023, were identified using the International Classification of Diseases, 10th revision (ICD)-10 codes. We restricted the sample selection to hospitals with more than 50 patients and those that continuously reported data from 2019 to 2023. The final sample included 113,942 breast cancer from 298 hospitals. The exclusion criteria are shown in eFigure 1. The need for ethical approval and informed consent was waived by the ethics committee as the data were completely anonymous and collected for hospital nursing quality management. All methods were performed in accordance with the relevant guidelines and regulations.

Definition of evaluation indices

Our research team developed a set of breast cancer care quality indicators tailored to the differences in clinical practices and healthcare systems in China. These indicators are designed to fit the Chinese healthcare system and are used to monitor and enhance the quality of breast cancer care. Additionally, countries with medical systems and professional cultures similar to China can share these quality assessment and improvement indicators. Process indicators depict how clinical doctors provide care to patients, covering a wide range from diagnosis to follow-up, and indicate whether appropriate care has been provided to eligible patients. Simultaneously, we identified 11 process indicators to record breast cancer treatment quality. Each parameter was considered present in a patient during diagnostic or therapeutic procedures. Each parameter was considered present in a particular patient when the diagnostic or therapeutic procedure was conducted and was indicated or was not conducted and was not indicated. The outcome indicator was the proportion of patients with postoperative complications. Quality indicators are described in a numerator/denominator format (Table 1, eTable 1).

Table 1 Definition of quality indicators.

Analytical method

Rankability

We chose rankability as a summary measure of the reliability of hospital rankings, a metric commonly used in the Netherlands and proposed by van Houwelingen et al.18. When comparing the treatment quality of hospitals directly using process or outcome indicators, unreliable rough comparisons often overlook random variations among hospitals. Additionally, differences in case mix could lead to a confounding bias. These two issues can ultimately cause variations between hospitals and inaccuracies in evaluating hospital treatment quality, which need to be adjusted. Rankability measures the signal-to-noise ratio, where the signal represents the true differences in treatment quality between hospitals and noise arising from imprecision caused by factors such as hospital sample sizes.

Rankability calculation is based on two components, variation within and between hospitals, computed using Fixed Effects (FE) and Random Effects (RE) models, respectively. In the FE model, hospitals are considered unique entities representing only themselves. The square of the median standard error (SE) of the hospital-level variable coefficients was used to estimate the variation within hospitals, reflecting the precision of the hospital coefficients. In the RE model, where hospitals are treated as random factors, the estimates obtained are called RE estimates, also known as “empirical Bayes estimates”. Since the hospital adherence rate follows a binomial distribution, the logit function was used as the link function to explore the variation in the composite and process indicators. We classified rankability as follows: low (< 0.5), moderate (0.5–0.75), and high (> 0.75).

Rankability-weighting method

Breast cancer treatment processes are influenced by numerous factors, making it challenging to evaluate them using a single process indicator. It is common to combine individual indicators into a comprehensive performance metric. We developed a new method for calculating comprehensive indicators. The rankability-weighting method is a new comprehensive indicator weighting method proposed in this study. This method is conceptually similar to the coefficient of variation method. It utilizes each indicator’s rankability (\(\rho\)) value as its weighting value, considering \(\rho\) as a coefficient measuring the amount of information. The \(\rho\) value can reflect the ability to differentiate hospital treatment quality differences. A larger \(\rho\) value indicates a greater relative degree of variation, implying that the indicator carries more information and, consequently, holds a higher weight. Conversely, if the variation is small, indicating that the indicator cannot effectively differentiate treatment quality differences between hospitals, the information it carries is lower; thus, its weight is smaller. Compared to traditional weighting methods, such as objective weighting, the coefficient of variation method, and the mean square deviation method, which determine weights based on the internal variability of indicators, the rankability-weighting method considers both inter-and intrahospital variations. It assesses the magnitude of assignment based on the degree of treatment quality variation at the hospital level. The calculation formulas are as follows:

$$\omega_{n} = \frac{{\rho_{n} }}{{\rho_{1} + \rho_{2} + \cdots + \rho_{n} }}$$
(1)
$$Score = \frac{{\omega_{1} \cdot \sum {Z_{1} + } \omega_{2} \cdot \sum {Z_{2} + \cdots + \omega_{n} \cdot \sum {Z_{n} } } }}{{\omega_{1} \cdot \sum {M_{1} + } \omega_{2} \cdot \sum {M_{2} + \cdots + \omega_{n} \cdot \sum {M_{n} } } }}$$
(2)

where \({\omega }_{n}\) represents the weight calculated using the indicator’s \(\rho\) value, Z and M denote the numerator and denominator of the indicator, respectively and subscript \(\text{1,2},\dots ,n\) represents each hospital number.

As the composite scores obtained are continuous variables, a linear model is applied, and there is no need to transform the dependent variable into a logit scale. After transforming the composite scores into binary variables based on the median of the scores, similar to the calculation for individual indicators to compute \(\rho\)

Indicator performance comparison

The O/E ratio represents the observed-to-expected ratio, commonly used in the medical and health fields to assess the correlation between observed and expected outcomes. It is often employed to compare the ratio of observed outcomes to expected outcomes for hospitals or other healthcare units and to evaluate their performance. Based on the O/E ratios, we classified hospitals performing better than expected as the “better group” (O/E > 1), and those performing worse than expected as the “worse group” (O/E < 1). We classified hospitals based on the hospital’s O/E ratio results for process and outcome indicators, enabling a comparison of the ability of two types of indicators to describe hospital performance.

Sensitivity analysis

The number of hospitals differed by year. To identify the impact of these changes on the interpretation of the data, we performed bootstrap sampling, where 50% of the hospitals involved in reporting were randomly selected each year. The stability of the RE model and the FE model estimation results were visualized by plotting graphs for the sample hospitals, and were replicated 500 times. All statistical analyses were conducted using SAS 9.2 (SAS Institute Inc., Cary, NC, USA) and R language (version 4.4.0, R Foundation for Statistical Computing, Vienna, Austria).

Results

Population and hospital characteristics

A total of 113,942 patients who underwent breast cancer treatment in 298 hospitals between January 1, 2019, and April 30, 2023, were evaluated. The patient and hospital characteristics are presented in Table 2. The average age was 53.19 ± 11.26 years, with a higher proportion of women aged over 46 years (75.34%). In terms of medical insurance, the majority utilized urban residents’ reimbursement methods. The patients primarily accessed treatment through outpatient services. Most hospitals were public institutions, predominantly tertiary level A hospitals, with specialized hospitals accounting for 33.8% (Table 2).

Table 2 Patient and hospital characteristics.

Adherence to performance measures of hospitals

Large gaps were observed in adherence rates and magnitude of change across indicators. Figure 1 presents adherence rates for each indicator at each hospital. The completion rate for recording the tumor size in the pathology reports (QI11) was the highest, with an average completion rate of 98.13%. The proportion of patients with breast cancer who underwent breast-conserving surgery (QI4) was the lowest (17.35%). The variation in the proportion of patients treated with trastuzumab, in whom heart function was monitored every three months (QI7), was the highest among the hospitals, with a standard deviation of 30.08%. The range of adherence rates for QI5, QI7, QI8, and QI10 was 100% between hospitals (eTable 2).

Figure 1
figure 1

Adherence to performance measures of hospitals. This graph shows adherence rates for process indicators (Q1–Q11) and outcome indicator for different hospitals. Each radial plot represents an indicator, and each point on the plot corresponds to the adherence rate for that indicator for a particular hospital. Concentric circles from the innermost to the outermost circle represent adherence rates from 0 to 100%, respectively.

The rankability of indicators and the rankability-weighting composite score

To compute the rankability of all the process indicators, the FE and RE models were adjusted for age, payment method, hospital grade, and hospital category as covariates. The highest \(\rho\) value is for QI1 (\(\rho\) = 0.919), which has high rankability. The outcome indicator has very low rankability (\(\rho\) = 0.011). Although adherence rates for QI7 varied widely between hospitals, the rankability of this indicator was relatively low owing to substantial within-hospital uncertainties. Therefore, it is not advisable to directly equate the magnitude of change in adherence rates with differences in the quality of care between hospitals.

This method, employed to calculate the composite scores, is a novel approach introduced in this study that incorporates the rankability of the included indicators as weights in calculating the composite scores. The composite scores obtained using this method were converted into binary variables based on the median values. The rankability of the binary composite scores was calculated, resulting in a rankability of 0.883. This indicated a high binary rankability-weighted composite score (Table 3).

Table 3 Rankability and a weighting factor of indicator.

Association between O/E ratio and hospital performance categories

The most commonly used performance metric for hospital performance comparisons is the O/E ratio, which categorizes each hospital’s performance. Many hospitals have different performance assessments when classified using outcome indicators or composite scores (Figure 2, Table 4). The number of hospitals in the better group, according to the composite score ad QI1, were greater than those using the outcome indicators. This means that more hospitals were able to perform better than expected when using the process indicators to categorize hospital performance. In addition, the comparison with categorical results of the outcome indicator has different performance classifications for 113 hospitals (37.92%) for composite scores and 140 (46.98%) for QI1. It can also be seen from the results presented in Table 4 that more hospitals shifted to more favorable categories when high-rankability process indicators were used to evaluate hospital performance instead of outcome indicators.

Figure 2
figure 2

Association between hospital volume and observed to expected (O/E) ratio. This scatterplot illustrates the distribution of hospitals based on the (O/E) ratios for the Hospital Composite Indicators, QI1, and Outcome Indicators. x-axis: represents the O/E ratio, where a value of less than 1 indicates better-than-expected performance, and a value of greater than 1 indicates poorer-than-expected performance. y-axis: represents the number of cases in a hospital. Red Dot: Hospitals with O/E ratios less than 1 are categorized in the "poorer performance group," indicating that these hospitals are performing worse than expected. Blue Dot: Hospitals with O/E ratios greater than 1 are categorized in the "Better Performing Group," indicating that these hospitals are performing better than expected.

Table 4 Comparison of hospital classifications.

Bootstrap sampling method

Given the different numbers of hospitals in different years and diseases, we estimated the trends for \({\tau }^{2}\) (RE model) and \({\sigma }^{2}\) (FE model) using the bootstrap method. The \({\tau }^{2}\) trend of most indicators was shown to be relatively stable and approximately linear, the trend of 500 samples were integrated into a cluster (Figs. 34), and \({\sigma }^{2}\) trend of most indicators were integrated into one or two clusters (eFigures 2, 3).

Figure 3
figure 3

The QIs estimates results FE model (\({\tau }^{2}\)) by bootstrap method. This graph shows the distribution of \({\tau }^{2}\) estimates for each Quality Indicator (QI) after 500 bootstrap samples. x-axis (QI): represents the different Quality Indicators, numbered from 1 to 11. y-axis (\({\tau }^{2}\)): represents the τ2 estimates, which indicate the differences between hospitals. Red dots: represent the true \({\tau }^{2}\) value for each quality indicator. Blue dash: represents the median estimated value of \({\tau }^{2}\) for each quality indicator. The results show that the proximity of the red dots to the blue dashes confirms the robustness and consistency of the bootstrap sampling results.

Figure 4
figure 4

Trends in QIs RE model estimates (\({\tau }^{2}\)) from 2019 to 2023 by bootstrap method. This graph shows the trend in the estimated values of each Quality Indicator (QI) \({\tau }^{2}\) from 2019 to 2023 based on 500 bootstrap samples. Each line represents the \({\tau }^{2}\) estimate for a single bootstrap sample for an indicator in a different year. The clustering of the lines for each quality indicator suggests that the bootstrap sampling results are robust and stable over time.

Discussion

This study aimed to quantify the reliability of process indicators for evaluating the quality of breast cancer treatment and identify an process indicator that could measure high-quality breast cancer care in healthcare institutions. We used rankability to assess the reliability of multiple process indicators, an outcome indicators, and composite score for ranking hospitals. This population-based exploratory study of patients with breast cancer was able to demonstrate that process indicators are superior and more reliable than traditional outcome indicator for ranking healthcare institutions on quality of care. We found that among individual process indicators, preoperative cytology or histological examinations were the best indicator reflecting differences in care quality among hospitals. In addition, our proposed composite score based on rankability- weighting method is equally reliable in treatment quality.

Our study demonstrates that process indicators are more effective than outcome indicators in assessing the quality of breast cancer care in hospitals. This finding is significant for clinical practice and healthcare quality improvement. Firstly, process indicators provide immediate and actionable insights into care quality. This aligns with previous studies, which have shown that process indicators can assess care quality and reveal differences between treatment centers19, and are important for improving clinical outcomes and enhancing patient safety20,21,22. In particular, process indicators are more sensitive to short-term outcome changes, supporting the work of N. E. Kolfschoten et al., who emphasized the responsiveness of process indicators in quality assessment23. Secondly, our results indicate significant variability in adherence rates among hospitals. Implementing standardized process indicators can help mitigate these disparities, ensure consistent care quality, and ultimately lead to better patient outcomes24. Among all process indicators proportion of breast cancer patients who had the diagnosis in cytology and/or histology before surgery (QI1), had a high \(\rho\) value, which can be a valid rankable indicator of the quality of care in healthcare organizations. There are several reasons for the differences in the presentation of histology and cytology between hospitals that we believe are as follows: preoperative cytology and histology in breast cancer patients usually require specialized equipment and medical personnel; the experience of the operator and interpreter involved affects the speed and accuracy of the pathology results, and some healthcare institutions may lack these resources, and uneven medical distribution of resources prevents widespread implementation of this test25; The current common non-surgical diagnosis of breast lesions involves a ”triple assessment” through physical examination, imaging (mammography and/or ultrasound), and pathology (cytology and histology)26. Different hospitals have different clinical procedures and standards, and some hospitals may not include this test in their standard operating procedures, which may affect the utilization rate; Some patients may not have sufficient knowledge about this test and prefer non-invasive methods such as ultrasound, so the doctor will determine the use of this test according to the patient’s needs.

This study introduced a novel objective weighting method for composite indicators using the rankability-weighting method. Unlike traditional methods, such as the coefficient of variation, the rankability-weighting method comprehensively considers variations between and within hospitals27. It determines the weight of each indicator based on the extent of treatment quality variation at the hospital level. It quantifies the differences caused by significant fluctuations in indicator occurrence rates owing to small sample sizes or variations in the case mix28. The rankability-weighting method posits that the rankability coefficient of an indicator represents the extent of treatment quality variation between hospitals. The higher the rankability coefficient (\(\rho\)), the more it reflects differences in treatment quality between hospitals. Consequently, indicators with higher \(\rho\) values have greater weight, indicating they carry more information. This was further reflected in the measurements of the composite indicators.

A major strength of our study is the large sample size, which enhances the generalizability of our findings. A new objective weighting method for composite indicators was introduced. In addition, the use of a robust statistical method, bootstrap sampling, enhances the reliability of the results. However, there are some limitations to consider. As a retrospective study, our analysis was limited by inherent biases in data collection and reporting. Reliance on data from the Specific Disease Medical Service Quality Management and Control Systems in China may limit the applicability of our findings to other healthcare settings. Different countries have unique healthcare systems, patient populations, and cultural backgrounds, which may affect the implementation and validity of these indicators. Therefore, the application of these indicators to a wider range of settings requires careful consideration of the cultural adaptation of the indicators. In addition, the randomness of repeated sampling may lead to a bias of repeated sampling, especially in the case of large data volume, therefore, the skewness of the sampling distribution will inevitably increase (Figs. 3, 4, eFigures 23).

Future work

To further validate and expand our findings, future research should focus on assessing the effectiveness and adaptability of process indicators across different countries and cultural contexts. It is essential to explore how these indicators can be adjusted to fit various clinical practices and healthcare systems, and investigate effective strategies for implementing these process indicators in diverse healthcare settings. Developing related policies and guidelines to ensure broad acceptance and application of these indicators is also crucial. Additionally, the application of indicators should be viewed more as a dynamic process than a static one. As adherence to certain indicators becomes sufficiently uniform, it is important to transition to new indicators to continuously drive quality improvement. This dynamic approach helps ensure that we are always assessing and improving the most relevant aspects of care. Therefore, future research should focus on developing and introducing new process indicators to address the evolving clinical practices and patient needs. While our study primarily focuses on short-term effects, future research should also evaluate the impact of process indicators on long-term patient outcomes and overall care quality.

Conclusion

Our findings indicate that process indicators are more appropriate than outcome indicators for assessing hospital breast cancer care quality, particularly in the short term, as they offer a more immediate and actionable reflection of care quality. By focusing on process indicators, healthcare providers can better identify specific areas for improvement and implement targeted interventions, driving continuous quality improvement efforts.