Introduction

This paper delves into the intricate dynamics of flight schedules, defined as the entitlement of an aircraft to utilize specific infrastructure and services at a predetermined date and time for arrivals and departures at a given airport. Serving as a fundamental and highly valued public resource for airline operations, flight schedules are crucial, especially in the context of busy airports like primary and secondary coordinated airports1. These airports have evolved from a singular management approach to schedule capacity (also known as coordination parameters or declared capacity) to publishing hourly coordination parameters throughout the day. These parameters essentially delineate the number of flight schedules permissible within specific 1-h or 15-min intervals, typically represented by the planned count of takeoffs and landings for each coordination interval2.

In practice, coordination parameters set the upper limits for schedule allocation by airports, epitomizing the supply-side assurance capability. Despite formal adherence to schedule parameters set by authorities, structural inherent delays are a common occurrence in actual operations. From a scheduling technical perspective, the root cause of these delays is often the disparity between the timetable used for capacity assessment and the actual flight schedule post-capacity approval, particularly in a stable operational environment. This discrepancy, sometimes as high as 10–30%, significantly impacts strategic management and tactical control decisions related to traffic flow3.

Single airport schedule allocation is fundamentally a resource-constrained distribution problem, predominantly managed through administrative means grounded in operations research theory. Yet, establishing a robust declared capacity for individual airports presents a persistent challenge. This declared capacity must strike a balance between maximizing capacity utilization and maintaining service quality, like minimizing flight delays or ensuring punctuality. It means to incorporate weather-related flight delays into the flight schedule coordination parameters, thereby obtaining parameters that minimize delays and enhance efficiency.

Traditionally, there are three primary methodologies for predicting flight delays: statistical inference4, simulation5 and network modeling6, and machine learning-based approaches7. While statistical inference addresses complex data structures and simulation methods provide valuable insights, machine learning offers enhanced predictive capabilities and efficiency8. It thrives in big data environments and offers high prediction accuracy. Notably, most delay prediction research presupposes a fixed schedule profile structure, but determining a practical flight schedule profile based on service level is an inverse problem in itself9.

This paper aims to unravel the intrinsic relationship between delays and schedule profiles, gleaning insights from historical flight data and integrating weather variables. In a stable operational scenario, we explore the range of profile distribution from an operational efficiency viewpoint and propose an optimized profile structure. This approach not only enhances the understanding of schedule dynamics but also contributes to more effective and efficient airport schedule management.

Algorithm Introduction

K-means clustering algorithm

K-means clustering algorithm is designed to categorize ‘n’ samples into ‘k’ distinct clusters based on their similarities. The fundamental principle of this algorithm is to ensure that samples within each cluster exhibit high similarity amongst themselves, while maintaining a lower degree of similarity when compared to samples in other clusters10. This method effectively groups data points that share common characteristics, facilitating a more organized and insightful analysis. The algorithm flowchart is shown in Fig. 1:

Figure 1
figure 1

K-means clustering algorithm.

Random forest classification and regression model

Random forest (RF) is an advanced adaptation of the Bagging (Bootstrap AGGregatING) algorithm, situated within the realm of parallel ensemble learning. This sophisticated approach builds on the foundation of the Bagging ensemble, utilizing decision trees as its core learning models. What sets random forest apart is the incorporation of random attribute selection in the training phase of the decision trees. This distinctive feature enhances the algorithm’s ability to handle diverse datasets and improves its overall predictive accuracy and robustness11,12,13.

Classification model

  1. 1.

    Input sample set

    $$D{ = }\left\{ {\left( {x_{1} ,y_{1} } \right),\left( {x_{2} ,y_{2} } \right), \cdots ,\left( {x_{m} ,y_{m} } \right)} \right\}$$
    (1)
  2. 2.

    Perform the t-th random sampling in the training set containing multiple feature items, collecting \(m\) times to obtain the sample set \(D_{t}\). Train the T-th decision tree model \(GT\left( x \right)\) with \(D_{t}\), training nodes and dividing them into left and right subtrees.

  3. 3.

    Classification typically employs a majority voting method to determine the final result. The formula for output result determination is as follows:

    $$H(X) = \arg \mathop {\max }\limits_{Y} \sum\limits_{i - 1}^{k} {I\left( {h_{i} \left( x \right) = Y} \right)}$$
    (2)

    where \(H\left( x \right)\) represents the final output result of the classification model,\(h_{i} \left( x \right)\) denotes the output result of a single decision tree,\(Y\) represents the value of the output variable. \(h_{i} \left( x \right) = Y\) means that it is 1 when the value corresponding to the output variable appears in the data set, and 0 when the value corresponding to the output variable does not appear in the data set.

Regression model

  1. 1.

    Input sample set

    $$D{ = }\left\{ {\left( {x_{1} ,y_{1} } \right),\left( {x_{2} ,y_{2} } \right), \cdots ,\left( {x_{m} ,y_{m} } \right)} \right\}$$
    (3)
  2. 2.

    Construct a binary decision tree. Select the optimal splitting variable\(h\) and splitting point \(j\)8, and solve using the following formula:

    $$\mathop {\min }\limits_{h,j} [\mathop {\min }\limits_{{c_{1} }} \sum\limits_{{x_{1} \in R_{1} \left( {h,j} \right)}} {\left( {y_{i} - c_{2} } \right)^{2} + } \mathop {\min }\limits_{{c_{2} }} \sum\limits_{{x_{1} \in R_{2} \left( {h,j} \right)}} {\left( {y_{i} - c_{2} } \right)^{2} } ]$$
    (4)

Traverse the variable \(h\), scan the splitting point \(j\) for a fixed splitting variable \(h\), and select the \(\left( {h,j} \right)\) that minimizes the value of the above formula.

  1. 3.

    Divide the region using the selected pair \(\left( {h,j} \right)\) and determine the corresponding output values \(T_{m}\):

    $$R_{1} \left( {h,j} \right) = \left\{ {x\left| {x^{\left( h \right)} \le j} \right.} \right\}$$
    (5)
    $$R_{2} \left( {h,j} \right) = \left\{ {x\left| {x^{\left( h \right)} \le j} \right.} \right\}$$
    (6)
    $$T_{m} = \frac{1}{{N_{m} }}\sum\limits_{{x_{i} \in R_{m} \left( {h,j} \right)}} {y_{i} } ,x \in R_{m} ,m = 1,2$$
    (7)
  2. 4.

    Repeat steps (2) and (3) for the two subregions until the stopping criteria are met.

  3. 5.

    Divide the input space into N regions \(R_{1} ,R_{2} , \ldots ,R_{N}\), and generate the decision tree.

Data construction, processing, and analysis

In this study, we meticulously select and analyze actual operational data from the airport, encompassing 19 pertinent feature items. Each data point in this comprehensive collection includes critical information such as the flight number, departure and arrival airports, actual takeoff time, planned takeoff time, among others. To provide a clear illustration of the dataset used, some exemplary data points are presented in Table 1. This data forms the backbone of our analysis, offering a detailed snapshot of real-world flight operations.

Table 1 Example of actual operational data for some flights.

For this study, weather data was sourced from [Xihe Energy](https://xihe-energy.com/#climate), where we specifically selected historical weather data relevant to the airport, encompassing a comprehensive set of 13 feature items. Each entry in this dataset provides a detailed account of weather conditions, including but not limited to the date, time, temperature, air pressure, humidity, precipitation, meridional and zonal winds, surface wind speed, wind direction, as well as various measures of radiation—surface horizontal, scattered, and direct radiation14,15. To offer a clear view of the weather data utilized, a selection of this information is systematically presented in Table 2. This dataset is integral to our analysis, providing vital insights into the environmental factors affecting flight operations. Among them, actual operational data provides information about flight status and execution, while weather data provides meteorological conditions that may be affected during flight.

Table 2 Some weather data examples.

In this research, the actual operational data from the airport and the corresponding weather information underwent meticulous data cleaning processes. This involved the elimination of redundant information and the completion of any missing data, ensuring the integrity and reliability of the dataset. Given the variability in the dimensions of each feature vector, which could potentially influence the outcomes of our experiments, we employed min–max standardization as a normalization technique. This method standardizes the data, bringing all feature vectors onto a comparable scale, thus enhancing the accuracy and efficacy of our analytical models.

$$x{\prime} = \frac{{x - x_{\min } }}{{x_{\max } - x_{\min } }}$$
(8)

where \(x^{\prime}\) represents the standardized data,\(x\) is the original data. Min–max normalization maps the original data to the interval \(\left[ {0,1} \right]\).

Experiment and analysis

Construction of flight schedule profile

In this study, a sample analysis is performed using historical data from Harbin Taiping International Airport, covering a period from April 9, 2018, to April 4, 2023, 118,832,507 pieces of data in total. The data includes the peak season before the epidemic, the middle and low periods of the epidemic, and the recovery period after the epidemic, with a wide coverage and more realistic fit. The data analysis is conducted on an hourly basis, focusing on the frequency of takeoffs and landings within each hour. This approach provides a detailed overview of airport activity patterns, allowing for a granular understanding of operational dynamics. The summarized data, capturing the number of takeoffs and landings in each hourly interval, is systematically presented in Table 3. This representation not only offers valuable insights into the airport’s operational trends but also serves as a foundation for further in-depth analyses.

Table 3 Number of inbound and outbound flights per time period.

The research aims to investigate the distribution patterns of the airport’s arrival and departure flight schedule structure. To achieve this, we utilize the count of arrivals and departures across each of the 24-h periods within a day as key clustering variables. Employing the K-means clustering algorithm, we cluster the arrival and departure flights into distinct groups. As an integral part of this analysis,the sum of squared errors (SSE) value is calculated using formula (9). This approach not only allows for a nuanced understanding of the flight schedule distribution but also aids in identifying operational patterns and potential areas for efficiency enhancement in airport scheduling:

$$SSE = \sum\limits_{j = 1}^{K} {\sum\limits_{{x \in c_{j} }}^{{}} {\left( {x - m_{j} } \right)^{2} } }$$
(9)

where \(K\) represents the number of clusters,\(c_{j}\) denotes the set of data samples within cluster \(j\), and \(m_{j}\) represents the centroid of cluster \(c_{j}\), that is to say, the average position or center point of each cluster. The line graph of the number of clusters versus SSE values is shown in Fig. 2:

Figure 2
figure 2

K-means clustering number and SSE value line graph.

In accordance with the “elbow method,” as depicted in the accompanying graph, we observe that the SSE value progressively diminishes as the number of clusters increases. However, a noticeable deceleration in the rate of this decrease becomes evident. Beyond a certain point, the reduction in the SSE value markedly slows down, suggesting an optimal clustering balance. This trend indicates that selecting 4 as the number of clusters achieves a judicious equilibrium between the effectiveness of clustering and the complexity of the model. Consequently, this leads to the identification of four distinct types of arrival and departure flight schedule structures. The average values of these identified structures are concisely illustrated and presented in Fig. 3, offering a clear and detailed representation of the scheduling patterns:

Figure 3
figure 3

Four clustering results and their average values of time profile structure in a day.

Analyzing the line graph depicting the hourly arrival and departure schedule structure post-clustering, distinct patterns emerge among the clusters. Cluster 3 is characterized by consistently higher flight frequencies, whereas Cluster 2 exhibits lower frequencies. Meanwhile, Clusters 1 and 4 demonstrate moderate frequencies. Taking into account the historical time span of the data, these four distinct schedule profiles encapsulate the characteristic flight schedules of Harbin Airport. They effectively represent the varying operational phases: the pre-pandemic peak season, the subdued period during the pandemic, and the distinct non-pandemic winter-spring and summer-autumn flight seasons. This analysis not only highlights the adaptability of the airport’s scheduling but also provides valuable insights into its response to varying seasonal and extraordinary circumstances.

Random forest flight delay prediction model

Classification model

Here we define a specific hyperparameter search space and proceed to construct a random forest classifier. To fine-tune the model for optimal performance, we employ the Optuna framework for hyperparameter optimization16. The best combination identified through this process includes a configuration of 100 trees in the forest, a maximum tree depth of 5, a minimum of 4 samples required at leaf nodes, and a minimum of 3 samples required to split a node.

To assess the model’s predictive capability, we plot the receiver operating characteristic (ROC) curve. This curve is an essential tool for visualizing the performance of binary classifiers, plotted with the true positive rate (sensitivity) on the y-axis against the false positive rate (1-specificity) on the x-axis. The ROC curve is constructed based on various cut-off points or decision thresholds, enabling a clear observation of the method’s accuracy.

The effectiveness of the model is further quantified by the area under the curve (AUC) value, which is the integral area under the ROC curve. The AUC value serves as a robust measure of the model’s overall performance. An AUC value above 0.5 is indicative of increasing accuracy, with values approaching 1 denoting higher precision. Specifically, AUC values ranging between 0.5 and 0.7 represent lower accuracy, between 0.7 and 0.9 signify moderate accuracy, and values above 0.9 reflect high accuracy. The AUC value can be calculated using a specific formula, providing a reliable metric to gauge the model’s efficacy:

$$AUC = \frac{1}{2}\sum\limits_{i = 1}^{m - 1} {\left( {x_{i + 1} - x_{i} } \right)\left( {y_{i} + y_{i + 1} } \right)}$$
(10)

Among them, xi represents the i-th sample of false positive rate, yi represents the i-th sample of true positive rate. Plot the ROC curve for both the training and test sets based on the calculated AUC values, as shown in Figs. 4 and 5:

Figure 4
figure 4

Random forest ROC curve.

Figure 5
figure 5

Support vector machine ROC graph.

The results of a comparative analysis between the random forest model and the support vector machine (SVM) model reveal a clear distinction in performance on the test set. It is evident that the random forest classification model exhibits preferable generalization capabilities when predicting new data, outperforming the SVM in this regard. To further validate this finding, flight operation data was input into both the established random forest model and the SVM classification model, serving as a test set for delay prediction.

The comparative results were quite telling the random forest model achieved an impressive accuracy rate of 98%, surpassing the SVM, which recorded an accuracy of 96%. This higher accuracy rate signifies the random forest model’s enhanced predictive reliability and efficiency. Consequently, based on these findings, the random forest model has been chosen as the preferred classification prediction model for data training in our study. This decision underscores the model’s robustness and suitability for complex predictive tasks in flight operation scenarios.

Regression model

To rigorously evaluate the random forest regression model, we partitioned flight and weather data as analyzed earlier, into a training set and a test set, allocating 20% of the data to the test set. This strategic division allows for a comprehensive assessment of the model’s predictive performance. We then established the random forest model, conducting a series of experiments with varying values of n_estimators (the number of decision trees in the forest) and Max_depth (the maximum depth of each decision tree).

Through iterative testing and optimization, we fine-tuned the parameters for constructing the random forest regression model. The finalized configuration is as follows: the model consists of 200 decision trees, each with a maximum depth of 12, and we employed a random seed number of 420. This specific combination of parameters was chosen to ensure the most accurate and reliable regression predictions, reflecting a well-balanced approach to model complexity and generalization capability.Evaluate the model using R2 (coefficient of determination):

$$R^{2} = 1 - \frac{{\sum\limits_{i = 1}^{n} {\left( {\hat{y}_{i} - \overline{y}} \right)^{2} } }}{{\sum\limits_{i = 1}^{n} {\left( {y_{i} - \overline{y}} \right)^{2} } }}$$
(11)

where \(R^{2}\) is in the range [0,1], yi represents the actual value of the i-th sample,\(\hat{y}_{i}\) represents the predicted value of the i-th sample,\(n\) is the number of samples, and \(\overline{y}\) is the average value of the dependent variable. The closer \(R^{2}\) is to 1, the better the model’s fit.

MAPE (mean absolute percentage error) is used to measure the average relative error between predicted values and actual values:

$$MAPE = \frac{{\sum\limits_{i = 1}^{n} {\left( {\left| {\frac{{y_{i} - \hat{y}_{i} }}{{y_{i} }}} \right|} \right)} }}{n}$$
(12)

where MAPE is in the range \([0, + \infty )\), with lower values indicating greater accuracy in the model’s predictions.

Table 4 provides examples of some predicted values and their corresponding actual values:

Table 4 Comparison of sample values obtained from partial test set and real set.

We now compare the random forest regression model with the SVM regression model, using the same training and test sets. Results are shown in Table 5:

Table 5 Comparison between partial test set and real set.

The findings presented in Table 5 clearly indicate that the random forest regression model surpasses the support vector regression (SVR) in terms of predicting delays. This superior performance underscores the effectiveness of the random forest model in handling the complexities of delay prediction. Consequently, we have selected the random forest regression prediction model as our preferred tool for forecasting delays, particularly in scenarios involving future weather data. This decision is based on the model’s demonstrated accuracy and robustness, making it an invaluable asset for reliable delay prediction.

Construction of flight schedule profile driven by flight delay prediction

Establishment of the partial least squares regression prediction model

Flight delay refers to situations where the landing time of a flight (the actual arrival time of the aircraft at the gate) is delayed by more than 15 min compared to the planned landing time (as per the flight schedule) or when a flight is canceled 17,18,19,20,21,22,23.

In Sect. "Regression model", we organized and filtered the delay time data predicted by the model, specifically focusing on instances where the delay time exceeded 15 min. This process enabled us to isolate and select the most pertinent data for our analysis. Subsequently, we calculated the number of delayed flights for each hour, applying the K-means clustering algorithm to categorize these figures across the different hours of the day, much like the approach we adopted in Sect. "Construction of flight schedule profile".

This methodical clustering resulted in the identification of four distinct types of 24-h arrival and departure delay flight schedule structures. These structures, each with their unique characteristics and patterns, have been systematically illustrated in Fig. 6. This classification not only provides a clearer understanding of delay trends throughout the day but also offers valuable insights for optimizing flight schedules to mitigate delays:

Figure 6
figure 6

Time structure of arrival and departure delayed flights.

The graph reveals distinct patterns among the clusters. Cluster 4 is characterized by having the highest frequency of flights, indicating periods of peak activity. In contrast, Cluster 3 exhibits lower flight frequencies, suggesting less busy intervals. Meanwhile, Clusters 1 and 2 display moderate frequencies, representing more balanced operational periods.

To provide a comprehensive overview, we take the average values from each cluster as a baseline. These averages are then integrated with the average values of arrival and departure flights derived from Section "Construction of flight schedule profile". This holistic approach allows for a more nuanced understanding of the flight schedule dynamics throughout the day. The consolidated results, which blend both the frequency and timing of flights, are succinctly summarized in Fig. 7. This visualization not only aids in identifying patterns of flight schedules but also serves as a valuable tool for optimizing operational efficiency and managing airport traffic flow:

Figure 7
figure 7

Summary time structure.

The graph provides a revealing look into the operational dynamics at the airport. It illustrates that the trends of departure flights and their corresponding delays closely mirror each other, as do the trends for arrival flights and their associated delays. This parallelism highlights a consistent pattern in both arrival and departure aspects of airport operations.

Furthermore, the graph indicates a significant interplay between arrival and departure flights. A notable observation is that high volumes of arrival flights often correlate with a lower number of departures during the same intervals. This inverse relationship suggests a strong mutual influence and constraint between the two, revealing a high degree of correlation in their operational patterns.

To delve deeper into this relationship, partial least squares regression is employed. This method is particularly adept at analyzing scenarios where the independent variables are highly correlated, allowing for a nuanced understanding of the regression relationship between independent and dependent variables. By employing this approach, we can more accurately model and predict the complex interactions within airport operations. The detailed findings of this analysis, which shed light on the intricate dynamics between arrival and departure flights and their delays, are systematically compiled and presented in Table 6:

Table 6 Model regression coefficient test.

Therefore, the regression relationship expression between the dependent variable \(Y\) and all independent variables \(X\) can be obtained:

$$Y_{1} { = } - 0.158X_{1} + 0.321X_{2} + 0.879X_{3}$$
(13)

where \(Y_{1}\) represents departure flights of Harbin Taiping International Airport,\(X_{1}\) represents arrival flights of Harbin Taiping International Airport,\(X_{2}\) represents delayed departure flightsof Harbin Taiping International Airport, and \(X_{3}\) represents delayed arrival flights of Harbin Taiping International Airport.

Construction of flight schedule profile satisfying service level

To commence our analysis, the first step involves downloading the weather data for a specific flight season. This data is then fed into the random forest flight delay prediction model for training purposes. The outcome of this process is a detailed prediction of flight delay times, broken down into hourly intervals for each day. Subsequently, we focus on identifying and filtering out the delayed flights, followed by counting the number of delayed arrivals and departures within each time period.

The second phase of the analysis entails conducting dimensionality reduction and clustering on the weather data. For dimensionality reduction, we employ principal component analysis (PCA), a technique that simplifies the complexity of the data while preserving its essential patterns. The results of the PCA are visually represented in a line graph that displays the weight proportion of each principal component, as illustrated in Fig. 8. This graph offers insights into the relative importance of different components in the dataset.

Figure 8
figure 8

Line chart of weight proportion.

Additionally, we explore the relationship between the sum of squared errors (SSE) and the number of clusters (K). This relationship is crucial in determining the optimal number of clusters for our analysis. A graph plotting SSE against different values of K is presented in Fig. 9. This graph is instrumental in identifying the ‘elbow point,’ which indicates the most suitable number of clusters for effectively segmenting the data. Together, these steps provide a comprehensive framework for analyzing and understanding the impact of weather on flight delays:

Figure 9
figure 9

SSE and k-value relationship diagram.

In our approach to dimensionality reduction using principal component analysis (PCA), a crucial step is to observe the variance explanation ratio of each principal component. This ratio indicates how much of the total variance in the dataset is captured by each component. By focusing on this metric, we can determine the significance of each principal component in representing the dataset’s variability.

To optimize the dimensionality reduction process, we retain only the first few principal components that exhibit higher variance explanation ratios. These components are the most informative, capturing the essence of the dataset while reducing its complexity. By selecting principal components with the greatest explanatory power, we ensure that the most critical features are preserved, thereby maintaining the integrity of the data after dimensionality reduction.

This selective retention of principal components effectively balances the need for simplifying the data with the necessity of retaining its most meaningful aspects. The resultant reduced-dimensional feature set, comprising these principal components, forms the core of our subsequent analyses, providing a more focused and efficient dataset for further exploration and modeling.

The selection of the optimal number of clustering centers in K-means clustering is a critical step in our analysis, and it is guided by the relationship graph depicting the sum of squared errors (SSE) against the number of clusters. From this graph, we look for the point where the increase in the SSE value starts to plateau or slow down significantly against different values of K is presented in Fig. 9. This point, often referred to as the elbow, is indicative of a balance between having too few and too many clusters. Choosing the number of clusters at this elbow point ensures that each cluster is meaningful and distinct, without overcomplicating the model.

For this study, based on the observed trend in the SSE graph, we determine the number of clustering centers to be 5. Alongside this, we select the first 6 principal components identified through PCA, as these components have the highest variance explanation ratios, thus encapsulating the most significant features of the weather data.

With these parameters established, we proceed to perform K-means clustering on the weather data, using the 6 principal components as features and setting up 5 clustering centers. The clustering process groups the data into meaningful categories based on the weather patterns. The results of this clustering, which reveal the distinct weather profiles and their corresponding groupings, are visually represented and presented in Fig. 10. This step is pivotal in understanding the diverse weather conditions and their potential impact on flight operations:

Figure 10
figure 10

Weather clustering result graph.

Following the clustering outcomes, we focus on identifying specific weather conditions classified as either adverse or favorable, and their respective impact on the average number of delayed arrival and departure flights. Adverse weather conditions are defined as those with precipitation exceeding 3 mm per hour and wind speeds greater than 17.2 m per second. Conversely, favorable weather conditions are characterized by no precipitation (0 mm per hour) and wind speeds less than 5 m per second.

Once these weather conditions are identified from the clustering results, we calculate the average number of delayed flights—both arrivals and departures—associated with each of these conditions. This calculation provides a clear indication of how different weather conditions affect flight punctuality and scheduling.

Additionally, the average number of delayed flights corresponding to other, more neutral weather conditions is determined. This data serves as a benchmark for what we term the ‘normalized schedule structure.’ It essentially represents the typical delay scenario under average weather conditions, providing a baseline against which the extremes of adverse and favorable weather can be compared.

The summarized findings, including the impact of both adverse and favorable weather on flight delays, along with the normalized schedule structure, are clearly illustrated in Fig. 11. This visual representation offers valuable insights into how varying weather conditions influence flight scheduling and delays, enabling more informed decision-making in airline and airport operations:

Figure 11
figure 11

The time structure of arrival and departure delay flights under different weather characteristics.

To further our analysis, we integrate the results derived from the earlier steps, along with the data on planned arrival and departure flights, into the partial least squares (PLS) regression prediction model. This model is particularly adept at handling complex datasets with multiple variables and discovering underlying relationships.

By processing this combined data through the PLS regression model, we are able to predict the schedule structure of arrival and departure flights under various time characteristics. This prediction takes into account not only the planned flight schedules but also the influence of weather conditions and their associated impact on flight delays.

The outcomes of this predictive analysis offer a comprehensive view of how the schedule structure for arrivals and departures is likely to vary across different times, factoring in both the standard scheduling plans and the potential deviations caused by varying weather conditions. These nuanced insights are crucial for effective flight schedule planning and management.

The results of this predictive modeling, depicting the schedule structure under different temporal and environmental conditions, are clearly presented in Fig. 12. This visual representation serves as a valuable tool for understanding and anticipating the dynamic nature of flight scheduling, allowing for more efficient and responsive airport and airline operations:

Figure 12
figure 12

Time structure of inbound and outbound sorties under different weather characteristics.

The analysis yields distinct 24-h arrival and departure schedule structures under varying weather conditions. In the graphical representation of these structures, each time slot is defined by the upper and lower limits of three curves corresponding to that specific interval. The lower limit curve symbolizes the schedule parameters under adverse weather conditions, characterized by factors such as high precipitation and strong winds. Conversely, the upper limit curve corresponds to favorable weather conditions, like clear skies and mild winds.

The area enclosed between these two curves represents the schedule structure under normalized weather conditions—essentially, the typical scenario that falls between the extremes of adverse and favorable weather. This delineation is crucial as it suggests that reducing the number of flights during adverse weather conditions, as compared to favorable weather, can mitigate delays to a certain degree.

Moreover, the flight arrangements that fall within this defined range are indicative of a balance being struck between maintaining a level of flight normalcy and meeting market demand. This range serves as a guideline for airports to formulate their planned schedule structures. It provides a strategic framework for optimizing flight schedules, taking into account the fluctuating nature of weather conditions and their impact on flight operations, thus aiding airports in enhancing operational efficiency and passenger satisfaction.

Conclusion

The construction of a schedule structure in civil aviation schedule management is a delicate equilibrium between market demand and the availability of scheduling resources. It also reflects the operational assurance capabilities of the aviation system. One of the primary challenges lies in developing a viable 18–24 h schedule profile for busy airports that aligns with the expectations of declared capacity and accommodates schedule changes, while maintaining a sense of normalcy.

A crucial aspect of this challenge is establishing a link between flight delays and the frequency of takeoffs and landings. In this context, the use of random forest classification and regression models for flight delay prediction has demonstrated superior fitting accuracy and precision compared to support vector machines. The regression model, in particular, formulates a mathematical relationship between departure flights, arrival flights, and delays.

By employing clustering techniques, complex weather conditions are categorized into adverse, favorable, and normalized weather scenarios. This categorization allows for the derivation of delay profiles under each of these conditions. Subsequently, the regression model outlines the upper and lower limits of delays, along with the expected schedule structure for each weather scenario.

However, it’s important to note that the differentiation between adverse, favorable, and normalized weather in this study is somewhat broad, based primarily on clustering results. Future research can delve deeper into refining and studying normalized weather information, thus providing a more nuanced and comprehensive basis for constructing flight schedule structures.

Looking ahead, in a stable operational environment, more detailed delay predictions and capacity profile constructions can be achieved through refined weather clustering. This enhancement will enable the determination of reasonable daily flight levels for airports across different flight seasons. Such advancements would be invaluable in assisting schedule management for authorities and air traffic flow management during various strategic and tactical periods. Additionally, it provides airlines and airports with detailed decision-making information for assessing delay risks, allocating capacity and resources, and adjusting schedule arrangements.

Overall, this approach lays a refined and comprehensive foundation for future research, significantly aiding in the effective management of flight schedules, air traffic flow, and resource allocation. It holds particular promise in assessing and mitigating delay risks and in adjusting schedules to optimize operational efficiency.