Keywords

1 Introduction

Production scheduling problems are essential for optimizing manufacturing processes and ensuring effective resource utilization. In other words, scheduling defines where and when production operations will be performed [1]. The production scheduling aims to optimize resource utilization, minimize the makespan, reduce global setup time, and satisfy customer demands [2]. According to Lawler et al. [3], in the majority of the time, scheduling falls under the category of non-deterministically polynomial (NP) time problems. To address the complex nature of production scheduling problem, advanced techniques have been developed. Some of these techniques are the mathematical optimization models, heuristic algorithms, and machine learning (ML) approaches. Important inputs that the methods above take into account is information as setup matrices, processing times, quantities to be produced, material availability, due dates, technical information, production capabilities of the production lines that such a technique is modelled. Additionally, the use of real-time data from the shop floor and the use of artificial intelligence (AI) techniques can improve adaptability in dynamic manufacturing environments. Moreover, using AI techniques in parallel with real-time data from the shop floor arises new challenges, i.e., real-time decision-making.

Digital twin (DT) is a technology that allows real-time monitoring and optimization of the physical environment, prosses, or assets [4]. DT is a technology that enables the user to gain insights on the model’s accuracy [5]. Using data from sensors and Internet of Thing (IoT) devices, DT creates a digital replica of the physical environment sharing the same characteristics and interactions of the physical counterpart [6]. DT have been used in many different domains, including healthcare, urban planning and manufacturing [7]. In a variety of industries, the combination of real-time data and the digital replica of the physical environment promotes the decision-making and allow the industries to continuously improve their performance [8, 9].

Asset Administration Shell (AAS) technology was introduced within the Reference Architecture Model Industry 4.0 (RAMI4. 0) [10] and has become a ground-breaking idea in manufacturing of how assets are managed and used [11]. AAS are standardized models that allow industries to combine the physical assets with their digital counterparts (i.e., machines, production systems, or tools), where AASs provide a framework to control and monitor the physical assets. AI scheduling agents have been used coupled with AAS concept in the literature [9, 12, 13]. Additionally, AI scheduling agents are intelligent autonomous systems that take as an input production system information to plan resource allocation tasks [14]. The AI scheduling agents play an important role due to the fact that they can generate a real-time efficient schedule. Coupled with the AAS and DT technologies, AI scheduling agents can be used for the real-time decision-making or for predictions [14, 15].

Multi-agent system (MAS) are systems that are used to compose many autonomous agents, where these agents interact to each other [15]. MAS provides decentralized and collaborative decision-making, where it allows the collaboration of different agents. Each agent in the MAS has some capabilities, decision-making abilities and takes decisions. MAS is used in order to solve complex problems where one agent is impossible to solve. The idea of dived and conquer is used to divide the problem into subproblems, where each agent solves a subproblem, and provides solutions that are adaptable, robust, and able to handle real-time uncertainties. MAS is also combined with DT, AAS, and AI scheduling agents. In a further analysis, AASs can enable interaction between different agents, where the agents can be models as different assets.

The contributions of this work are the use of the DT in order to accurately simulate and validate the AI agents that have been developed, as well as training some of the agents. Moreover, the use of the AAS technology to exchange data between the DT and AI agents within the MAS and finally the developed AI scheduling agents that were developed and modeled based on the bicycle industry’s requirements and challenges.

This chapter is organized in four sections, where the first section introduces the concepts of digital twins (DT), Asset Administration Shells (AAS), scheduling problem, and artificial intelligence (AI) applications. In second section are discussed related works. In the third section is explained the proposed MAS framework and explain the optimization tools that have been developed. In the fourth section is described the case study that the proposed framework is implemented. Finally, the last section is the conclusion of this work, where some future works are discussed.

2 Related Works

The manufacturing sector has experienced an evolution thanks to Artificial Intelligence (AI). AI offers innovative methods for increasing productivity, quality, and efficiency [16]. Due to its capacity to handle complicated issues, evaluate big amount of information, and make precise predictions, AI approaches have become utilized more and more in manufacturing. Numerous manufacturing processes, including quality assurance, preventative maintenance, supply chain management, and production scheduling, have benefited from the application of AI approaches [17].

Two of the most important tasks in the industrial sector are production planning and scheduling. For creating effective and efficient production plan or schedule, a variety of strategies, methods, and technologies have been developed and deployed. Production planning comprises considering what to do and how to do something in advance. Scheduling, on the other hand, entails allocating resources or manufacturing facilities to handle work orders. Effective production scheduling lowers production costs, boosts productivity, and lastly improves customer satisfaction. Due to their capacity to handle complicated scheduling issues and offer precise solutions, artificial intelligence (AI) systems have been gaining prominence in production scheduling. Machine learning (ML) is one of the most frequently used AI approaches [5]. More effective production planning and scheduling algorithms have been created using genetic algorithms, artificial neural networks, and reinforcement learning.

Heuristics is one of the approaches to solve the dynamic flexible job-shop scheduling problem [18]. Another popular technique, genetic algorithms, has been utilized in several research to improve production scheduling by describing the issue as a combinatorial optimization issue [19]. Nevertheless, the rise of Industry 4.0 has made ML techniques an attractive alternative to address manufacturing difficulties, due to the availability of data, powerful processing, and plenty of storage capacity. Neural networks and deep learning have gained more attention in recent years [20]. Additionally, reinforcement learning (RL), which uses experience to improve scheduling policies, has been proposed for production scheduling. The scheduling policy is represented in RL as a function that connects the state of the system at the moment with an action [21]. In conclusion, future research may concentrate on combining several AI techniques to develop production scheduling algorithms that are more potent and effective.

The digital twin (DT) interest is growing from both an academic and an industry perspective. However, the definition of that concept in the scientific literature lacks distinctiveness. DT provides virtual representations of systems along their lifecycle. Then, decisions and optimizations would be based on the same data, which is updated in parallel with the physical system [22]. DT can be briefly described as a framework or concept that combines the physical and real environment with the digital and virtual one, with the use of novel interconnection methods and technological innovations [23]. This physical to virtual connection for addressing real processes and assets to their digital representative ones can be characterized as twinning.

One of the main technologies used, in order to realize most of DT implementation approaches, is simulation [24]. As already mentioned, the idea of DT is to build a virtual version of a real system. This replica can be used to simulate and forecast how the physical system will respond to certain situations. Thus, one of the best methods to construct a virtual representation of the physical system seems to be simulation, which enables engineers to test and improve the system before it is built, lowering costs and increasing efficiency. Digital twin and simulation technology are being used more and more frequently in sectors such as manufacturing and aerospace, exhibiting their ability to completely change how complex systems are created and optimized [25].

Furthermore, digital twin implementation methods can support decision-making related to the scheduling task for a production system with potential uncertainties [26]. A crucial aspect for the development of a digital twin is the achievement of a high level of standardization and interoperability with systems outside the digital environment. The digital twin simulates some of the behaviors of the physical environment, and thus requires some kind of seamless information exchange with the physical entities and the information they provide. OPC UA is a standard that can provide standardization in the data exchange between the digital twin and production hardware, achieving real-time monitoring and control, interconnectivity, security, access control, while also data modelling and semantics [27].

The Asset Administration Shell (AAS) could be also used in order to standardize the description and management of assets. The digital twin technology can exchange information with the asset via the establishment of a common information language [28]. In addition, the AAS and OPC UA are complementary standards that can be both used to define the framework and protocol for that communication [29]; it is worth noticing that AAS is a collection of standards, namely IEC 62875, DIN SPEC 91345, IEC 62541 (OPC UA), and RAMI 4.0. In cases where the digital twin composes a higher level system such as a production line, a station, or a production system, it is usually composed of multiple assets and thus AAS models. From the digital twin side, the AAS can be the middleware for exchanging information with the assets or managing their behavior. It is important, however, to highlight that there is no standard way for describing an asset using the AAS; although the metamodel will always be the same, there is a freedom to the different submodels and submodel elements that will be selected in order to describe any different asset. It is thus usual to exploit additional information modelling standards or framework to define the specific components and information structures within the AAS metamodel – e.g., ISA-88, ISO 22400, and ISA-95.

Digital twin is not only about simulating the environment but also taking decisions over the next actions, which can then be used on the physical environment. Simulation on its own cannot address this issue, and AI agents is a way that this challenge can be solved. Multi-agent systems are preferred over centralized software components in cases where the problem is hard enough to be solved from a monolithic software component. It is a decentralized approach that breaks the problem into subproblems and each agent has access only to the subproblems compatible with its skills. In the case of production scheduling, this is a useful approach as it enables different types of scheduling problems being solved by different AI methods based on which method best satisfies the requirements. AI is a broad term and in scheduling in particular the most common methods are heuristic, metaheuristic, mathematical optimization, machine learning, reinforcement learning, and policy-making.

Mathematical optimization, also referred as mathematical programming, is an optimization model consisted of input sets and parameters, decision variables, constraints/expressions, and the objective function. Based on the constraints and the objectives, the model may be classified as linear, nonlinear, convex, integer, and mixed-integer problem, with different type of algorithms to optimize the objectives. As such, as important as the model, the algorithm that is used in order to find a both feasible and accurate solution is also crucial for the quality of the solution. The algorithms may be exact or heuristic-based, while metaheuristic methods are also popular for various optimization problems.

Heuristics have been deployed to solve various production scheduling optimization problems. A combination of constructive heuristics and iterated greedy algorithm was used to solve the distributed blocking flowshop scheduling problem (DBFSP) and lead to makespan minimization [30]. Montiel et al. (2017) proposed an approach for the stochastic optimization of mine production schedules with the use of heuristics, implementing iterative improvement by swapping periods and destinations of the mining blocks to create the final solution [31]. Heuristics can also be successfully deployed to optimize the scheduling task, aiming at reducing total energy consumption [32]. Jélvez et al. (2020) worked for a new hybrid heuristic algorithm to solve the Precedence Constrained Production Scheduling Problem (PCPSP) for an open-pit mining industry [33].

Heuristic and metaheuristic algorithms focus on an intelligent search along the solution space, which does not ensure the quality of the solution, and in complex optimization problems require flexible time delays. Deep learning methods, on the other hand, do not depend on searching the solution space, but rather predicting the solution based on patterns from historical information. Although in most cases the results are guaranteed to be fast, it is not necessarily of high quality. In reality it depends on the deep learning model that was used, the dataset quality and quantity. In some cases, there is also a dataset shortage which makes the problem even more difficult to solve. In practice, researchers may address this problem via the utilization of a system digital replica which is able to simulate the behaviors of the actual system in a realistic manner. This can support the development of either reinforcement learning methods that use the simulation as a reward retrieval plugin or for extracting artificial dataset that can then be used in supervised learning models to learn and adapt to the actual system implementation. Especially, deep reinforcement learning has showed great potential in recent years in dealing with complex scheduling optimization problems. Researchers have focused on the implementation of deep reinforcement learning techniques for production scheduling-related problems where there is lack of data, and the problem appears high complexity. The Job-Shop Scheduling Problem (JSSP) is one of the most common optimization problems related to production scheduling that the scientific community has tried to solve with the application of deep reinforcement learning. Zhang et al. (2020) developed a deep reinforcement learning agent, able to select priority dispatch rules to solve the JSSP [19]. Liu et al. (2020) followed a similar deep reinforcement learning approach to solve both the static and dynamic JSSP [34]. Rather than only solving the JSSP, there have been also solutions for the optimization of the whole production system with the use of deep Q-learning, a very popular deep reinforcement learning technique in the last decade [35].

While all the technologically innovative techniques have helped to develop smarter and more efficient systems and tools, these solutions could also be integrated in an efficient way in the actual production system through a digital twins (DT) and can help in integrating such solution to increase productivity. Villalonga et al. (2021) proposed a framework for dynamic scheduling with the use of digital twins to represent actual production assets in order to enhance decision-making [36]. Zhang et al. (2021) use the digital twin concept to gather real-time data from the shop floor and realize an effective dynamic production scheduling [37]. To achieve real-time decision-making, the implementation of a digital twin appears a great potential, since uncertain and dynamic events are addressed effectively. Dynamic interactive scheduling method can be enhanced and strengthened by the use of DT [26, 38]. However, digital twin concept can also be implemented to support production scheduling in an offline mode, such as the offline simulation of a production system. This gives the ability to train scheduling agents in more dynamic environments and respond to uncertainties even when they have not yet been identified. Nevertheless, a main challenge in implementing production scheduling solutions and digital twins is the lack of a well-defined data model. A solution to this issue can be offered by the Asset Administration Shell (AAS) concept. AAS is basically a method to represent data in a defined architecture [13, 39]. While in other problems there is some effort made by the literature to implement AAS concept, in production scheduling it is not explored.

The need to explore and address well-defined standards for production optimization agents is clearly revealed when there is a need for cooperation between different production agents, in order to formulate a multi-agent system. Researchers from a variety of fields have given multi-agent systems (MASs) a great deal of attention as a way to break down complicated problems into smaller jobs. Individual tasks are assigned to agents, which are autonomous entities. Using a variety but well-defined inputs, each agent chooses the most appropriate plan of action to complete the task [40]. Agents make decisions based on the information provided in the environment they are integrated and choose their actions proactively or reactively [41]. In manufacturing, multi-agent systems have gathered attention of many researchers during recent years. MAS can limit the complexity of order scheduling in production systems through a cooperative multi-agent system for production control optimization [42]. A similar approach was followed for the implementation decentralized scheduling algorithms in a test-bed environment [43]. A scheduling strategy to assist a manufacturing system experiencing learning and forgetting was supported by a multi-agent system to carry out the scheduling tasks in conventional production systems in close to real-time, and a simulation was utilized for validation [44].

While the multi-agent systems implementation methods have been explored in recent years, further investigation to address challenges is required. For example, the use of standards in a scheduling multi-agent system is something crucial, in order to develop systems that could be easily transformed to a “plug & play” application. In addition, agents that control or implement different applications and software should follow a hierarchical implementation to achieve better multi-agent system utilization and agents’ distribution. Lastly, if external applications are controlled through a multi-agent system functionality, Application Programming Interface (API) and standards are almost inevitable for the proper scheduling MAS integration for the actual production system. The implementation of the scheduling multi-agent system proposed in this work addresses the aforementioned issues and gives the opportunity for a more flexible implementation of scheduling algorithms, with different functionalities and heterogenous optimization techniques.

3 Multi-Agent System Framework

3.1 System Architecture

The architecture presented in Fig. 1 merges numerous Industry 4.0 technologies within a single framework with the goal of creating quality decision-making support for the production manager in his/her daily tasks. Specifically, there is used a (1) user interface for production manager interaction, (2) a multi-agent system for decentralized production scheduling, (3) a production digital twin for performance validation, and (4) Asset Administration Shell concept for the description of production information and agents as assets within the I4.0 environment.

Fig. 1
A workflow diagram. The physical production system interacts with the planning user interface, which consists of the digital twin, multi-agent system, and production orders. These interact with the simulator A P I and M A S A P I modules and the A A S platform.

Framework Architecture of all modules and their interactions

The first aspect of the proposed framework is defining the information exchange mechanisms and the corresponding information model to pass data over the different components. This is one of the interoperability issues associated with enterprise software as it is usual to utilize different information format and structures for the same information context. In this architecture the AAS is used in order to represent production information, such as work orders, process plan, and production resources. However, AAS is a metamodel, and although it may specify some abstract modelling objects and interaction mechanisms, it does not specify the detailed model to be used for the description of the asset. In other words, there could be more than one AAS descriptions for the same asset, structuring differently the same asset components and behaviors. To this end, there is a whole other topic of choosing the “right” information model for describing the production data so that it is achieved standardization over the information exchange. However, this is not within the scope of this framework, and although the AAS is used for exchanging information between ERP software and the agents, the underlying model is not standardized.

As displayed in Fig. 1, information from the enterprise resource planning (ERP) are described within AASs for the corresponding work orders that the manager is called to satisfy within the following production period. This type of information is restored from the user interface (UI), allowing the user (in this case the production manager) review the workload of the upcoming days. The connection between the AAS and the ERP is performed via an ERP-to-AAS connector so that the proposed UI platform depends on the AAS model rather than the specific ERP information model structure. The UI rather than visualization of production information, it is also an enabler for interaction of the user with the MAS as well as the production digital twin. It is important to highlight that, unlike other systems, the integration of decision-making results to the actual system is not a trivial task. In practice human interferences is required to review and apply the production plan.

The exchange of information between the UI and the MAS is achieved via a MAS API, which is in practice a way of passing and receiving data regarding the production workload and status. The MAS is responsible for handling the data and provide scheduling decisions for the user given the current production scenario. There are multiple AI agents that were developed to address this problem each one giving its own benefits for the user. The reason for using more than one agent for a scheduling problem arises due to complexity of the problem, the user requirements, as well as the problem itself. Scheduling problems are widely diverse with respect to the environment, constraints, objectives, and equivalently the optimization methods are usually compatible with a small portion of the overall set of scheduling problems available. To this end, there cannot be a monolithic approach capable of addressing all production scheduling problems without lacking on satisfying the user requirements. In order to address this issue, there was proposed the concept of a meta-scheduling agent, which in practice was a compound of multiple AI scheduling agents each one providing different optimization attributes.

The AAS was developed for the description of agent, which was retrieved by the MAS framework in order to deploy the corresponding entities and bring the algorithms to life. The AAS model for the meta-agent was consisted of a toolbox of optimization methods, with the description of connection dependencies as well as capabilities and skills provided by the specific method. During the initialization, there were spawned individual entities within the MAS, each one carrying a specific set of skills (operations) corresponding to the AAS operations. It is important to highlight though that the deployment of the agent within the MAS with the actual algorithm runtime may differ. Specifically, the MAS is operating within a single framework, which is usually a local installation of all the partial components, and in this case the deployment of the algorithms is better to be remote. Figure 2 illustrates this aspect for an example case of a scheduling agent. It can be displayed that a scheduling agent AAS may contain more than one scheduling methods, which are spawned as individual agents within the MAS framework. On top of that a meta-scheduling agent is spawned within this framework in order to support the scheduler selection and orchestration process within the MAS. The scheduling algorithms, however, may be deployed in different remote servers depending on the case. While a scheduling operation is requested from one of the schedulers, the AAS interfaces support the communication between the agent installation and the actual algorithm.

Fig. 2
A workflow model presents the A A S platform, which consists of schedulers 1, 2, and 3. They interact with the service modules 1, 2, and 3 and agent schedulers via initialization and call schedulers 2 and 3 in the M A S framework.

MAS implementation based on the AAS description for the agent, showing interaction between agents and services/schedulers as well as the agent spawn procedure

In the previous architecture, it is important to clarify the need for the meta-agent as well as the requirement for generating multiple agent entities within the MAS framework. In essence, the notion of an agent, as an independent entity, is useful when there is achieved some kind of communication within a network of agents. This type of communication is achieved usually via a MAS framework implementation, which allows all the messages exchange and events to be broadcasted between the inner entities seamlessly. As such, a MAS framework facilitates the interaction between the agents; however, the implementation of the agent logic does not have to be within the same software component as the MAS. This is because, usually the MAS is a unique software component with all of its agents and events operating within the same software container. It thus makes sense not to include complex computational process (such as scheduling) within the same resources.

To this end, the actual optimization processes are kept aside from the agent interfaces within the MAS. However, the reason that the scheduling agent AAS that contains multiple scheduling methods is not spawned within the single agent in the MAS is dues to easier management of the different scheduling operations. Although this is most of a design decision, it is easier to distribute a network of agents, each one responsible for a specific scheduling method, because on the contrary side, all scheduling requests independent of method selection would flow through the same agent, making it less efficient to work with two different scheduling requests in parallel. The meta-agent is thus present to support the selection of the algorithm based on the scheduling problem and allocate the optimization process to the different agents. In practice, this specific agent is aware of the different scheduling methods available within the system and is capable to analyze the request before selection.

In order to accurately assign the scheduling problems to the scheduling algorithms, there were used a problem classification method based on three notations: environment, constraints, and objectives. This notation method is widely used in the description of scheduling problems and has the ability to classify any type of problem. The environment expresses the production system equipment and process flow, the constraints express the job-related characteristics or specific equipment/buffer requirements, while the objectives have to do with the criteria that the scheduler need to optimize. The following are some examples for each case:

  • Environment: job shop, flexible job shop, parallel machines, single machine, flow shop, flexible flow shop, conveyor line, batch machine (e.g., oven), and so on.

  • Constraints: job release time, block (no buffer capacity before the machine), deadline, sequence-dependent setup time, recirculation, stochastic processing time, and so on.

  • Objectives: makespan, flowtime, tardiness, energy consumption, and so on.

In order to classify the problem based on these notations in an automatic way, the meta-agent was enriched with different rules (per characteristic per notation) in order to check whether the specific type of problem complies with these conditions. For example, identifying a Job Shop schedule has to contain exactly one route per product and no alternatives. As a results, in cases that the agent was given with a schedule request that did not specify the type of scheduler to use, these rules were applied and the scheduler that complied with the rules was selected. In some cases, more than one scheduler would comply with the rules and more than one responses may be produced.

It is also important to highlight that within a request there was used a specific information structure for providing the production data and similarly the scheduling outputs were contained within a specific scheduling response. The structure of the information may vary based on the implementation and thus it was not specified within this section. There are different alternative standards also to be used, while in some cases a specific ERP data model could be also utilized. In any case, this is another important aspect that is not specified within the chapter. However, the methodology remains the same, with the exception that the problem classification should be applied to a different model.

The digital twin was the final component of the architecture and ensured that information is validated in a close-to-reality scenario and the system performance is approved by the user. The production schedule was received by the MAS and then sent (on-demand) to the digital twin in order to calculate its performance (see Figs. 3 and 4). This step was executed before the schedule was displayed in detail to the user as there could be one of multiple competitive schedules available for a single case from different schedulers. The reason for using a digital representation of the production system was to give the ability to the user to evaluate the resulted schedule.

Fig. 3
A simulation model represents the layout of the wheel assembly line. The parts storage leads to M O 1, M O 2, M O 3, and M O 4, followed by assembly lines 1, 2, 3, and 4, T F 1, T F 2 1, T F 2 2, T F 3, T F 4 1, and T F 4 2, which further lead to the warehouse.

Simulation model of the wheels assembly department utilized as a digital twin to apply the schedule outcome from the MAS and observe performance

Fig. 4
A simulation model represents the layout of the painting department. It consists of a hanger station, preparation of components, and buffers, followed by 12 decal stations and a coating line. From there, the components are sent to the warehouse.

Simulation model of the painting department utilized as a digital twin to apply the schedule outcome from the MAS and observe performance

3.2 Paint Shop Scheduling Agents

3.2.1 Mathematical Optimization

The paint shop scheduling agents were designed in order to be able to give solution to the Paint Shop Scheduling Problem (PSSP) as it can be found in the literature. This problem addresses the sequence of the items entering the painting line of the factory in order to optimize the performance indicators. This problem is different from other scheduling problems as it usually encounters higher detail in the combination of items and sorting before entering the line. The line itself is usually a moving conveyor of carriers with some specific spatial constraints, setup delays due to color, and a constant speed. The objective is to find the optimal combination of the items within the “bill of material” of the products and sequence them in order to comply with the desired performance.

Figure 5 illustrates the PSSP in a simplistic way. As it can be seen the goal is to create a schedule – sequence and combination of items – for entering the painting line so as to create the maximum utilization of the line, which will reflect in reducing the makespan for the system. There are some requirements, however, that the decision-making system needs to comply with in order to be in line with the physical characteristics of the system. The following aspects were taken into consideration:

  • The conveyor speed is constant and the carriers are equally spaced along the line. This ensures that the input and output rate of the line is also constant.

  • Each carrier has a unique capacity (100%) similar of all the carriers that cannot be exceeded.

  • Items with different color cannot be placed within the same carrier. This is because in most cases the items are painting all together within the painting cabins.

  • In cases where two consecutive colors are placed within the line there should be a setup delay, expressed in empty carriers so that the operators have the time to setup the new color.

  • Each item type occupies a specific percentage of the carrier and can be mixed with others as long as the max capacity is not violated.

  • In cases where an item cannot fit into one container/carrier, then it will be used the next consecutive carrier in order to hold the remaining capacity of the item. It is made the exception, however, that no item needs more capacity than two carriers.

Fig. 5
A 3-D model diagram features a conveyor carrying items from the input buffer unit to the output buffer unit. The items are evenly spaced along the conveyor line. The output speeds of the items are labeled 0.2 and 0.01 items per second.

PSSP graphical representation

Based on the proposed conversions, the following mathematical formulation can be created:

Sets:

  • P Set of production orders that need to be painted

  • I Set of different items (types) that need to be painted

  • C Set of color codes

Parameters:

  • qp, i Quantity of items i ∈ I included in production order p ∈ P, qp, i ∈ ≥0

  • ci Capacity a carrier required in order to carry item i ∈ I; ci ∈ >0

  • dp, p' Setup delay required between the items of two different production orders (p, p' ) ∈ P, dp, p' ∈ ≥0

Auxiliary variables:

  • si Defines whether item i ∈ I has a size higher than one carrier, si ∈ {0, 1}

  • ac, t Defines whether this color c ∈ C has entered the line on time t ∈ ≥0, ac, t ∈ {0, 1}

  • ep, t Defines whether this item i ∈ I has entered the line on time t ∈ ≥0, ep, t ∈ {0, 1}

Decision variable:

  • xp, i, t The number of items i ∈ I from product p ∈ P that will enter the line on time t, xp, i, t ∈ ≥0,

Counters:

  • nt Number of timesteps available in the schedule

  • ni Number of item types in I-set

  • np Number of products in P-set

  • nc Number of colors in C-set

Constrains:

First, in a feasible schedule, we need to ensure that all items enter the resource exactly once at some point during production time. This can be covered from the following linear equality:

$$ \sum_{t=0}^{\infty }{x}_{p,i,t}={q}_{p,i}\bullet \left(1+{s}_i\right),\forall p\in P,\forall i\in I\vspace*{-15pt} $$
$$ \textrm{Number}\ \textrm{of}\ \textrm{constrains}:{n_p}^{\ast }\ {n}_i $$

Limitation for not allocating into the same carrier more than the items that can hold based on its capacity can be achieved via the following inequality:

$$ \sum_{\forall p\in P}\sum_{\forall i\in I}{c}_i\bullet \left(1-{s}_i\frac{1}{2}\right)\bullet {x}_{p,i,t}\le 1,\forall t\in {\mathbb{Z}}_{\ge 0}\vspace*{-15pt} $$
$$ \textrm{Number}\ \textrm{of}\ \textrm{constrains}:{n}_t $$

In addition, for cases when items require more than one carrier, it needs to be placed in two consecutive carriers. This can be ensured by the following nonlinear expression:

$$ \sum_{t=0}^N\sum_{\forall p\in P\ }\sum_{\forall i\in I}{s}_i\left({x}_{p,i,t}\bullet {x}_{p,i,t+1}\right)\ge 1\vspace*{-15pt} $$
$$ \textrm{Number}\ \textrm{of}\ \textrm{constrains}:1 $$

In order to transition to the linear version of the above expression, we start from the logical expression:

$$ \left({x}_{p,i,t}>0\right)\wedge \left({x}_{p,i,t+1}>0\right)\vee \left({x}_{p,i,t-1}>0\right) $$

Then, we define two auxiliary variables to carry the outputs of the above logical expressions:

$$ {z}_{p,i,t}=\left({x}_{p,i,t+1}>0\right)\vee \left({x}_{p,i,t-1}>0\right)\vspace*{-15pt} $$
$$ {y}_{p,i,t}=\left({x}_{p,i,t}>0\right)\wedge {z}_{p,i,t} $$

Then, we use the linear expressions for ∧ and ∨ operators:

$$ \left.\begin{array}{c}{z}_{p,i,t}\ge \kern0.5em {x}_{p,i,t+1}\\ {}{z}_{p,i,t}\ge {x}_{p,i,t-1}\\ {}{z}_{p,i,t}\le \kern0.5em {x}_{p,i,t+1}+{x}_{p,i,t-1}\\ {}{y}_{p,i,t}\le {x}_{p,i,t}\\ {}{y}_{p,i,t}\le {z}_{p,i,t}\\ {}{y}_{p,i,t}\ge {x}_{p,i,t}+{z}_{p,i,t}-1\end{array}\right\}\forall p\in P,\forall i\in I,\forall t\in {\mathbb{Z}}_{\ge 0} \mid {s}_i>0 $$
$$ \textrm{Number}\ \textrm{of}\ \textrm{constrains}:6{n_t}^{\ast }\ {n_p}^{\ast }\ {n}_i $$

In addition, when changing color between subsequence items the setup delay must be applied. This can be achieved by the following linear inequalities:

$$ {a}_{c,t}\ge \frac{\sum_{\forall p}\sum_{\forall i}\left({x}_{p,i,t}\bullet {f}_{p,c}\right)}{\sum_{\forall p}\sum_{\forall i}{\textrm{q}}_{p,i}},\forall t\in \left[0,{\mathbb{Z}}_{\ge 0}\right],\forall c\in C\vspace*{-15pt} $$
$$ {a}_{c,t}+{a}_{c\hbox{'},t\hbox{'}}\le 1+\frac{\left|t-{t}^{\prime}\right|}{d_{c,{c}^{\prime}}+1}\forall \left(c,{c}^{\prime}\right)\in C,\forall t\in {\mathbb{Z}}_{\ge 0},\forall {t}^{\prime}\in \left[t,t+{d}_{c,{c}^{\prime}}\right] \mid c\ne {c}^{\prime}\vspace*{-15pt} $$
$$ \textrm{Number}\ \textrm{of}\ \textrm{constrains}:{n_c}^{\ast }\ {n}_t+{n_t}^{\ast }\ {\left(\textrm{delay}\right)}^{\ast }\ {\left({n}_c\kern0.5em \hbox{--} \kern0.5em 1\right)}^2 $$

The same constraint can be achieved via the following nonlinear equation:

$$ \sum_{t=0}^N\left({a}_{c,t}\bullet \sum_{t\hbox{'}=t}^{t+{d}_{c,{c}^{\prime}}}\left(1-{a}_{c\hbox{'},t\hbox{'}}\right)\right)=0,\forall \left(c,{c}^{\prime}\right)\in C \mid c\ne {c}^{\prime}\vspace*{-15pt} $$
$$ \textrm{Number}\ \textrm{of}\ \textrm{constrains}:{\left({n}_c\kern0.5em \hbox{--} \kern0.5em 1\right)}^2 $$

Objective function:

Ltotal The total flowtime of the production:

$$ {L}_{\textrm{total}}=\sum_{t=0}^{\infty}\left(t\bullet {a}_{c,t}\right) $$

Lweighted The total weighted flowtime of the painting line:

$$ {L}_{\textrm{weighted}}=\sum_{t=0}^{\infty}\sum_{\forall i\in I}\left(t\bullet {w}_p\bullet {e}_{p,t}\right) $$
$$ \textrm{where}\kern0.75em {x}_{p,i,t}\le {q}_{p,i}\bullet {e}_{p,t},\forall p\in P,\forall i\in I,\forall t\in {\mathbb{Z}}_{\ge 0} $$

λi, k The output (production) rate for an item type in a specific interval can be defined as a moving average in the series of allocations for an item:

$$ {\lambda}_{i,k}=\frac{\sum_{t=k\ L}^{\left(k+1\right)L}\sum_{\forall p\in P}{x}_{p,i,t}}{L},k=0,1,2,..,{n}_t $$

Figure 5 shows where the requirements for this objective come from, and what implications could come from missing to apply this objective. It is clear that in this example, missing to produce the items at the average rates that are departed from the buffer afterwards will cause an overflow, and in this case it is illustrated that the circle needs to be at a much slower output rate than the cube item.

The ways that this can be applied are more than one, specifically the user may require this to be a constrain to the scheduler, by means that at no times this rate is exceeded, which can be applied by the next inequality, or via the objective function trying to approach a specific value, yet this does not necessarily ensure that this value will not deviate in the final results.

$$ {\lambda}_{i,k}\le {\lambda}_i^{\textrm{desired}},k=0,1,2,..,N,\forall i\in I $$
$$ \min \left\{\sum_{\forall i\in I}\sum_{k=0}^{\infty }{\left({\lambda}_{i,k}-{\lambda}_i^{\textrm{desired}}\right)}^2\right\} $$

The above mathematical formulation is a very complex optimization problem to solve for a real-scale production problem. As shown in the results section, real-scale industrial problems may require scheduling up to 20,000 items from different orders, colors, and types, making the problem extremely difficult to solve in a considerably short time frame. Thus, following the previous mathematical formulation, three different versions were formed, each utilizing the expressions presented above differently.

  • The first model is the nonlinear version (MINLP) of the problem, which applies the nonlinear inequality constraints presented above. This allows lower number of constrains (thus lower memory utilization), but a very complex solution space that most of the times requires more sophisticated optimization algorithms and more demanding computational delay.

  • The linear version (MILP) was also considered where only the linear constrains are utilized, improving the computational demand but increasing the requirements for memory utilization in computational resources.

  • The last one was a simpler form of the linear version (two-stage MILP) in which constrain #4 is removed from the model, running the optimization only for mixing the items of order that acquire the same color. This allows a much faster response horizon, since there were no setup constrains to apply in the schedule. In a second stage, once the allocation of items is achieved, the optimization process is repeated, but this time it schedules the sequence of colors as a function of minimizing the setup delay. In this way, the model manages to reduce the solutions space and the constrains limitations. The problem with this model, however, is that it decreases the flexibility of the solution as a trade-off for lower CPU time because it is not capable of providing good solutions for the production rate issue.

In Fig. 6, the implications of the different models on the CPU duration of the computational resources is clearly shown. The graphs are in a logarithmic scale in the Y-axis and is clearly shown that both MILP and MINLP cannot outperform the simplified-MILP, which can cover up to a very high number of items (100 orders are usually 18,000 items) in a relatively short period of time (20 min).

Fig. 6
A line graph plots duration versus the number of orders. The lines are plotted for simplified linear M I P, linear M I P, and non-linear M I P. The lines for linear and non-linear M I P follow a steep, increasing trend, while the line for simplified linear M I P follows a gradual rising trend.

Actual diagram from example case displaying the CPU delay differences of the modeling approaches as the number of orders increase

Indices:

  • Discrete sets: np, ni, nc, nt

  • Decision variables: np2 + ni (np + 1)

  • Input variables: np2 + ni (np + 1)

  • Auxiliary variables: ni + nt (ni + nc)

  • Constrains: np ni + 6nt ni np + nc nt + nt d (nc + 1)

3.2.2 Data-Driven Optimization

In order to avoid this long CPU delays and demanding RAM utilization, the utilization of data-driven (i.e., ML) approaches was investigated in order to rather predict the output of the scheduler. First, a feed-forward neural network (FFNN) was developed, which uses as an input information over the workload data (i.e., orders, items, colors, and sizes), as well as produces the sequence of orders/items allocated into the painting line. The input layer to the model was based on the same parameters that were used for the mathematical formulation of MIP models, resulting in the following encoded input vector (\( \overline{x} \)), while also the decision variable xp,i,t was described the output vector (\( \overline{y} \)) (Figs. 7 and 8):

Fig. 7
An illustration represents the transformation of the table into a 1 D vector using an input layer encoding mechanism for variables, p 1, p 2, and p 3. The values of all 3 are in a single array with 24 cells.

Input layer (vector) encoding mechanism displaying an example for how the tables are reshaped into a single dimensional vector

Fig. 8
An illustration represents the transformation of the table into a 1 D vector, which presents the output encoded by allocating per product per item type per timestep for the variables, p 1, p 2, and p 3. The values of i 1, t 1, p 1, p 2, and p 3 are in a single array with 27 cells.

Output layer (vector) encoding mechanism displaying the output can be encoded into the allocation per product per item type per timestep

$$ \overline{x}=\left[\left(\right(\left(\boldsymbol{q}\left[p\right]\left[i\right]\forall i\in I\right),\left(\boldsymbol{f}\left[p\right]\left[c\right]\forall c\in C\right)\left)\forall p\in P\right),\boldsymbol{c}\left[i\right],\boldsymbol{d}\left[c\right]\left[{c}^{\prime}\right]\right] $$
$$ \overline{y}=\left[\boldsymbol{x}\left[\forall p\right]\left[\forall i\right]\left[\forall t\right]\right] $$
$$ {L}_{\overline{x}}={p}_{\textrm{max}}{i}_{\textrm{max}}+{p}_{\textrm{max}}{c}_{\textrm{max}}+{i}_{\textrm{max}}+{\left({c}_{\textrm{max}}\right)}^2 $$
$$ {L}_{\overline{y}}={i}_{\textrm{max}}\ {t}_{\textrm{max}}\ {p}_{\textrm{max}} $$

\( {L}_{\overline{x}} \) and \( {L}_{\overline{y}} \) shown in the above equation represent the dimensions of the input and output layers, respectively. In contrast to the model-based methodologies, neural networks are consisted of a static number of I/O parameters, which contradicts to the arbitrary scale of the scheduling problems. In order to address this issue, the encoded input considers a prefixed maximum number of orders (pmax), items (imax), color codes (cmax), and production duration (tmax). For cases where less than this maximum number is provided as an input, the encoder generates additional orders so as to fulfil the I/O layers of the neural network (NN) although sets the items quantity to zero, which will have no effect on the allocation process; for cases with more than the maximum numbers, the model is unable to encode the input. The number of neurons per layer and the total number of trainable parameters for the whole NN model are provided by the following formula:

$$ \textrm{neurons}\#={2}^k{L}_{\overline{x}} $$
$$ \textrm{params}\#={\sum}_{\forall k:\textrm{layer}}\left[{2}^k{L}_{\overline{x}}\left({2}^{k-1}{\textrm{L}}_{\overline{y}}+1\right)\right] $$

Another data-driven approach has been developed that treats the scheduling output as a time series from which the next allocations can be predicted based on known previous values of the sequence. As such, the whole production schedule can be generated in a recursive manner, reducing the model’s prediction variables which improves the accuracy as well as avoiding any limitations regarding the problem’s scalability. Similar to the above-mentioned approach, the ultimate objective is the prediction of xp,i,t, for all the given orders, items, and timesteps; however, in this model a prediction is only applied for one timestep and is repeated for all the output sequence. The input features of the LSTM neural network consist of a dynamic and a static part. The dynamic part as presented below are the features that change as moving in the time axis.

The allocation of all items over a specific timestep (carrier) is given by the following vector:

$$ \overline{\boldsymbol{x}}\left[{t}_i\right]=\left[\boldsymbol{x}\left[\forall p\right]\left[\forall i\right]\left[{t}_i\right]\right],{t}_i\in \mathbb{N} $$

The following defines a variable that provides the number of remaining items of an order at that given timestep, given the sequence of previous allocation selections:

$$ \boldsymbol{Q}\left[p\right]\left[i\right]\left[{t}_i\right]=\boldsymbol{q}\left[p\right]\left[i\right]-\sum_{t=0}^{t_i}\boldsymbol{x}\left[p\right]\left[i\right]\left[t\right] $$

Given the above formula, the following vector is defined:

$$ \overline{\boldsymbol{Q}}\left[{t}_i\right]=\left[\boldsymbol{Q}\left[\forall p\right]\left[\forall i\right]\left[{t}_i\right]\right],{t}_i\in \mathbb{N} $$

Moreover, similar to the feed-forward NN model, some static information of the workload revealing the colors, sizes, and setup delays must also be provided. The final configuration of the input layer is shown in Fig. 9.

Fig. 9
An architecture presents the L S T M R N N input and output layer design overview, 4 input derivations, and 3 output representations on line graphs.

Overview of LSTM RNN I/O layers design and how the specific input is derived as well as how the output is represented

Figure 9 presents the format of the I/O model, which is required for the LSTM model. Unlike MIP models, this method arises a problem in defining the first allocations (\( \overline{\boldsymbol{x}}\left[0:L\right] \)) as it requires historical information of a window (L), which are not defined as the face a totally new schedule request. This problem is more apparent in the training procedure as multiple scheduling results from different workloads merged together into a single sequence to train the LSTM model. This issue was addressed by adding L number of timesteps in the beginning of each schedule, where L is the window of previous allocations that the model is using for the prediction. These timesteps contained zero allocations of items and were responsible only for fulfilling the input layer of the LSTM neural network model (Fig. 10).

Fig. 10
Five line graphs plot the number of items versus time. Graphs a, c, d, and e have a fluctuating rectangular wave pattern between 0 and 20 seconds and remain flat thereafter. Graph b plots fluctuations between 25 and 30 seconds and remains flat in rest times.

Each graph (row) shows the total number of allocations from a product over time. Each graph contains two lines for display purposes

3.3 Deep Reinforcement Learning Scheduling Agent

The deep reinforcement learning (DRL) agent was selected to solve the dynamic scheduling problem (DSP). According to Chien and Lan [45], the DSP is susceptible to a number of uncertainties, including machine failures, the arrival of new, urgent jobs, and changes to job due dates. In the literature there are several articles on the DSP [46,47,48,49]. DRL agent is also combined with DNNs and deep Q-network to approximate of a state action value function [50, 51]. The proposed DRL agent is combined with a discrete event simulator in order for training and testing the DRL model. In details, the DES that was used is the Witness Horizon from Lanner [52]. The DRL and DES communicated via API, where the API is provided by Witness Horizon. In addition, except from API files, text files were used to exchange data among the DES and DRL (see Fig. 11). The concept that was used for the DRL agent is to purpose task allocation to resources via the use of dispatch rules. In the literature there are several research works that study the use of RL agent combined with dispatch rules [53].

Fig. 11
An illustrated flow diagram represents the cyclic operation of the job allocations by the D R L agent in the discrete event simulation, which sends back the production status to the D R L agent connected with U I.

DRL agent operation architecture

The Q-learning is an off-policy temporal difference algorithm and is based on the idea of Q-function [54]. In the following equation, the Qπ(st, at) is the expected return of the discounted sum of rewards at state st by taking action at:

$$ {Q}^{\pi}\left({s}_t,{a}_t\right)\kern0.5em =\kern0.5em {\max}_{\pi }E\ \left[\ {r}_{t+1}+\gamma\ {r}_{t+2}+{\gamma}^2\ {r}_{t+3}+\dots |{s}_t=s,{a}_t=a,\pi \right] $$

The main concept of the Q-learning is to use the Bellman equation as a value iteration update. The agent in a decision point t in a state st ∈ S selects an action at ∈ A according to a policy π. Taking the action at the agent gets to a state st + 1 with transition probability p(st + 1|st, at) ∈ P(S × A → S) and reward rt ∈ R. Additionally, γ is a discount factor at each timestep t. Also, a is a learning rate, where 0 < a ≤ 1. The objective for the agent is to find the optimal policy π that maximizes the expected sum of rewards. The Q-leaning has some limitations when the environment is huge. For that reason, the deep Q-network (DQN) concept was used. Coupled RL with deep learning techniques Q-tables can be replaced with Q-function approximator with weights [55]. In order to solve the DSP problem, due to the fact that the environment is huge, DRL DQN concept was used. Let us denote as Q(s, a; θi) the approximate value using deep convolutional neural network. Additionally, the ψi are the weights at iteration i of the Q-network. The experiences are denoted as et = (st, at, rt, st + 1) where each time t are stored to a dataset Dt = {e1, …, et}. Chosen uniformly at random an instance from the pool of stored instances, a Q-learning update is applied of each experience (s, a, r, s)~U(D).

$$ {L}_i\left({\theta}_i\right)={E}_{\left(s,a,r,{s}^{\prime}\right)\sim U(D)}\ \left[{\left(r+\gamma\ {\max}_{a^{\prime}}Q\left({s}^{\prime},{a}^{\prime};{\theta}_i^{-}\right)-Q\left(s,a;{\theta}_i\right)\right)}^2\right] $$

θi are the weights of the Q-network at the iteration i and \( {\theta}_i^{-} \) are the network weights used to compute the target in iteration i. Target network parameters (\( {\theta}_i^{-} \)) are updated with the Q-network parameters every c step, where c is a constant number.

The state is a tuple of feature that characterizes a given input. This chapter contains the stats of the resources (down, busy, and available), the stats of the tasks (waiting, pending, on-going, and finished), and finally a list with the quantities or the product orders. Moreover, an action describes the dispatch rule that is selected by the DRL agent to propose the task allocation over resources.

3.4 Heuristic Optimization

The hierarchical scheduler is a decision-making module for extracting an efficient order of required tasks [56]. The problem that the scheduler solves is the resource allocation problem [57], where the problem seeks to find an optimal allocation of a discrete resource units to a set of tasks. The heuristic algorithm is based on the scientific research of [58]. It is based on the depth of search concept, except the number of layers for which the search method looks ahead. The main control parameters are the decision horizon (DH), the sampling rate (SR), and the maximum number of alternatives (MNA). In each decision point, a decision tree is created based on the DH, SR, and MNA. Figure 12 shows the nodes A1…. AN that represents decision point where a task is assigned to an operator. The proper selection of MNA, DH, and SR allows the identification of a good solution. For example, it is proven in [59] that the probability of identifying an alternative of good quality (i.e., utility value within a range Δ with respect to the highest utility value) is increasing with the MNA and Δ. The pseudocode of the algorithm is defined as follows [60]:

Fig. 12
A tree diagram with a root node leads to a maximum number of alternatives. The first two layers of nodes are marked as the decision horizon from which a single node leads to a larger number of nodes labeled sample rate.

Search methodology example in tree-diagram showing the generation of different branches and layers based on the MNA, DH, and SR

Algorithm:

Three adjustable parameters, MNA, SR, DH

Initialize: MNA, SR, DH

while full schedule is not generated

  Generate MNA-alternative-branches of allocations for DH-steps in the future.

  for each branch in alternatives:

   Generate SR sub-branches of allocations from DH-step and forward.

   Calculate average score of SR sub-branches on each MNA branch.

  Select alternative with the highest score.

  Store allocations of the alternative for up to DH-steps

return: best alternative

For each decision tree, the algorithm returns a list with valid task-resource allocations [61,62,63,64]. MNA and DH control the breadth and DH the depth of the search, respectively. On the other hand, SR is used to direct the search toward alternatives that can provide better quality solutions. Thus, the quality of the solution depends on the selection of the MNA, DH, and SR.

4 Case Study

The proposed multi-agent scheduling framework was implemented, validated, and evaluated for a case study from the bicycle production industry. For this work and the deployment of AI scheduling agents in a production environment capable of producing optimized long- or short-term scheduling, two departments were chosen: painting and wheels assembly department. As already mentioned, there were different types of scheduling agents. The purpose of using a multi-agent system for the deployment of various scheduling applications is twofold. The first reason is that with the realization of a multi-agent system, one can use an integrated solution without affecting the other entities of the production system. An algorithm can be developed separately, as a stand-alone application in a multi-agent system. The second reason is that with a multi-agent system, there is the possibility for automated cooperation between different applications to coordinate multiple assets or functionalities. To combine both benefits of a multi-agent system capable of solving different scheduling problems and combine its assets to solve more than one scheduling problems at one, the implementation method proposed in this work follows the deployment of different scheduling algorithms integrated in a multi-agent system.

The multi-agent system for the scheduling agents was developed with the use of JANUS, an efficient and flexible agent-oriented programming framework that gives the opportunity for easy and fast deployment of virtual assets. JANUS multi-agent system framework is compatible with the programming language SARL and also with JAVA. In this multi-agent system, there are four main concepts that need to be defined before the deployment of any agent: agents, events, capacities, and skills. The agent instance stands for all the operating sequences required for a specific batch of functionalities and operations to happen when the agent needs to operate. Agents’ communication and behavior is controlled by events, which are predefined patterns that allow all the agents in the framework to interact one with another. The term capacity refers to an abstract description for an implementation in skills, which is used to define reusable capabilities of agent patterns without defining implementation details. Lastly, the concept of skills is a manner of implementing a capability, which allows exchange and modification of implementations based on own or adapted skills without modifying the agent’s behavior or the template agent’s characteristics. To address the scheduling multi-agent system using the JANUS framework, the scheduling agents are modeled as agents in the JANUS framework, capable of spawning and operating under the control of a meta-agent, which is the orchestrator agent inside the multi-agent system. The scheduling agents have specific skills, related to the problem-solving algorithm and the meta-agent concept was integrated, in order to realize an automated and distributed cooperation of the different agents inside the multi-agent system, when there is a scheduling request. The user is able to interact with the multi-agent system in the backend of a UI, developed for the scheduling tasks visualization.

In practice, the meta-agent receives the scheduling request from the UI. This scheduling request is modeled in an AAS, as already described in previous sections, and the meta-agent is able to spawn the corresponding scheduling agent to solve a particular scheduling problem. A scheduling agent is the parent “class” in JANUS that implements events, skills, and methods, and can also consist of local variables. Each one of the scheduling agents accommodated three MAS events:

  • “Initialization,” where the scheduling agent has been spawned by the meta-agent during the initialization of the framework and waits for a scheduling request notification from the meta-agent. During the initialization of the agent, specific scheduling agent parameters are defined and initialized, able to serve a specific scheduling request type in the future.

  • “Scheduling request,” where the meta-agent is requested to notify the corresponding scheduling agent in order for the required scheduling computation to be performed. After this event call, specific skills and operations are performed by a scheduling agent in order for the scheduling algorithm to calculate the schedule.

  • “Schedule response,” where the output of the scheduling task is emitted to all other agents of the multi-agent system. When the scheduling agent finishes its operation, the event notices every other agent in the framework that can listen to this event.

When a scheduling request reaches the multi-agent system, the meta-agent is responsible for identifying the correct scheduler based on the request from AAS. This AAS also contains information that, in addition to the scheduling task information, will indicate the required scheduling agent, which is capable of performing the scheduling task based on some predefined characteristics. After the scheduling request, the meta-agent performs simple filtering in the provided information in the AAS to choose the corresponding scheduling agent. Each scheduling agent has its own input format. Since the JANUS meta-agent is responsible for the orchestration of the scheduling task, it will pass the information to the appropriate scheduling agent capable of performing the correct algorithm to compute the schedule.

As it was mentioned above, the reason for realizing a multi-agent system is that a scheduler can be developed as a stand-alone application. Hence, to give the ability to each scheduling agent to perform its scheduling skills without the development of the algorithmic part inside the multi-agent system framework, interfaces were utilized to perform the scheduling algorithms through the scheduling agents’ skills. Moreover, a REST API was used for the agents to be able to reach out the scheduler’s endpoint and pass information to the algorithm. On the other hand, since there is not a certain point in time that the resulted response is expected, RabbitMQ message exchange channel was utilized for the scheduler’s responses. Of course, this is a design decision, and other protocols can be used to pass information around the different entities.

In the case study, three different scheduling agents were implemented in the multi-agent system, each one with its own characteristics and functionalities. To validate the aforementioned multi-agent system implementation, the schedulers developed and utilized were the following: (1) heuristic multi-objective scheduling framework, (2) mixed integer programming (MIP) model optimizer for production scheduling [65], and (3) a deep reinforcement learning (DRL) scheduler for dynamic production scheduling. The first two agents were utilized to solve the scheduling problem for the painting department of the bicycle industry, while the third one was utilized to solve the scheduling problem of the wheels assembly department. The first two agents were deployed with the goal of optimizing the scheduling sequence of a painting line. The DRL agent was deployed with the goal of solving the dynamic scheduling problem of a production system with uncertainties included.

The scheduling agents were deployed in the multi-agent system to support the scheduling task of the bicycle production system. Nevertheless, since the application should be used by a production manager in an industrial environment, a UI was required. The UI was developed with the scope to showcase the scheduling task with all the mandatory assets in an efficient and user-friendly manner. The UI consists of the scheduling task formulation tab, where the production manager selects the orders that need to be scheduled and chooses the corresponding scheduler. There is also a feature where the user can run all the supported schedulers and compare the results before one actually can apply the schedule in the real production system. Results are shown in another tab, and this is a common table for the scheduled production orders. In addition, there is the opportunity for the user to show some production KPIs through the digital twin tab where a DES run of the resulted scheduling is performed.

To validate the whole framework performance, discrete event simulation (DES) was utilized. Two DES models were developed, representing the production environments of the two departments from a bicycle production system. These DES models were used to showcase the results of the scheduling request that the multi-agent framework handled, as well as for the actual operation of the agent for solving the dynamic scheduling problem. The heuristic and the MIP schedulers were deployed for the painting department whereas the DRL scheduler was deployed for the wheels assembly department. JANUS multi-agent system spawned all three scheduling agents when the necessary information for accessing them is provided within the AAS definition after the scheduling request formulated in the UI. As such, the user could choose any of the scheduling agents and, using the toolbox of schedulers provided in the UI, address similar or different kinds of problems. The user sent scheduling operations to the multi-agent system in an abstract manner without the need to specify the corresponding problem. After the scheduling request arrival, the meta-agent was responsible to spawn the required scheduling agent. Seamless integration between the SARL software and the individual schedulers was achieved.

The resulted framework implementation showed great potential in achieving multi-agent scheduling optimization. The UI (Figs. 13 and 14) allows the user to evaluate the resulted scheduling through the use of DES. Production KPIs are presented and through the evaluation of the system performance on each occasion, one can decide if the resulting schedule is efficient. Manual tests were made in collaboration with the production manager, and the results were validated for their accuracy and precision. Hence, the proposed scheduling multi-agent system implementation for the bicycle production industrial environment can effectively handle the workload distribution among its different scheduling agents in order to propose the most appropriate production sequence.

Fig. 13
A screenshot of the multi-agent system tab has an entry field provided for painting orders. The table below has details of the production number, quantity, and painting date, with checkboxes to select. The right side of the table lists different agents.

UI multi-agent system tab

Fig. 14
A screenshot of the digital twin tab has a line graph that plots parts versus date in an increasing trend. On the right side, a bar chart plots the values for the maximum parts per buffer.

UI digital twin tab

Table 1 summarizes the agent results from testing the framework over some real-life examples of the industrial use-case. The results do not directly translate on business KPIs and are the log from the schedulers. This is why the digital twin component is necessary to reflect how these solutions fit into the overall production scenario and inspect the performance.

Table 1 Table with results from the different agents used to solve the use-case scheduling problems

5 Conclusion

In conclusion, the multi-agent system (MAS), digital twin (DT), Asset Administration Shell (AAS) concept, and artificial intelligence (AI) technology are part of the Industry 4.0, and more and more researchers and industrial experts aim to combine these technologies. Digital manufacturing is an important step for industries and researchers, where there are many gaps and challenges to overcome. Digitalization will enable automation, increase efficiency, real-time decision-making, flexibility, and adaptability in industries. This work proposes a MAS framework that was developed for the bicycle industry using the concept of AAS, DT, and MAS for the production scheduling problem. A mathematical optimization, deep reinforcement learning, heuristic algorithm, and deep learning algorithm have been developed to address the identified problems. The key contribution of this work is the use of the DT to accurately simulate the production environment and increase the efficiency of the developed AI agents. The AAS concept is also used to guarantee interoperable data transfer within MAS. Future research directions could be considered the continuous exploitation of the DT and AI integrations. Moreover, the AAS technology was used to fully parameterize the agents and the production environment on the simulator.