Keywords

1 Introduction

The advent of Industry 4.0, introduced in 2011, aimed to revolutionize manufacturing by incorporating advanced technologies to achieve operational efficiency and productivity gains [24]. However, as the focus shifted from technology-driven advancements to a more human-centric approach, the concept of Operator 4.0 emerged. Operator 4.0 envisions a workforce assisted by systems that alleviate physical and mental stress while maintaining production objectives [14]. This shift in perspective laid the foundation for the development of Industry 5.0, a value-driven manufacturing paradigm that places worker well-being at the forefront of the production process [30].

Industry 5.0 encompasses two main visions: one involving human–robot collaboration and the other centered around a bioeconomy utilizing renewable biological resources. As the researchers explore the potential of Industry 5.0, it becomes crucial to investigate how technology can support the industry while prioritizing worker safety and productivity. This necessitates meeting the human needs outlined in the Industrial Human Needs Pyramid, ranging from workplace safety to the fulfillment of human potential through a trustworthy relationship between humans and machines.

In this context, wearable sensor-based human activity recognition (HAR) emerges as a vital component of Industry 5.0. By continuously and unobtrusively monitoring workers’ activities, wearable sensors enable a synergistic collaboration between humans and machines [23, 29]. This collaboration enhances productivity while empowering workers to unleash critical thinking, creativity, and domain knowledge. Simultaneously, machines autonomously assist with repetitive tasks, reducing waste and costs.

To foster the development of trustworthy coevolutionary relationships between humans and machines, interfaces must consider the unique characteristics of employees and organizational goals. An example of such collaboration is evident in the use of cobots, where machines share physical space, perceive human presence, and perform tasks independently, simultaneously, sequentially, or in a supportive manner [8].

Within this framework, this book chapter focuses on the crucial intersection of wearable sensor-based HAR and Industry 5.0. Especially, we explore the transformative role of sensor-based HAR in promoting worker safety and optimizing productivity in manufacturing environments. We aim to study the intricacies of wearable sensor technologies, sensor data fusion techniques, and advanced machine learning algorithms to effectively capture and interpret workers’ activities in real time. By utilizing wearable sensors, manufacturers can gain real-time insights into the physical movements and behaviors of workers, enabling them to identify potential safety hazards, proactively intervene in hazardous situations, and implement preventive measures.

Moreover, the applications of sensor-based HAR extend beyond worker safety. The data collected from wearable sensors can facilitate the optimization of manufacturing processes by identifying bottlenecks, streamlining workflows, and minimizing errors [29]. By analyzing workers’ activities, manufacturers can uncover insights into the ergonomics of workstations, leading to improvements in the job design and a reduction in musculoskeletal disorders. Furthermore, the knowledge derived from sensor-based HAR can inform training programs, enabling targeted interventions and skill development to enhance worker efficiency and job satisfaction. Likewise, the integration of HAR using wearable sensors in manufacturing environments aligns with the growing interest in smart manufacturing and Industry 4.0 [23, 29]. In the manufacturing line, HAR can be utilized to quantify and evaluate worker performance [28], understand workers’ operational behavior [1], and support workers’ operations with industrial robots [22]. Activity recognition for worker safety in the manufacturing line is becoming increasingly important, as it provides the ability to quickly identify workers’ needs for assistance or prevent industrial accidents.

In this book chapter, we study the intricacies of wearable sensor technologies and their integration into manufacturing environments. We explore various sensor modalities, including inertial sensors, motion sensors, and body capacitance sensors, and discuss their relevance in capturing a comprehensive view of workers’ activities. We also examine sensor data fusion techniques to effectively integrate and interpret data from multiple wearable devices, enabling a holistic understanding of workers’ actions. The outcomes of this research shed light on the transformative capabilities of wearable sensor technologies and open new avenues for future research in this field, aligning with the principles of the Industry 5.0 paradigm.

To demonstrate the practical application of sensor-based worker activity recognition, we present a use case in a smart factory testbed. In this use case, we deploy and test HAR using wearable sensors to predict workers’ movement intentions and plan optimal routes for mobile robots in a collaborative environment. By anticipating workers’ actions, collision risks between workers and robots can be minimized, ensuring both high production levels and worker safety.

To further improve the performance of HAR, we explore deep learning techniques such as adversarial learning [26, 27] and contrastive learning [9, 12]. These approaches have shown promise in enhancing activity recognition by leveraging additional information and improving the generalization capabilities of the models.

In addition to traditional wearable sensor modalities, we introduce the use of body capacitance sensing as an alternative modality for HAR. Body capacitance sensing captures the electric field feature between the body and the environment, providing unique insights into body movement and environmental variations [6]. By combining body capacitance sensing with inertial measurement unit (IMU)-based activity recognition, we aim to extend the sensing capabilities and improve the accuracy of activity recognition systems.

Throughout this chapter, we showcase real-world applications of sensor-based worker activity recognition in manufacturing environments. From real-time monitoring of workers’ movements to detecting unsafe actions and alerting workers and supervisors, the potential benefits are substantial. We also discuss the limitations and potential ethical considerations associated with wearable sensor systems, emphasizing the importance of privacy, data security, and worker consent.

In the following sections, we provide a comprehensive review of related works in Sect. 2. Section 3 presents the wearable sensor-based worker activity recognition in a manufacturing line with IMU sensors and body capacitance sensing module, including data fusion approaches and a use case in a smart factory testbed. Section 4 introduces deep learning techniques for improving the performance of human activity recognition, such as adversarial learning and contrastive learning. Finally, Sect. 5 concludes this book chapter.

2 Background

The Internet of Things (IoT) has revolutionized various industries, including manufacturing, by enabling the integration of physical devices and digital systems. In the context of smart factories, IoT technologies play a crucial role in creating intelligent and automated production environments. With the advancement of sensing technologies, it has become increasingly feasible to recognize human activities using various sensors such as IMU sensors [26], microphones [5], cameras [20], and magnetic sensors [19]. These sensors capture valuable data about human movements, interactions, and environmental factors, which can be utilized to enhance productivity, safety, and efficiency in smart factory environments.

Supervised machine learning techniques have been widely used for recognizing activities using labeled training data. However, collecting labeled data can be time-consuming and labor-intensive. To overcome this challenge, researchers have explored methods that reduce the effort required for data collection. Among the popular approaches are transfer learning, activity modeling, and clustering techniques. These methods leverage existing knowledge or unsupervised learning to recognize activities with minimal labeled data.

The concept of smart manufacturing, often associated with Industry 4.0, has gained significant attention in recent years [21]. Researchers have focused on recognizing and supporting factory work using sensor technologies [2, 3]. In the manufacturing domain, the use of sensors for activity recognition has been extensively explored. For example, Koskimaki et al. [15] utilized a wrist-worn IMU sensor and a K-Nearest Neighbors model to classify activities in industrial assembly lines. Maekawa et al. [17] proposed an unsupervised method for lead time estimation of factory work using signals from a smartwatch with an IMU sensor. These studies demonstrate the potential of sensors in recognizing worker activities in manufacturing environments, particularly with the application of machine learning techniques.

Sensor-based HAR is a crucial aspect of wearable technology. The IMU has been the dominant sensor in wearable devices, providing motion-sensing capabilities. However, the IMU’s ability is limited to capturing the wearer’s movement patterns and does not account for body–environment and body–machine interactions, which play critical roles in security and safety in manufacturing environments. To extend the motion-sensing ability of wearables, the researchers have explored alternative sensing sources, such as the body capacitance [6]. The body capacitance describes the electric field between the body and the environment, and variations in this field can provide valuable information for pattern recognition. For example, capacitive sensing has been used to detect touch patterns between human fingers and different objects. This alternative sensing approach benefits from low cost and low power consumption and extends the sensing ability beyond IMU-based activity recognition.

Machine learning models play a crucial role in wearable activity recognition, enabling the extraction of meaningful patterns and features from sensor data [13]. Classical machine learning methods extract handcrafted features from sensor data, such as time-domain and frequency-domain features [16]. These methods rely on expert knowledge and domain-specific feature engineering. In recent years, deep learning-based methods have gained significant attention in activity recognition [18]. Convolutional neural networks (CNNs), recurrent neural networks (RNNs), and hybrid models have been developed to capture temporal correlations and learn sensor representations for improved activity recognition accuracy. Additionally, multitask learning and generative adversarial learning have been introduced to address different data distribution problems and enhance recognition performance [4, 7]. These advancements in machine learning techniques have paved the way for more accurate and robust activity recognition in wearable devices.

To bridge the gap between wearable sensor-based HAR and the principles of Industry 5.0, this book chapter offers unique contributions and addresses existing research gaps. It presents a comprehensive case study on wearable sensor-based worker activity recognition in a manufacturing line with a mobile robot. The investigation includes the exploration and comparison of sensor data fusion approaches using neural network models to effectively handle the multimodal sensor data obtained from wearable devices. Additionally, several deep learning-based techniques are introduced to enhance the performance of human activity recognition. By harnessing the potential of wearable sensors for human activity recognition, this chapter provides valuable insights into improving worker safety on the manufacturing line, aligning with the principles of the Industry 5.0 paradigm. The outcomes of this research shed light on the transformative capabilities of wearable sensor technologies and open new avenues for future research in this field.

3 Wearable Sensor-Based Worker Activity Recognition in Manufacturing Line

3.1 Use Case at the SmartFactory Testbed

The SmartFactory Testbed, developed by the Technology Initiative SmartFactory KL, is a collaborative effort between the Department of Machine Tools and Controls (WSKL) at the TU Kaiserslautern and the Innovative Factory Systems (IFS) research unit at the German Research Center for Artificial Intelligence (DFKI). This non-profit organization focuses on advancing manufacturing technologies with industry specialists known as factory innovators.

The SmartFactory Testbed offers unique manufacturer-independent demonstrations that enable the development, testing, and deployment of innovative ICT technologies in a realistic industrial production environment. It serves as a vital test environment, particularly for the European GAIA-X subproject SmartMA-X, where flexible production systems can be arranged and integrated in highly customized configurations, enhancing the dynamism of the industrial environment.

In this chapter, we utilize the SmartFactory Testbed as a real-world deployment setting to evaluate and validate the Human Action Recognition module within a specific use case referred to as “Human Action Recognition and Prediction in the Respective Environment.” The use case focuses on a worker’s activity pipeline, including their presence in the pilot area and collaboration with different modules and robots across 20 diverse activities. The accurate prediction of human actions within the production lines is crucial, especially when workers interact with moving robots or when robots generate paths to avoid potential collisions in the layout. By leveraging information about the worker’s location and their next anticipated action, the robot’s movement path can be manipulated to minimize the risk of collisions. This proactive approach ensures the safety and reliability of the industrial environment, particularly when humans are present.

3.2 Data Acquisition

In this chapter, we conducted an experiment involving 12 volunteers, from diverse cultural backgrounds and genders, who wore a wearable sensing prototype that we designed, as well as an Apple Watch, currently the top-selling smartwatch [10], while performing various tasks that simulated typical worker scenarios during their daily work. These tasks included opening and closing doors, walking, checking parts inside a module, and interacting with a touch screen. To ensure robust results, the experiment consisted of five sessions, each lasting between 2 and 3 minutes. Some sessions were conducted and recorded in a different direction from the flow chart. Prior to participation, all volunteers signed an agreement in compliance with the university’s committee for the protection of human subjects policies. The experiment was video recorded to enable further confidential analysis, including ground truth activity annotation. Both the observer and the participants adhered to an ethical and hygienic protocol in accordance with public health guidelines.

Figure 1 illustrates the wearable sensors attached to the participants. The prototype sensors were placed on both wrists while the Apple Watches were attached to both wrists, and an iPhone mini was attached to the left arm. The collected data from the wearable sensing prototype consisted of 10 channels per sensor, resulting in a total of 20 channels, including three channels of acceleration, three channels of gyroscope, three channels of magnetometer data, and one channel of body capacitance data. The data collected using the Apple Watch and iPhone mini comprised 9 channels per sensor, totaling 27 channels, including 3 channels of acceleration, 3 channels of gyroscope, and 3 channels of magnetometer data. To synchronize the recorded video data, left wrist sensor data, and right wrist sensor data, we performed five claps at the start and end of each session. Based on the video data, we manually annotated the users’ activities, resulting in 12 different activities, including a Null class. The prototype sensors provided IMU sensor data and body capacitance data with a sampling rate of 25 Hz, while the Apple Watch only provided IMU sensor data with a sampling rate of 100 Hz. This experiment allowed us to assess the performance of the sensor hardware and collect a dataset suitable for developing and testing algorithms for human intention recognition.

Fig. 1
Four photographs depict components from Rexroth, Festo, and Pilz, including an open door with checked parts inside a module in the Rexroth Bosch Group.

The wearable sensors attached to the participants. (a) The wearable sensing prototype we designed on both wrists and (b) Apple Watches on both wrists and the iPhone mini on the left arm

Table 1 presents the data distribution of activities in the prototype sensor and Apple Watch datasets. The prototype sensor dataset involved data collection for a total duration of 142 minutes and 24 seconds, while the Apple Watch dataset had a total duration of 170 minutes and 3 seconds. To focus on preventing collisions between workers and mobile robots, we decided to collect additional sensor data from the Apple Watches for the walking class after collecting data from the prototype sensor. As a result, the Apple Watch dataset had a higher proportion of walking class data compared to the prototype sensor dataset.

Table 1 Comparison of the data distribution of activities in the prototype sensor and Apple Watch datasets

To evaluate the prototype sensor hardware and neural network models for worker activity recognition, we annotated the user’s activities based on the workflow of the use case at the SmartFactory Testbed. The workflow classified the sensor data into 12 activities, including Null, opening/closing doors, checking machines, walking, pressing buttons, and placing back keys.

In our application, each instance is a sliding window of sensor data. For the Apple Watch data, we used a window of length 100 (1 second) and a step size of 4 (0.04 seconds). For the prototype sensor data, we employed a window of length 25 (1 second) and a step size of 1 (0.04 seconds). That is, in both cases, we use data from 1 second, with a step size of 0.04 seconds, which maximizes the number of windows while keeping the step size suitable in both sensor scenarios.

3.3 Worker Activity Recognition Results

To process the collected sensor data, we used neural networks based on convolutional neural networks (CNNs) [25]. For training and validation of the neural network models, we employed a leave-one-session-out scheme. In each fold, one session was allocated for testing, another for validation, and the remaining three sessions for training.

Table 2 presents the comparison results between Apple Watches and the prototype sensing module, which combined IMU and body capacitance sensors, for the worker’s activity recognition at the smart factory testbed. The evaluation metrics used include accuracy, macro F1 score [26], and the accuracy of the walking class, which is critical for preventing potential collisions between workers and mobile robots. The results demonstrate that the prototype-sensing module outperformed the Apple Watch in terms of accuracy, macro F1 score, and walking class accuracy.

Table 2 Comparison results between Apple Watches and the prototype sensing module combining IMU and body capacitance sensors in terms of the testing accuracy and macro F1 score. The numbers are expressed in percent and represented as \(mean \pm std\)

4 Deep Learning Techniques for Human Activity Recognition Improvement

In this section, we focus on exploring advanced deep learning techniques to improve the performance of worker activity recognition in the manufacturing line for worker safety, building upon the findings presented in the previous section. Two specific techniques, namely adversarial learning and contrastive learning, are discussed in detail. These techniques offer innovative approaches to improve the accuracy and robustness of the recognition system, addressing the challenges associated with complex and dynamic industrial environments [9, 27].

4.1 Adversarial Learning

Despite the successful digitalization of worker activities through wearable sensors and their recognition by simple CNN models, achieving generalization to unseen workers remains a significant challenge. Numerous studies have demonstrated that individuals perform the same activities in different ways, posing a challenge for activity recognition, despite the feasibility of user recognition [26], as illustrated in Fig. 2. This discrepancy becomes evident when evaluating performance by leaving out subjects rather than leaving out sessions.

Fig. 2
An illustration depicts training users in a 2 by 3, and test users in a 1 by 3, with a gap between them. The training dataset and test dataset results depict low recognition accuracy.

Challenges in activity recognition: accounting for diverse behavior patterns across individuals

To deal with this problem, we present an adversarial learning-based method for user-invariant HAR in this subsection, as illustrated in Fig. 3. Inspired by generative adversarial networks (GANs) [11], adversarial learning has been introduced as a technique to enhance the model’s ability to discriminate between different worker activities and improve generalization. This technique, described in [26] and [27], employs four independent networks: a feature extractor, a reconstructor, an activity classifier, and a subject discriminator. The feature extractor maps sensor data to a common embedding feature space, while the reconstructor reconstructs the original signal from the embedding features. The activity classifier predicts activity labels based on the embedding features, and the subject discriminator differentiates between subjects based on the embedding features. The feature extractor and subject discriminator are trained by adversarial learning, with the subject discriminator aiming to distinguish subjects and the feature extractor aiming to deceive the subject discriminator. The method also incorporates a reconstruction loss to minimize the difference between the original and reconstructed signals, along with a classification loss to train the feature extractor and activity classifier using activity labels. Additionally, the proposed method employs maximum mean discrepancy (MMD) regularization to align the distributions among source and target subjects, thereby enhancing the generalization of the embedding feature representation.

Fig. 3
A block diagram illustrates a system with input from multiple individuals processed by an encoder, undergoing reconstruction and classification through a decoder, activity classifier, and domain discriminator, with associated losses for effective training.

The overall framework of the user-invariant HAR method using adversarial learning

The user-invariant HAR method has demonstrated improvements of up to 7% in accuracy and 28% in macro F1 score compared to the baseline using the CNN model [26]. By using this adversarial learning technique, we expect to improve worker activity recognition in the manufacturing line for worker safety. For further details on the methods and experiments, please refer to [26] and [27].

4.2 Contrastive Learning

We tackle a common challenge in wearable HAR, where sensor locations that are commonly used for everyday wear provide inadequate information for accurate activity recognition. For instance, in our manufacturing work scenario, we found that sensors placed on the wrist and arm are not optimal for recognizing activities like walking, which would be better detected with an IMU sensor placed on the leg. This is a well-known limitation in HAR, where sensors deployed for long-term everyday use often result in poor or noisy data for the intended application.

To address this issue, we propose a method outlined in our work [9] that aims to improve the representation of the deployed sensors during training. The idea is to leverage additional sensors that are available only during the training phase to build a better representation of the target (deployed) sensor. This approach allows us to capture more relevant information about the activities being recognized, even if the target sensor alone may not provide sufficient information, we can improve it by guiding its representation with through contrastive learning with the source sensor.

In our proposed method, we collect temporally paired data from both the source and target sensors. The source sensor refers to a sensor that is available during training but not during deployment, while the target sensor is the sensor that will be deployed for activity recognition in real-world scenarios. Through contrastive learning, we learn a mapping between the representations of the source and target sensors. This process enables us to capture the relationship and similarities between the activities observed by the source and target sensors, enhancing the representation of the target sensor.

The training of representations and the mapping between them are depicted in Fig. 4. Each sensor’s data are processed separately through deep neural network encoders to obtain their respective representations. The translation between representations is facilitated by translation networks, which learn to transform the source sensor representation to align with the target sensor representation. This contrastive learning step can be performed using unlabeled data, where the network learns to align the representations based on the temporal correspondence between the sensors.

Fig. 4
A block diagram represents a domain adaptation system. Source data is encoded by E s r c and translated by T s r c d s t, resulting in R s r c. Contrastive loss is applied. A similar process occurs for target data.

Step 1: Training representations with paired sensor data by contrastive learning

Once the representations are learned, we proceed to the next step using labeled data from both sensors. In this step, we train our activity classifier using data either from the target sensor in its learned representation or from the source sensor by translating it to the target representation using the translation network. This joint training process allows the classifier to learn to recognize activities based on the enhanced representations from both sensors. The overall training process is illustrated in Fig. 5. For evaluation, we utilize only the target sensor data, as it represents the real-world deployment scenario. By applying the learned representations and the trained classifier to the target sensor data, we can more accurately recognize and classify activities in practical settings. The evaluation process is illustrated in Fig. 6.

Fig. 5
A block diagram illustrates a domain adaptation system. Source and target data undergo encoding, translation, and reconstruction processes. The resulting features contribute to an activity label classifier with associated contrastive and classification losses.

Step 2: Training representations and classifier by minimizing classification loss

Fig. 6
A block diagram illustrates target data, an encoder, a classifier, and an activity label.

Step 3: Testing with data of the target sensor only in order to evaluate the method

We evaluated our method on two benchmark datasets for HAR: PAMAP2 and Opportunity. The results demonstrated significant improvements in activity recognition performance. We achieved an average macro F1 score increase ranging from 5 to 13 percentage points across all activities compared to traditional approaches that only rely on the target sensor. Notably, in specific scenarios where the source sensor provides highly informative data compared to the target sensor (e.g., recognizing walking with an ankle sensor as the source and a wrist sensor as the target), we observed even greater improvements, reaching up to 20 to 40 percentage points for certain activity classes.

This method using contrastive learning has important implications for real-world applications beyond IMU sensors. By using contrastive learning and knowledge transfer between sensors, our approach enables the development of more robust and accurate HAR systems. This has potential applications in various domains, including human–robot collaboration in manufacturing lines, where HAR plays a crucial role in improving productivity, quality assurance, and worker safety.

5 Conclusion

In this book chapter, we explored the role of wearable sensor-based HAR in promoting worker safety and optimizing productivity in manufacturing environments. We discussed the importance of worker safety and the potential benefits of using wearable sensors to monitor and recognize workers’ activities on the manufacturing line. We presented a case study on wearable sensor-based worker activity recognition in a manufacturing line with a mobile robot, using sensor data fusion approaches and neural network models. By combining data from different sensor modalities, such as inertial sensors and body capacitance sensors, we were able to capture a comprehensive view of workers’ activities and improve the accuracy of activity recognition. Furthermore, we introduced several deep learning-based techniques to enhance the performance of HAR, including adversarial learning and contrastive learning. These approaches have shown promise in improving activity recognition accuracy and generalization capabilities.

The use case in the SmartFactory Testbed demonstrated the practical application of sensor-based worker activity recognition in a real-world manufacturing environment. By accurately recognizing workers’ activities and predicting their movement intentions, collision risks between workers and robots can be minimized, ensuring both worker safety and high production levels. Throughout this chapter, we discussed the potential benefits of sensor-based HAR beyond worker safety, including process optimization, ergonomics improvement, and worker training. We also highlighted the challenges and limitations associated with wearable sensor systems, such as privacy concerns and data security.

While our studies demonstrate the potential of wearable sensor-based HAR in manufacturing environments, we acknowledge that there are still challenges to overcome. These challenges include ensuring the robustness and reliability of sensor data, addressing issues related to real-time processing and inference, and managing privacy concerns in the collection and storage of personal data.

The results obtained from the adopted research can be applied in the practice of manufacturing companies by providing real-time insights into workers’ activities. By accurately recognizing workers’ activities and predicting their movement intentions, manufacturers can identify safety hazards, optimize processes, and enhance overall operational efficiency. The implementation of wearable sensor-based HAR systems can lead to improved worker safety, reduced workplace accidents, enhanced productivity, and more efficient resource allocation.

In conclusion, wearable sensor-based HAR holds significant potential for improving worker safety and productivity in manufacturing environments. By leveraging the wealth of data collected from wearable sensors and utilizing advanced machine learning techniques, manufacturers can gain real-time insights into workers’ activities, identify safety hazards, optimize processes, and enhance overall operational efficiency. Future research in this field should focus on addressing the challenges and limitations of wearable sensor systems and exploring novel sensing modalities and machine learning approaches to further improve the performance of HAR in manufacturing environments.