Abstract
To solve the problem of identifying different types of car couplers during the operation of the automatic uncoupling robot of a tippler, a method for recognizing the handle of a car coupler based on the YOLOv5 model has been proposed. This method selects YOLOv5n, which is relatively simple in the YOLOv5 series, as the benchmark model for the detection network. The overall structure is more concise, effectively reducing the number of model parameters while ensuring detection accuracy. The YOLOv5n model used for feature extraction and target recognition on two types of coupler datasets: upper action and lower action, greatly reducing the time required for training and testing, and achieving extremely high recognition accuracy. Compared with the commonly used SSD300 model and Faster R-CNN model, it shows significant advantages in terms of parameter quantity, computational complexity, predictive inference speed and weight file size.
You have full access to this open access chapter, Download conference paper PDF
Keywords
1 Introduction
As an important tool for unloading goods, tipplers are widely used in the coal transportation industry. Its working principle is to fix and rotate a carriage of the train at a certain angle, thereby dumping the coal out of it. However, there are couplers between the train carriages for connection, so it is crucial to timely and effectively remove the couplers before the tippler works. The operating handles for uncoupling train couplers are divided into two types: upper acting and lower acting, and different handles require different methods of uncoupling operation. The removal of train couplers was initially completed by manpower, but due to the low efficiency of manual removal of couplers and the danger of workers during the process, the research on automatic decoupling robots has emerged. Compared to manual labor, automatic hook picking robots can adapt to more harsh working environments and have higher work efficiency. The ability to accurately and quickly determine the position of the coupler and identify the type of coupler has a significant impact on the completion and work efficiency of the uncoupling robot. Therefore, the research on target detection of the coupler operating handle is of great significance.
The development of target detection methods is mainly divided into two stages: the stage of using traditional methods for target detection and the stage of using deep learning methods for target detection. The traditional target detection method uses exhaustive methods to select the target location, which not only takes a long time, but also has a high false detection rate in practical applications, so it is gradually replaced by the deep learning method. In deep learning, commonly used target detection methods are mainly divided into two types. One type is to first form a series of candidate boxes for feature extraction and target position judgment, known as the two stage algorithm, such as R-CNN algorithm and Faster R-CNN algorithm. Another type of algorithm, which does not generate candidate boxes separately, completes the selection of target positions and feature extraction together to achieve end-to-end target detection tasks, is called one stage algorithm, such as YOLO algorithm and SSD algorithm [1,2,3]. Both algorithms can effectively complete various complex object detection tasks and are widely used by a large number of scholars. He et al. [4] introduced an attention module, balance module, and context module to construct an intelligent detection model in Mask R-CNN, which is used for welding quality detection of subway car bodies. According to experimental verification, the detection accuracy of this method is 4.5 percentage points higher than the traditional Mask R-CNN algorithm. Zhang et al. [5] applied the Faster R-CNN algorithm to equipment recognition and status detection tasks in power rooms, and performed well in both image and video tests, proving the effectiveness of the algorithm. Li et al. [6] combines deep learning with unmanned aerial vehicle (UAV) bridge crack detection, uses UAV to obtain high-quality images of long distance bridges, and then uses Faster R-CNN algorithm based on transfer learning to identify cracks, and obtains high detection accuracy and efficiency. Dai et al. [7] proposed a multi task detector based on the Faster R-CNN algorithm, which uses an improved ResNet-50 architecture to estimate distance and detect pedestrians during autonomous driving. It combines infrared cameras and LiDAR with target detection algorithms, achieving good detection results in real nighttime road scenes. Luo et al. [8] optimized the Faster R-CNN algorithm and combined it with feature enhancement methods to detect different vehicles, effectively solving the problem of vehicle detection in complex traffic environments. Li et al. [9] combines YOLOv4 and YOLO-GGCNN for object recognition in positional environments, facilitating the robotic arm to selectively grasp objects in unknown environments. Jiang et al. [10] used the YOLO model to extract features from videos and images captured by infrared cameras, and achieved good results in complex environments, verifying the effectiveness of the YOLO model in drone target detection tasks. Mushtaq et al. [11] applied the deep learning method to the identification of important components of aerospace vehicles, used YOLOv5 algorithm to identify and classify different components on the assembly line, and achieved high detection accuracy. Ji et al. [12] improved the traditional YOLOv4 algorithm by introducing the extended perception module and the attention mechanism module, and improved the CIOU Loss function. It was verified that it has better detection effect in small target detection than other models through public data sets.
The above methods have achieved good detection results, but they have not taken into account the complexity and timeliness of the model. Most existing target detection models have complex structures in order to achieve high detection accuracy, but this also leads to an increase in the computational time of the model, an increase in the demand for hardware computing power, and a longer inference time when making predictions in practical applications. The detection objects studied in this article are only two types of coupler operating handles, which belong to relatively simple target detection tasks and do not require extremely complex model structures and calculation steps. Using common models for detection and analysis can waste a lot of time and burden hardware. In response to this issue, this article uses the simplest YOLOv5n model in the YOLO series as the detection model, which ensures accuracy while obtaining fewer model parameters and faster operation speed. It has high engineering application value for simple target detection tasks.
This article is summarized as follows. Section 2 introduces the diagnostic methods used in this article. Section 3 introduces the experimental process and result analysis of target recognition for the coupler handle. Section 4 summarizes the diagnostic methods used in this article.
2 YOLOv5 Model
YOLO series algorithms are representative methods in one stage target detection algorithms. Due to their fast detection speed, strong generalization ability, and high robustness, they have been widely used in various object detection scenarios. YOLOv5 is one of the most widely used detection methods. Compared to the previous YOLOv4 method, this method has fewer parameters, faster calculation speed, and higher accuracy [13]. Its model structure is shown in Fig. 1. According to the complexity of the models, the YOLOv5 series proposes a total of five official models, namely YOLOv5x, YOLOv5l, YOLOv5m, YOLOv5s, and YOLOv5n [14,15,16]. The YOLOv5n model has the fastest computational speed and the smallest number of parameters, making it more suitable for simpler target recognition applications.
The Input section is the input end of the model, used to input the original image dataset. This section mainly focuses on mosaic image enhancement, adaptive anchor box calculation, and adaptive image scaling processing [17, 18]. The mosaic method can randomly select several different images for random stitching and combination to form a large image, thereby improving the multiplicity of training images and the robustness of the model network. Adaptive anchor box can automatically calculate the most suitable anchor box parameters for the input image, improving the accuracy of detection results. Adaptive image scaling refers to standardizing the size of the original image to reduce information redundancy in the input image and improve the inference speed of the algorithm.
The Backbone section consists of CBS module, C3_1 module and SPPF module, mainly used for feature extraction of images [19]. Among them, CBS module and C3_1 module can repeatedly analyze the input images and gradually extract useful information. The SPPF module can achieve local feature fusion and to some extent solve the multi-scale problem of the target. Compared to the original SPP module, it has faster computational speed and higher efficiency. In the overall framework of Backbone, the first convolutional layer usually has two structures: Focus and CBS. The Focus module has stronger feature extraction capabilities, but the Focus structure is relatively complex in implementation, which can lead to increased computational complexity. The CBS module has a simple structure and is more suitable for simple target recognition tasks. Therefore, the model in this article chooses the CBS module as the first convolutional layer, which can simplify the model structure and improve computational efficiency.
The Neck section continues to use the traditional FPN + PAN structure, as shown in Fig. 2. FPN can transmit deep semantic features to shallow layers, while PAN can conversely transmit shallow localization information to deep layers. The effective combination of the two can enhance the ability of network feature extraction and fusion. Additionally, the Neck section introduces C3_2 modules to further enhance the network's feature fusion capability [20].
The Head section is the output end of the model, which is divided into three scales for prediction at different scales [21].
3 Experiments and Result Analysis
3.1 Experimental Introduction
Experimental Dataset
The automatic uncoupling robot needs to recognize and locate the operating handle of the coupler during the uncoupling process, which requires it to learn and memorize different types of coupler operating handles. This experimental dataset was collected from different connecting parts of train carriages, and images of coupler operating handles of different models and backgrounds were collected. The operating handle is mainly divided into two types: top acting and down acting, and its schematic diagram is shown in Figs. 3 and 4. Use the “Labelimg” tool to label all images and set the labels for the two types of couplers to “Top acting handle” and “Down acting handle”, respectively. Finally, the annotated images are randomly divided into validation, testing and training sets in a ratio of 1:1:8, with all three datasets having different images [22, 23].
Experimental Evaluation Indicators
In terms of model effectiveness detection, this article selects indicators such as P (Precision), R (Recall) and mAP (Mean average precision) to evaluate the effectiveness of the model. Accuracy represents the proportion of correctly predicted positive samples in the predicted data to the actual positive samples, as shown in Eq. (1), where TP represents the number of predicted positive samples in the positive samples and FP represents the number of predicted positive samples in the negative samples [24]. The recall rate represents the probability that the correct category in the sample is predicted correctly, as shown in Eq. (2), where FN represents the number of predicted negative categories in the positive sample. Drawing with P as the vertical axis and R as the horizontal axis can obtain the PR curve. AP (Average Precision) represents the area enclosed under the PR curve, as shown in Eq. (3). The larger the AP value, the better the model's data processing performance. At present, mAP is often used to measure the model detection effect, which represents the average accuracy of all categories [25, 26], as shown in Eq. (4), where nclass represents the number of categories, and the larger the value of mAP, the better the model detection effect.
3.2 Network Training
Input the dataset into the network model for training. During the training process, the learning rate is set to 0.01, the batch size is set to 8, and the epoch is set to 200. After training, the curves of various loss values gradually changing with continuous iteration are shown in Fig. 5. In the picture, box_loss represents the loss of the bounding box, obj_loss represents whether there is a corresponding object loss in the bounding box, cls_loss represents classification loss, train represents training set, and val represents validation set. As shown in the figure, the loss curves of both the training and validation sets can effectively converge, ultimately approaching 0, indicating that this method can achieve the best effect after 200 iterations.
Calculate the accuracy, recall rate and mAP values and plot the change curve, as shown in Fig. 6. As shown in the figure, after 200 iterations, the curves of accuracy, recall rate, and mean average precision all converge completely and are infinitely close to 1, indicating that the method of this article can classify, locate, and recognize different types of train coupler operating handles, and has achieved excellent results.
Figure 7 shows the confusion matrix of the validation results. As shown in the figure, the recognition accuracy for both top acting handle and down acting handle categories is 1, and there is no recognition at the background category. This indicates that the method of this article can effectively recognize both types of coupler handles, and the positioning effect is good, without confusing the handle with other background parts.
3.3 Visualization of Test Results
Use the trained model to detect and recognize the images of the coupler operating handle in the test set, and the results are shown in Fig. 8. From the figure, it can be seen that this method can accurately select different styles of coupler operating handles and has a high recognition probability, which once again demonstrates the good engineering application value of this method.
3.4 Comparison of Different Models
To test the advantages of the YOLOv5n method used for detecting the uncoupling handle of car couplers in this article, it was compared with SSD300 and Faster R-CNN, and analyzed from the aspects of detection accuracy, parameter quantity, computational complexity, prediction speed, and saved weight file size. In terms of detection accuracy, draw the mAP change curves of the three methods, as shown in Fig. 9. From the figure, it can be seen that under a threshold of 0.5, the mAP values of YOLOv5n, SSD300, and Faster R-CNN methods can all rapidly increase with iteration and eventually converge. The best mAP values can reach 99.50, 99.95, and 96.88%, respectively. The mAP value for target detection of the coupler handle using the YOLOv5n method, although not the highest among all models, fully meets the accuracy requirements of target detection. Calculate the parameter quantity, computational complexity and prediction speed of the three methods, as shown in Table 1. From the table, it can be seen that compared to the SSD300 and Faster R-CNN algorithms, the YOLOv5n algorithm has the least number of parameters and the smallest computational complexity, thus requiring less hardware computing power during the operation process. In terms of prediction speed, the FPS value of YOLOv5n algorithm is higher than the other two algorithms, indicating that it has the fastest inference speed in the prediction process and can more quickly identify task targets in practical applications. In terms of model weight preservation, draw a bar chart comparing the optimal weight file sizes of the three methods, as shown in Fig. 10. As shown in the figure, the YOLOv5n algorithm saves the smallest optimal weight volume, making it easier to deploy to resource constrained platforms such as embedded or mobile devices. At the same time, a smaller volume can also achieve faster inference speed, thus completing prediction tasks faster. Based on the above analysis, it can be concluded that compared to other models, the YOLOv5n model is more suitable for task scenarios with simple targets and high timeliness requirements such as coupler handle detection.
4 Conclusions
To solve the problem of identifying and positioning the operating handle of the train automatic uncoupling robot during the uncoupling process, a method based on the YOLOv5 model for object detection of the coupler handle is proposed. This method uses the relatively simple YOLOv5n model as the reference model for detection, ensuring the detection accuracy of the model for simple objects while effectively reducing the parameter quantity of the model. The YOLOv5n model is used to detect and analyze different types of coupler handles. The experimental results show that this method has high detection accuracy, fast speed, small weight file volume, and smaller hardware requirements, thus having high engineering application value.
References
Liu JJ, Xiong L, Sun J, Liu Y, Zhang R, Lin HK (2023) A method for rotor speed measurement and operating state identification of hydro-generator units based on YOLOv5. Machines 11:758. https://doi.org/10.3390/machines11070758
Dong XD, Yan S, Duan CQ (2022) A lightweight vehicles detection network model based on YOLOv5. Eng Appl Artif Intell 113:104914. https://doi.org/10.1016/j.engappai.2022.104914
Qu Z, Gao LY, Wang SY, Yin HN, Yi TM (2022) An improved YOLOv5 method for large objects detection with multi-scale feature cross-layer fusion network. Image Vis Comput 125:104518. https://doi.org/10.1016/j.imavis.2022.104518
He DQ, Ma R, Jin ZZ, Ren RC, He SQ, Xiang ZY, Chen YJ, Xiang WB (2023) Welding quality detection of metro train body based on ABC mask R-CNN. Measurement 216:112969. https://doi.org/10.1016/j.measurement.2023.112969
Zhang QY, Chang XR, Meng ZN, Li Y (2021) Equipment detection and recognition in electric power room based on faster R-CNN. Procedia Comput Sci 183:324–330. https://doi.org/10.1016/j.procs.2021.02.066
Li RX, Yu JY, Li F, Yang RT, Wang YD, Peng ZH (2023) Automatic bridge crack detection using unmanned aerial vehicle and Faster R-CNN. Constr Build Mater 362:129659. https://doi.org/10.1016/j.conbuildmat.2022.129659
Dai XB, Hu JP, Zhang HM, Shitu A, Luo CL, Osman A, Sfarra S, Duan YX (2021) Multi-task faster R-CNN for nighttime pedestrian detection and distance estimation. Infrared Phys Technol 115:103694. https://doi.org/10.1016/j.infrared.2021.103694
Luo JQ, Fang HS, Shao FM, Zhong Y, Hua X (2021) Multi-scale traffic vehicle detection based on faster R-CNN with NAS optimization and feature enrichment. Defence Technol 17(4):1542–1554. https://doi.org/10.1016/j.dt.2020.10.006
Li Z, Xu BL, Wu D, Zhao K, Chen SW, Lu ML, Cong JL (2023) A YOLO-GGCNN based grasping framework for mobile robots in unknown environments. Expert Syst Appl 225:119993. https://doi.org/10.1016/j.eswa.2023.119993
Jiang CC, Ren HZ, Ye X, Zhu JS, Zeng H, Nan Y, Sun M, Ren X, Huo HT (2022) Object detection from UAV thermal infrared images and videos using YOLO models. Int J Appl Earth Obs Geoinf 112:102912. https://doi.org/10.1016/j.jag.2022.102912
Mushtaq F, Ramesh K, Deshmukh S, Ray T, Parimi C, Tandon P, Jha PK (2023) Nuts&bolts: YOLO-v5 and image processing based component identification system. Eng Appl Artif Intell 118:105665. https://doi.org/10.1016/j.engappai.2022.105665
Ji SJ, Ling QH, Han F (2023) An improved algorithm for small object detection based on YOLO v4 and multi-scale contextual information. Comput Electr Eng 105:108490. https://doi.org/10.1016/j.compeleceng.2022.108490
Sun TY, Xing HS, Cao SX, Zhang YH, Fan SY, Liu P (2022) A novel detection method for hot spots of photovoltaic (PV) panels using improved anchors and prediction heads of YOLOv5 network. Energy Rep 8:1219–1229. https://doi.org/10.1016/j.egyr.2022.08.130
Yar H, Khan ZA, Ullah FUM, Ullah W, Baik SW (2023) A modified YOLOv5 architecture for efficient fire detection in smart cities. Expert Syst Appl 231:120465. https://doi.org/10.1016/j.eswa.2023.120465
Chen SF, Yang DZ, Liu J, Tian Q, Zhou FT (2023) Automatic weld type classification, tacked spot recognition and weld ROI determination for robotic welding based on modified YOLOv5. Robot Comput-Integr Manufact 81:102490. https://doi.org/10.1016/j.rcim.2022.102490
Cheng SX, Zhu YS, Wu SH (2023) Deep learning based efficient ship detection from drone-captured images for maritime surveillance. Ocean Eng 285:115440. https://doi.org/10.1016/j.oceaneng.2023.115440
Mahaur B, Mishra KK, Kumar A (2023) An improved lightweight small object detection framework applied to real-time autonomous driving. Expert Syst Appl 234:121036. https://doi.org/10.1016/j.eswa.2023.121036
Chen J, Bao E, Pan JY (2022) Classification and positioning of circuit board components based on improved YOLOv5. Procedia Comput Sci 208:613–626. https://doi.org/10.1016/j.procs.2022.10.085
Hamzenejadi MH, Mohseni H (2023) Fine-tuned YOLOv5 for real-time vehicle detection in UAV imagery: architectural improvements and performance boost. Expert Syst Appl 231:120845. https://doi.org/10.1016/j.eswa.2023.120845
Yuan JX, Zheng XF, Peng LW, Qu K, Luo H, Wei LL, Jin J, Tan FL (2023) Identification method of typical defects in transmission lines based on YOLOv5 object detection algorithm. Energy Rep 9:323–332. https://doi.org/10.1016/j.egyr.2023.04.078
Lamane M, Tabaa M, Klilou A (2022) Classification of targets detected by mmWave radar using YOLOv5. Procedia Comput Sci 203:426–431. https://doi.org/10.1016/j.procs.2022.07.056
Cai W, Zhao JY, Zhu M (2020) A real time methodology of cluster-system theory-based reliability estimation using k-means clustering. Reliab Eng Syst Saf 202:107045. https://doi.org/10.1016/j.ress.2020.107045
Liu YX, Wu WB, Zhang X, Wan ST (2023) Fault detection method of bearings based on HHO-CNN. J Hebei Univ (Nat Sci Edn) 43(6):571–583. https://kns.cnki.net/kcms/detail/13.1077.N.20231108.1518.006.html
Zhao C, Shu X, Yan X, Zuo X, Zhu F (2023) RDD-YOLO: a modified YOLO for detection of steel surface defects. Measurement 214:112776. https://doi.org/10.1016/j.measurement.2023.112776
Liu G, Hu YX, Chen ZY, Guo JW, Ni P (2023) Lightweight object detection algorithm for robots with improved YOLOv5. Eng Appl Artif Intell 123:106217. https://doi.org/10.1016/j.engappai.2023.106217
Roy AM, Bhaduri J (2023) DenseSPH-YOLOv5: an automated damage detection model based on DenseNet and Swin-Transformer prediction head-enabled YOLOv5 with attention mechanism. Adv Eng Inform 56:102007. https://doi.org/10.1016/j.aei.2023.102007
Acknowledgements
This work is supported by National Natural Science Foundation of China (52105098), Natural Science Foundation of Hebei Province (E2021502038) and the Fundamental Research Funds for the Central Universities (2023MS130).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
Copyright information
© 2024 The Author(s)
About this paper
Cite this paper
Liu, Z. et al. (2024). Recognition Method for Train Coupler Handle Based on YOLOv5 Model. In: Halgamuge, S.K., Zhang, H., Zhao, D., Bian, Y. (eds) The 8th International Conference on Advances in Construction Machinery and Vehicle Engineering. ICACMVE 2023. Lecture Notes in Mechanical Engineering. Springer, Singapore. https://doi.org/10.1007/978-981-97-1876-4_88
Download citation
DOI: https://doi.org/10.1007/978-981-97-1876-4_88
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-1875-7
Online ISBN: 978-981-97-1876-4
eBook Packages: EngineeringEngineering (R0)