Introduction

The importance of identifying wood species is being emphasized globally to prevent the distribution of illegally harvested timber1,2. Conventional wood identification based on light microscopy is difficult for the public to access because of the requirements of experimental techniques and a high level of wood anatomical knowledge3,4,5,6,7. Therefore, various species identification protocols, such as DNA barcoding, principal component analysis, and comparison of extractive components, are being discussed worldwide to replace traditional species identification methods8,9,10. Recently, deep learning-based species classification has become the focus among these alternatives. This method minimizes the difficulty of species identification by optimizing the requirements for specialized knowledge, experimental techniques, and time-consuming species analysis required for the conventional species identification process11.

The deep learning-based species identification method processes visual traits, such as anatomical features on the wood surface, that can be visualized into feature maps, recognizes repetitive patterns as unique features of the species, and utilizes them as classification indices for species12,13,14. The diversity of the constituent cells observed on the wood surface enables convolutional layers to recognize and learn more diverse patterns, structures, and features while extracting features. This enables generalizing a wide range of features, a more precise classification of species, and increasing classification performance. Various feature maps of anatomical traits of hardwood, which exhibits more diverse and detailed anatomical features than softwood15,16, can be generated during feature extraction. This implies that hardwoods can be classified more precisely than softwoods.

Six oak species—Quercus serrata, Quercus dentata, Quercus mongolica, Quercus variabilis, Quercus aliena, and Quercus acutissima—are distributed over an area of 975,181 ha in Korea, accounting for approximately 22.3% of the total forest area17. Oakwood accounts for approximately 10.8% of the total domestic timber distribution volume18 and plays a crucial role across various sectors of the Korean wood industry, from low-value applications such as firewood, pulp, and fiberboard resources to high-value uses such as charcoal, engineered wood, and wood slabs.

These six oak woods have similar anatomical characteristics, making it difficult to classify them without specialized knowledge. In conventional species identification, oak species have been classified based on anatomical characteristics such as the arrangement of growth rings, development of large vessels in the earlywood, tyloses, and broad rays (more than 10 seriates in width and over 1 mm in height)19,20,21,22,23,24,25. However, the conventional procedure based on wood anatomy can generally identify up to the genus level, owing to the similarity of anatomical characteristics26.

In the mid-2000s, studies on the application of computer vision and artificial neural networks to identify wood species began to be reported. In 2004, Clark classified wood species based on a multi-layer perceptron (MLP) and suggested using it as an auxiliary method to conventional wood species identification27. In 2007, Tou et al. compared the wood species classification performance using a small dataset of five wood species from Malaysia based on a gray-level co-occurrence matrix (GLCM) and a multilayer perceptron (MLP)28. In 2009, Esteban et al. used an artificial neural network to classify two Juniperus species with similar anatomical characteristics with a high accuracy of over 90%29. Entering the 2010s, artificial neural networks and computer vision-based methods for wood species classification had developed into more sophisticated and diverse directions. In 2012, Ma et al. reported excellent classification performance of over 90% using the near-infrared spectroscopy (NIR) dataset of wood species based on backpropagation artificial neural network and regression neural network model30. In 2013, Yadav et al. classified 25 hardwood species with excellent accuracy of over 90% using a gray-level co-occurrence matrix (GLCM) and a multilayer perceptron (MLP) and compared the performance for each condition by dividing the dataset into various ratios during the wood species classification process31. In 2017, Esteban et al. again used an artificial neural network to report that Pinus sylvestris and Pinus nigra, which were difficult to classify due to similar anatomical characteristics, could be classified with an accuracy of over 80%32. In the late 2010s, wood species identification studies based on convolutional neural networks began to be reported, and active research is being conducted to this day7,33,34.

Recently, several studies have been conducted on deep learning technology to identify wood species in Korea. Kwon et al. reported the classification performance of softwood species using convolutional neural network architectures, including LeNet and miniVGGNet35,36. We also conducted a study on the classification of commercial softwood species using convolutional neural networks to explore the potential of deep learning for species identification37. Subsequently, we reported the performance of species classification using the bark of domestic oak species as a follow-up study38.

However, no studies have been conducted on classifying oak species using convolutional neural networks (CNNs) with wood datasets. Therefore, the aim of this study was to verify the potential of CNN-based species classification using a wood dataset of six Korean oak species to provide a convenient identification method for their efficient utilization. In this context, the factors influencing classification performance were statistically analyzed to investigate the optimal performance conditions, and important classification factors were analyzed through visualization.

Materials and methods

Materials

Six Korean oak species were harvested from the research forest of Kangwon national university located across Chuncheon and Hongcheon in Gangwon State. After confirming the species by an expert in dendrology, three stems of each species were obtained, and sample discs were collected from the breast height of each stem. Comprehensive information about the samples is presented in Table 1.

Table 1 Sample Information.

Image and dataset preparation

Wood samples were collected from the normal parts of the mature wood to avoid the influence of juveniles and abnormal parts during collection. Five to eight blocks (R × T × L = 15 − 20 × 10 × 10 mm) for each species were randomly separated from each disc using a cutter knife and hand hammer. More than 20 wooden blocks were obtained for each species.

Sample preparation for optical microscopy was performed by following the guidelines of conventional methods39,40. The wood blocks were softened in a boiling mixture of glycerin and water (a 1:4 ratio) and sliced into approximately 30 μm thick slices using a sliding microtome (Lab Model, WSL, Birmensdorf, Switzerland). The sections were stained with a 1% safranin solution, dehydrated using an ethanol series, cleared with xylene, and mounted on permanent slides using Canada balsam.

The cross-section was observed using a digital camera (IMTCam 6.3MP camera, IMT, British Columbia, Canada) connected to an optical microscope (ECLIPSE E600, NIKON, Tokyo, Japan) with 4× (Plan Fluor model; NA 0.13; WD 17.2, NIKON, Tokyo, Japan) and 10× (Plan Fluor model; NA 0.30; WD 16.0, NIKON) objective lenses to collect the dataset. Micrographs were obtained and analyzed using IMT i-Solution Lite image analyzer (Version 26.1, IMT Inc., Burnaby, British Columbia, http://www.imt-digital.com/).

Two types of datasets (whole-part and earlywood) were prepared to analyze the effect of micrograph capture location on the performance of the artificial neural networks. The whole-part dataset comprised earlywood and latewood, captured using the 4× objective lens. The earlywood dataset was captured using the 10× objective lens. The micrographs with defects such as cracks and contamination were excluded. Finally, the dataset was constructed with 150 micrographs by species and collected part.

In this study, all methods were performed in accordance with the relevant guidelines and regulations.

Dataset pretreatment

Before training, the RGB coefficients were reduced to a ratio of 1/255 to decrease the range from zero to one for normalization. Dataset augmentation is used to directly increase the size of a dataset for training by applying random variables to the images that constitute the dataset to improve the generalization performance of neural networks41. This study examined and analyzed the impact of dataset augmentation on classification accuracy. Recently, various modules and libraries such as tf.image, albumentations, torchvision, Augmentor, and imgaug have been widely used for dataset augmentation. However, we augmented the dataset using the ImageDataGenerator function to maintain consistency with the Keras library employed in the neural network model in this study. Datasets of 80% for training and 20% for testing, based on the size of the non-augmented dataset with 150 images, were prepared. Data augmentation was only applied to the training dataset. The dataset augmentation using the ImageDataGenerator function was set to a tenfold increase. The options applied included a rotation range of 10°, a width shift range of 10%, a height shift range of 10%, a zoom range of 20%, horizontal flip, vertical flip, and fill mode set to 'nearest'. These settings expanded the dataset from 120 images to over 1,300 images per species before and after augmentation, respectively.

The effects of other verification conditions, such as the composition of the dataset and type of optimization function, on the classification accuracy were also investigated by augmenting the training dataset and comparing it with the non-augmented dataset.

Micrographs of the dataset were constructed with 3072 × 2084 pixels (6,402,048 pixels) and resized to 224 × 224 pixels (50,176 pixels) to conserve system resources used during training. The actual size of each pixel increased from approximately 1.3 µm2/pixel to approximately 160 µm2/pixel (15.5 × 10.3 µm/pixel) as the resolution of individual datasets decreased.

Verification factors influencing classification performance

The classification performance and factors influencing the performance were analyzed using the CNN architecture. The convolutional layer extracts the features from an image and transforms them into feature maps. Therefore, as the number of convolutional layers in a convolutional neural network increases, the features of the image can be extracted more deeply, leading to improved accuracy. However, there was also a negative relationships with an increase in the number of convolutional layers, showing an increase in loss42. Therefore, three or four convolutional layers constituting the convolutional neural network were applied depending on the variation of classification accuracy and loss by the verification condition. The number of convolutional layers by the verification condition is shown in Table 2.

Table 2 Applied conditions for verification in this study.

Dropout and batch normalization techniques were partially applied to the convolutional and fully connected layers to normalize the verification data. Finally, the Softmax activation function was applied.

The verification variables were set to optimizer, dataset augmentation, and collected part of the dataset to analyze the factors affecting the classification performance of the convolutional neural network for hardwood species. Three types of optimizers—stochastic gradient descent (SGD), adaptive moment estimation (Adam), and root mean square propagation (RMSProp)—were used to compare the classification performance according to the optimizer type. The classification accuracy was compared based on the augmentation of the training dataset. In addition, low-magnification (4× objective lens) images for the entire cross-section, and high-magnification (10× objective lens) images for earlywood-centered images were constructed for each dataset to analyze the difference in classification performance according to the collected part of the microscopic images that comprise the dataset, and the classification accuracy of the trained model was compared.

The gradient-weighted class activation mapping (Grad-CAM) technique was used to verify the factors affecting species classification43,44. The Grad-CAM technique extracts the output of the feature map in the convolutional layer from the input image and works on the principle of multiplying the gradient average of the class for all channels, creating a spatial map of how much the input image activates the class45. Therefore, the Grad-CAM technique is widely used to analyze influential factors in the process of classifying labels based on machine learning and deep learning46,47,48. An arbitrary image per species was selected, and the Grad-CAM technique was applied to each verification condition to analyze the area recognized as a common classification indicator and the classification indicators by species.

The Softmax function was used as the activation function to classify the six oak species using the results from the artificial neural networks. A categorical cross-entropy function was applied as a multiclassification loss function for compilation.

Statistical analysis

The Pearson correlation coefficients among the variables were analyzed using a bivariate correlation analysis in IBM SPSS Statistics for Windows (Version 26.0, IBM, Armonk, New York, USA, https://www.ibm.com/spss/). Nominal variables, including the type of optimizer (SGD, Adam, and RMSProp), dataset collection region (whole and earlywood), and augmentation for analysis, were applied, whereas accuracy and loss were applied as scale variables. Furthermore, homogeneous subsets among the results were obtained using Duncan's post hoc analysis with one-way ANOVA.

Results and discussion

Comparison of classification performance by the applied condition

Classification performance of oak species using CNN

Figure 1 shows the results of comparing the classification accuracy and loss in the test phase of the CNN architecture using the whole-part and earlywood-part datasets of the six oak woods. As the number of epochs increased in both dataset conditions, the loss decreased and the accuracy increased. This trend appeared in the process of updating weights and biases repeatedly with increasing epochs of the CNN49, indicating proper performance in learning and classification through the CNN.

Figure 1
figure 1

Verification results of CNN architecture using the whole-part and earlywood-part datasets of the six oak woods. (a) loss in the whole-part dataset condition. (b) loss in the earlywood-part dataset condition. (c) accuracy in the whole-part dataset condition. (d) accuracy in the earlywood-part dataset condition.

In the verification results of the whole-part dataset, the classification accuracy and loss under the conditions of the Adam and RMSProp optimizers rapidly stabilized within the range of 10–20 epochs, whereas the conditions of SGD were relatively gently stabilized. The learning speed according to the type of optimizer generally depends on the difference in the operational principle50,51. The conditions trained by the augmented dataset tended to stabilize in the range of 80–100 epochs regardless of the optimizer. In particular, the SGD non-augmented dataset condition showed the gentlest stabilization, with a slope close to linear. In the testing phase, the augmented dataset tended to stabilize relatively quickly in the range of 20–40 epochs compared to the non-augmented dataset for all optimizer conditions, whereas the Adam and RMSProp conditions using the non-augmented dataset showed a pattern of overfitting after 20–40 epochs. In particular, the classification accuracy at the final stage in the validation condition using the augmented dataset was nearly 20% higher than that using the non-augmented dataset, which was due to the improvement in the model’s generalization performance owing to the increase in the diversity of the dataset52.

Whereas, in the verification results of the earlywood-part dataset, the fluctuation trend of classification accuracy and loss was more clearly observed than that in the whole-part dataset. The classification accuracy and loss were observed to stabilize at approximately 40 epochs, regardless of the optimizer and dataset augmentation. However, when the augmented dataset was applied, the loss was lower, and the classification accuracy was higher than when the non-augmented dataset was applied. Meanwhile, the difference in classification accuracy and loss between the augmented and non-augmented dataset conditions decreased significantly compared with the validation condition using the whole-part dataset. This implies that even when using the non-augmented dataset, similar levels of performance to the augmented dataset can be achieved in the earlywood dataset for learning, and it can be interpreted that the convolutional layer extracts various features in earlywood, achieving excellent generalization performance even with a small dataset52.

Anatomical factors affecting wood species classification performance

Grad-CAM analysis of whole-part dataset

Table 3 shows the weights of the parts recognized as classification indicators using the Grad-CAM technique, which was applied to classify the six oak species based on earlywood and latewood cross-sectional images. As a result of the analysis of the factors affecting species classification in the Grad-CAM technique using the cross-sections of the oak species, the earlywood vessels and well-developed broad rays over 10 seriates were identified as common classification indicators in most species. The arrangement of earlywood vessels in the earlywood of Q. acutissima acted as a factor influencing the classification, whereas the area composed of only fibers without broad-ray tissue and axial parenchyma cells was involved in the classification of Q. aliena. The arrangement of earlywood vessels and distribution of axial parenchyma cells around the latewood affected the classification of Q. dentata. The fiber area without axial parenchyma cells and broad rays in the cross-section was identified as a classification indicator for Q. mongolica. The arrangement of earlywood vessels, axial parenchyma cells, and broad rays were identified as classification factors for Q. serrata. Axial parenchymal cells adjacent to vessels around the broad rays did not affect the classification of Q. serrata. Most traits such as the arrangement of vessels, axial parenchyma cells, and fibers were confirmed as classification indicators in Q. variabilis. However, the parenchymal cells distributed around the vessels did not affect the classification.

Table 3 Analysis of the classification factors of six oak species using the whole-part micrographs.

In the whole part dataset, species classification based on convolutional neural network was affected by the arrangement of pores, broad rays, and axial parenchyma cells.

Table 4 lists the weights of the parts recognized as classification indicators in the Grad-CAM technique using the earlywood dataset. Compared with the whole-part images, the arrangement of earlywood vessels, which is a major characteristic, was more clearly observed, and classification indicators, such as wood fiber and axial parenchyma, were found around the earlywood vessels. Oak species undergo rapid growth from spring to summer owing to seasonal factors53, leading to the significant development of earlywood in the xylem, which contributes prominently to the classification of the species; thus, it is regarded as a determining factor for classification among the oak species in this study. Tyloses in the earlywood vessels and the axial parenchyma cells around the earlywood vessels were also classified as indicators of Q. acutissima. The classification accuracy of Q. aliena was affected by earlywood vessels, tyloses in earlywood vessels, and axial parenchyma cells around latewood vessels. Tyloses in earlywood vessels and axial parenchyma cells around the earlywood vessels were also classification indicators of Q. dentata. Q. mongolica is affected by its overall structural components, such as the arrangement of earlywood vessels, fibers, and axial parenchyma cells. Q. serrata was characterized by a lower occurrence rate of tyloses in earlywood vessels than in other species and did not affect the classification of axial parenchyma cells. Although the arrangement of earlywood vessels was confirmed as a classification indicator in Q. variabilis, the axial parenchymal cells around the earlywood vessels were excluded from the classification indicators.

Table 4 Analysis of the classification factors of six oak species using the earlywood micrographs.

The results suggested that species classification based on the convolutional neural network using an earlywood dataset was affected by the arrangement of the pores, broad rays, and axial parenchyma cells.

Statistical analysis

Correlation among the factors

Table 5 presents the correlations between the variables applied to the test process of the CNN architectures. The loss tended to decrease with an increasing number of epochs in whole epochs verification, whereas the accuracy tended to increase in proportion to the number of epochs.

Table 5 Correlation of the factors influencing classification performance.

Among the optimizers in whole epochs condition, Adam (0.127**) and SGD (− 0.160**) had the highest and lowest impact on classification accuracy, respectively, whereas RMSProp did not show a significant difference in classification accuracy. Dataset augmentation showed a relatively higher impact (0.351**) on classification accuracy than the other factors. In contrast, the accuracy tended to decrease with the application of the whole part or non-augmented dataset, whereas it increased with the application of the earlywood part or augmented dataset. The loss tended to be opposite to that of accuracy.

The factors affecting classification accuracy, such as epochs, optimizer, and dataset composition (whole-part, earlywood), disappeared, and the impact of dataset augmentation increased more than twice from 0.351** to 0.747**. The increase in impact is expected to be attributed to the variation in classification accuracy and loss minimization after reaching the convergence point. Dataset augmentation could be a major factor affecting classification performance.

Homogeneous subset

Table 6 presents the results for the homogeneous subsets among the conditions of the dataset based on the verification results shown in Fig. 1.

Table 6 Comparison of average loss and accuracy among optimizers.

In the results of whole-epochs verification, losses were classified into multiple subsets. The first identified subset included conditions such as whole-part-SGD, earlywood-Adam, and earlywood-RMSProp, which utilized an augmented dataset. The second subset included four conditions: two conditions for the augmented dataset, whole-part-RMSProp and earlywood-SGD, and two conditions for the non-augmented dataset, earlywood-SGD and earlywood-RMSProp. In the third subset, three conditions were classified: earlywood-SGD and earlywood-Adam for the non-augmented dataset and earlywood-SGD for the augmented dataset. The fourth subset was identified according to three conditions: whole-part-Adam and earlywood-Adam for the non-augmented dataset and earlywood-SGD for the augmented dataset. The fifth subset identified three conditions: whole-part-SGD, whole-part-Adam, and earlywood-Adam, which utilized a non-augmented dataset.

Classification accuracy was divided into two major subsets. The first homogeneous subset consisted of five of the six earlywood dataset conditions, excluding the ADAM-augmented condition. The second homogeneous subset also consisted of five conditions: whole-part-SGD, earlywood-SGD, earlywood-Adam, and earlywood-RMSProp conditions that utilized an augmented dataset, and the earlywood-RMSProp condition that utilized a non-augmented dataset. Based on these results, it was concluded that applying the earlywood dataset produced similar results without a significant impact on the conditions.

In the results of last five epochs verification, most conditions during the test phase had a classification accuracy of around 70%. Only some conditions, such as whole-part-SGD, whole-part-Adam, whole-part-RMSProp, and earlywood-SGD that applied an augmented dataset, had a classification accuracy of over 80%. The condition with the highest classification accuracy among the conditions was whole-part-Adam that applied an augmented dataset, showing an accuracy of approximately 85.7%. However, there was no significant difference in classification accuracy between the conditions that produced a classification accuracy of over 80% mentioned earlier and the whole-part-Adam condition.

Table 7 presents the results of the homogeneous subset analysis between the indicators using the average accuracy in the final five epochs of the test phase, as shown in Table 6. Data augmentation directly affected the classification accuracy, and the classification accuracy before and after augmentation was verified as a significantly independent subset. There was no significant difference in classification accuracy between the whole-part and earlywood datasets. The classification accuracy according to the optimizers SGD, Adam, and RMSProp was classified as a homogeneous subset with no difference.

Table 7 Comparison of average accuracy in the final five epochs per factor in the test phase.

Conclusions

The results of the CNN classification trained with the datasets of the six Korean oak species are as follows:

The classification accuracy ranged from 61.5 to 85.7% for the whole-part dataset and from 71.6 to 83.3% for the earlywood-part dataset based on the final five epochs. The whole-part dataset had a larger deviation for each condition than the earlywood-part dataset. However, the whole-part dataset exhibited excellent classification accuracy in the augmented dataset. In particular, the classification accuracy in the augmented condition increased significantly compared with that in the non-augmented condition.

The arrangement of pores, broad rays, and axial parenchyma was verified as a species classification factor from Grad-CAM analysis results.

The factors affecting the classification accuracy included the epoch, optimizer type, and dataset augmentation and composition. Epochs showed the highest influence (0.533**), followed by dataset augmentation (0.351**), the Adam optimizer (0.127**), the earlywood-part dataset (0.070**), the whole-part dataset (− 0.070**), and the SGD optimizer (− 0.160**).

Based on the final five epochs, dataset augmentation was proven to have a significant effect on classification accuracy, with a value of 0.747**, indicating a strong correlation.

Four validation conditions were used in the augmented dataset: three for the whole-part dataset, Adam (85.7%), RMSProp (84.9%), and SGD (81.9%), and one for the earlywood dataset, SGD (83.3%), which showed classification accuracies of over 80%.

It was concluded that a whole-part dataset with augmented conditions should be used for training, and Adam or RMSProp optimizers can be used to obtain the best classification accuracy for the six Korean oak wood species.