Background

Black flies (Diptera: Simuliidae) are medically important haematophagous insects for humans, domestic animals and wildlife, due to their pestiferous biting habits and vectorial roles in transmitting various parasites. They are the sole vector of the filarial nematode Onchocerca volvulus, which causes river blindness, the second leading infectious cause of blindness in the world [1]. They also transmit other Onchocerca species, Mansonella filarial parasites and Leucocytozoon and Trypanosoma protozoa [2, 3]. In contrast, black flies also function as beneficial organisms in aquatic ecosystems, where the larvae process fine particulate organic matter into larger food pellets, serve as food for other aquatic organisms and act as bioindicators of water quality [4].

Southeast Asia harbours nearly 20% of the world’s species of black flies, providing excellent opportunities for research on these minute creatures. The extensive morphotaxonomic research on black flies in Indonesia began in the late 1990s, leading to a total of 143 species reported from the country to date [5, 6]. The rich black fly biodiversity in Indonesia reflects its strategic location in the tropical belt between the Pacific and Indian Oceans and between the Asian and Australian continents. All Indonesian black flies are in the genus Similium Latreille and are classified in five subgenera: Gomphostilbia Enderlein, Morops Enderlein, Nevermannia Enderlein, Simulium Latreille and Wallacelum Takaoka. The species are further assigned to 27 species groups [6, 7]. Nevertheless, various biological aspects of black flies in Indonesia, including their vectorial roles and biting habits, remain to be explored. Exceptions include S. (G.) atratum, which bites domestic fowls in Java [8], and S. (N.) aureohirtum, which is autogenous [9, 10].

Black flies are traditionally identified using morphological keys, such as those by Adler, Currie [11], Crosskey [12], Shelley [13], Takaoka [14], Takaoka and Davies [15], Takaoka and Davies [16] and Takaoka, Sofian-Azirun [17]. Chromosome-based analyses also drive black fly taxonomy and have revealed cryptic diversity in many morphospecies [2]. These two methods, however, are sometimes insufficient for rapid and accurate species identification crucial for biological research and vector control. Morphologically similar species often cannot be differentiated in one or more life stages, and chromosomal identifications are typically applicable only in the larval stage. Both methods also require a higher level of expertise [18, 19].

The DNA barcoding approach has shown promise as a molecular taxonomic tool for black flies. Many DNA barcoding studies, based on the mitochondrial cytochrome c oxidase subunit I (COI) gene, demonstrate high levels of correct species identifications, which are usually consistent with morphotaxonomic and chromosomal studies. COI-based barcoding has demonstrated a considerable success level (> 90% sampled taxa) in distinguishing species of black flies from Thailand [20, 21]. The molecular approach is also helpful in revealing cryptic diversity in morphospecies thought to be single species. Thailand, in particular, has been actively reporting species complexes such as S. (G.) angulistylum Takaoka & Davies [22], S. (N.) feuerborni [23, 24] and S. (S.) fenestratum [25] through integrated initiatives of barcoding and cytogenetics. Coupled with other taxonomic approaches, DNA barcoding also complements the description of cryptic species. Some notable examples include the description of S. (N.) pairoti from S. (N.) feuerborni [26] and the naming of S. (S.) nobile cryptic species in Peninsular Malaysia as S. (S.) vanluni [27]. Additionally, S. (S.) rufibasis Brunetti in Japan and Korea was revised as S. (S.) yamatoense Takaoka, Adler & Fukuda after the morphological, chromosomal and molecular re-examinations of the species [28]. In the meantime, ongoing molecular research on these simuliids  is being carried out in Malaysia and Vietnam, hoping to contribute to the growing body of knowledge in this area.

Although several genetic studies have been conducted on black flies in Indonesia, including S. (N.) feuerborni, S. (S.) nobile and S. (S.) timorense [24, 29, 30], the genetics of other Indonesian black flies is understudied. We, therefore, used the mitochondrial COI gene to delimit species boundaries for 55 species of black flies from Indonesia.

Methods

Sample collection

Samples were collected from eight provinces in Indonesia between 2014 and 2017 (Table 1). Aquatic stages of black flies (larvae and pupae) attached to grasses, leaves, twigs, plant roots and rocks were collected by hand using fine forceps. Pupae were individually kept alive in vials until adult emergence. The adults, together with their pupal exuviae and cocoons, were fixed in 80% ethanol for identification at the subgenus, species group or species level. The methods of collection and identification followed those of Adler, Currie [11] and Takaoka [14].

Table 1 Black flies (n = 27) of Indonesia included in the present study of COI barcoding, with collection data and GenBank accession numbers

DNA extraction, polymerase chain reaction (PCR) and sequencing

One to four adults were selected randomly and dissected for each species before DNA extraction. Genomic DNA was extracted from the dissected parts (thorax or hind leg), using the NucleoSpin® Tissue Mini Kit (Macherey–Nagel, Düren, Germany), according to the manufacturer’s protocol. A conventional polymerase chain reaction (PCR) was then performed to amplify the target region of the cytochrome c oxidase subunit I (COI) gene, using the DNA barcoding standard primers: LCO1490 (5′-GGTCAACAAATCATAAAGATATTGG-3′) and HCO2198 (5′-TAAACTTCAGGGTGACCAAAAAATCA-3′) [31]. Each PCR reaction mixture contained 1 µl DNA template, 12.5 µl MyTaq Red Mix 2 × mastermix (Bioline Reagents, Meridian Bioscience, Cincinnati, Ohio, USA), 0.4 µM forward primer, 0.4 µM reverse primer and distilled water up to 25 µl. The PCR amplifications were performed on Applied Biosystems Veriti 96-Well Thermal Cycler (Applied Biosystems, Inc., Foster City, CA, USA). PCR reaction conditions and temperature profiles followed those of Rivera and Currie [19]: denaturation at 96 °C for 1 min and 94 °C for 1 min, primer annealing at 55 °C for 1 min, 35 cycles of amplification at 72 °C for 1.5 min and 7 min at 72 °C. PCR products were visualized on a 1.5% agarose gel electrophoresis pre-stained with SYBR Safe dye (Invitrogen Corp., Carlsbad, CA, USA) run using a 100-bp DNA ladder (GeneDireX, Inc., Taiwan) as the DNA band size standard. Lastly, the PCR amplicons were sent to Apical Scientific Sdn Bhd (Selangor, Malaysia) for sequencing.

Data analyses

Publicly available COI sequences of other related black fly species were retrieved from the NCBI GenBank database and included in analyses. A total of 204 black fly COI sequences representing 55 species from 14 species groups were analysed, with 86 of the sequences generated in the present study. Representative sequences were deposited in the NCBI GenBank database under accession numbers OQ117897–OQ117982 and the Global Biodiversity Information Facility (GBIF) database with other relevant information. The COI sequences were aligned in Unipro UGENE software using MUSCLE [32] and were trimmed to 452 bp in BioEdit software [33]. Before phylogenetic analyses, model selection was performed using kakusan4 to determine the most suitable nucleotide substitution model [34]. Trees were constructed based on the COI sequences via maximum-likelihood (ML) and Bayesian inference (BI) methods. Parasimulium crosskeyi (GenBank accession number: FJ524489) [21] was chosen as an outgroup for both tree analyses. The ML tree was generated from RAxML webserver (https://raxml-ng.vital-it.ch/#/) [35] using a generalized time-reversible (GTR) nucleotide substitution model with invariant sites of 0.47 (I), a gamma shape parameter (α) of 0.56 (G), four mean gamma category rates and maximum likelihood search. Bootstrap support was estimated for 100 replicates. The configuration file generated from kakusan4 was used to perform BI tree analysis using MrBayes v3.2.7 [36] on CIPRES Science Gateway v3.3 webserver (https://www.phylo.org/portal2/home.action). The BI analysis adopted the GTR substitution model using gamma-distributed rate variation across sites with shape parameter of 0.767 and invariable sites of 0.466. The posterior probability distribution of trees was estimated from two independent Markov chain Monte Carlo (MCMC) simulations of five million generations until the average standard deviation of split frequencies reached < 0.01. The first 25% of all runs was discarded as burn-in.

Species delimitation analyses, including Assemble Species by Automatic Partitioning (ASAP) [37], Generalized Mixed Yule Coalescent (GMYC) [38] and single Poisson Tree Processes (PTP) [39], were also performed. ASAP analysis was performed in the webserver version (https://bioinfo.mnhn.fr/abi/public/asap/). The Jukes-Cantor (JC69), Kimura (K80) ts/tv and simple distance models were tested. Results with genetic distances between 0 and 0.03 were highlighted. The GMYC analysis adopted an ultrametric tree generated from BEAUti2 software using a GTR + G + I model, Yule prior and relaxed clock log-normal model. The analysis was run for 40 million generations with a sampling frequency of every 1000 generations in BEAST v2.6.7. The output file was visualised using Tracer v1.6 software to ensure all estimated sample sizes (ESS) of all parameters exceeded 200. The output tree was then analysed in TreeAnnotator v2.6.7 software with a 20% burn-in. Data were analysed using a single threshold model in the SPLITS software package [40] available in the R v3.3.0 program. The single PTP analysis was performed in the mPTP webserver (https://mptp.h-its.org/#/tree) with the tree obtained from RAxML as input file and PTP with default p-value selected as the model for analysis with default settings. The intra- and interspecific genetic distances were calculated based on an uncorrected p-distance model with variance estimation using the bootstrap method for 1000 replicates in MEGA11 software [41]. Lastly, the efficacy of COI sequences for species identification was tested using the best match (BM) and best close match (BCM) methods in TaxonDNA software. The criterion for successful identifications based on the BM method was that all conspecifics had the smallest distance to the query sequence, whereas the BCM method required that the smallest distance be within the 95th percentile of overall intraspecific distances [42]. Using an adhoc R package [43], the cut-off threshold of BCM method was 1.9%.

Results

Phylogenetic analysis based on COI barcodes

Both ML and BI trees showed similar topologies. The only difference was in the placement of the S. (S.) eximium clade. Simulium (Simulium) eximium grouped with the S. (S.) iridescens group in the ML tree, whereas it clustered with the S. (S.) multistriatum group in the BI tree; only the ML tree is shown. The BI tree was included as a supplementary figure (see Additional file 1).

Three major clades were formed in the tree, corresponding to (i) subgenus Simulium, (ii) subgenera Gomphostilbia and Nevermannia and (iii) Simulium (Gomphostilbia) tahanense. Overall, most nominal species formed clades in their respective subgenera and species groups, consistent with morphotaxonomic studies, except for S. (G.) tahanense, which formed a distinct clade with strong bootstrap and posterior probability values.

Subgenus Simulium Latreille

All species groups of the subgenus Simulium were monophyletic (Figs. 1, 2). Simulium nebulicola was the only member of the S. nebulicola group represented in our study. It formed a distinct clade from other Simulium species groups with high interspecific distances. Simulium eximium formed a strongly supported clade, whereas S. iridescens was paraphyletic with the S. javaense clade nested in its clade. In the S. multistriatum group, S. bullatum formed a strongly supported distinct subclade. Simulium fenestratum formed two subgroups representing the only species in a distinct Indonesia group and a Thailand group that included the remaining members of the S. multistriatum group (S. chainarongi, S. chaliowae and S. ubonae). Within the S. striatum group, S. argyrocinctum was paraphyletic, with S. baliense nested within its clade. Simulium chaingmaiense, S. nakhonense and S. wangkwaiense formed a non-monophyletic clade with low genetic distances among these taxa. In the S. nobile group, one sequence of S. vanluni was distinct from the others that formed a separate clade of S. vanluni. The S. nobile clade was nested within the S. timorense clade, with low interspecific distances (minimum = 1.11%), making the S. timorense clade paraphyletic. However, in the BI tree, the S. nobile and S. timorense clades were well separated. In the S. tuberosum group, S. jianshiense and S. keningauense each formed a monophyletic clade, whereas S. tani was divided into two subgroups.

Fig. 1
figure 1

ML tree showing species of black flies from Indonesia in the subgenus Simulium Latreille, which was constructed from COI sequences. Bootstrap and posterior probability values of  > 50% and  > 0.50, respectively, are shown on the branches. Branches with bootstrap and posterior probability values > 70% and > 0.70, respectively, are considered well supported. New sequences generated in the study are in bold. Grey bars indicate the respective operational taxonomic units recognized by the three species delimitation analyses (i.e. ASAP, GMYC and PTP, in order). ASAP Assemble Species by Automatic Partitioning, GMYC Generalized Mixed Yule Coalescent, PTP Poisson Tree Processes

Fig. 2
figure 2

Continued ML tree showing species of black flies from Indonesia in the subgenus Simulium Latreille, which was constructed from COI sequences. Bootstrap and posterior probability values of  > 50% and  > 0.50, respectively, are shown on the branches. Branches with bootstrap and posterior probability values > 70% and > 0.70, respectively, are considered well supported. New sequences generated in the study are in bold. Grey bars indicate the respective operational taxonomic units recognized by the three species delimitation analyses (i.e. ASAP, GMYC and PTP, in order). ASAP Assemble Species by Automatic Partitioning, GMYC Generalized Mixed Yule Coalescent, PTP Poisson Tree Processes

Subgenus Nevermannia Enderlein

In subgenus Nevermannia, two clades formed representing the S. feuerborni species group and the S. ruficorne species group (Fig. 3). Members of the S. feuerborni group were divided into two subgroups, showing the paraphyly of S. feuerborni with other taxa. The two subgroups corresponded to S. feuerborni from Indonesia and Thailand, which were non-monophyletic with other members of the S. feuerborni group (S. fruticosum, S. ledangense, S. pairoti and S. pumatense). In the S. ruficorne group, S. aureohirtum was divided into two subgroups of which one subgroup had sequences of S. wayani nested within.

Fig. 3
figure 3

ML tree showing species of black flies from Indonesia in the subgenus Nevermannia Enderlein, which was constructed from COI sequences. Bootstrap and posterior probability values of  > 50% and  > 0.50, respectively, are shown on the branches. Branches with bootstrap and posterior probability values > 70% and > 0.70, respectively, are considered well supported. New sequences generated in the study are in bold. Grey bars indicate the respective operational taxonomic units recognized by the three species delimitation analyses (i.e. ASAP, GMYC and PTP, in order). The double asterisk (**) on the two grey bars of the ASAP analysis indicates these two bars represent the same taxonomic unit. ASAP Assemble Species by Automatic Partitioning, GMYC Generalized Mixed Yule Coalescent, PTP Poisson Tree Processes

Subgenus Gomphostilbia Enderlein

The nominal species of the subgenus Gomphostilbia formed two clades: a major clade with subgenus Nevermannia clustering with the Simulium epistum group and a strongly supported distinct S. tahanense clade of the S. batoense group (Figs. 4, 5). Other members of the S. batoense group were monophyletic.

Fig. 4
figure 4

ML tree showing species of black flies from Indonesia in the subgenus Gomphostilbia Enderlein, which was constructed from COI sequences. Bootstrap and posterior probability values of  > 50% and > 0.50, respectively, are shown on the branches. Branches with bootstrap and posterior probability values > 70% and > 0.70, respectively, are considered well supported. New sequences generated in the study are in bold. Grey bars indicate the respective operational taxonomic units recognized by the three species delimitation analyses (i.e. ASAP, GMYC, and PTP, in order). ASAP Assemble Species by Automatic Partitioning, GMYC Generalized Mixed Yule Coalescent, PTP Poisson Tree Processes

Fig. 5
figure 5

Continued ML tree showing species of black flies from Indonesia in the subgenus Gomphostilbia Enderlein, which was constructed from COI sequences. Bootstrap and posterior probability values of  > 50% and > 0.50, respectively, are shown on the branches. Branches with bootstrap and posterior probability values > 70% and > 0.70, respectively, are considered well supported. New sequences generated in the study are in bold. Grey bars indicate the respective operational taxonomic units recognized by the three species delimitation analyses (i.e. ASAP, GMYC and PTP, in order). ASAP Assemble Species by Automatic Partitioning, GMYC Generalized Mixed Yule Coalescent, PTP Poisson Tree Processes

The S. asakoae group was not monophyletic. It had a member of the S. ceylonicum group (S. rangatense) clustering with one of its members (S. sunapii). Nonetheless, the high genetic distance (8.85%) between S. rangatense and S. sunapii suggests that they are distinct species. Other taxa of the S. asakoae group formed a monophyletic clade except for S. puaense, which contained S. maehongsonense in the ML tree. In the BI tree, however, all members of the S. asakoae group were monophyletic.

In the S. ceylonicum group, S. sheilae was paraphyly because its clade included S. trangense. This clade was further divided into three subclades: (i) Malaysia and Indonesia, (ii) Thailand and (iii) Indonesia + S. trangense. The S. epistum species group formed four subclades: (i) S. cheongi (Malaysia), S. atratum and S. floresense; (ii) S. merapiense; (iii) S. lehi; (iv) S. sarawakense and S. cheongi (Indonesia). All involved taxa were monophyletic, except for S. cheongi. Simulium chumpornense and S. sumbaense of the S. varicorne group formed a paraphyletic clade, clustering with subclade iv of the S. epistum group.

Genetic distances

The maximum intraspecific genetic distance ranged from 0% in S. (N.) ledangense, S. (N.) wayani and S. (S.) chainarongi to 13.94% in S. (G.) cheongi. Out of 55 morphospecies, 11 exhibited high intraspecific divergences, with mean and maximum values reported as follows: S. (G.) gyorkosae (2.18%; 3.32%), S. (G.) sheilae (5.75%; 9.51%), S. (G.) cheongi (7.93%; 13.94%), S. (G.) floresense (1.88%; 3.76%), S. (N.) feuerborni (7.04%; 10.62%), S. (N.) aureohirtum (4.36%; 7.96%), S. (S.) eximium (2.51%; 3.76%), S. (S.) iridescens (2.14%; 3.32%), S. (S.) fenestratum (2.58%; 4.42%), S. (S.) argyrocinctum (2.80%; 3.54%) and S. (S.) tani (5.32%; 7.74%) (Table 2). Among these species, S. (N.) feuerborni, S. (S.) fenestratum and S. (S.) tani are known to be species complexes.

Table 2 Species of black flies (n = 55) included for barcoding analyses (n = 204 COI sequences), with the mean and maximum intraspecific divergence values (%) of each species

Interspecific genetic distances ranged from 0 to 19.25%, with an average of 13.22%. Low levels of minimum interspecific distance were noted in the following species pairs, suggesting that the individuals of the two species in each pair are closely related or perhaps conspecific: S. (N.) aureohirtum and S. (N.) wayani (0.66%), S. (S.) iridescens and S. (S.) javaense (0.66%), S. (S.) chainarongi and S. (S.) ubonae (0.88%), S. (S.) chaliowae and S. (S.) fenestratum (0.22%), and S. (S.) fenestratum and S. (S.) ubonae (0.66%). Table S1 shows the intraspecific and interspecific genetic distances of each species (see Additional file 2).

Species delimitation analyses

For ASAP analysis, a few subsequent partitions other than the “best” one with the lowest ASAP score and the threshold distance were considered while choosing the final species partition [37]. The fifth partition with an ASAP score of 11 and threshold distance of 0.034 was chosen among the 10 “best” partitions found by the ASAP analysis using a simple distance substitution model. The distance-based ASAP method and GMYC revealed comparable results, which were 44 and 42, respectively, whereas the single PTP method revealed 51 operational taxonomic units (OTUs). Overall, all three species delimitation analyses showed good agreement, although the single PTP method identified more putative species than did the other two methods. The non-monophyletic groups, such as the S. (N.) feuerborni and S. (S.) multistriatum groups, were considered by the analyses as single taxonomic units, with their members inseparable. Also, more than one taxonomic unit was detected within the single species that had high intraspecific distances (> 3%), except for S. (S.) iridescens.

Species identification efficacy

The percentages of correct species identifications via the best match and best close match methods exceeded 80% (Table 3). Incorrect identifications were associated with non-monophyletic species as follows: S. (N.) aureohirtum, S. (N.) feuerborni, S. (N.) fruticosum, S. (N.) pumatense, S. (S.) argyrocinctum, S. (S.) fenestratum, S. (S.) iridescens and S. (S.) nakhonense. Lack of conspecifics in database might also cause ambiguous and incorrect identifications of the following species: S. (G.) johorense, S. (G.) laosense, S. (G.) rangatense, S. (G.) sumbaense, S. (G.) sunapii, S. (S.) baliense and S. (S.) nebulicola.

Table 3 COI identifications of black flies based on best match (BM) and best close match (BCM) methods

Discussion

The relationships among 55 nominal species of black flies in 14 previously established species groups in Indonesia are presented for the first time to our knowledge through DNA barcodes based on the mitochondrial COI gene. The accuracy of the COI gene to identify black fly species in Indonesia is > 84%. Most of the species are shown to be monophyletic in their respective species groups and subgenera with a few exceptions. Possible causes of non-monophyly include inadequate phylogenetic signal, imperfect taxonomy, interspecific hybridization, incomplete lineage sorting and gene paralogy [44].

In the S. batoense group, S. (G.) tahanense forms a single group distinct from other group members. This topology agrees with previous phylogenetic analyses [45, 46]. In fact, S. (G.) tahanense is distinctive not only among species of S. batoense species group but also among species of the subgenus Gomphostilbia by having the elongate female labrum [47]. The unique characteristic observed in S. (G.) tahanense is believed to contribute to its distinctiveness from other taxa. The grouping of S. (G.) rangatense of the S. ceylonicum group with S. (G.) sunapii causes the S. asakoae group to be non-monophyletic. Even so, a high genetic distance of 8.85% was recorded between these two species, each of which is recognized as a distinct species. The grouping might be due to inadequate phylogenetic signal of the COI gene in resolving the two species groups, as shown by Low, Takaoka [48].

Simulium (Gomphostilbia) sheilae from Indonesia is probably a distinct lineage from this nominal species in Malaysia and Thailand, based on our results. In the barcode tree, S. (G.) sheilae is divided into three subclades: (i) Indonesia and Malaysia; (ii) Thailand; (iii) Indonesia, which are regarded as different taxonomic units by the delimitation analyses. Furthermore, S. (G.) sheilae from Indonesia displayed high intraspecific distances (minimum = 3.10%) compared to lineages from Malaysia and Thailand. Conversely, a single sample from Indonesia showed a high genetic distance (minimum = 8.63%) compared to other Indonesian sequences, indicating a high level of intraspecific divergence within S. (G.) sheilae in Indonesia. These findings suggest that S. (G.) sheilae in Indonesia may harbour cryptic diversity. Simulium (Gomphostilbia) trangense also has a lower genetic distance from S. (G.) sheilae from Indonesia (minimum = 1.77%) than from Malaysia (minimum = 9.96%) and Thailand (minimum = 6.19%), indicating that S. (G.) trangense is genetically more closely related to S. (G.) sheilae from Indonesia.

Simulium (Nevermannia) feuerborni is a species complex of four chromosomally distinct lineages from Thailand (cytoforms A and B), Malaysia (cytoform C, subsequently named S. (N.) pairoti) and Indonesia (cytoform D), although molecular analysis was not conducted on the Indonesian population in the original studies [23, 24, 26]. Our study supports the distinctiveness of the Indonesian lineage with high divergence values (minimum = 9.29%) reported between Indonesian and Thai lineages. The two lineages are also considered different taxonomic units. Besides, one sequence of Indonesian S. (N.) feuerborni (GenBank accession number: KX525228) has high genetic distance of 5.09% against other Indonesian sequences. Moreover, ASAP and PTP analyses also detected two taxonomic units in the Indonesian S. (N.) feuerborni. These genetic results suggest possible cryptic diversity, though further research is needed to clarify these observations.

Similar to the studies by Thaijarern, Sopaladawan [49] and Pramual, Jomkumsing [20], S. (N.) aureohirtum in our study was divided into two lineages, considered different taxa, that are genetically different, with a maximum distance of 7.96%. However, no evidence was found of sibling species in S. (N.) aureohirtum in Thailand [50]. Further analyses are required to determine whether the two lineages are different species [20]. More specimens of S. (N.) aureohirtum from Indonesia should be included in analyses to determine intraspecific variation and genetic relationships with other taxa. In addition, comparisons with S. (N.) aureohirtum from the type locality (Assam, India) are essential in sorting out the taxonomy of this nominal species.

The sequences of S. (N.) wayani were nested within one of the S. (N.) aureohirtum subgroups with low genetic distances (minimum = 0.66%), although S. (N.) aureohirtum is readily distinguished from S. (N.) wayani by the number of pupal gill filaments, suggesting that S. (N.) wayani is closely related to the S. (N.) aureohirtum subgroup. Chromosomal analyses indicate, however, that S. (N.) wayani is closely related to the S. (N.) ornatipes complex of mainland Australia [1], indicating that further barcode studies should include the S. (N.) ornatipes complex. Takaoka [51] inferred that species of the S. ruficorne group dispersed eastward from Sumatra in Indonesia to the Australasian Region while reducing the pupal gill filaments from eight (S. (N.) glattharri Takaoka & Davies) to four (S. (N.) ornatipes) through six (S. (N.) aureohirtum). Simulium (N.) wayani has four pupal gill filaments. Our results support the hypothesis that S. (N.) wayani might have evolved from an ancestral six-filamented population of S. (N.) aureohirtum, proposed by Takaoka, Sofian-Azirun [52], perhaps along with members of the S. (N.) ornatipes complex [1].

As expected from Pramual and Nanork [53], S. (S.) fenestratum was paraphyletic with respect to other members of the S. multistriatum group. The specimen from Indonesia forms a clade separate from the Thailand sequences retrieved from GenBank, although Indonesian S. (S.) fenestratum is genetically closer to two Thailand sequences (GenBank accession numbers: MG734051 and MG734055). The intraspecific variation of S. (S.) fenestratum from Indonesia could not be examined, as only one specimen was available. Simulium (Simulium) ubonae has low interspecific distances compared with other taxa in our study. The genetic distances of S. (S.) ubonae compared with those of S. (S.) chainarongi (0.88%) and one sequence of S. (S.) fenestratum (0.66%) are especially low, indicating S. (S.) ubonae is genetically closer to these two species. This result does not agree with a previous study showing high interspecific distances (minimum = 4.9%) of S. (S.) ubonae [54]. The non-monophyly of S. (S.) chiangmaiense, S. (S.) nakhonense and S. (S.) wangkwaiense in the S. striatum group in our study was expected; a previous study by Pangjanda and Pramual [55] showed that the COI gene was unable to separate these three taxa.

In the S. tuberosum group, S. (S.) tani is a large species complex [56,57,58]; thus, the high intraspecific divergence in our study was expected. Although the single barcode of S. (S.) tani showed high intraspecific distances (minimum = 3.10%) compared to other Thailand sequences, delimitation methods do not classify S. (S.) tani from Indonesia as a separate taxonomic unit. However, due to the availability of only one sample, genetic results provide limited information on the intraspecific variation of S. (S.) tani from Indonesia.

A rough indicator of separate species in the Simuliidae has been suggested as 3% divergence [59]. Accordingly, S. (G.) gyorkosae, S. (G.) cheongi, S. (G.) floresense, S. (S.) eximium, S. (S.) iridescens and S. (S.) argyrocinctum are possible species complexes. All COI sequences of these nominal species, except S. (G.) cheongi, are reported here for the first time. Takaoka and Davies [15] first suspected that S. (G.) iridescens is a species complex because males from West Java differ from those at the type locality in East Java. Morphological differences have also been found between males of S. (G.) gyorkosae from Bali and Lombok [60]. The cytotaxonomy of S. (S.) eximium suggested that it includes two cryptic species [61]. For S. (G.) floresense and S. (S.) argyrocinctum, no morphological or cytogenetic studies indicate possible cryptic diversity. Intraspecific distances of these species, which exceed 3%, hint at possible cryptic diversity, but more study is required. On the other hand, the COI gene strongly suggests that S. (G.) cheongi from Indonesia and Malaysia represents two genetically distinct species, as evidenced by the high genetic divergence between the two lineages and their placements in the tree. The two clades are also recognised as separate taxa. The Malaysian lineage is more closely related to S. (G.) atratum based on their genetic distance and the sister relationship between the two species.

In addition to the species pairs with low levels of interspecific distances described earlier, the two species in the following species pairs group together in the tree and possess low minimum genetic distances between them: S. (G.) sumbaense and S. (G.) chumpornense (2.21%), S. (S.) nobile and S. (S.) timorense (1.11%), and S. (S.) baliense and S. (S.) argyrocinctum (2.21%). The low interspecific distances between S. (S.) nobile and S. (S.) timorense are comparable to those in previous studies [27, 29]. Simulium (Gomphostilbia) sumbaense is assigned to the S. chumpornense subgroup and has a similar arrangement of pupal gill filaments to S. (G.) chumpornense [52]. In contrast, S. (S.) baliense and S. (S.) argyrocinctum are structurally alike in their pupal gill arrangements [60]. Although these three species pairs are structurally alike, the species are nonetheless separable by other characters. Their low genetic distances suggest that the members of each pair are closely related.

Conclusions

COI-based DNA barcoding is a valuable means of identification of black flies in Indonesia, except for a limited number of taxa, especially nominal species known to be complexes. The separation of these problematic taxa requires other options, such as fast-evolving genes and cytogenetics. Several nominal species were unavailable for in-depth inspection because of limited sampling. For instance, only one sequence was included for the following species, limiting the study of their intraspecific variation: S. (G.) sunapii, S. (G.) rangatense, S. (G.) sumbaense, S. (N.) aureohirtum, S. (S.) fenestratum, S. (S.) nebulicola, S. (S.) baliense and S. (S.) tani. Therefore, more samples should be collected from Indonesia for in-depth studies. Furthermore, no morphological variation was observed in the species that showed high intraspecific divergences; further detailed morphological examinations are thus required to confirm the presence of cryptic diversity. Nevertheless, this research provides a basis for future comprehensive studies on black flies in Indonesia. The deposition of COI sequences into publicly accessible databases also enables the establishment of a novel sequence library for Indonesian black flies. Additionally, the nucleotide database is expected to serve as a reference for species identification and comparative studies of other species of Indonesian black flies that were not included in this study. Overall, our findings establish the groundwork for further utilization of COI barcoding as a rapid and precise method for exploring the diversity of Indonesian black flies.