Background

Broad-scale assessments of animal movement can provide ecological insight for target species (e.g., seasonal migrations, feeding areas, spawning sites) as well as critical information for stock assessments of managed species (e.g., core habitat use). The value of such information has led to the frequent application of acoustic telemetry to evaluate aquatic animal distribution at a regional scale using passive arrays of autonomous receivers. For example, regional distribution has been assessed for various fishes using acoustic telemetry to identify drivers of fish movement and spatial overlap among species [1,2,3]. From a management perspective, acoustic telemetry has provided metrics of occupancy, such as cross-jurisdictional movements of individuals and use of protected areas, to inform stock assessments of economically valuable species [4,5,6,7].

Use of acoustic telemetry has grown exponentially over the last couple of decades, and it has been adapted globally to conduct studies in diverse aquatic habitats (e.g., streams, lakes, and oceans) and among numerous taxa, particularly fishes [8, 9]. In the Laurentian Great Lakes basin alone, the number of reported telemetry projects increased by over 250% between fall 2017 (59 registered projects) and spring 2023 when there were 150 ongoing registered projects (http://data.glos.us/glatos) with substantial increases in receiver coverage throughout the lakes. Numerous collaborative networks such as the Ocean Tracking Network (OTN) [10] and Great Lakes Acoustic Telemetry Observation System (GLATOS) [11] have been developed to facilitate sharing of data and infrastructure, increasing the spatial extent of telemetry studies by incorporating detections across multiple arrays [11, 12]. Such increases in spatial coverage can enable regional evaluations of aquatic animal occupancy [13]; however, inconsistent detection probability and large gaps in receiver coverage within and among arrays throughout a waterbody may affect the performance of movement model estimates.

Placement of receivers and detection range are two of the most important factors affecting the results of an acoustic telemetry study. Structured receiver arrays that standardize the distance between receivers (i.e., grid arrays) can reduce detection biases and improve estimates of movement and distribution [14, 15]; however, even with a structured array the actual receiver coverage is generally irregular due to the unique characteristics of each receiver position and its surrounding environment. For example, environmental and anthropogenic factors such as boat noise, surface disturbance, ice movement, benthic substrate, and temperature stratification can influence detection range and can cause detectability to differ among receiver stations and within a single location, depending on time of year [16,17,18,19]. Due to the high variability in detection efficiency, consistent monitoring of detection range throughout a study is needed to account for spatial and temporal variance in detectability when interpreting results [20,21,22]. Still, detection efficiency and receiver coverage assessments are commonly absent in acoustic telemetry studies and rarely used to inform array design [20, 23]. In contrast to structured arrays, many studies deploy receivers at strategic locations (e.g., river outlets, reefs, etc.) and have large spatial gaps between receiver positions to maximize the extent of the array, with consequent need to extrapolate movement patterns between widespread detections. Metastudies that evaluate detections across multiple arrays within collaborative networks are also likely to evaluate animal movement throughout areas with inconsistent detection probability due to voids in receiver coverage between individual arrays as well as variability in the array designs associated with different study objectives. Gaps in receiver coverage often reduce the overall number of observations of tagged individuals and can limit data interpretation. Multiple analytical approaches have been developed to estimate occupancy based on the limitations of acoustic telemetry data, but these approaches have their own limitations and biases.

Studies investigating regional distribution, specifically, have used various techniques including basic models that incorporate presence data (i.e., raw detections) alone [1, 24, 25], centers of activity (COA) [2, 26], and Brownian bridge movement models [27,28,29]. Estimates of animal distributions based solely on the number of receiver detections can be highly vulnerable to biases associated with uneven receiver distribution and detection probability [15]. Interpolation models, such as COA, typically estimate positions at a set time interval by calculating a mean-position estimate based on the total number of detections at individual receivers within that timestep [26]. While these interpolation models can generate positions away from receiver locations, they are still limited to straight-line movements or shortest paths between detections that can have low accuracy when receiver coverage is sparse [30]. Stochastic models such as dynamic Brownian bridge movement models (dBBMM) use detection data to generate numerous conditional random walks that produce a utilization distribution [28]. In contrast to interpolation models, dBBMMs can extrapolate movement patterns and distribution between detection events based on the shape of the system and an animal’s behavior [31, 32]. Therefore, dBBMMs may provide more realistic estimates of animal movements between detections; however, increasing the distance between adjacent receivers also increases the uncertainty of these analyses as the number of possible paths an animal may travel between receivers increases [23]. Given the complexities of analyzing acoustic telemetry data, resources have been developed to guide researchers on how to identify the appropriate analyses [33, 34] and account for differences among the systems and organisms studied [35,36,37]. However, few studies have compared the biases of different analytical approaches for evaluating regional distribution.

In this study, we provide a framework to compare analytical methods for assessing regional occupancy across spatially distinct zones in systems with variable receiver coverage using simulations within a given receiver array and identify the method with the least bias for that array. We then provide a case study that applies the model with the least bias to a passive acoustic telemetry dataset collected from tagged lake trout (Salvelinus namaycush) in Lake Champlain. We evaluated a subset of available analytical methods that can be used to estimate spatial distributions of fishes from passive acoustic telemetry data to describe regional occupancy while also incorporating time to determine relative regional occupancy of Lake Champlain. Models were selected to cover a range of statistical complexity from minimalist models using raw detections alone to complex calculations that can estimate positions between spatially and temporally distant detections (i.e., Brownian bridge) [34]. However, the framework was developed to be flexible and able to incorporate any movement models for performance comparisons. The models included were (1) a basic model that estimated positions from detections alone; (2) an extension of the basic model using last-observation-carried-forward analysis, which filled in gaps between detections by assuming that an individual remained at a receiver location until it was detected on another receiver; (3) a COA model that provided a weighted average position at set timesteps based on the number of detections at receivers during that time period; (4) a novel approach that generated least-cost paths using linear and non-linear (for obstacle avoidance) interpolated positions between subsequent detections, and (5 and 6) two dBBMMs that used stochastic processes to generate probability distributions along a reconstructed movement path and generated a utilization distribution for each tagged individual, but handle variance of Brownian motion differently.

Methods

Study system

The simulated and field components of this study were based in Lake Champlain, which lies between New York and Vermont, USA, and has northern waters reaching into Quebec, CAN. The lake is approximately 193 km long and 20 km wide at the widest point with a maximum depth of 122 m and average depth of 20 m. Three islands in the northern third of the lake are connected to each other and the mainland with causeways (i.e., raised roadways built using stone over shallow areas) that create distinct basins. While the causeways have narrow openings (< 55 m) to allow boat and fish passage, they result in near-complete hydrological separation among the adjacent basins. Overall, there are five main basins of the lake: Missisquoi Bay, Northeast Arm, Malletts Bay, the Main Lake, and the South Lake. The Main Lake is largely unobstructed and is subdivided into three geographic regions for management purposes: Main Lake North, Main Lake Central, and Main Lake South. Thus, for this paper, we recognize seven regions in Lake Champlain for determining large-scale distribution (Fig. 1).

Fig. 1
figure 1

Regions of Lake Champlain with acoustic receiver deployment locations designated by black points

Acoustic receiver array

Thirty acoustic receivers (VR2W, 69 kHz; Vemco, Halifax NS) were deployed during the field study, although not all receivers were deployed for the same duration (Additional file 1). At the start of the study in fall 2013, 12 receivers were placed throughout the North, Central, and South regions of the Main Lake and at causeway openings. An additional 15 receivers were deployed during summer 2014 and three more were deployed in fall 2015 across the northern basins (Fig. 1). One receiver was placed at the northern extent of the South Lake, which is mostly eutrophic and riverine [38] and assumed to be suboptimal habitat for lake trout.

Simulated fish tracks

Simulated tracks were generated and used to create simulated detection histories for hypothetical fish swimming within Lake Champlain using functions in the “glatos” R package [39]. These tracks were then used to compare results from six movement models to known regional occupancy of the virtual data. One hundred tracks were simulated using the ‘crw_in_polygon’ function with each track containing 5000 steps and a fixed step length of 500 m. The tortuosity of each simulated track was determined based on turn angle parameters. Specifically, the change in heading (turn angle) between two subsequent steps was drawn from a normal distribution with mean value of 0 degrees and standard deviation of 25 (function default), 15, or 5 degrees. Turn angle varied among individual tracks to generate a gradient of behaviors from low dispersal (high turn angle) to high dispersal (low turn angle) consistent with previous simulated tracks [15]. Similarly, velocity for each simulation was randomly set as 0.1, 0.5, or 0.9 m/s to further incorporate variable behaviors among the simulations. There were five potential start locations for the simulations (Fig. 1). Each start location was in a different region and within the detection range of a receiver to ensure each simulation would have at least one detection within its corresponding start region to mark the start of its track. Transmissions were generated along the tracks using the function ‘transmit_along_path’ with a fixed transmission delay of 120 s. A detections dataset was generated for each track based on all 31 receiver locations included in the case study and using the ‘detect_transmissions’ function in the “glatos” R package. This function simulates transmitter detections based on an estimated receiver detection range curve calculated using a logistic equation that incorporated known detection efficiency results determined during the field component of the case study [40].

Movement models

Six movement models were selected to estimate regional occupancy of each simulated track using the generated detection dataset. The first model (Base) generated a residency index dependent solely on the presence data (i.e., raw detections) and therefore limited detected positions to receiver coordinates. The second model (LOCF) used a last-observation-carried-forward approach that determined the time between detections and assigned simulated individuals to the location last detected until they were observed at a different receiver, thus creating an extended residency index compared to the Base model. The total time spent within each region was estimated and used to calculate regional use. A COA model (COA) was used that generated short-term COAs [26] among receivers. Pseudo-positions were estimated at 60-min timesteps based on the number of detections at receivers during each timestep, creating a weighted average location, using the ‘COA’ function from the “V-Track” package in R [41]. The fourth model (Int) generated least-cost paths using linear and non-linear interpolated positions at a set time interval along the shortest path between successive detection locations for each simulation using the ‘interpolate_path’ function from the “glatos” R package [39]. For our dataset, positions were calculated every 60 min. When an individual was only detected by one receiver during a 60-min interval, the position for that individual was placed at the coordinates for that receiver. When there were detections at multiple receiver locations within a time interval, the ‘interpolate_path’ function retained each of those positions and assigned them to that 60-min interval. In these cases, we summarized positions as the first receiver to detect the individual within that time interval, which limited each individual to a maximum of 24 positions each day. We also considered using an average location when an individual was detected at multiple receivers, but we found this had a minimal influence on our results and conclusions and thus decided to simplify the analysis by using the first position. When no position data (i.e., detected transmissions) were available for a duration greater than the set interval length, positions were interpolated between the previous and subsequent detected locations and spaced evenly based on the number of timesteps that occurred between the detections. Interpolated positions were estimated using the ‘interpolate_path’ function, which incorporates a mixture of linear and non-linear least-cost paths. The type of path used (i.e., linear vs non-linear) to fill gaps between subsequent detections was determined based on a user-defined threshold (i.e., lnl_thresh argument) applied to a ratio of the linear and non-linear distances between the two detection locations [39]. We set the threshold to calculate linear interpolations for paths when there were no barriers (i.e., land) between positions and non-linear interpolation between positions obstructed by land (lnl_thresh = 0.999). Some interpolated positions (N = 400, 0.12% of all positions) occurred on small islands and along shorelines, but these were unlikely to cause a difference in regional occupancy estimates as the locations were entirely within regions. Non-linear shortest paths were fit along adjacent grid cells from an underlying transition layer and therefore were sensitive to the resolution of that layer. The last two models estimated regional occupancy based on utilization distributions generated from dBBMMs [31]. Briefly, these models estimate a utilization distribution for an individual by modeling its track based on known positions (i.e., receiver locations of detected transmissions) and the time gaps between the known positions. The models calculate the variance of the Brownian motion for segments of each track between detection locations to identify behavioral changes (i.e., turn angle, velocity, and/or step length) that are used to inform estimates of the utilization distribution [31]. The first dBBMM (Move) used the ‘brownian.bridge.dyn’ function from the “move” package in R [31, 42] to estimate occupancy probability for each track. The second dBBMM (RSP) was generated using the ‘dynBBMM’ function in the “RSP” package in R [27]. Both dBBMMs incorporated a raster layer of the study system and a location error equal to the average detection range of receivers (250 m) [40] to calculate the utilization distribution. The Move dBBMM removed the variance of the Brownian motion for some segments (≤ 68%) of simulated tracks (n = 30) with large gaps (> 168 h) between detections for the function to operate. The RSP dBBMM function, however, was able to calculate and include variance of the Brownian motion for the entire duration of each track. Utilization distributions calculated by both dBBMMs included cells on land with estimates greater than zero that were removed from analyses. All estimated positions (receiver coordinates and interpolated positions between receivers) generated by the movement models were grouped by the corresponding lake regions where they occurred for statistical analyses of regional occupancy. Model estimates for regional occupancy were standardized by calculating the percent occupancy of each region for individual simulations, generating regional residency indexes for each simulation with values from 0 (no occupancy) to 100 (complete occupancy). Similarly, the actual regional occupancy for each simulated track was calculated as the percentage of time in each corresponding region.

Performance of the residency indexes from six movement models were assessed based on five criteria (Table 1). The first criterion for evaluating performance among models was overall error calculated as a weighted mean of regional occupancy error (MOE; Eq. 1):

Table 1 Criteria for evaluating performance among model estimates for regional occupancy
$${\text{MOE}}_{{\text{s}}} = \frac{{\sum\nolimits_{{r = 1}}^{n} {{\text{aROE}}_{{{\text{sr}}}} *{\text{weight}}_{{{\text{sr}}}} } }}{{\sum {{\text{weight}}} }}.$$
(1)

For this metric, the absolute regional occupancy error (aROE) was calculated as the absolute difference between actual and modeled occupancy of each region (r) for each simulated track (s), weighted by the true proportion of time spent in that region for a given track (weightsr). Weighting was used to calculate the average error among regions while accounting for the proportion of time spent in each region. This criterion was used to evaluate accuracy averaged across all regions in the system for comparisons among models. The second criterion was the regional occupancy error (ROE; Eq. 2), which was calculated as the unweighted difference between model estimates (estimatesr) and actual occupancy (actualsr) for each simulated track by region. This criterion determined model accuracy for individual regions and determined if models overestimated or underestimated region occupancy. Similarly, the third criterion, aROE (Eq. 3), evaluated accuracy for individual region estimates based on the average error without accounting for the direction of error (i.e., over or underestimated).

$${\text{ROE}}_{\text{sr}}={\text{estimate}}_{\text{sr}}-{\text{actual}}_{\text{sr}},$$
(2)
$$\left. {aROE_{{sr}} = abs(ROE_{{sr}} } \right)$$
(3)

The fourth criterion was the daily region assignment error, which was measured as the percentage of days in which the most occupied region was incorrectly assigned based on model estimates for each simulation. This criterion evaluated model error based on a coarser assessment of fish distribution that only focused on the most occupied region at the daily scale for each simulated track. Daily region occupancy was estimated based on either the number of positions in each region or duration spent in a region (LOCF model only). Each day was assigned the region that had the highest occupancy for individual simulated tracks. Daily region assignment error was calculated using the number of days that had an incorrect assignment for the most occupied region based on estimates from a given model (daysincorrect) and the total simulated track duration in days (daystotal) (Eq. 4):

$${\text{Daily region assignment error}}_{\text{s}}= \frac{{\text{days}}_{\text{incorrect}}}{{\text{days}}_{\text{total}}}\times 100.$$
(4)

The final evaluation criterion was within-region occurrence error, which determined the extent to which models overestimated or underestimated the total number of simulated tracks that were present at some point in each region. This criterion was calculated as the difference between the total number of distinct simulated tracks that occurred within a region based on model estimates (ntracks(m)) and the actual number of distinct tracks that occurred in each region (ntracks(a)) (Eq. 5).

$$\text{Within region occurrence error}={n}_{\text{tracks}(\text{m})}- {n}_{\text{tracks}\left(\text{a}\right).}$$
(5)

Lake trout tracking

Lake trout included in the case study were captured and tagged in Lake Champlain as described by [43]. Briefly, 45 adult males and 48 adult females were collected during the fall spawning period in 2013 and 2014 from two locations, one in Main Lake North (N = 60) and one in Main Lake South (N = 33). Each fish was tagged with an acoustic transmitter (V13, 120 s nominal ping delay, 3 years battery life; Vemco, Halifax NS) that was surgically implanted into the body cavity [44]. After surgery, fish were held for either 20 min (N = 62) or 24 h (N = 30) to monitor their health prior to release at the capture location. The acoustic receivers were vertically suspended 2 m above bottom and offloaded twice per year. Receivers were removed from the array beginning in spring 2015 except for one receiver deployed during 2013 near Missisquoi Bay that was lost within the first year.

Detection data were time-corrected for clock drift using VUE software (Vemco, Halifax NS) and filtered to remove potentially erroneous detections [45] by excluding data from transmitters with no other detections from the same receiver within 3600 s (i.e., 1 h) (< 0.1%). Detections during the fall season when fish were tagged were also excluded (minimum of 21 days removed) to avoid movement behaviors influenced by handling and acoustic tagging; this process eliminated eight fish from the dataset. Fish with regularly occurring detections at a single receiver for an extended period and until the end of the recorded time were presumed dead and were also removed from the dataset (N = 4). These data cleaning steps reduced the number of lake trout in the dataset to 81 individuals including 49 fish tagged at the northern site and 32 tagged at the southern site.

Statistics

We assessed how multiple covariates affected the results of our simulations, based on the five criteria previously defined, using a combination of generalized linear mixed effects models (GLMMs) and non-parametric statistics. For the first three critera (MOE, ROE, and aROE), we fit GLMMs using the “glmmTMB” package [46] in R. The fixed effects for each of these models were movement model, start region, tortuosity, and velocity and we included the simulated track identification numbers as a random effect. For ROE and aROE, the GLMMs also included region as a covariate and an interaction between movement model and region under the hypothesis that the models would perform differently from each other and some of these differences were dependent on regional detection probability. The GLMMs that assessed MOE and aROE used a Tweedie distribution to account for zero inflation resulting from a large number of regions with zero detections for individual simulated track; the GLMM that assessed ROE used a Gaussian distribution as this metric included positive and negative values. All possible GLMMs for assessing each criterion, based on the different combinations of the parameters selected, were generated using the ‘dredge’ function from the “MuMIn” R package [47] and the most parsimonious GLMM for each criterion was chosen as the model with the fewest parameters within two Akaike’s Information Criterion (AIC) units of the model with the lowest AIC value. Model parameters with significant effects were further evaluated by pairwise comparisons using the “emmeans” R package with a Tukey adjustment [48]. The fourth criterion (daily region assignment error) was compared among movement models using a Friedman’s test followed by pairwise Wilcoxon rank sum test with a Bonferroni adjustment to identify significant differences in error among models. Non-parametric statistics were used for this criterion because the data and residuals were not normally distributed. Finally, we compared the fifth criterion (within-region occurrence error) among models by identifying the model with the lowest average error, which was calculated using the absolute within-region occurrence error for each region.

For lake trout field data, regional occupancy was determined daily and then grouped by season, defined by biological relevance (winter (coldest season): December 1–March 31, spring (transitional season): April 1–May 31, summer (stratified season): June 1–September 30, fall (spawning season): October 1–November 30), and year to determine seasonal patterns in regional distribution. Regional occupancy was calculated daily to account for individuals that were not detected for the full duration of a season by weighting the seasonal distribution of individuals by the number of days they were detected during that period. Individual fish were also grouped by tagging location (Main Lake North and Main Lake South) to compare distributions of the two spawning groups. Occupancy estimates from the optimal movement model were converted to proportions and compared among regions using a mixed multinomial logistic regression (Mlogit; ‘gam’ function from the “mgcv” R package [49]) with fish identification as a random effect and sex, region, tagging location, and an interaction between region and tagging location included as fixed effects. We fit all potential models and used AIC to select the most parsimonious model. Estimates for proportional regional occupancy were transformed to slightly compress the range of values and exclude zeros and ones [50]. We did not include additional parameters (i.e., season and year) because we had a limited sample size and we wanted to ensure we could make inferences based on the covariates of most interest. Estimated marginal means and standard error for proportional occupancy were calculated for each tagging location and region combination, and used to determine 95% confidence intervals that were compared for statistically significant differences. All modeling and statistics were conducted using R [51].

Results

Simulated tracks

Track duration, regional occupancy, and detection percentage were variable due to differences in fish velocity and starting location. Duration of the simulated tracks ranged from 32 to 289 days with all but two tracks, which originated in the Northeast Arm, passing through at least two lake regions (Table 2). Track duration differed due to variation in velocity among the simulated tracks while step length and number were held constant. Overall, the true regional occupancy of the simulated tracks was greatest in Main Lake North and least in Missisquoi Bay (Fig. 2). Average velocity and tortuosity (i.e., standard deviation of turn angle), which were both randomly assigned to the simulated tracks, were consistent across individuals grouped by start regions. The percentage of transmissions detected was low for all individuals (< 4%) and was lowest for simulations that originated in the Northeast Arm (Table 2). This small detection rate is consistent with the low detection probability of each lake region, with an average receiver coverage of approximately 0.5% across all regions based on a detection radius of 250 m (Fig. 1, Additional file 2).

Table 2 Summary statistics of 100 simulated tracks originating from five regions of Lake Champlain. Metrics include track duration, range of unique regions visited, percent of simulated transmissions detected, standard deviation of the turn angle, and velocity. All metrics except for the number of unique regions visited are presented as averages with one standard deviation. Note that limited values were possible for standard deviation of the turn angle (5, 15, or 25 degrees) and velocity (0.1, 0.5, and 0.9 m/s)
Fig. 2
figure 2

True regional occupancy of 100 simulated fish tracks among seven regions of Lake Champlain (MIB: Missisquoi Bay; NEA: Northeast Arm; MAB: Malletts Bay; MLN: Main Lake North; MLC: Main Lake Central; MLS: Main Lake South; SOL: South Lake) and regional estimates based on six distinct models. Red triangles in the six model plots represent the true mean for each corresponding regional occupancy. All boxplots include minimum values, 1st quartile, median, 3rd quartile, and maximum values excluding outliers

Model comparison

MOE varied significantly among the six occupancy models with the Int model providing the most accurate estimate of occupancy within each region and significantly lower error than all other models (p < 0.001; Tukey). The Move dBBMM was the second-most accurate model with significantly lower error than the remaining four models (p < 0.003; Tukey). There were no differences in MOE among the Base, COA, and RSP models (p > 0.995); however, the LOCF model had significantly more error than both the Base (p = 0.041; Tukey) and RSP models (p = 0.016; Tukey; Fig. 3). The Int model had the lowest maximum MOE among all simulations (40.4%) while the LOCF model had the greatest error for an individual simulation (62.9%) and greatest average error (12.5%; Fig. 3). Simulation velocity and tortuosity had insignificant effects on MOE and were excluded from subsequent analyses (GLMM, p > 0.17).

Fig. 3
figure 3

Mean weighted occupancy error for six different models estimating regional occupancy of 100 simulated tracks. Significant differences in error are denoted by differing letters

Average ROE and aROE both varied considerably among lake regions and movement models, and there were significant interactions between these two variables (p < 0.05; Tukey; Additional files 3 and 4). All other parameters that were assessed (start region, tortuosity, and velocity) had no effect on ROE (p > 0.75; GLMM) nor aROE (p > 0.12). Significant differences in ROE between models by region were associated with Base and COA models overestimating occupancy of Malletts Bay while the LOCF model and RSP dBBMM underestimated occupancy of this region (p < 0.02; Tukey). In addition, the RSP dBBMM overestimated occupancy of Main Lake North and had a significantly different ROE than the Base and COA models which largely underestimated use of the region (p < 0.01; Tukey). The LOCF model largely overestimated occupancy of Main Lake Central with significantly greater ROE for the region than the Base, Int, and Move models (p < 0.04; Tukey; Additional file 3). Overall, the Int model tended to have the lowest average ROE across all regions (Fig. 4). In contrast, the Base model had substantial error for all regions and the COA and RSP models had substantial error for all regions except Main Lake South (Fig. 4). Similar to ROE, aROE was most frequently lowest when calculated by the Int model (Table 3). This included the most accurate estimates for five of the seven regions with significantly lower aROE based on estimated marginal means for almost all pairwise comparisons (p ≤ 0.048; Tukey; Additional file 4). The aROEs for the remaining two regions (Missisquoi Bay and Northeast Arm) were the lowest when calculated by the RSP dBBMM (Table 3). Regionally, all occupancy models tended to underestimate occupancy (i.e., negative ROE) for Missisquoi Bay, which had no receivers present; the Base, LOCF, COA, and Int models were not capable of producing occupancy estimates for this region and therefore use of this region was set to zero for these four models (Fig. 2). While all models tended to underestimate occupancy of Missisquoi Bay, the average ROE for the adjacent Northeast Arm was consistently overestimated among models. The only other region that showed a consistent pattern among models was the South Lake, which was underestimated by all models (Fig. 4). For regional comparisons of aROE, error tended to be lowest across models for the South Lake and Main Lake South, and highest for the Northeast Arm and Main Lake North (Table 3).

Fig. 4
figure 4

Average regional occupancy error (ROE) for six models estimating occupancy from 100 simulated tracks. Values greater than zero represent overestimated occupancy and values less than zero were underestimates

Table 3 Average absolute regional occupancy error (aROE) for six models estimating occupancy of 100 simulated tracks separated by seven regions of Lake Champlain

Daily region assignment error varied substantially among models with large differences in the total number of days for which each model had estimated positions. Both dBBMMs provided occupancy estimates without a time component and thus were not appropriate or able to provide daily region assignments. The Base and COA models only provided region assignments for days when transmissions were detected, which limited daily assignments to an average of 55% of the simulation dates. Both the LOCF and Int models estimated positions for dates with no detected transmissions; however, the LOCF model provided region assignments until a final defined date (100% of dates) while the Int model only provided daily assignments until the last date a simulation was detected, which included an average of 98% of simulation dates. The Int model was still able to provide the lowest daily region assignment error (9.7 ± 9.8%), which was significantly lower than the error for the other three models (p < 0.001; pairwise Wilcoxon rank sum). The LOCF model assignments had the second-least error (18.5 ± 10.2%) with significantly lower error than the Base (43.7 ± 23.2%) and COA (43.0 ± 23.7%) models (p < 0.001; pairwise Wilcoxon rank sum), which were not different from each other (p = 1.000; pairwise Wilcoxon rank sum; Fig. 5).

Fig. 5
figure 5

Percentage of days that were incorrectly assigned to the most frequently occupied region by day for four models including 100 simulated tracks. Significant differences in the percentage of correct assignments among models are denoted by differing letters

Within-region occurrence error based on the number of distinct simulated tracks that occurred in each region at any point throughout their track was highly variable among models and regions. Nearly all models underrepresented the number of distinct simulated tracks that occurred in Missisquoi Bay, and all models had large overrepresentations (≥ 23%) for the number of distinct tracks present in the Northeast Arm and Malletts Bay. In contrast, within-region occurrence error for the Main Lake regions and South Lake were substantially lower (≤ 11%) and the number of distinct tracks in these regions were typically underrepresented. Overall, average within-region occurrence error—calculated using absolute values of the error—across regions was similar among movement models but was lowest for the COA model and greatest for the Move dBBMM (Table 4).

Table 4 Within-region occurrence error calculated as the difference between the number of distinct simulated tracks that occurred in seven lake regions based on model estimates and the true number of simulations that were present

Lake trout distribution

Regional occupancy of lake trout was assessed using the Int model given the low error from most of our model comparisons. Lake trout distribution was consistently highest in the Main Lake regions although multiple fish had estimated positions in Malletts Bay (54% of tagged individuals) and the Northeast Arm (10% of tagged individuals). Only one fish occurred in the South Lake and no positions occurred in Missisquoi Bay among all individuals. While the Northeast Arm and Malletts Bay typically had low average occupancy estimates for lake trout from both tagging sites, some individual fish had estimated occupancy up to 100% for a given season in these two regions, especially during winter, leading to high variance for the model estimates (Fig. 6). Int model estimates for occupancy of Missisquoi Bay and the South Lake were consistently negligible with no estimates greater than zero for Missisquoi Bay and only one estimate (2%) for the South Lake during a single winter (Fig. 6). Given no fish had estimated positions within Missisquoi Bay, this region was dropped from Mlogit analyses. Lake trout occupancy varied among the remaining six regions and regional distribution differed significantly based on tagging location (p < 0.001; Mlogit; Fig. 6) but not sex (p = 0.946; Mlogit). Lake trout from both tagging locations had significantly greater predicted occupancy of the Main Lake regions than all other lake regions (Table 5). Occupancy for Malletts Bay and the Northeast Arm were low and comparable for fish tagged at the north site, but Malletts Bay had a significantly greater predicted use than the Northeast Arm for fish tagged at the south site (Table 5). The region with greatest estimated occupancy was dependent on tagging location with significantly higher predictions for the region where fish were tagged (i.e., Main Lake North or Main Lake South) compared to all other regions (Table 5). Lake trout tagged in the Main Lake North also had significantly greater predicted occupancy for that region compared to lake trout tagged in the Main Lake South, and individuals tagged in the Main Lake South had significantly greater predicted occupancy for that region as well as greater predicted use for the Main Lake Central and Malletts Bay regions (Table 5). These differences in occupancy between lake trout tagged in the north and south were most pronounced during fall, although the trend was maintained for all other seasons based on Int estimates (Fig. 6). However, the trend was not consistent across all individuals, with five of the north-tagged fish (10%) and three of the south-tagged fish (9%) having no estimated occupancy of their respective tagging region throughout their entire detection period.

Fig. 6
figure 6

Seasonal percentage of time that lake trout occupied in seven regions of Lake Champlain (MIB: Missisquoi Bay; NEA: Northeast Arm; MAB: Malletts Bay; MLN: Main Lake North; MLC: Main Lake Central; MLS: Main Lake South; SOL: South Lake) based on interpolated positions. Lake trout are separated by the location where they were captured and subsequently tagged with acoustic transmitters. Error bars represent the 5th (lower error bar) and 95th (upper error bar) percentiles

Table 5 Estimated lake trout occupancy for seven regions of Lake Champlain separated by tagging location (North and South) and calculated using least-cost interpolated positions (Int)

Discussion

Our comparison among regional occupancy estimation models using the framework we developed demonstrates the importance of model selection to reduce inaccurate interpretations of animal movement. All models were able to capture a similar trend in overall regional occupancy of the simulated tracks (Fig. 2) and therefore may be sufficient to assess general distribution patterns (e.g., most used region); however, differences in model performance likely have a more substantial influence on animal movement interpretations when conducting statistical analyses that evaluate significant correlates of movement patterns. For example, methods that provide more consistent occupancy estimates with less artificial variance caused by biases associated with inconsistent detection probability will have more statistical power to identify significant correlates; in this respect they have similar benefits as standardized receiver arrays (e.g., grids) [15]. Overall, estimated occupancy of simulated fish tracks constructed using the Int model consistently performed best for our receiver array in Lake Champlain based on nearly all criteria. Previous work has demonstrated the value of using shortest paths with interpolated positions to improve occupancy estimates and provide more realistic descriptions of animal movement compared to detection data alone [27]. Another important consideration for practical model selection is computational complexity. The movement models we used varied in their coding complexity, operating time, and processing demand, all of which can be limiting factors when selecting an appropriate model [52]. For example, the Int, Move, and RSP models each include spatial analyses that can be computationally costly and may exceed the capabilities of typical personal computers.

Our model comparison also identified various limitations for each of the movement models we assessed in respect to the limited receiver coverage of our array. All model estimates were impaired by low receiver coverage and large time gaps between detected transmissions. For example, only the two dBBMMs were able to estimate positions in the region with no receivers, Missisquoi Bay, and they still tended to underestimate occupancy of that area. However, this limitation was likely exacerbated in our study by the minimal connectivity between Missisquoi Bay and its adjacent lake region; models capable of estimating animal positions between detection locations (i.e., COA, Int, Move, and RSP models) would likely perform better than we observed for regions without receivers that are located between adjacent regions. The absence of receivers in Missisquoi Bay also caused all models to overestimate occupancy for the adjacent regions, Northeast Arm and Main Lake North, by either having a higher relative proportional use due to fewer total estimated positions (Base and COA models) or inaccurately assigning interpolated positions to the next likely regions based on where they were most frequently observed (LOCF, Int, Move, and RSP). The Int model, however, likely reduced error associated with gaps between detections by restricting interpolated positions during these periods to the proximity of the receivers that most recently detected the individual and separating the positions evenly between receivers based on the spatial and temporal extent of the gaps. The two dBBMMs and the LOCF model were also capable of estimating occupancy that accounted for gaps between detections; however, they were likely less capable of estimating accurate positions during extended periods without detections. Previous studies have similarly found that dBBMMs are hindered by large gaps in detections because occupancy estimates become overly broad [23, 53] and these types of models are typically designed for estimating occupancy for periods with frequent known positions [27, 28]. However, dBBMMs can be formatted to exclude track segments with large gaps between detections when calculating the variance of the Brownian motion to provide more refined estimates of occupancy. The removal of motion variance during large gaps between detections and the resulting utilization distribution calculated by the Move dBBMM in this study appear to have reduced overall regional occupancy error compared to the RSP dBBMM; however, excluding these periods also likely reduced estimates for the region without receivers. The LOCF model frequently provided inaccurate spatial estimates during large gaps by assigning individuals to their last known location until observed again, which creates a bias towards regions that have high detection probability or regions that are in close proximity to other regions with low detection probability. Both the Base and COA models were limited to estimating positions only when transmissions were detected.

All model occupancy estimates also had error associated with limitations of the methods we used to generate the simulated detections and with the movement analyses we selected. For example, the methods for generating artificial detection data from the simulated tracks were not able to incorporate the complex influence of causeways on the detection range of adjacent receivers. Our receiver array included stations intentionally positioned near openings of causeways to have a high probability of capturing fish movement between regions [43]. However, the causeways can dramatically reduce detectability of transmissions that occur within the typical detection range of nearby receivers by creating near-complete barriers between tagged fish and receivers positioned on the opposite side of the causeways (although transmissions sent while a tagged fish is perfectly in line with a causeway opening and receiver on the opposite side of the causeway may still be detected). Therefore, tracks were frequently detected across the causeways in our simulations, increasing the number of detections for tracks located in the adjacent region. This error can be seen by the large total occurrence error across all models for the Malletts Bay and Northeast Arm regions (Table 4), which are both partially bordered by causeways. This issue demonstrates the value for researchers of having a general understanding of the features of their study system that can impact its soundscape to identify potentially erroneous results. The models included in our framework were also limited by an inability to incorporate environmental parameters that can provide more accurate occupancy estimates even during periods with infrequent or missing detection data [54, 55]. For example, depth-contour algorithms have been combined with acoustic telemetry data to increase accuracy of fine-scale habitat use and movement patterns [56, 57]. Regional assessments of occupancy are likely less influenced by accuracy of fine-scale habitat selection; however, models that incorporate biases for specific locations or environmental factors may improve occupancy estimates, especially when there are large gaps in detections.

Despite the detection biases associated with irregular acoustic receiver coverage and particular geomorphic features of Lake Champlain (e.g., causeways and semi-isolated basins), our case study demonstrates that appropriate movement models can identify complex regional movement patterns derived from occupancy estimates. The significant interactions between release location and regional occupancy suggests the fish that were tagged had a general pattern of regional fidelity that was likely associated with spawning locations, and that behavior was largely maintained throughout the year. Similar patterns were not observed with the simulated tracks, as start region had no effect on regional use. Because regional fidelity was apparent for the lake trout data despite removing detections that occurred during the season fish were tagged, this trend likely represents true behaviors of the tagged animals. Biologically, these results indicate a common spatial segregation among lake trout groups within the lake, even though individuals have the capacity to disperse fully throughout the system. Additionally, our selected model (Int) was able to identify rare behaviors such as prolonged use (> 1 year) of regions typically considered to be unsuitable for the species for extended periods of time (i.e., summer stratification). Such results can have important applications for identifying habitats that may help support populations or function as possible ecological traps. An obvious limitation of our selected model was the inability to estimate occupancy for the region without receivers, which prevented any conclusions about the occupancy of lake trout for that area. Based on previous research describing the habitat preferences of our study species and the limited suitable habitat available in the un-monitored region [58], we expect that lake trout rarely use the area and, therefore, this limitation likely had a minimal influence on our model error. However, such expectations require a general understanding of the habitat preferences for a given study species in addition to the environmental characteristics of the study system, which might not always be available.

Conclusions

The framework we have developed here is intended to help researchers select an appropriate movement model to best address regional occupancy of their study species based on the unique characteristics of their study system and acoustic receiver array. Such analyses provide broad-scale assessments of distribution that can be used to evaluate large movements among regions throughout waterbodies, with the capacity to incorporate detection data from multiple arrays. It is important to note that such analyses are likely too coarse for assessments of habitat use; however, this information provides unique perspectives of occupancy that can inform management and conservation efforts over large areas (e.g., fishery management units and marine protected areas) and advance our basic understanding of movement behaviors. Therefore, regional assessments of aquatic animal distribution will likely continue to be used and become more common as acoustic telemetry applications expand further and collaborative networks continue to enable data and resource sharing.