Background

Aquatic telemetry is a powerful tool for studying the behavior and demographics of fish populations [1,2,3]. The value of this technology stems partly from its ability to track uniquely identifiable fish through time and space without requiring either investigator proximity or repeated handling of the study subject after tagging and release [4]. However, spatiotemporal separation raises the possibility that not all tags are tracking a living subject in the manner intended. Tags may be detected after expulsion or displacement, or the tag may be temporarily transferred to a predator after consumption, an event we refer to as “tag predation” [5,6,7,8,9]. The latter possibility results in the predator’s behavior being recorded instead of that of the study subject. Both possibilities may produce an inaccurate depiction of the behavior and survival of the study subject and may result in biased inferences if false detections are not accounted for in data analysis [10].

Tag predation is identified as a potential source of bias in some studies of survival and movement of migratory fishes, and a variety of procedures have been used to identify and remove this type of false detection data; see Klinard and Matley [10] for a review of methods used to identify tag predation and other forms of mortality in telemetry data sets. The most direct predator diagnoses arise from physical recovery and examination of the tagged individual. In most studies, tag recovery is incidental so indirect diagnosis methods are required. Procedures used to flag purported predator detections have sometimes been referred to as “predator filters” [11]. The bulk of existing predator filters may be classified as either behavior-based or signal-based. Behavior-based filters depend on the assumption of behavioral differences between the focal species and likely predator species [e.g., 11,12,13]. Signal-based filters identify predation in the tagging data using changes to the signal generated by the tag; for example, recently developed acoustic tags switch to the “predator signal” when a specially designed coating breaks down in the gastrointestinal tract of a piscivore [14]. This paper addresses behavior-based filters.

Our definition of predator filter is intentionally broad, ranging from simple decision rules to complex algorithms or statistical approaches. We distinguish between a predator filter and subjective data manipulation by requiring predator filters to be formalized, documented, and applied systematically. In all cases, the effectiveness of a given filter is predicated on the assumption that a distinct behavioral contrast exists between the study subject and their predators and that behavioral differences are detectable on the spatiotemporal scale of the data [10].

The ability to identify predator detections in a telemetry investigation varies based on the quantity and quality of data available. Typically, the available data are sequences of tag detection times at various fixed-site or mobile acoustic receivers. Movement metrics derived from each sequence, such as migration rate and direction, form the basis of a behavior-based predator filter. Tag detection data may be augmented by environmental data and detections from past or concurrent predator-tagging studies [15, 16]. In most cases, some knowledge of the behavior and swimming capabilities of the focal species is required, possibly combined with understanding of predator behavior and gut evacuation rates [17,18,19].

The literature addressing the tag predation problem is limited. The types of predator filters previously used include fine-scale assessment of the acoustic signal pattern in the raw telemetry data file [20], 2-D or 3-D tracking of the tag’s position in space [21], simple or multifaceted decision rules based on expert scientific opinion [11, 22], outputs from multivariate clustering procedures [16], random forest classifications [13], multivariate mixture models [21], or some combination thereof [23]. These filtering approaches vary in their spatiotemporal scale (e.g., 2-D tracks versus detection events), their degree of complexity (e.g., the number and type of metrics used), and their use of auxiliary information such as environmental data or tag recoveries. A key difference among filters lies in whether records are deemed suspicious based on exceedance of explicit biological thresholds (rule-based) as opposed to a statistics-based measure of dissimilarity. Rule-based filters use expert judgement to define criteria necessary to be considered representative of the focal species. A statistics-based filter may generate a set of data-driven rules or may estimate a state (focal species versus predator) probability or assignment based on patterns in the data. Among statistics-based filters, pattern-recognition filters use numerical methods to identify typical behavioral patterns for a population and then flag tags showing aberrant behavior.

The choice of predator filter has the potential to affect study outcome and understanding of the system under investigation. However, diagnosing predated tags can be both subjective and time consuming, and little guidance has been provided on when a predator filter is advised, which predation identification procedure to use, and the degree to which study inference may be affected by the predator filter. A comparison among filter types is warranted because of the variable effort required by the different approaches and the potential impact on study findings.

In this paper, we discuss considerations in designing and implementing a predator filter and demonstrate several predator filters using a case study of subyearling Chinook salmon Oncorhynchus tshawytscha in the San Joaquin River of California, United States. Our context is the common case of studying survival of juvenile migratory fishes in a regulated river using implanted microacoustic tags [24,25,26]. In such studies, tagged fish released at one end of the study area migrate past a series of interrogation arrays and their migration survival is characterized using a Cormack–Jolly–Seber (CJS) survival model [27, 28]. Here, we assume tagged fish are migrating downstream, although the factors we consider are applicable to other types of telemetry studies. We outline issues in identifying tag predation, limitations to diagnosing predator detections, and potential impacts on survival estimation. We demonstrate two variations on both rule-based and statistics-based predator filters: a simple rule-based filter, a complex rule-based filter, and pattern-recognition filters with and without the inclusion of auxiliary movement data from intentionally tagged predators. We present novel approaches for assigning detection-level predation events as part of a pattern-recognition filter. We compare the behavior of alternative filtering approaches based on the number and pattern of purported predator detections among the smolt tags and the known predator tags, and examine the consequences of each filter using local- and region-scale survival estimates. Finally, we provide guidance on selection of predation identification methods.

Methods

Study area

The Sacramento–San Joaquin River delta (“Delta”) of California, United States, is a tidally influenced dendritic inland estuary that connects the productive agricultural region of the Central Valley of California to the San Francisco Bay and the Pacific Ocean (Fig. 1). Four runs of Chinook salmon and anadromous rainbow trout Oncorhynchus mykiss (steelhead) migrate through the Delta and all have suffered severe declines in the last century [29]. Two runs of salmon and Central Valley steelhead are listed as endangered or threatened under the U.S. Endangered Species Act (1973), and the other two runs are listed as Species of Special Concern by the state of California [30]. The Delta also hosts populations of non-native piscivorous fish that feed on native fishes including juvenile salmonids; species suspected to be important predators of juvenile salmonids in the Delta include striped bass Morone saxilitis, largemouth bass Micropterus salmoides, smallmouth bass Micropterus dolomieu, and channel catfish Ictalurus punctatus [31,32,33,34,35]. Additionally, the Delta provides municipal and agricultural water for millions of California citizens; the challenge of managing water resources as well as protected fishes has focused research attention on the magnitude and sources of mortality of juvenile salmonids in the Delta.

Fig. 1
figure 1

Map of the San Joaquin River and Sacramento–San Joaquin Delta with release site, acoustic telemetry stations (single and dual arrays), and gaging/environmental monitoring stations used in the 2016 acoustic telemetry study

Researchers have used acoustic telemetry and a dense array of fixed-site telemetry receivers to study salmonid survival in the Delta for more than a decade, and the possibility of tag predation within these studies has been recognized as a potential complication [20]. The combination of both resident and transient predators, complex hydrodynamic and migratory pathways, and closely spaced telemetry stations increases the likelihood of observing tags after a predation event. Tidal influence in the Delta further complicates the problem, as incoming tides can reverse the flow of the river for extended periods twice a day. These factors make the salmonid telemetry studies in the Delta ideal candidates for predator filters and an appropriate testing ground for novel analytical approaches. We focused our case study on a subset of the data collected in a multiyear salmon study in the southern Delta.

In 2016, an acoustic tagging study of juvenile fall-run Chinook salmon was implemented to estimate survival of these subyearling fish as they emigrated from the San Joaquin basin through the southern Delta (see Additional file 1). In April of 2016, 648 hatchery-reared fall-run Chinook salmon were surgically implanted with Vemco V4-180 kHz microacoustic transmitters (“tags”) and released into the San Joaquin River at Durham Ferry, located approximately 20 river km (rkm) upstream of the Delta entrance at Mossdale Bridge (“Mossdale”) and 195 rkm upstream of the Golden Gate Bridge (Fig. 1). Surgery, fish handling methods, and fish release methods followed the standard operating procedure outlined in Liedtke et al. [36]. The tags were monitored on fixed-site acoustic receivers located throughout the lower river and Delta (Fig. 1, Table 1). An in-tank tag-life study was conducted in April and May in 2016 using 45 randomly selected tags. Additionally, telemetry detection data were available from 37 predator specimens (24 largemouth bass, 5 striped bass, 4 channel catfish, and 4 white catfish Ameiurus catus). These known tagged predators had been part of a group of 300 predatory fish tagged with Vemco V9 long-lived acoustic tags in prior telemetry studies in 2014 and 2015 [37, 38].

Table 1 Telemetry stations and regions of lower San Joaquin River and Delta used in predator filter in 2016 Chinook salmon case study

The southern Delta is a complex habitat of interconnected river channels that includes multiple migration routes (Fig. 1). Survival in the San Joaquin River from the region near Mossdale to the Turner Cut Junction (37 rkm from Mossdale) has been found to be variable and positively related to river discharge entering the Delta from upstream [39]. This region also has a sizeable population of non-native predatory fishes [35]. Misclassification of the predation status of detected tags in this region has the potential to bias survival estimates and the estimated relationships between survival and covariates such as river discharge. For this reason, the spatial focus for this paper was the mainstem San Joaquin River from Durham Ferry downstream to the Turner Cut Junction. This region included the distributary point of Old River (“head of Old River”) where it branches off from the San Joaquin 5 km downstream of Mossdale.

Tag detections on fixed-site acoustic receivers were used both to assign predation status at the time of the detection events and to estimate survival. The acoustic receivers were grouped into telemetry stations, each consisting of one or more receivers to ensure complete coverage of the river channel. Some stations used paired lines of receivers (“dual array”) to enable or enhance detection probability estimation (Figs. 1, 2). Telemetry stations were located throughout the San Joaquin River from 1 km upstream of the Durham Ferry release site (station A1) to 3 km downstream of the Turner Cut Junction (A20) and in key channels off the mainstem San Joaquin River: at Rough and Ready Island near Stockton, CA, (R1), in Turner Cut (T1, 1.2 km from the junction), in Old River within 1.0 km of its source (B1), and in Old and Middle rivers within 0.6 km of the source of Middle River (B1, C2). Survival was estimated using detections from all San Joaquin River stations downstream of the release site and from the B1, R1, and T1 stations (“core area”). The predator filter used detections from all stations but included detections from outside the core area (stations A1, B2, and C1) only if the tag was subsequently detected in the core area.

Fig. 2
figure 2

Schematic of mark–recapture model used to estimate survival in 2016 acoustic telemetry study in the San Joaquin River. DF: release site at Durham Ferry. Horizontal lines indicate telemetry stations, parallel lines indicate dual array stations. Parameters: S: survival probability, P: detection probability, Y: route selection probability

Data processing

The raw telemetry detection data were first filtered for false positive signal detections (e.g., multipath) and compiled into receiver-specific detection events by researchers at the U.S. Geological Survey (USGS) in Sacramento, California, United States. The USGS Great Lakes Science Center further processed the receiver detection event data into station-specific detection event data in which data from receivers within the same station were pooled. Consecutive detection events at a given telemetry station were separated either by detection at a different station or by a time gap of ≥ 12 h without detection at the same station; a 12-h cutoff was selected to accommodate small-scale movements in response to the approximately 12-h tidal cycle in the Delta. The remaining data processing and analysis was performed by the University of Washington. Water velocity, river stage, and river discharge observations from river gaging stations located throughout the study area were retrieved from two state databases: the California Data Exchange Center (https://cdec.water.ca.gov/selectQuery.html) and the California Water Data Library (www.water.ca.gov/waterdatalibrary). The river data were recorded at 15-min intervals; these data were cleaned for obvious errors as described in [40].

The detection event data were passed through one of four distinct predator filter procedures: (1) a simple rule-based filter; (2) a complex rule-based filter; (3) a pattern-recognition (cluster analysis) filter restricted to detections of salmon tags (“smolt-only”), and (4) a pattern-recognition filter augmented with telemetry data from known predators (“multispecies”). All four filters assigned predator status to individual detection events, but the methods for assigning these events differed between the two primary approaches (rule-based versus pattern recognition). Additionally, tag predation could be assigned at either the beginning or end of a detection event for the two rule-based filters but only at the beginning of a detection event for the two pattern-recognition filters. The simple rule-based filter consisted of a small set of binary rules based on several movement metrics, any violation of which signaled tag predation and a predator classification for the tag from that point onward. The complex rule-based filter used additional metrics and a score-based system that classified tags as predated if they showed evidence of unexpected behavior across multiple criteria. It was also the only spatially explicit filter used, meaning that thresholds for some metrics depended on the region within the study area. The two pattern-recognition filters used common multivariate statistical methods to identify deviant clusters of tags based on patterns of observed metrics. The smolt-only pattern-recognition filter used statistical procedures to identify groups of suspicious tags based on the observed variation in the metrics data. The multispecies pattern-recognition filter used behavioral similarity with known predators as a basis for flagging suspicious smolt tags. Each predator filter was developed and implemented independently, producing four subsets of the detection events that were classified as coming from predators, one for each predator filter. The outputs from the four filters were compared using the total number of tags classified as predated by the end of the detection history and the number of first-time predator classifications that occurred at stations in various regions of the study area: upstream of Mossdale (stations A1–A5); Mossdale through the head of Old River (A6–A8/B1); Undine Road to Howard Road (A9–A13); Weston Ranch (Stockton) to Navy Bridge/Rough and Ready Island (A14–A17/R1); Calaveras River to the Turner Cut Junction (A18–A20/T1); and the head of Middle River (B2, C1) (Fig. 1).

The output of each filter was further processed by truncating each detection sequence at the last detection event preceding the purported tag predation. The resulting four sets of filtered detection event data were used for survival estimation for the entire study area and for the regions named above. For each predator filter, the filtered data were converted into detection histories that represented the final fate of the tagged fish at each river junction and telemetry station. The detection histories were analyzed in a CJS release–recapture model that included route selection at the head of Old River (Fig. 2) to estimate reach-specific and regional survival. Sparse detection data at the R1 and T1 telemetry stations prevented estimation of detection probabilities unique to those sites, and so detections at those sites were pooled with detections at stations A17 and A20, respectively. Tag survival data from the in-tank tag-life study were modeled using the 2013 version of the four-parameter vitality model [41, 42] and the resulting tag survival probabilities were used to adjust fish survival estimates for premature tag failure [43]. Survival was also estimated using unfiltered observation data for comparison to the filtered data. The results of the five data sets were assessed by comparing estimates of cumulative survival moving downriver from the release site and regional survival.

Data processing and filtering were conducted in R [44] using the “cluster” package (v. 2.1.0) for silhouette estimates and the “shipunov” package (v. 1.14) to assess bootstrap cluster stability. The release–recapture model was fit using maximum likelihood in the software Program USER [45].

Simple rule-based filter

The simple rule-based predator filter was constructed using expert opinion to define five rules for classifying detections as coming from either salmon or predators. Metrics and/or rules were derived from methods reported in other salmon survival studies in the Delta. The metrics were: (1) distance traveled upstream on a single upstream trip [22]; (2) migration rate during each transition between detection events [11, 46]; (3) water velocity and river discharge at the start of each detection event [11, 23]; (4) cumulative average distance traveled per day since release [23], and (5) total time spent in the vicinity of a telemetry station (cumulative sum of near-field residence time; see Complex rule-based filter [23]. Each metric was computed for each detection event. Water velocity and river discharge at the start of each event were interpolated from the nearest 15-min velocity and flow observations at the closest gaging station.

Detection events were classified as representing predators if any of the following conditions were met: (1) the distance traveled upstream on a single trip was ≥ 16 km [22]; (2) an upstream transition had migration rate > 3 km h−1 and was not immediately followed by a downstream transition [46]; (3) the tag was observed to move upstream against the direction of river flow [22, 23]; (4) the cumulative average distance traveled per day since release was < 1 km d−1; and (5) the total time spent in the vicinity of a station was > 36 h. Rule violations resulted in predator classifications at either the beginning (rules 1–4) or end (rule 5) of the detection event. Upstream movement against the direction of river flow was defined by water velocity > 0.15 m s−1 at the start of an upstream detection event. For telemetry stations A1–A5, water velocity data were unavailable and river discharge > 28.3 m3 s−1 (1000 ft3 s−1) at the VNS gaging station defined upstream movement against the flow. Neither velocity nor river discharge data were available for the Lathrop telemetry station (A8).

Complex rule-based filter

The complex rule-based filter augmented the threshold approach of the simple rule-based filter by using additional metrics, adding a spatial dimension to the rule set, and classifying tags using a score-based system. A similar complex rule-based filter has been used for multiple years of telemetry studies of Chinook salmon in the Delta [11]. The filter used 24 metrics to characterize the behavior of tagged fish and compared observed metrics to criteria determined by expert opinion, literature review, past filters used for studies in this system, and calibration based on examination of the 2016 detection data. Physical recovery of five acoustic-tagged salmon in trawl sampling near Mossdale (A6) informed criteria for the 2016 filter (unpublished data). The criteria were spatially defined and fell under nine categories, each of which included one or more metrics: residence time on three spatiotemporal scales (near, mid, and far field), travel time since release, reach-specific migration rate, migration rate scaled by water velocity and fish length (body lengths per second, BLPS), upstream-directed transitions, movements against the direction of water flow, and regional patterns of movement. More details on the metrics and filter criteria are provided in the Additional file 1: Tables S1 and S2.

A score was assigned to each detection event, representing the number of criteria categories violated for that event (range = 0–9). Violations of one or more criteria within a single category increased the event’s score by 1. Detection events that earned total scores ≥ 2 were assigned a predator classification, as were all subsequent detections of the tag. The first predator classification could be assigned at either the beginning or the end of a detection event. Assignments to the beginning of an event resulted from violations of the criteria for travel time, migration rate, BLPS, upstream transitions, movements against flow, and unexpected transitions. Assignments to the end of an event resulted from violations of the criteria for residence time and, in some cases, apparent movements against flow or unexpected transitions.

Smolt-only pattern-recognition filter

The pattern-recognition filter used procedures similar to past studies [e.g., 16, 23] and expanded on these methods in a novel way. We first used hierarchical cluster analysis to identify individual tags that exhibited suspicious behavior during their detection history. We then followed with a post hoc analysis to identify the specific detection events when tags first exhibited sufficient abnormal behavior to be classified as predated.

Similar to the rule-based filters, the pattern-recognition filter relied on a set of behavioral metrics. We initially computed a list of candidate metrics at the detection-level taken from those used in the complex rule-based filter, and then aggregated these metrics using one or more summary statistics. Each summarized metric reflected a hypothesis about a specific behavioral scenario that might indicate predator-like behavior. For example, migration rate was summarized using the tag-level mean and maximum, based on the hypothesis that deviations in mean migration rate or a stark difference between mean and maximum might reflect a sudden change in movement behavior indicative of tag predation. See Additional file 1: Table S3 for a complete description of metrics and hypotheses. We omitted metrics not suited for a pattern-recognition analysis using conventional multivariate statistical methods, such as categorical fields or those with missing data [47]. We used plots of data distributions and correlograms to determine which variables to remove and applied log transformations to a subset of variables to reduce their skew. The final data set contained 20 explanatory variables. We standardized each metric by subtracting the mean and dividing by the standard deviation.

We then implemented a cluster analysis using the Ward hierarchical clustering method based on the squared Euclidean distance matrix [16, 23, 48]. This unsupervised learning algorithm partitioned the tags based on similarities in multivariate patterns of the observed metrics. We used two common statistical techniques to determine the optimal number of clusters: a discontinuity (“elbow”) in the plot of the proportions of variance explained and the full average silhouette statistic [49]. After selecting the number of clusters, we assigned predated status to cluster members based on associated behavior patterns and the assumption that a minority of tags were likely to have been consumed before the end of their detection histories. Let \({T}_{S}\) indicate the subset of tags classified by the cluster analysis as smolts (non-predated) and \({T}_{P}\) the subset of tags classified as predated by the end of the study.

We followed the cluster analysis with a principal components analysis (PCA) in order to examine the behavior of the clustering algorithm and the relative importance of specific metrics for defining clusters. PCA uses the same reference space as the Ward cluster analysis and both methods are based on minimizing (Ward) or decomposing (PCA) the variance of a distance matrix [50]. We used the correlation coefficients between the first few principal component axes and the scaled metrics (i.e., loadings) to identify variables most influential in spreading the data in multidimensional space and used resampling procedures to examine clustering stability [51].

After classifying tags into smolt and predator categories, we then used a recursive series of PCA ordinations to identify the detection event for each tag in the \({T}_{P}\) set that suggested tag predation. For each increment \(i\) in the PCA series, we subset the raw event-level data to the first \(i\) detection events for each tag, where \(i\) ranged from 1 through the maximum number of detection events among all tags; all behavioral metrics were summarized and standardized on this incomplete data set and the principal components were computed. We plotted and tracked the relative ordination position of all tags using the value of the first and second principal components. Because all tags were in live smolts at the time of release, the cloud of points begins clustered and then shifts in space as cumulative metrics evolve for each tag with additional detection events, slowing as they settle into their final positions based on the full data. We tracked tags that were classified as predators (\({T}_{P}\)) and identified the detection event when each tag first moved outside of the bivariate distribution of expected smolt behaviors as defined by the first and second principal component axes. For each increment \(i\), we estimated that bivariate distribution using a multivariate normal distribution defined by the mean ordination \(\left({\overline{x} }_{1},{\overline{x} }_{2}\right)\) and covariance of the first two principal component values for the \({T}_{S}\) tags (i.e., classified as smolts). A significant deviation from that distribution was defined as departure from the ellipse defined by the 95% confidence interval contour about the mean ordination \(\left({\overline{x} }_{1},{\overline{x} }_{2}\right)\) based on the Chi-square distribution with two degrees of freedom [52]. Because the method is applied recursively, sequentially increasing the number of detection events with each increment, and uses movement beyond a bivariate ellipse as the criterion for estimating the tag predation event, we refer to this event-level assignment method as the “recursive ordination ellipse” (ROE) method.

Because the orientation of the 95% ellipse in the ordination may be unstable for small values of \(i\), we defined a minimum number of events (“event minimum”) required before a tag could be considered abnormal. We defined the event minimum as the increment \({i}_{0}\) \(\left({i}_{0}\le 5\right)\) when major rotations of the ROE ellipse ceased and the range of observed values on the principal component axes stabilized. Tags were classified as newly predated upon the first event \(i\) \(\left(i\ge {i}_{0}\right)\) when their ordination was outside the ROE ellipse boundary. Only tags in the \({T}_{P}\) subset were assigned predation events.

Multispecies pattern-recognition filter

The multispecies pattern-recognition filter used mostly the same methods as the smolt-only pattern-recognition filter but augmented the smolt-tag data with detection data from 37 acoustic-tagged known predatory fish. We used the same behavioral metrics as in the smolt-only pattern-recognition filter, now defined for both the smolt tags and the known predator tags (i.e., summarizing and standardizing the combined detections of smolt and known predator tags). Because the predator tags were initially released up to two years prior to the 2016 smolt study, we assigned them a virtual release date at the time of the first smolt tag detection in 2016 and a virtual release location equal to the site of their first detection during the smolt study. As in the smolt-only version, we assumed that even if a large number of tagged smolts were consumed during the study, a small (but potentially influential) minority of those tags would be detected after consumption. Thus, we determined the number of clusters by jointly minimizing both the total number of clusters and the heterogeneity of tag identities within clusters. We were particularly interested in defining clusters characterized by normal behavior patterns (closer to the origin of the PCA axes) in which smolt tags appeared as a large majority (i.e., > 90%). All smolt tags assigned to clusters disproportionately composed of known predator tags (> 10%) were classified as predated by the end of their detection history (\({T}_{P}\) subset of smolt tags); smolt tags assigned to the remaining clusters were classified as smolts (\({T}_{S}\)). We used the ROE method to identify the first detection event when the \({T}_{P}\) tags were first classified as predated as described above.

Testing on known predator data

We tested each of the four predator filters on the set of 37 known predator tags. A useful predator filter should have a high probability of classifying known predator tags as predated at some point during their detection history. We implemented both rule-based filters on the predator tags using a virtual release date and location defined as described above and computed the percentage of predator tags diagnosed as predated. The predator tags were already included in the multispecies pattern-recognition filter; for this filter and the smolt-only pattern-recognition filters, we computed the percentage of predator tags that left the 95% ellipse in the ROE method.

Results

A total of 177 tags were classified as predated by one or more predator filtering approaches, ranging from 66 tags flagged by the smolt-only pattern-recognition filter to 139 flagged by the complex rule-based filter (Table 2). Detection, travel time, and survival results from the filtering approaches are described below, including the degenerate no-filter approach.

Table 2 Summary of tag detections at key locations and output of rule-based (RB) and pattern-recognition (PR) predator filters for 2016 San Joaquin Chinook salmon case study

No filter

Of the 648 acoustic tags implanted into juvenile salmon and released at Durham Ferry (A0) in 2016, 618 were detected downstream of the release site, 6 were detected upstream of the release site (A1), and 30 were not detected after release. Three of the tags detected at A1 were also detected downstream; the remaining three tags detected at A1 were excluded from analysis. A total of 250 tags were detected at Lathrop (A8) and 26 tags were detected at the Turner Cut Junction (A20, T1). A total of 32 tags were detected entering Old River (B1), and 1 of these tags was later observed in the San Joaquin River at Lathrop or downstream. A total of 22 tags were detected near the head of Middle River (B2, C1); 1 of these tags was subsequently detected at B1 or in the San Joaquin River, and the remaining 21 tags were excluded from analysis.

Travel time from tag activation to A20 or T1 varied from 4.7 d to 40.4 d and averaged 6.7 d (harmonic mean; 2.4 d to 37.9 d from release; Fig. 3). The time to tag failure in the tag-life study ranged from 27.6 d to 46.0 d (Fig. 3), and fish survival estimates were adjusted for premature tag failure. Without filtering for possible tag predation, the total probability of survival from Durham Ferry to the Turner Cut Junction was estimated at 0.044 (\(\widehat{\mathrm{SE}}\) = 0.009; Table 3). Cumulative survival decreased markedly from A6 (Mossdale) to A12 (Brandt Bridge) and was largely stable from A12 to A17/R1 (Navy Bridge/Rough and Ready Island). Survival rate per km was lowest between A7 (River Islands Parkway) and A9 (Undine Road); this region included the head of Old River (Fig. 4).

Fig. 3
figure 3

Arrival timing distribution to the Turner Cut Junction (stations A20, T1) for salmon tags classified by filtering approach (bold and colored lines), observed tag survival from the tag-life study (open dots), and fitted tag survival curve using the four-parameter vitality model. Filled icons indicate last arrival at A20/T1 for the filtering approach; the final detection time observed from the no-filter approach and the pattern-recognition filters coincided. Failure times for tag-life data were offset by average delay between tag activation and fish release based on the salmon tags released (2.3 d). The rug plot (horizontal axis) represents the arrival timing of unfiltered salmon tags

Table 3 Regional and total survival estimates for alternative predator filtering approaches for 2016 San Joaquin River Chinook salmon data processed using no filter, simple or complex rule-based (RB) filter, and smolt-only or multispecies pattern-recognition (PR) filter
Fig. 4
figure 4

Cumulative survival probability estimates from release to each telemetry station in the core area (A1 – A20/T1, B1) classified by filtering approach. The horizontal axis is scaled by the distance from release; the vertical axis is shown on the log2 scale to highlight survival < 0.250. A0 = release at Durham Ferry, A20/T1 = Turner Cut Junction. Error bars represent 95% confidence intervals; shading represents 95% confidence interval for no-filter approach

Rule-based filters

The simple and complex rule-based filters classified 93 tags (14% of those released) and 139 tags (21% of those released), respectively, as predated at some point after release (Fig. 5a and b). The telemetry stations in the region from Mossdale through the head of Old River (stations A6 to A8/B1) had the largest number of first-time predator classifications (39 for simple and 67 for complex rule-based filter), followed by the next region downstream (Undine Road to Howard Road, A9 to A13) (Fig. 6a and b, Table 2).

Fig. 5
figure 5

Classifications of smolt tags as predators (black bars) or smolts (gray bars) by the end of the study of acoustic-tagged Chinook salmon smolts according to four filtering approaches: simple rule-based filter (a), complex rule-based filter (b), smolt-only pattern-recognition filter (c), and multispecies pattern-recognition filter (d). White bars in a, b, and c are placeholders for the known predator tags used in the multispecies pattern-recognition filter (d). Tags are ordered according to a Ward hierarchical cluster analysis of behavioral metrics from the multispecies pattern-recognition filter (dendrogram shown). The red dashed line indicates distance threshold used to divide the multispecies dataset into five clusters (numbered 1–5); clusters 1, 4, and 5 were classified as predated

Fig. 6
figure 6

Comparison of locations where predation events were assigned for tags implanted in Chinook salmon smolts in the 2016 case study in the San Joaquin Delta according to four alternative predator filter methods

For the simple rule-based filter, the behavior that resulted in the largest number of predator assignments was long residence time near a receiver (filter category 5), followed by upstream transitions against river flow (category 3) and low average distance per day (category 4; Table 4). The criteria for length and migration rate of upstream transitions (categories 1 and 2) were not violated for any tags. For the complex rule-based filter, the near-field and mid-field residence time criteria accounted for the largest number of first-time predator classifications, followed by migration rate and far-field residence time. The time since release and the length- and velocity-adjusted migration rate (body lengths per second) categories accounted for the fewest predator classifications (Table 5).

Table 4 Summary of predation classifications from simple rule-based filter: number of tags first flagged as predated at telemetry stations in region because of meeting filter category
Table 5 Summary of predation classifications from complex rule-based filter: number of tags first flagged as predated at telemetry stations in region because of rule violation in filter category

Both rule-based filters successfully diagnosed 100% of the 37 known predator tags as predated at some point during their detection history. The criteria most pivotal to the predator diagnosis depended on the filter. The average distance traveled per day was the most important criterion for the simple rule-based filter, flagging 73% of the known predator tags because the distance traveled each day was too low. For the complex rule-based filter, the reach-specific rate of travel (migration rate) criterion was violated for 95% of the predator tags and the mid-field residence time criterion was violated for 89% of the predator tags. Collectively, the three residence time criteria were violated for 100% of the predator tags.

Travel time from release to the Turner Cut Junction (A20/T1) ranged from 2.4 days to 18.0 days (harmonic mean = 4.1 days) using the detection data after passing it through the simple rule-based filter, and averaged 3.9 d (range = 2.4 days to 10.5 days) for detections passed through the complex rule-based filter. Regional survival estimates using the simple and complex filters ranged from 0.311 (\(\widehat{\mathrm{SE}}\) = 0.029) and 0.318 (\(\widehat{\mathrm{SE}}\) = 0.030), respectively, from the head of Old River to Weston Ranch in Stockton (A8–A14) to 0.671 (\(\widehat{\mathrm{SE}}\) = 0.023) and 0.659 (\(\widehat{\mathrm{SE}}\) = 0.023), respectively, in the segment from Mossdale through the head of Old River (A6–A8/B1). Overall survival from release to the Turner Cut Junction was estimated at 0.040 (\(\widehat{\mathrm{SE}}\) = 0.008) using the simple and at 0.044 (\(\widehat{\mathrm{SE}}\) = 0.009) using the complex rule-based filters (Table 3). For both filters cumulative survival patterns largely followed those of the unfiltered data (Fig. 4).

Pattern-recognition filters

The smolt-only and multispecies pattern-recognition filters classified 66 tags (10% of those released) and 89 smolt tags (14%), respectively, as predated by the end of their detection histories (Table 2). Both pattern-recognition filters diagnosed the largest number of predated tags in the region from Undine Road to Howard Road (stations A9–A13; 25 events for smolt-only and 28 for multispecies). The neighboring regions had the next highest counts of predated tag events for both of these filters: from Mossdale through the head of Old River (A6–A8/B1) and from Weston Ranch to the Navy Bridge/Rough and Ready Island stations (A14–A17/R1; Fig. 6c and d; Table 2).

The 66 tags diagnosed as predated in the smolt-only pattern-recognition filter comprised the smaller of the two clusters identified in that cluster analysis. Although increasing the number of clusters accounted for more variance in the data, there was no obvious optimal cluster number on the basis of variance explained (Additional file 1: Fig. S1a), and the silhouette method indicated two clusters were sufficient (Additional file 1: Fig. S1b). The first two principal components in the PCA accounted for 56% of the variance (Table 6). The ordination plot for the first two principal component axes indicated that the minority cluster (i.e., the 66 \({T}_{P}\) tags) was associated with metrics involving upstream movement (Fig. 7). The majority cluster (i.e., non-predated tags) showed little evidence of upstream movement and spanned a gradient of migration rates; high average migration rate was negatively correlated (180° difference in the angle of loadings) with long residence times and many repeated visits to arrays. The third principal component mainly reflected differences in downstream migration rate and did not further separate clusters visible in the two-dimensional plotting.

Table 6 Correlations between principal component weights and behavioral metrics as part of two pattern-recognition procedures
Fig. 7
figure 7

Ordination plot of smolt tags (points) along the first and second principal component axes based on summarized tag-level metrics for smolt-only pattern-recognition filter. Tag classification as smolt or predator was based on partition of data into two clusters in Ward hierarchical cluster analysis. Labeled vectors are relative PCA loadings of observed behavioral metrics from full smolt-tag data set

The cluster analysis for the multispecies pattern-recognition filter partitioned the full set of smolt and predator tags into five clusters, the two largest of which were classified as smolts and the remainder as predators. Five clusters were selected to jointly minimize the total number of clusters and the number of mixed clusters containing large majorities of smolt tags. The dendrogram in Fig. 5 shows the groupings at different levels of clustering; the great majority of known predator tags (35 of 37) are grouped at the top and bottom. The separation between groups of known predators reflected different patterns of behavior. The predator-rich grouping at the top of the dendrogram (cluster 1: 20 smolt tags, 21 predator tags) was separated from the two majority-smolt-tag clusters in the middle (clusters 2 and 3). Examination of the PCA ordination plot (not shown) and correlations between principal component weights and behavioral metrics (Table 6) indicated that cluster 1 was characterized by abnormally long residence times at telemetry stations. The fourth cluster included all 66 tags flagged as suspicious in the smolt-only procedure and 3 additional smolt tags. Admixture of 12 known predator tags with smolt tags in this cluster provided additional evidence of tag predation (Fig. 5d, cluster 4). The smallest cluster (cluster 5) contained only 2 tags (striped bass and white catfish), both of which deviated significantly from the norm based on their long detection sequences (> 35 events) and high frequency of upstream transitions. The 89 smolt tags in clusters 1, 4, and 5 were classified as predated (Fig. 5d). More details on the cluster analysis results are provided in the Additional file 1.

The ROE method performed better for the smolt-only analysis than for the multispecies analysis. In the smolt-only analysis, the ordination points representing the 66 predated tags left the ellipse across a range of detection events and did not return, and only 4 of the points moved outside the ellipse prior to the event minimum (\({i}_{0}=\) 3; Additional file 1: Fig. S2a). In the multispecies analysis, of the 89 smolt tags classified as predators in the cluster analysis (\({T}_{P}\)), 19 (21%) crossed the ellipse boundary before the event minimum (\({i}_{0}=\) 4) and 24 (27%) crossed the ellipse boundary more than once (Additional file 1: Fig. S2b). The 19 tags that left the ellipse before the event minimum were classified as predated at the fourth detection event (\(i=\) 4) by default. Many of the points associated with known tagged predators also left the ellipse when only a few detection events had been summarized and in similar directions, thus increasing confidence that smolt tags exiting the ellipse early were aberrant in the same way. Tags that returned to the ellipse after initial exit suggested that aberrant behavior was only temporary for some purported predated tags.

The two pattern-recognition filters successfully diagnosed 89% (smolt-only) and 100% (multispecies) of the 37 known predator tags as predated during the 2016 salmon study. Although two largemouth bass tags were assigned to a “smolt” cluster in the multispecies pattern-recognition filter (Fig. 5d), they both exited the ellipse in the ROE method and so were finally classified as predated. Both of those tags were classified as smolts in the ROE component of the smolt-only pattern-recognition filter (i.e., remained within the ellipse throughout their detection history), along with two other largemouth bass tags. These four bass tags tended to have short residence times, short detection histories, or partially directed downstream movement similar to behavior expected for a migrating smolt. However, one of these four tags was detected only at A11 in nine detection events over a period of two months, behavior that is unexpected for a migrating smolt; the lack of known predator tags in the smolt-only pattern-recognition filter may have resulted in a reduced ability to classify such behavior as aberrant.

The maximum travel time observed from release to the Turner Cut Junction (A20/T1) was 37.9 d using detection events that successfully passed undiagnosed through either of the pattern-recognition filters. Total survival to A20/T1 was estimated at 0.036 (\(\widehat{\mathrm{SE}}\) = 0.008; Table 3) using both filters. Additionally, cumulative survival for both filters largely followed the patterns from the unfiltered data until station A12 (Brandt Bridge), after which survival for the smolt-only analysis began to diverge from the unfiltered estimates. The largest difference in cumulative survival between the smolt-only pattern-recognition filter and the no-filter option was observed for A17/R1 (Navy Bridge/Rough and Ready Island; difference = 0.079, \(\widehat{\mathrm{SE}}\) = 0.034; Fig. 4). The cumulative survival estimates from the multispecies filter were consistently slightly lower (non-significant difference) than from the smolt-only filter from Brandt Bridge (A12) through the San Joaquin Shipping Channel (A19) (Fig. 4).

Comparison of filters

The filtering approach used affected both the number and composition of tags considered suspicious (Table 2, Fig. 5). More than twice as many tags were classified as suspect using the complex rule-based filter (n = 139) compared to the smolt-only pattern-recognition approach (n = 66). A total of 32 tags were classified as predated by all four filtering methods and 54 tags were classified as predated based on only one of the filters. Overall, the four approaches agreed on classifications for 77% of the 618 tags observed. Although the composition of flagged tags differed across filtering methods, subsets of suspicious tags shared some similar attributes, such as upstream transitions and longer, more variable residence times. For example, the proportion of tags with at least two upstream transitions was only 1% for the full population of tags, compared with 6% for tags flagged by the complex rule-based filter, 12% for the simple pattern-recognition filter, and 9% for the other two filters. The harmonic mean of the near-field residence time of tags flagged as predators by the smolt-only pattern-recognition filter was considerably higher at 0.84 h (\(\widehat{\mathrm{SE}}\) = 0.20 h) than the population mean of 0.06 h (\(\widehat{\mathrm{SE}}\) = 0.01 h).

There was generally higher agreement in predated tag classifications between related methods. For the rule-based filters, 94% (87 of 93) of the tags flagged by the simple version were also flagged by the complex version (Fig. 5a and b). The complex rule-based filter flagged an additional 52 tags as suspicious, primarily from Mossdale through the head of Old River (A6–A8/B1; Table 2, Fig. 6). Likewise, all 66 tags flagged by the smolt-only pattern-recognition filter were also identified by the multispecies pattern-recognition filter (Fig. 5c and d). The additional data provided by the known predators in the multispecies version resulted in flagging another 23 salmon tags. Overall, the two rule-based filters agreed on the outcome of 560 tags (91% of those observed), and the two pattern-recognition filters agreed on the outcome of 595 tags (96%) (Fig. 5).

The detection event where a tag’s first predator classification was assigned varied spatially among filters (Fig. 6). The predator classifications from the two rule-based filters were concentrated more around the head of Old River (A6–A8), whereas those from the pattern-recognition filters tended to concentrate in the San Joaquin River downstream of Old River (A9–A11). All filters flagged a high proportion of suspicious detections in the region between Banta Carbona and Frewert Road (A5–A11).

When translated to survival, the differences observed among the four filtering approaches and the degenerate “no filter” approach varied by spatial scale and region (Fig. 4). On the reach scale defined by adjacent telemetry stations (Fig. 1), the maximum absolute difference in survival estimates among the filtering approaches ranged from < 0.001 for the reach from Banta Carbona to Mossdale (A5–A6) to 0.112 for the reach immediately preceding Garwood Bridge (A15–A16). In the latter reach, survival estimates ranged from 0.875 (\(\widehat{\mathrm{SE}}\) = 0.039) for the smolt-only pattern-recognition filter to 0.987 (\(\widehat{\mathrm{SE}}\) = 0.013) for the complex rule-based filter. The reach from Navy Drive Bridge to the Calaveras River station (A17–A18) had the highest relative variation in survival estimates of all reaches (coefficient of variation [CV] = 5.5%); survival in this reach was estimated at 0.710–0.739 (\(\widehat{\mathrm{SE}}\) ≤ 0.061) for the two pattern-recognition filters, compared to 0.654–0.658 (\(\widehat{\mathrm{SE}}\) ≤ 0.055) for the two rule-based filters and the no-filter approach (Additional file 1: Table S5).

On the regional scale, the reaches from Durham Ferry to Mossdale (A0–A6) were least sensitive to filter choice, with survival estimates of 0.642–0.644 (\(\widehat{\mathrm{SE}}\) = 0.019) for all filtering methods (CV = 0.1%; Table 3). The region with the highest sensitivity to filter choice was the head of Old River to the Weston Ranch station near Stockton (A8–A14), where survival estimates ranged from 0.234 (\(\widehat{\mathrm{SE}}\) = 0.027) for the multispecies pattern-recognition filter to 0.318 (\(\widehat{\mathrm{SE}}\) = 0.030) for the complex rule-based filter, and the CV of the survival estimates was 13.7%. The pattern-recognition filters tended to produce lower survival estimates until the upstream Stockton reaches (A14), after which the rule-based filters had lower estimates (Table 3, Fig. 4). The spatial variation in filter rankings acted to stabilize the cumulative survival estimates on larger spatial scales so that there was little absolute difference in total survival estimates from release to the Turner Cut Junction: 0.036 (\(\widehat{\mathrm{SE}}\) = 0.008) for the pattern-recognition filters to 0.044 (\(\widehat{\mathrm{SE}}\) = 0.009) for the complex rule-based filter and the “no-filter” approach. Although the variation was small on the absolute scale, the low magnitude of the survival estimates overall resulted in a relatively high CV of 10.0% among survival estimates on this spatial scale. Closer examination of the 2016 data demonstrated that the majority (66%) of tags had detection histories that simply ended before reaching the Turner Cut Junction and did not receive a predated classification from any filter.

The filtering approach had little effect on the distribution of observed travel time through the system for fish that completed travel within approximately one week of release, as demonstrated by the first 75% of the cumulative arrival distributions to the Turner Cut Junction (Fig. 3). The primary effect of the predator filters on perceived travel time was observed for the last 10% of the arriving fish. The rule-based filters explicitly considered either long travel times or long residence times and thus removed the last tag detection(s) at the Turner Cut Junction, reducing the maximum observed travel time from approximately 38 d to 18 d using the simple filter and 10.5 d for the complex filter. The pattern-recognition filters, on the other hand, were more sensitive to measures of upstream-directed movements than to long travel times (Fig. 7) and ended up removing more Turner Cut Junction detections between approximately days 8 and 12 but did not remove the latest detections at that site.

Discussion

Predation of study fish is a common occurrence and is likely the proximate cause of mortality in many studies of juvenile fish. Although some studies assume negligible mortality whether from predation or other sources (e.g., behavior studies), survival studies are designed to characterize the magnitude of mortality and the factors that contribute to it. The large majority of tagging studies require that the measures in the data set represent only live study fish rather than a mix of live study fish, predators, and deposited tags. While a variety of methods have been used previously to diagnose tag predation, this is the first study to compare results of different methods and assess the sensitivity of study results to the diagnosis process used. For this data set, we observed considerable variability in predator status classification depending on the predator filter but minimal effect on survival estimates (e.g., Fig. 4) and measures of average travel time. The largest effects were seen in estimates of the upper quantiles of the travel time distribution (e.g., Fig. 3).

Some spatial variability between filter types was anticipated because the rule-based filters were constructed to assign some predator classifications at the start of detection events (i.e., predation event occurred in the reach between adjacent stations) and some at the end of detection events (i.e., predation event occurred in the vicinity of the station) depending on the filter criteria that were violated, whereas the pattern-recognition filters were constructed to assign all predator classifications in the reaches between stations (Fig. 6). Spatial variability in outcomes may also have arisen because of variable weighting placed on certain behaviors or different behavior patterns observed in different regions. Although none of the filters inherently weights any metrics over others, the different methods of diagnosing predation (e.g., predator score > 2 for the complex rule-based filter versus clustering approach for the pattern-recognition filters) may provide more leverage to some metrics or criteria than others. Additionally, some behavior that is flagged by the rule-based filters may not be flagged by the pattern-recognition filters if that behavior is common in the data set. The cluster analysis and the ROE approach of the pattern-recognition filters rely on the implicit assumption that a majority of tags are possessed by smolts, which is not assumed by the rule-based filters. Similarly, values of metrics that are extreme in one region may be common in other regions even for the focal species (e.g., longer salmon residence times may be observed in the more tidally influenced regions but rare upstream) and thus some events classified as predated by the rule-based filters may evade flagging in the pattern-recognition filters and vice versa.

Despite the variability in predated tag classifications among the filters, the resulting absolute differences in survival estimates in our case study tended to be small and within the sampling error. When translated to the relative scale, however, the differences between the various filtered and unfiltered estimates of cumulative survival were larger, ranging up to 19% for survival to the Turner Cut Junction (A20/T1) and 31% for survival to the Navy Bridge/Rough and Ready Island region (A17/R1) (pattern-recognition filters, Fig. 4). Survival to A17/R1 also had the largest variability among filtered estimates (CV = 16.7%). These results are comparable to the large differences in relative survival observed in a 2010 study of subyearling Chinook salmon in the region, in which the estimate of total Delta survival from Mossdale to Delta exit at Chipps Island (55 km downstream of A20/T1) was reduced by 50% when a similar complex rule-based filter was applied to the data set (0.11 vs 0.05 [11]).

The four predator filters considered here entailed different tradeoffs in their level of complexity, subjectivity, and interpretability and in the effort required by the researcher. Higher levels of complexity resulted from using more data (e.g., multispecies versus smolt-only tag data), increased degrees of freedom represented by more metrics (e.g., complex versus simple rule-based filters), or more decisions that also represented a higher level of effort and subjectivity (e.g., complex rule-based filter compared to the others). On the other hand, the rule-based filters resulted in clearly defined state assignments whereas the cluster analysis in the pattern-recognition filters required further interpretation to yield state assignments (e.g., the ROE method).

The issue of subjectivity looms large in adapting any predator filter approach and making inferences from its results. In some cases, particular tag movements may be well beyond the swimming capabilities of the focal species and the choice to omit some or all of a tag record is obvious. However, there are likely to be many more cases where subjective judgements can significantly influence findings. The two primary filtering approaches considered here varied in their demands on subject matter knowledge by the researcher and their degree of subjectivity. The metrics and criteria in the rule-based filters were based entirely upon expert knowledge gleaned from literature review, consultation with biologists, and personal experience with similar data sets. Pattern-recognition methods that use multivariate or machine learning, on the other hand, prioritize letting signals emerge from the data rather than requiring explicit biological judgements from the researcher. A distinct benefit of the pattern-recognition approach lies in its ability to isolate sets of abnormal behavior metrics without requiring the researcher to identify precise numeric criteria associated with smolt-like behavior. Thus, it may appear that pattern-recognition filters or similar statistics-based filters remove all subjectivity from the exercise. However, there remains subjectivity in selecting the metrics to include, how to scale or transform the metrics, how many clusters to use, and how to interpret the clusters. In our analysis, we were confronted with several options of distance measures, clustering algorithms, statistics for choosing an optimal number of clusters, and ordination approaches. Thus, although pattern-recognition approaches remove the researcher’s responsibility for setting numeric thresholds, there remain a multitude of decisions that may affect the outcome.

There are limitations to any predator filter’s ability to correctly identify all predator detections. Even a well-designed filter based on realistic and defensible understanding of the behavior of the focal species and predators may miss some predator detections and/or misclassify some valid detections. A predated tag that has a short detection history may have too few opportunities to demonstrate aberrant behavior to trigger a predator classification. Highly variable behavior in the focal species will also make it difficult to accurately distinguish between predated and non-predated tags. Other factors contributing to variability in filter outcome include the composition, mobility, and home range of the predator community, the density of the telemetry network and the spatiotemporal scale of inference, and the amount of data available.

The availability of predator data in particular may affect the outcome of the multispecies pattern-recognition filter and filter testing. This case study used opportunistic detections of four species of piscivorous fish that had been tagged with long-lived acoustic tags one or two years prior to the study. All four predator filters performed well on these predator data, indicating that the filters were sensitive to predator behavior patterns. However, the long delay between predator tagging and the 2016 study resulted in a small and possibly non-representative predator data set: only 37 individual predators were represented, 24 (65%) of which were largemouth bass and most of which were detected in the segment between the A6 and A12 arrays. Predators that exhibit spring migrations may be more likely than resident fish to pass through these predator filters without being flagged. A larger and more representative sample of the full predator population present during the 2016 study may have demonstrated unidentified weaknesses of the filters or resulted in different predator classifications from the multispecies filter; it may also have rectified some of the spatial differences in survival estimates between the multispecies filter and the rule-based filters.

Use of a formal predator filter may be expected to be more important in some settings than in others. Our case study suggests that it is more important when studying survival over smaller spatial scales, for intermediate levels of survival, or to characterize maximum travel time. If the predators’ home range lies entirely within the extent of a single reach for which survival is estimated, then tag predation will have no effect on survival estimation because predators are not expected to pass a monitoring station. It is when a predator’s range includes one or more monitoring sites that tag predation may bias survival estimation and a predator filter is recommended. Additionally, formal methods may be more necessary in settings in which individuals of the focus species may legitimately move in the direction opposite their eventual target, such as a tidal region with reverse flows; in simpler settings, transitions in an unexpected direction may be easily interpreted as evidence of predation, whereas in more complex settings it is necessary to compare observations with prevailing environmental conditions. Thus, formal predator filters are often omitted from studies of juvenile salmon migratory survival through the 40- to 100-km riverine reaches in the Columbia River; this region has no reverse flows and the predator community is dominated by avian species and piscine species such as Northern pikeminnow Ptychocheilus oregonensis, which exhibit mostly small-scale movements near dams or movements upriver during the spring juvenile salmonid emigration [53]. In this setting, upstream-directed movement may be easily interpreted as evidence of predation. Formal predator filters are more common in studies of salmon migratory survival through the 1- to 10-km tidally influenced reaches in the Sacramento–San Joaquin River delta in the presence of reverse flows and striped bass that may move about the entire south Delta and between the Delta and the Pacific Ocean [54].

Guidance on applying predator filters

The choice of predator filter depends on the study objectives, the type and source of available data, and existing knowledge of the system. Simple rule-based filters are the only reasonable choice when there are few available metrics available. In studies without ancillary behavioral or environmental data, researchers can at least count on estimates of movement rate and direction based on the timing of site detections. The upper limits of swimming ability are usually well documented and can form the basis for sensible, albeit conservative, behavioral rules for study subjects. When there is little knowledge of the study species and its environment but more available metrics (> 4), then pattern-recognition filters are recommended. These data-driven approaches can decompose and describe patterns of covariance in the absence of strong biological justifications for more complex rule-based methods. Furthermore, cluster analysis and PCA can easily accommodate a large number of metrics even if they are somewhat correlated. However, multivariate techniques should still be applied judiciously as results may be sensitive to which metrics are included and how they are transformed. Additionally, care should be taken in interpreting multivariate outcomes if high rates of tag predation are anticipated, because the majority patterns detected may reflect predators rather than the focal species.

If the study system is well-understood, then a more complex rule-based filter may be justified and choice between a rule-based and pattern-recognition filter depends on the needs of the study. Only when a system is well-understood and there are many metrics available is a complex rule-based filter suitable. Presently, these filters are better suited than the other approaches for adding a spatial component to the filtering process and they may also be better at identifying predation in data sets where the majority of detections come from predated tags. Additionally, rule-based filters are better suited than pattern-recognition filters at providing information on when in a detection history a predation event occurred and which specific threshold was exceeded. The ROE method for assigning tag predation events in the pattern-recognition filter has compelling statistical justification but is not so easily interpreted. Also, a “predator score” from a complex rule-based filter may even be used as a proxy for a probability of being predated. However, serious thought must be given to whether the additional labor of constructing and calibrating this type of filter is worthwhile.

Even in settings in which less complicated filtering methods may be suitable, we recommend that researchers clearly document their assumptions and filtering procedures. Unrecognized conceptions of what constitutes realistic behavior can influence results and may have a cascading effect. Furthermore, ad hoc or undocumented procedures bear the risk of investigator drift during implementation, in which a rule for omitting detections may become more or less strict as it is implemented over the complete set of tags and detections. Thus, we recommend identifying and documenting a well-defined procedure to diagnose predation among detection data. The methods should be tailored to the study’s focal species, likely predator species, setting, season, and monitoring network. We also recommend that study results be examined for a range of filtering assumptions, at minimum comparing outcomes using the researcher’s best understanding of the true state of the detection data (presumably the filtered data) with outcomes from data that are either entirely or partially unfiltered.

More sophisticated modeling techniques and new tag technologies could provide avenues for more robustly addressing the tag predation problem. Modeling techniques designed to account for unobservable or partially observable states, such as hidden Markov and state–space models [55,56,57], could be used to propagate the uncertainty in predation status to the final study results. Additionally, there now exist specialized microacoustic transmitters that change their output when they have been predated (predator tags [58, 59]). For example, manufacturers have designed an acoustic tag that transmits an alternative signal after a polymer coating has broken down in a predator’s gut [60]. Such predator tags remove the sensitivity of predation detection to behavioral differences. Even with these specialized tags, however, there remain challenges of correctly identifying where and when the predation event occurred, caused in part by variation in digestion rate and the time delay between the predation event and the onset of the predation signal (trigger time) [14].

Conclusions

Technological advances in telemetry hold great potential for more complete and timely information on individual fish of a widening range of species, life stages, and sizes. With the ability to tag smaller fish and the use of telemetry to study survival over smaller spatiotemporal scales comes the increased risk that tag predation may bias study results. The magnitude of the bias will be apparent only when researchers intentionally investigate the outcomes of one or more predator filters. Rule-based filters have great flexibility but require a high degree of researcher judgement; pattern-recognition filters are more automated but depend heavily on the individual tags and metrics included. The tag predation problem and the data-filtering approaches used to address it warrant serious attention by investigators and further research. We encourage researchers to articulate their assumptions and filtering rules and to report the robustness of their results to the filtering procedure invoked. Such practices should be part of accepted methodology to generate repeatable, defensible science in fisheries research.