figure a
figure b

1 Introduction

Over the past decade, deep neural networks (DNNs) have successfully solved challenging artificial intelligence problems [47, 70]. Abstractly, these models can be thought of as providing interfaces to real-world data—e.g., they can provide object classes [30, 47], detections [59, 60], and trajectories [10, 11, 83]. Then, these predictions are processed by programs, e.g., to identify driving patterns [5], events in TV broadcasts [28], or animal behaviors [67].

However, writing such programs can be challenging since they must still account for the fuzziness of real data. To do so, these programs typically include real-valued parameters that need to be manually tuned by the user. For example, consider a query over car trajectories designed to identify instances where one car turns in front of another. This query must capture the shape of the trajectory of both the turning car and the car crossing the intersection. In addition, the user must select the appropriate maximum duration from the first car changing lanes to the second car crossing the intersection. Even an expert would require significant experimentation to determine good parameter values; in our experience, it can take up to an hour to tune the parameters for a single query.

We focus on programs that query databases of trajectories output by an object tracker [5, 7, 8, 28, 40,41,42, 54]. Given a video, the tracker predicts the positions of objects in each frame (e.g., cars, people, or mice), as well as associations between detections of the same object across successive frames. Applications often require subsequent analysis of these trajectories. For example, in autonomous driving, when a risky scenario is encountered, engineers typically search for additional examples of that driving pattern to improve their planner [63, 64, 66]—e.g., cars driving too close [82] or stopping in the middle of the road [6]. Object tracking has also been used to track robots [58, 81], animals for behavioral analysis [12, 67, 75], and basketball players for sports analytics [67, 85].

We propose an algorithm for synthesizing queries over object trajectories given just a handful of input-output examples. A query takes as input a representation of a trajectory as a sequence of states (e.g., position, velocity, and acceleration) in successive frames of the video, and outputs whether the trajectory matches its semantics. Our query language is based on regular expressions—in particular, a query is a composition of a user-extensible set of predicates using the sequencing, conjunction, and iteration operators. For instance, trajectories might correspond to cars in a video; Fig. 1 shows a query for identifying cars turning at an intersection. As we discuss in Sect. 6, the full query language semantics is rich enough to subsume (variants of) Kleene algebras with tests (KAT) [46] and signal temporal logic (STL) [50]; however, such generality is seldom needed, so we use a pared-down query language that works well in practice.

Our algorithm performs enumerative search over the space of possible queries to identify ones that are consistent with the given examples. A key challenge in our setting is that our predicates have real-valued parameters that must also be synthesized. Thus, our strategy enumerates sketches, which are partial programs that only contain holes corresponding to real-valued parameters. For each sketch, we search over the space of real-valued parameters, while using an efficient pruning strategy to reduce the search space. At a high level, we use a quantitative semantics to directly compute “boundary parameters” at which a given example switches from being labeled positive to negative. Then, depending on the target label, we can prune the entire region of the search space on one side of these boundary parameters. We prove that this synthesis strategy comes with soundness and (partial) completeness guarantees.

We implement our approach in a system called Quivr.Footnote 1 Our implementation focuses on videos from fixed-position cameras. While our language and synthesis algorithm are general, the predicates we design are tailored to specific settings. We evaluate Quivr on identifying driving patterns in traffic videos, including ones inspired by recent work on autonomous driving [63, 64, 66], on behavior detection in a dataset of mouse trajectories [72], and on a synthetic task from the temporal logic synthesis literature [44]. We demonstrate how both our parameter pruning strategies and our query evaluation optimizations lead to substantial reductions in the running time of our synthesizer.

In summary, our contributions are:

  • A language for querying object trajectories (Sect. 3) and an algorithm for synthesizing such queries from examples (Sect. 4).

  • An efficient parameter pruning approach based on a novel quantitative semantics (Sect. 4), yielding a \(5.0{\times }\) speedup over the state-of-the-art quantitative pruning technique from the temporal logic synthesis literature.

  • An implementation of our approach in Quivr, and an evaluation of Quivr on identifying driving behaviors in traffic camera video and mouse behaviors in a dataset of mouse trajectories (Sect. 5), demonstrating substantially better accuracy than neural network baselines.

Fig. 1.
figure 1

(a) A video frame from a traffic camera, along with object trajectories (red) and manually annotated lanes (black). (b) The trajectories selected by the query (bottom), which selects cars turning at the intersection. (Color figure online)

2 Overview

We consider a hypothetical scenario where an engineer is designing a control algorithm for an autonomous car and would like to identify certain driving patterns in video data. We show how they can use our framework to synthesize a query to identify car trajectories that exhibit a given behavior.

Video Data. Traffic cameras are a rich source of driving behaviors [5, 13, 61]; one dataset used in our evaluation is YTStreams [7], which includes video from several such cameras. Figure 1(a) shows a single frame from such a video; we have used an object tracker [83] to identify all car trajectories (in red).

Predicates. Quivr assumes it is given a set of predicates that match portions of trajectories exhibiting behaviors of interest; during synthesis, it considers queries composed of these predicates. In Fig. 1(a), the engineer has manually annotated the lanes of interest in this video (black), to specify four InLaneK predicates that select trajectories of cars driving in each lane K visible in the video. Predicates may be configured by real-valued parameters. For example,

$$\begin{aligned} \langle \textsf {InLane1} \rangle \wedge \langle \textsf {DispLt}_{\theta } \rangle \end{aligned}$$

searches for trajectories where the car stays in lane 1 for a period of time and the car has a displacement at most \(\theta \) between the beginning and end of that period. Note that atomic predicates, like \(\langle \textsf {DispLt}_{\theta } \rangle \), can match multiple time-steps, whereas in formalisms like regular expressions and temporal logic, atomic predicates are over single time-steps. A key feature of our framework is that the set of available predicates is highly extensible, and the user can provide their own. See Sect. 5.1 for the predicates we use in our evaluation.

Synthesis. To specify a driving pattern, the engineer provides a small number of initial positive and negative examples of trajectories; then, Quivr synthesizes a query that correctly labels these examples. In Fig. 1(b), we show the result of executing the query shown, which is synthesized to identify left turns in the data. Often, there are multiple queries consistent with the initial examples. While it may be hard for users to sift through the video for positive examples, it is usually easy for them to label a given trajectory. Thus, to disambiguate, Quivr asks the user to label additional trajectories [19, 36, 62].

Fig. 2.
figure 2

A single match (top) for the multi-object query (bottom) which captures one car, A, turning into a lane behind another car, B, that is in that lane. The trajectories change color from red to green as a function of time. As can be seen, the car making the right turn does so just after the car going straight passes through the intersection. (Color figure online)

Multi-object Queries. So far, we have focused on queries that identify trajectories by processing each trajectory in isolation. A key feature of our framework is that users can express queries over multiple trajectories—for example,

$$\begin{aligned} \big (\langle \textsf {InLane1}(B) \rangle \wedge \langle \textsf {ChangeLane2To1}(A) \rangle \big ) \,;\,\langle \textsf {InFront}(A,B) \rangle . \end{aligned}$$

This query says that car B is in lane 1 while car A changes from lane 2 to lane 1, and car A ends up in front of car B. Note that the predicates now include variables indicating which object they refer to, and the predicate \(\textsf {InFront}(A,B)\) refers to multiple objects. An example of a pair of trajectories selected by a multi-object query is shown in Fig. 2.

3 Query Language

We describe our query language for matching object trajectories in videos. Our system first preprocesses the video using an object tracker to obtain trajectories, which are sequences \(z=(x_0,x_1,...,x_{n-1})\) of states \(x_i\in \mathcal {X}\). Then, a query Q in our language maps each trajectory z to a value \(\mathbb {B}=\{0,1\}\) indicating whether it matches z. Our language is similar to both STL and KAT. One key difference is that predicates are over arbitrary subsequences of z rather than single states x. In the main paper, we consider a simpler language, but in Appendix A we show how it can be extended to subsume both STL and KAT.

Trajectories. We begin by describing the input to a query in our language, which is the representation of one or more concurrent object trajectories in a video.

Consider a space \(\mathcal {S}\) corresponding to a single object detection in a single video frame—e.g., \(s\in \mathcal {S}\subseteq \mathbb {R}^6\) might encode the 2D position, velocity, and acceleration of s in image coordinates. When considering m concurrent objects, let the space of states \(\mathcal {X} = \mathcal {S}^m\), and then a trajectory \(z\in \mathcal {Z} = \mathcal {X}^*\) is a sequence \(z=(x_0,x_1,...,x_{n-1})\) of states of length \(|z|=n\). We use the notation \(z_{i:j}=(z_i,z_{i+1},...,z_{j-1})\) to denote a subtrajectory of z.

Predicates. We assume a set of predicates \(\varPhi \) is given, where each predicate \(\varphi \in \varPhi \) matches trajectories \(z\in \mathcal {Z}\); we use \(\textsf {sat}_{\varphi }(z)\in \mathbb {B}=\{0,1\}\) to indicate that \(\varphi \) matches z. As discussed below, queries in our language compose these predicates to match more complex patterns.

Next, predicates in our language may have real-valued parameters that must be specified. We denote such a predicate \(\varphi \) with parameter \(\theta \in \mathbb {R}\) by \(\varphi _\theta \). To enable our synthesis algorithm to efficiently synthesize these real-valued parameters, we leverage the monotonicity in all such predicates we have used in our queries. In particular, we assume that the semantics of these predicates have the form

$$\begin{aligned} \llbracket \varphi _{\theta }\rrbracket (z){:}{=}\mathbbm {1}(\iota _{\varphi }(z)\ge \theta ), \end{aligned}$$

where \(\iota _\varphi :\mathcal {Z}\rightarrow \mathbb {R}\) is a scoring function. We also assume that the range of \(\iota _\varphi \) is bounded (which can be achieved with a sigmoid function, if necessary). For example, for the predicate \(\textsf {DispLt}_{\theta }\), we have \(\iota _{\textsf {DispLt}}(z)=-\Vert z_0 - z_{n-1} \Vert \). Thus, \(\iota _{\textsf {DispLt}}(z)\ge \theta \) says the total displacement is at most \(-\theta \). We describe the predicates we include in Sect. 5.1; they can easily be extended.

Syntax. The syntax of our language is

$$\begin{aligned} Q&\,{::=}\, \varphi \mid Q \,;\,Q \mid Q^k \mid Q \wedge Q, \end{aligned}$$

where \(Q^k=Q;Q;...;Q\) (k times). That is, the base case is a single predicate \(\varphi \), and queries can be composed using sequencing (\(Q\,;\,Q\)) and conjunction (\(Q\wedge Q\)). Operators for disjunction, negation, Kleene star, and STL’s “until” are discussed in Appendix A.2. We describe constraints imposed on our language during synthesis in Sect. 4.7.

Semantics. The satisfaction semantics of queries have type \(\llbracket \cdot \rrbracket : \mathcal {Q} \rightarrow \mathcal {Z} \rightarrow \mathbb {B}\), where \(\mathcal {Q}\) is the set of all queries in our language, \(\mathcal {Z}\) is the set of trajectories, and \(\mathbb {B}=\{0,1\}\). In particular, \(\llbracket Q \rrbracket (z)\in \mathbb {B}\) indicates whether the query Q matches trajectory z. The semantics are defined in Fig. 3. The base case of a single predicate \(\varphi \) checks whether \(\varphi \) matches z; conjunction \(Q_1 \wedge Q_2\) checks if both conjuncts match; and sequencing \(Q_1\,;\,Q_2\) checks if z can be split into \(z=z_{0:k}z_{k:n}\) in a way that \(Q_1\) matches \(z_{0:k}\) and \(Q_2\) matches \(z_{k:n}\). The semantics can be evaluated in time \(O(\left| Q \right| \cdot n^2)\).

Fig. 3.
figure 3

Satisfaction semantics of our query language; \(z\in \mathcal {Z}\) is a trajectory of length n and \(\varphi \in \varPhi \) are predicates. Iteration (\(Q^k\)) can be expressed as repeated sequencing.

4 Synthesis Algorithm

We describe our algorithm for synthesizing queries consistent with a given set of examples. It performs a syntax-guided enumerative search over the space of possible queries [3]. In more detail, it enumerates sketches, which are partial programs where only parameter values are missing. For each sketch, it uses a quantitative pruning strategy to compute the subset of the input parameters for which the resulting query is consistent with the given examples. A key contribution is how our algorithm uses quantitative semantics for quantitative pruning.

4.1 Problem Formulation

Partial Queries. A partial query is in the grammar

$$\begin{aligned} Q \,{::=}\,\; ?? \mid \varphi _{??} \mid \varphi \mid Q \,;\,Q \mid Q^k \mid Q \wedge Q. \end{aligned}$$

Note that there are two kinds of holes: (i) a predicate hole \(h=\;??\) that can be filled by a sub-query Q, and (ii) a parameter hole \(h=\varphi _{??}\) that can be filled by a real value \(\theta _h\in \mathbb {R}\). We denote the predicate holes of Q by \(\mathcal {H}_{\varphi }(Q)\), the parameter holes by \(\mathcal {H}_{\theta }(Q)\), and let \(\mathcal {H}(Q)=\mathcal {H}_{\varphi }(Q)\cup \mathcal {H}_{\theta }(Q)\). A partial query Q is a sketch (denoted \(Q\in \mathcal {Q}_{\text {sketch}}\)) [71] if \(\mathcal {H}_{\varphi }(Q)=\varnothing \), and is complete (denoted \(Q\in \bar{\mathcal {Q}}\)) if \(\mathcal {H}(Q)=\varnothing \). For example, for \(Q=\langle \textsf {DispLt}_{??1} \rangle \wedge ??2\), we have \(\mathcal {H}_{\theta }(Q)=\{??1\}\) and \(\mathcal {H}_{\varphi }(Q)=\{??2\}\). (We label each hole \(h=\;??i\) with an identifier \(i\in \mathbb {N}\) to distinguish them.)

Refinements and Completions. Given query \(Q\in \mathcal {Q}\), predicate hole \(h\in \mathcal {H}_{\varphi }(Q)\), and production \(R=Q\rightarrow f(Q_1,...,Q_k)\) we can fill h with R (denoted \(Q'=\textsf {fill}(Q,h,R)\)) by replacing h with f(??1, ..., ??k), where each ??i is a fresh hole, and similarly given a parameter hole \(h\in \mathcal {H}_{\theta }(Q)\) and a value \(\theta _h\in \mathbb {R}\). We call \(Q'\) a child of Q (denoted \(Q\rightarrow Q'\)). Next, we call \(Q''\) a refinement of Q (denoted \(Q\xrightarrow {*}Q''\)) if there exists a sequence \(Q\rightarrow ...\rightarrow Q''\); if furthermore \(Q''\in \bar{\mathcal {Q}}\), we say it is a completion of Q. For example, we have

$$\begin{aligned} ??1&\rightarrow \; ??2 \,;\,??3 \rightarrow \mathopen {}\left\langle {\textsf {InLane1}}_{{}_{}}\,\negthickspace \right\rangle \mathclose {} \,;\,??3 \rightarrow \ldots . \end{aligned}$$

Here, \(\mathopen {}\left\langle {\textsf {InLane1}}_{{}_{}}\,\negthickspace \right\rangle \mathclose {} \,;\,??3\) is a child (and refinement) of \(??2 \,;\,??3\) obtained by filling ??2 with \(Q\rightarrow \mathopen {}\left\langle {\textsf {InLane1}}_{{}_{}}\,\negthickspace \right\rangle \mathclose {}\)—i.e.,

$$\begin{aligned} \mathopen {}\left\langle {\textsf {InLane1}}_{{}_{}}\,\negthickspace \right\rangle \mathclose {} \,;\,??3 =\textsf {fill}(??2 \,;\,??3, ??2, Q\rightarrow \mathopen {}\left\langle {\textsf {InLane1}}_{{}_{}}\,\negthickspace \right\rangle \mathclose {}). \end{aligned}$$

Parameters. We let \(\theta \in \mathbb {R}^{|\mathcal {H}_\theta (Q)|}\) denote a choice of parameters for each \(h\in \mathcal {H}_\theta (Q)\), let \(\theta _h\in \varTheta _h\subseteq \mathbb {R}\) denote the parameter for hole h, and let \(Q_{\theta }\) denote the query obtained by filling each \(h\in \mathcal {H}_{\theta }(Q)\) with \(\theta _h\). Note that if \(Q\in \mathcal {Q}_{\text {sketch}}\), then \(Q_{\theta }\in \bar{\mathcal {Q}}\) is complete. For example, consider the sketch

$$\begin{aligned} Q=\langle \textsf {DispLt}_{??1} \rangle \wedge \langle \textsf {MinLength}_{??2} \rangle . \end{aligned}$$

This query has two holes, so its parameters are \(\theta \in \mathbb {R}^2\). If \(\theta = (3.2, 5.0)\), then \(\theta _{??1}=3.2\) is used to fill hole ??1 and \(\theta _{??2}=5.0\) is used to fill ??2. In particular,

$$\begin{aligned} Q_{\theta }=\langle \textsf {DispLt}_{3.2} \rangle \wedge \langle \textsf {MinLength}_{5.0} \rangle . \end{aligned}$$

Query Synthesis Problem. Given examples \(W\subseteq \mathcal {W}=\mathcal {Z}\times \mathbb {B}\), where \(\mathbb {B}=\{0,1\}\), our goal is to find a query \(Q\in \bar{\mathcal {Q}}\) that correctly labels these examples—i.e.,

$$\begin{aligned} \psi _W(Q){:}{=}\bigwedge _{(z,y)\in W}(\llbracket Q\rrbracket (z)=y). \end{aligned}$$

Thus, \(\psi _W(Q)\) indicates whether Q is consistent with the labeled examples W. Our goal is to devise a synthesis algorithm that is sound and complete—i.e., it finds a query that satisfies \(\psi _W(Q) = 1\) if and only if one exists.

4.2 Algorithm Overview

Our algorithm enumerates sketches \(Q\in \mathcal {Q}_{\text {sketch}}\); for each one, it tries to compute parameter values \(\theta \) such that the completed query \(Q_{\theta }\) is consistent with W—i.e., \(\psi _W(Q_\theta )=1\). It can either stop once it has found a consistent query, or identify additional queries that are consistent with W. Algorithm 1 shows this high-level strategy—at each iteration, it selects a sketch Q, determines a region B of the parameter space containing consistent parameters \(\theta \in B\), and adds (QB) to a list of consistent queries that solve the synthesis problem.

The key challenge is searching over the space of continuous parameters \(\theta \) for a given sketch Q such that \(Q_\theta \) is consistent with W. For efficiency, we rely heavily on pruning the search space. At a high level, consider evaluating a single candidate parameter \(\theta \) on a single example \((z,y)\in W\)—i.e., check whether \(\llbracket Q_\theta \rrbracket (z)=y\). If this condition does not hold, then we can not only prune \(\theta \) from the search space, but also a significant fraction of additional candidates. For instance, suppose \(\llbracket Q_\theta \rrbracket (z)=1\) but \(y=0\); if \(\theta '\le \theta \) (in all components), then by a monotonicity property we prove for our semantics, we also have \(\llbracket Q_{\theta '}\rrbracket (z)=1\). Thus, we can also prune \(\theta '\).

Previous work has leveraged this property to prune the search space [49, 53, 78]. Using a strategy based on binary search, for a given example \((z,y)\in W\), we can identify “boundary” parameters \(\theta \) to accuracy \(\varepsilon \) in \(O(\log (1/\varepsilon ))\) steps—i.e., compute \(\theta \) for which \(\llbracket Q_{\theta -\vec {\varepsilon }}\rrbracket (z)=1\) and \(\llbracket Q_{\theta +\vec {\varepsilon }}\rrbracket (z)=0\).

Our algorithm avoids this binary search process, which can lead to a significant speedup in practice. The key idea is to devise a quantitative semantics for queries that directly computes \(\theta \); in fact, this quantitative semantics is closely related to robust temporal logic semantics, where the conjunction and disjunction of the satisfaction semantics are replaced with minimum and maximum, respectively.

figure c

4.3 Pruning with Boundary Parameters

We begin by describing how “boundary parameters” can be used to prune a portion of the search space over parameters. First, for any candidate parameters \(\theta \), we can prune parameters \(\theta '\le \theta \) (if \(\llbracket Q_\theta \rrbracket (z)=1\) and \(y=0\)) or \(\theta '\ge \theta \) (if \(\llbracket Q_\theta \rrbracket (z)=0\) and \(y=1\)). Pruned regions of the parameter space take the form of hyper-rectangles, which we call boxes. For convenience, let \(\vec {\infty } {:}{=}(\infty , \ldots , \infty )\).

Definition 1

Given \(x, y \in \bar{\mathbb {R}}^d\), where \(\bar{\mathbb {R}}=\mathbb {R}\cup \{\pm \infty \}\), a box is an axis-aligned half-open hyper-rectangle \(\lfloor x, y \rceil {:}{=}\{v \mid x_i < v_i \le y_i\}\subseteq \mathbb {R}^d\).

The key property ensuring that parameters prune boxes of the search space is that the semantics are monotonically decreasing in \(\theta \).

Lemma 1

Given sketch Q, trajectory z, and two candidate parameters \(\theta ,\theta '\in \mathbb {R}^d\) such that \(\theta \le \theta '\) component-wise, we have \(\llbracket Q_\theta \rrbracket (z)\ge \llbracket Q_{\theta '} \rrbracket (z)\).

The proof follows by structural induction on the query semantics: the base case follows since the semantics \(\mathbbm {1}(\iota _\varphi (z)\ge \theta _k)\) for predicates is monotonically decreasing in \(\theta _k\), and the inductive case follows since conjunction and disjunction are monotonically increasing in their inputs (so they are also monotonically decreasing in \(\theta _k\)). Below, we show how monotonicity ensures that we can prune whole regions of the search space if we find boundary parameters.

As an example, suppose we have two trajectories, \(z_0\) of a car driving quickly and then slowly, and \(z_1\) of a car driving slowly and then quickly, and that we are trying to synthesize a query for \(W=\{(z_0,0),(z_1,1)\}\). For simplicity, we assume both \(z_0=(0.9,0.6)\) and \(z_1=(0.5,0.8)\) have just two time steps each, with just a single component representing velocity. Furthermore, we assume there is just a single predicate \(\mathopen {}\left\langle {\text {VelGt}_{\theta }}_{{}_{}}\,\negthickspace \right\rangle \mathclose {}\) matching time steps where the velocity is at least \(\theta \), where \(\theta \) is a real-valued parameter. Since \(\mathopen {}\left\langle {\text {VelGt}_{\theta }}_{{}_{}}\,\negthickspace \right\rangle \mathclose {}\) matches single time steps, the satisfaction semantics is 0 except on trajectories of length 1, so:

$$\begin{aligned} \iota _{\text {VelGt}}((z_0)_{0:1})&= 0.9&\iota _{\text {VelGt}}((z_0)_{1:2})&= 0.6&\iota _{\text {VelGt}}((z)_{i:i})&= -\infty \\ \iota _{\text {VelGt}}((z_1)_{0:1})&= 0.5&\iota _{\text {VelGt}}((z_1)_{1:2})&= 0.8&\iota _{\text {VelGt}}((z)_{0:2})&= -\infty \end{aligned}$$

Consider the sketch \(Q = \mathopen {}\left\langle {\text {VelGt}_{??1}}_{{}_{}}\,\negthickspace \right\rangle \mathclose {}; \mathopen {}\left\langle {\text {VelGt}_{??2}}_{{}_{}}\,\negthickspace \right\rangle \mathclose {}\). We can see that the candidate parameters (0.5, 0.6) satisfy \(\llbracket Q_{(0.5,0.6)} \rrbracket (z_1)=1\):

$$\begin{aligned} \llbracket Q_{(0.5, 0.6)} \rrbracket ((z_1)_{0:n})&= \bigvee _{k=0}^2 \llbracket \mathopen {}\left\langle {\text {VelGt}_{0.5}}_{{}_{}}\,\negthickspace \right\rangle \mathclose {} \rrbracket ((z_1)_{0:k}) \wedge \llbracket \mathopen {}\left\langle {\text {VelGt}_{0.6}}_{{}_{}}\,\negthickspace \right\rangle \mathclose {} \rrbracket ((z_1)_{k:n})\\&= \llbracket \mathopen {}\left\langle {\text {VelGt}_{0.5}}_{{}_{}}\,\negthickspace \right\rangle \mathclose {} \rrbracket ((z_1)_{0:1}) \wedge \llbracket \mathopen {}\left\langle {\text {VelGt}_{0.6}}_{{}_{}}\,\negthickspace \right\rangle \mathclose {} \rrbracket ((z_1)_{1:2})\\&= \mathbbm {1}(0.5 \ge 0.5) \wedge \mathbbm {1}(0.8 \ge 0.6) \\&= 1, \end{aligned}$$

where the second equality holds because \(\mathopen {}\left\langle {\text {VelGt}_{\theta }}_{{}_{}}\,\negthickspace \right\rangle \mathclose {}\) matches only length-1 trajectories, so the \(k=0\) and \(k=2\) cases evaluate to 0. Since the semantics are monotonically decreasing, we have \(\llbracket Q_\theta \rrbracket (z_1)=1\) for any \(\theta \in \left\lfloor (-\infty , -\infty ), (0.5, 0.6) \right\rceil \).

Notice, however, that if we were to move any \(\vec {\varepsilon } > 0\) upward, we would have \(\llbracket Q_{(0.5 + \varepsilon _1, 0.6 + \varepsilon _2)} \rrbracket (z_1) = \mathbbm {1}(0.5 \ge 0.5 + \varepsilon _1) \wedge \mathbbm {1}(0.8 \ge 0.6 + \varepsilon _2) = 0\). So we know \(\llbracket Q_\theta \rrbracket (z_1)=0\) for any \(\theta \in \left\lfloor (0.5, 0.6), (\infty , \infty ) \right\rceil \). This is because (0.5, 0.6) lies on the boundary between \(\{\theta '\mid \llbracket Q_{\theta '} \rrbracket (z)=0\}\) and \(\{\theta '\mid \llbracket Q_{\theta '} \rrbracket (z)=1\}\). This boundary plays a key role in our algorithm.

Definition 2

Given a sketch Q with d parameter holes and a trajectory z, we say \(\theta \in \mathbb {R}^d \cup \{\bot , \top \}\) is a boundary parameter if one of the following holds:

  • \(\theta \in \mathbb {R}^d\) and \(\llbracket Q_\theta \rrbracket (z) = 1\), but \(\llbracket Q_{\theta '} \rrbracket (z) = 0\) for all \(\theta ' \in \left\lfloor \theta , \vec {\infty } \right\rceil \)

  • \(\theta = \bot \) and \(\llbracket Q_{\theta '} \rrbracket (z) = 0\) for all \(\theta ' \in \left\lfloor -\vec {\infty }, \vec {\infty } \right\rceil \)

  • \(\theta = \top \) and \(\llbracket Q_{\theta '} \rrbracket (z) = 1\) for all \(\theta ' \in \left\lfloor -\vec {\infty }, \vec {\infty } \right\rceil \)

In the first case, by monotonicity, we also have \(\llbracket Q_{\theta '} \rrbracket (z)=1\) for all \(\theta '\in \left\lfloor -\vec {\infty }, \theta \right\rceil \); thus, \(\theta \) lies on the boundary between parameters \(\theta '\) where \(Q_{\theta '}\) evaluates to 1 and those where it evaluates to 0. The second and third cases are where \(Q_{\theta '}\) always evaluates to 0 and 1, respectively.

Given a boundary parameter \(\theta \) for an example \((z,y)\in W\), we can prune \(\left\lfloor \theta , \vec {\infty } \right\rceil \) if \(y=1\) or \(\left\lfloor -\vec {\infty }, \theta \right\rceil \) if \(y=0\). Intuitively, boundary parameters provide optimal pruning along a fixed direction in the parameter space. Thus, our algorithm focuses on computing boundary parameters for pruning.

In Fig. 4(a), if \(\theta _1\) is a boundary parameter for \(z_1\), we know that the blue region satisfies \(z_1\), and thus is consistent with the label 1, while the red region dissatisfies \(z_1\), and thus is inconsistent with the label 1. Similarly, in Fig. 4(b), if \(\theta _0\) is a boundary parameter for \(z_0\), we know that the red satisfies \(z_1\), and thus is inconsistent with the label 0, while the blue dissatisfies \(z_0\), and thus is consistent with the label 0.

Fig. 4.
figure 4

(a) shows a boundary parameter, \(\theta _1\), for \(z_1\), and a region that is inconsistent with \(z_1\) and can be pruned (red), as well as a region that is consistent with it (blue). (b) similarly shows a boundary parameter \(\theta _0\) for \(z_0\). (c) shows the pruning pair composed of \(\theta _0\) and \(\theta _1\), a region consistent with both (blue), and regions inconsistent with either (red). (d) is the same as (c), but if \(\theta _0\) and \(\theta _1\) swapped places. The labels \(b_0\) through \(b_8\) denote analogous boxes in (c) & (d). (e) shows how, if (d) were the result of the first step of search and \(b_6\) were chosen next, search could proceed. (f) shows ground truth consistent (blue) and inconsistent (red) regions that the search process in (d) & (e) might converge toward. (Color figure online)

4.4 Pruning with Pairs of Boundary Parameters

To extend pruning to the entire dataset W, we could simply prune the union of the individual pruned regions for each \((z,y)\in W\). However, one important feature of our approach is that we can also establish regions of the parameter space where the parameters are guaranteed to be consistent with W. To formalize this idea, we introduce the concept of a “pruning pair”, which is a pair of boundary parameters which might allow us to find such a consistent region.

Definition 3

Given a sketch Q and a dataset W, a pair of boundary parameters \(\theta ^-, \theta ^+ \in \mathbb {R}^d \cup \{\bot , \top \}\) is a pruning pair for Q and W if all of the following hold:

  • \(\theta ^+\) is a boundary parameter for some \(z \in W^+\) and, for all \(z'\in W^+\) such that \(z' \ne z\), we have \(\llbracket Q_{\theta ^+} \rrbracket (z') = 1\).

  • \(\theta ^-\) is a boundary parameter for some \(z \in W^-\) and, for all \(z'\in W^-\) such that \(z' \ne z\), we have \(\llbracket Q_{\theta ^-} \rrbracket (z') = 0\).

  • \(\theta ^- < \theta ^+\) or \(\theta ^- \ge \theta ^+\).

If \(\theta ^- < \theta ^+\), the pruning pair \((\theta ^-, \theta ^+)\) is consistent, and inconsistent otherwise.

Our algorithm searches for pruning pairs along a fixed direction—i.e., it considers a curve \(L\subseteq \mathbb {R}^d\) and looks for the following pruning pair along L:

$$\begin{aligned} \theta ^+=\sup \left\{ \theta \in L\biggm \vert \bigwedge _{z\in W^+}\llbracket Q_\theta \rrbracket (z)=1\right\} , \quad \theta ^-=\inf \left\{ \theta \in L\biggm \vert \bigwedge _{z\in W^-}\llbracket Q_\theta \rrbracket (z)=0\right\} . \end{aligned}$$

Intuitively, \(\theta ^+\) is the largest \(\theta \) that correctly classifies all positive examples, and conversely for \(\theta ^-\). We restrict to curves L that are monotonically increasing in all components, in which case the supremum and infimum are well defined since L comes with a total ordering (from its smallest point to its largest) that is consistent with the standard partial order on \(\mathbb {R}^d\). Then, \((\theta ^-,\theta ^+)\) form a pruning pair: since \(\theta ^+\) is a boundary parameter for z, if we take \(\theta ^+\) to be any larger, then we must have \(\llbracket Q_\theta \rrbracket (z)=0\) for some \(z\in W^+\), and similarly for \(\theta ^-\).

Given a curve L, we can compute an approximation to \(\theta ^+\) and \(\theta ^-\) via binary search. However, our algorithm avoids the need to do so by directly computing \(\theta ^+\) and \(\theta ^-\) using a quantitative semantics, which we describe in Sect. 4.6.

Figure 4(c) shows how the pair of boundary parameters \(\theta _0\) for \(z_0\) and \(\theta _1\) for \(z_1\) (where L is the diagonal line) prunes the parameter space. The blue region is guaranteed to be consistent with W, as it is the intersection of the region below \(\theta ^+\), which must satisfy \(\llbracket Q_\theta \rrbracket (z_1)=1\), and the region above \(\theta ^-\), which must satisfy \(\llbracket Q_\theta \rrbracket (z_0)=0\). Conversely, the red regions are inconsistent with either \(z_0\) or \(z_1\), and therefore with W. Thus, the red regions can be pruned, whereas the blue regions are solutions to our synthesis problem. Note that the red region is the union of the red regions in Fig. 4(a) and (b), whereas the blue region is the intersection of the blue regions in Fig. 4(a) and (b).

This pattern holds for any consistent pruning pair (\(\theta ^- < \theta ^+\)); if instead the pair is inconsistent (\(\theta ^- \ge \theta ^+\)), then the resulting pattern is illustrated in Fig. 4(d); in this case, we can prune the red regions as before, but there is no blue region of solutions. In general, for a d dimensional parameter space, a pruning pair divides the parameter space into \(3^d\) boxes (i.e., for each dimension, the box can be below, in line with, or above the center box). The regions below \(\theta ^-\) and above \(\theta ^+\) can be pruned, and the region between \(\theta ^-\) and \(\theta ^+\) (if one exists) contains synthesis solutions. Precisely, it follows from the definitions and monotonicity that:

Lemma 2

Every \(\theta \in \left\lfloor -\vec {\infty }, \theta ^- \right\rceil \) and \(\theta \in \left\lfloor \theta ^+, \infty \right\rceil \) is inconsistent with W, and every \(\theta \in \left\lfloor \theta ^-, \theta ^+ \right\rceil \) is consistent with W.

The remaining boxes need to be further analyzed by our algorithm.

figure d

4.5 Pruning Parameter Search Algorithm

Next, we describe Algorithm 2, which searches over the space of parameters to fill a sketch Q for a given dataset W. The algorithm uses a subroutine that takes a box and returns a pruning pair in that box, which we describe in Sect. 4.6. Given this subroutine, our algorithm maintains a work-list of “unknown” boxes (i.e., unknown whether parameters in these boxes are consistent or inconsistent with W). At each iteration, it pops a box from the work-list (in first-in-first-out order), uses the given subroutine to find a pruning pair inside that box, applies the pruning procedure described in the previous section, and then adds each new unknown box to the work-list.

For the last step, the current box b is divided into \(3^d\) smaller boxes. The box \(b_{\text {center}}{:}{=}\left\lfloor \min \{\theta ^-,\theta ^+\}, \max \{\theta ^-,\theta ^+\} \right\rceil \) is pruned (added to \(B_{\text {inc}}\)) if the pair \((\theta ^-,\theta ^+)\) is inconsistent, and contains solutions to the synthesis problem otherwise (added to \(B_{\text {con}}\)). The boxes \(b_{\text {lower}}=\left\lfloor -\vec {\infty }, \min \{\theta ^-,\theta ^+\} \right\rceil \) and \(b_{\text {upper}}=\left\lfloor \max \{\theta ^-,\theta ^+\}, \vec {\infty } \right\rceil \) are always pruned. The boxes \(b\in B_{\text {incomp}}\) are the remaining corners of b, and always have indeterminate consistency (added to \(B_{\text {unk}})\). The remaining boxes \(b\in B_{\text {extra}}\) are indeterminate if \((\theta ^-,\theta ^+)\) is consistent, and inconsistent otherwise. In our example, if the first step of the algorithm yielded Fig. 4(d), then the second step might pop \(b_6\) and yield Fig. 4(e).

The following soundness result follows directly from Lemma 2.

Theorem 1

In Algorithm 2, every \(\theta \in B_{{\text {con}}}\) is consistent with W for Q, and every \(\theta \in B_{{\text {inc}}}\) inconsistent.

In addition, the algorithm is complete for almost all parameters:

Theorem 2

The Lebesgue measure of \(\left\{ \theta \in b \mid b \in B_{{\text {unk}}}\right\} \rightarrow 0\) as \(N\rightarrow \infty \).

See Appendix D.1 for the proof. In other words, all parameters outside a subset of measure zero are eventually classified as consistent or inconsistent; intuitively, the parameters that may never be classified are the ones along the decision boundary. This result holds since at any search depth, the fraction of the parameter space pruned can be lower-bounded.

4.6 Computing Pruning Pairs via Quantitative Semantics

The pruning algorithm depends on the ability to compute, given a box b, a pruning pair \((\theta ^-, \theta ^+)\) on the restriction of the parameter space to b. Recall that \(\theta ^+\) must be a boundary parameter for some \(z^+ \in W^+\) and must satisfy \(\llbracket Q_{\theta ^+} \rrbracket (z)=1\) for all other \(z \in W^+\), and \(\theta ^-\) must be a boundary parameter for some \(z^- \in W^-\), and must satisfy \(\llbracket Q_{\theta ^-} \rrbracket (z)=0\) for all other \(z \in W^-\).

Given a box \(b=\left\lfloor \theta _{\text {min}}, \theta _{\text {max}} \right\rceil \), our algorithm takes \(L\subseteq \mathbb {R}^d\) to be the diagonal from \(\theta _{\text {min}}\) to \(\theta _{\text {max}}\) and computes the pruning pair along L. We can naïvely use binary search: for \(\theta ^+\), we search for the parameters where \(\bigwedge _{z \in W^+} \llbracket Q_\theta \rrbracket (z)\) transitions from 0 to 1, and similarly for \(\theta ^-\) and \(\bigwedge _{z \in W^-} \lnot \llbracket Q_\theta \rrbracket (z)\).

Instead, by leveraging a quantitative semantics, we can directly compute \(\theta ^+\) and \(\theta ^-\), thereby reducing computation time substantially. Given a sketch Q, trajectory z, parameter \(v \in \mathbb {R}^d\), and positive vector \(u \in \mathbb {R}_{>0}^d\), we devise a quantitative semantics \(\llbracket Q \rrbracket ^{q}_{v,u}(z) \in \bar{\mathbb {R}}\) such that the parameter \(\llbracket Q \rrbracket _{v,u}(z) \cdot u + v\) is a boundary parameter. Intuitively, this semantics computes, starting at v, how many u-sized steps must be taken to reach the boundary. (For the uses in our algorithm, the number of steps is always in [0, 1].) Then, for a box \(b = \left\lfloor \theta _{\text {min}}, \theta _{\text {max}} \right\rceil \), we can take \(v = \theta _{\text {min}}\) and \(u = \theta _{\text {max}}-\theta _{\text {min}}\), and compute

$$\begin{aligned} \theta ^+ = \left( \min _{z \in W^+} \llbracket Q \rrbracket ^{q}_{v,u}(z)\right) \cdot u + v, \qquad \theta ^- = \left( \max _{z \in W^-} \llbracket Q \rrbracket ^{q}_{v,u}(z)\right) \cdot u + v. \end{aligned}$$

We define the quantitative semantics in Fig. 5. The base case of \(\varphi _{??}\) adjusts and rescales \(\iota _\varphi \) by v and u, and the other cases replace conjunction and disjunction in the satisfaction semantics with minimum and maximum. We have the following key result (where \(\infty \cdot u {:}{=}\top \), \(-\infty \cdot u {:}{=}\bot \), \(\top + v {:}{=}\top \), and \(\bot + v {:}{=}\bot \)):

Theorem 3

For a sketch Q, trajectory z, parameter \(v \in \mathbb {R}^d\), and positive vector \(u \in \mathbb {R}_{>0}^d\), we have that \(\llbracket Q \rrbracket ^{q}_{v,u}(z) \cdot u + v\) is a boundary parameter of z for Q.

See Appendix D.2 for the full proof. For intuition, consider \(\theta _{\text {min}}=\vec {0}\) and \(\theta _{\text {max}}=\vec {1}\) (i.e., the current box \(b\subseteq \mathbb {R}^d\) is the unit hypercube). Then, \(v=\vec {0}\) and \(u=\vec {1}\), so \(\llbracket Q \rrbracket ^{q}_{v,u}\) reduces to the standard max-min quantitative semantics for temporal logic [25].

Fig. 5.
figure 5

The quantitative semantics of our language, taking a sketch Q, trajectory z, parameter \(v \in \mathbb {R}^d\), and positive vector \(u \in \mathbb {R}_{>0}^d\). n is the length of z.

Now, if we consider the satisfaction semantics of a base predicate \(\llbracket \varphi _{\theta _i} \rrbracket =\mathbbm {1}(\iota _{\varphi }(z)\ge \theta _i)\), then the value of \(\theta _i\) where the sementics flips is just \(\iota _\varphi (z)\). So any parameter with i-th component \(\iota _\varphi (z)\) is a boundary parameter, and since L has the same slope in all dimensions, the boundary parameter along L is \(\iota _\varphi (z) \cdot \vec {1} + \vec {0} = \llbracket \varphi _{??i} \rrbracket ^{q}_{\vec {0},\vec {1}}(z) \cdot \vec {1} + \vec {0}\).

In the inductive cases, it suffices to show that we can replace conjunction and disjunction with minimum and maximum in the semantics. Since the satisfaction semantics is monotonically decreasing, as we move upward along L, at some point we will transition from 1 to 0. A conjunction becomes 0 when either conjunct becomes 0, so the transition will occur when we hit the first of the conjuncts’ transition points (their minimum). Dually, a disjunction becomes 0 when both disjuncts become 0, so we will transition at the last of the disjuncts’ transition points (their maximum).

Finally, the intuition behind u and v is that they “preprocess” the parameters so that we evaluate along the diagonal of the current box instead of \(\left\lfloor \vec {0}, \vec {1} \right\rceil \).

4.7 Implementation

We implement our approach in a system called Quivr. It begins by running Algorithm 1 on a small number of labeled examples.

Active Learning. With a small number of examples, there are typically many queries that are consistent with the labels, and yet which disagree on the labels of the remaining data. To disambiguate, we use an active learning strategy, asking the user to label specific trajectories that we choose, which are then added to our set of labeled examples. Queries that are not consistent with the new label are discarded. The labeling process continues until the set of consistent queries agrees on the labels of all unlabeled data.

When choosing the trajectory \(z^*\) to query the user for next, we select the one on which the set of consistent queries C disagrees most—i.e.,

$$z^* = {\mathop {\hbox {arg min}}\limits _{z\in Z}}\left| J(z) - \frac{1}{2}\right| ,$$

where

$$J(z){:}{=}|C|^{-1}\sum _{Q_{\theta }\in C}\mathbbm {1}\left( \psi _{(z,y)}(Q_{\theta })\right) $$

is the fraction of consistent queries that predict a positive label for trajectory z.

Search Implementation. In some cases, searching for consistent parameters may take a very long time. To improve performance, we impose a timeout: for each sketch, we pause search if either: (i) we find some consistent box of parameters or (ii) we’ve exceeded 25 steps. In both cases, we save the sets of consistent, inconsistent, and unknown boxes. At each step of active learning, the newly labeled example may render previously consistent parameters inconsistent, so we mark all consistent boxes as unknown. We then resume search, again until (i) we find some consistent box (which may be the same one we had before), or (ii) we again exceed 25 steps.

Note that while this timeout may cause us to query the user more often than is strictly necessary, it does not affect either the soundness or completeness of our approach, as we continue search after querying the user.

Complete Query Selection. Active learning and evaluation of \(F_1\) scores (in Sect. 5) both require complete queries with specific parameters, rather than sketches with boxes of parameters. Since the set C of consistent queries is infinitely large, we instead we use one query for each sketch that is known to have consistent parameters (sketches where search timed-out are thus not included). For those sketches, we pick the middle of the box of known-consistent parameters.

5 Evaluation

We demonstrate how our approach can be used to synthesize queries to solve interesting tasks: in particular, we show that (i) given just a few initial examples, it can synthesize queries that achieve good performance on a held-out test set, and (ii) our optimizations significantly reduce the synthesis time.

5.1 Experimental Setup

Datasets. We evaluate on two datasets of object trajectories: YTStreams [7], consisting of video and extracted object trajectories from fixed-position traffic cameras, and MABe22 [72], consisting of trajectories of up to three mice interacting in a laboratory setting. We also evaluate on a synthetic maritime surveillance task from the STL synthesis literature [44]. On YTStreams, we use two traffic cameras, one in Tokyo and one in Warsaw, and we consider single cars or pairs of cars. On MABe22, we consider pairs of mice. For the predicates used, see Table 1 for YTStreams, Appendix Table 5 for MABe22, and Appendix Table 6 for maritime surveillance.

Table 1. The predicates used for the YTStreams dataset.

Tasks. On YTStreams, we manually wrote 5 ground truth queries. Several queries apply to multiple configurations (e.g., different pairs of lanes), resulting in 10 queries total (tasks H-Q in Table 2). The real-valued parameters were chosen manually, by visually examining whether they were selecting the desired trajectories. These queries cover a wide range of behaviors; for instance, they can capture behaviors such as human drivers making unprotected turns, an important challenge for autonomous cars [64], as well as cars trying to pass [66]. We show examples of trajectories selected by three of our multi-object queries in Fig. 6. MABe22 describes 9 queries for scientifically interesting mouse behavior. We implemented the 6 most complex to use as ground truth queries (tasks A-F in Appendix Table 7). The maritime surveillance task has trajectory labels and so does not need a ground truth query (task G).

Table 2. Ground-truth queries for the YTStreams dataset. “IDs” indicates which tasks are instances of a given query. Multiple instantiations correspond to different lanes being used for “lane 1” and “lane 2”. The first is a one-object Shibuya query, the second is a one-object Warsaw query, and the rest are two-object Warsaw queries.
Fig. 6.
figure 6

Trajectories selected by multi-object queries. Each image shows two objects; the color of each one changes from red to green to denote the progression of time. Left: Unprotected right turn into lane with oncoming traffic. Middle: Bottom car drives faster than the top one and passes it. Right: One car driving closely behind the other. (Color figure online)

Synthesis. For each task, we divide the set Z of all trajectories into a train set \(Z_\text {train}\) and a test set \(Z_\text {test}\), using trajectories in the first half of the video for training, and those in the second half for testing. We randomly sample a set of initial labeled examples W from \(Z_{\text {train}}\), with 2 samples being positive and 10 being negative, and then actively label 25 additional examples from \(Z_{{\text {train}}}\). For YTStreams and MABe22, labels are from the ground truth query. For tractability, we limit search to sketches with at most three predicates, at most two of which may have parameters. In most cases, this excludes the ground truth from the search space.

5.2 Accuracy of Synthesized Queries

We show that Quivr synthesizes accurate queries from just a few labeled examples. We evaluate the \(F_1\) score of the synthesized queries on \(Z_{\text {test}}\). Recall that our algorithm returns a list C of consistent queries; we report the median \(F_1\) score across \(Q\in C\).

Baselines. We compare to (i) an ablation where we replace our active learning strategy with an approach that labels z uniformly at random from the remaining unlabeled training examples; (ii) an LSTM [16, 33] neural network; and (iii) a transformer neural network [26, 29, 77]. Because neural networks perform poorly on such small datasets, we pretrain the LSTM on an auxiliary task, namely, trajectory forecasting [43]. Then, we freeze the hidden representation of the learned LSTM, and use these as features to train a logistic regression model on our labeled examples. The neural network baselines do active learning by selecting among the unlabeled trajectories the one with the highest predicted probability of being positive.

Table 3. \(F_1\) score after n steps of active learning, with our algorithm for selecting tracks to label (“Q”), an active learning ablation (“R”), an LSTM (“L”), and a transformer (“T”). For Q and R, there may be many queries consistent with the labeled data, so the median \(F_1\) score is reported. Bold indicates best score at a given number of steps.

Results. We show the \(F_1\) score of each of the 17 queries in Table 3 after 0, 5, 10, and 25 steps of active learning. After just 10 steps, our approach provides \(F_1\) score above 0.99 on 10 of 17 queries, and after 25 steps, it yields an \(F_1\) score above 0.9 on all but 2 queries. Thus, Quivr is able to synthesize accurate queries with relatively little user input. The neural networks achieve poor performance, particularly on the more difficult queries.

5.3 Synthesis Running Time

Table 4. Running time (seconds) of synthesis (mean ± standard error) using binary search (B) and quantitative semantics (Q) running on CPU and GPU, with 25 steps of active learning.

Next, we show that quantitative pruning and using a GPU each significantly reduce synthesis time, evaluating total running time for 25 steps of active learning.

Ablations. We compare to two ablations: (i) using the binary search approach of [53] to find pruning pairs, rather than using our quantitative semantics, and (ii) evaluating the matrix semantics (Appendix A.1) on a CPU rather than a GPU.

Results. In Fig. 4, we report the running time of our algorithms on a CPU (\(2{\times }\) AMD EPYC 7402 24-Core) and a GPU (\(1{\times }\) NVIDIA RTX A6000). For binary search, on average, the GPU is \(7.6{\times }\) faster than the CPU. On a GPU, using the quantitative semantics rather than binary search offers another \(5.0{\times }\) speed-up.

6 Related Work

Monotonicity for Parameter Pruning. We build on [49] for our parameter pruning algorithm. Their approach has been applied to synthesizing STL formulas for sequence classification by first enumerating sketches and then using monotonicity to find parameters, similar to our binary search baseline [53]. We replace binary search with our novel strategy based on quantitative semantics, leading to \(5.0{\times }\) speedup. There is also work building on [49] to create logically-relevant distance metrics between trajectories by taking the Hausdorff distance between parameter satisfaction regions (which they call “validity domains”), with applications to clustering [78]. For logics like STL, our quantitative semantics could provide a speedup to their approach.

Synthesis of Temporal Logic Formulas. More broadly, there has been work synthesizing parameters in a variant of STL by discretizing the parameter space and then walking the satisfaction boundary [24]; in one dimension, their approach becomes binary search, inheriting its shortcomings. There has been work on synthesizing STL formulas that are satisfied by a closed-loop control model [38], but they assume the ability to find counterexample traces for incorrect STL formulas, which is not applicable to our setting. Another approach is to synthesize parameters in STL formulas using gradient-based optimization [35] or stochastic optimization [45], but we found these methods to be ineffective in our setting, and they do not come with either soundness or completeness guarantees. There is work using decision trees to synthesize STL formulas [1, 14, 44, 48], but these operate on a restricted subset of STL, namely Boolean combinations of a fixed set of template formulas. This restriction prevents these approaches from synthesizing temporal structure, which is a key component of the queries in our domains. Finally, there has been work on active learning of STL formulae using decision trees [48], but it assumes the ability to query for equivalence between a particular STL formula and the ground truth, which is not possible in our setting.

Synthesizing Constants. There is work on synthesizing parameters of programs using counterexampled-guided inductive synthesis and different theory solvers, including Fourier-Motzkin variable elimination and an SMT solver [2]. Though our synthesis objective can be encoded in the theory of linear arithmetic, it is extremely large, and we have found such solvers to be ineffective in practice.

Querying Video Data. There has been recent work on querying object detections and trajectories in video data [5, 7, 8, 28, 40,41,42, 54]. The main difference is our focus on synthesis; in addition, these approaches focus on SQL-like operators such as select, inner-join, group-by, etc., over predefined predicates, which cannot capture compositions such as the sequencing and iteration operators in our language, which are necessary for identifying more complex behaviors.

Neurosymbolic Models. There has been recent work on leveraging program synthesis in the context of machine learning. For instance, there has been work on using programs to represent high-level structure in images [21,22,23, 74, 84], for reinforcement learning [9, 34, 79, 80], and for querying websites [18]; in contrast, we use programs to classify trajectories. The most closely related work is on synthesizing functional programs operating over lists [67, 76]. Our language includes key constructs not included in their languages. Most importantly, we include sequencing; in their functional language, such an operator would need to be represented as a nested series of if-then-else operators. In addition, their language does not support predicates that match subsequences; while such a predicate could be added, none of their operators can compose such predicates.

Quantitative Synthesis. There has been work on program synthesis with quantitative properties—e.g., on synthesis for producing optimized code [37, 57, 65], for approximate computing [15, 52], for probabilistic programming [56], and for embedded control [17]. These approaches largely focus on search-based synthesis, either using constraint optimization [52], continuous optimization [17], enumerative search [15, 57], or stochastic search [37, 56, 65]. While we leverage ideas from this literature, our quantitative semantics based pruning strategy is novel.

Quantitative Semantics. Our quantitative semantics is similar to the “robustness degree” [25] of a temporal logic formula. The difference is that, by adjusting the denotations of the base predicates, our quantitative semantics gives a parameter on the satisfaction boundary. More broadly, there has been work on quantitative semantics for temporal logic for robust constraint satisfaction [20, 25, 73], and to guide reinforcement learning [39]. There has been work on quantitative regular expressions (QREs) [4], though in general, QREs cannot be efficiently evaluated due to their nondeterminism, and our language is restricted to ensure efficient computation. There has been work on synthesizing QREs for network traffic classification [68], using binary search to compute decision thresholds. Similarly, there has been work using the Viterbi semiring to obtain quantitative semantics for Datalog programs [69], which they use in conjunction with gradient descent to learn the rules of the Datalog program. In contrast, we use our quantitative semantics to efficiently prune the parameter search space in a provably correct way. Finally, there has been work on using GPUs to evaluate regular expressions [55]; however, they focus on regular expressions over strings.

Query Languages. Our language is closely related to both signal temporal logic (STL) [50] and Kleene algebras with tests (KAT) [46]. In particular, it can straightforwardly be extended to subsume both (see Appendix A for details), and our pruning strategy applies to this extended language. The addition of Kleene star, required to subsume KAT, worsens the evaluation time. STL has been used to monitor safety requirements for autonomous vehicles [32]. Spatio-Temporal Perception Logic (SPTL) is an extension of STL to support spatial reasoning [31]. Many of its operators are monotone, and thus would benefit from our algorithm. Scenic [27] is a DSL for creating static and dynamic driving scenes, but its focus is on generating scenes rather than querying for behaviors.

7 Conclusion

We have proposed a novel framework called Quivr for synthesizing queries over video trajectory data. Our language is similar to KAT and STL, but supports conjunction and sequencing over multi-step predicates. Given only a few examples, Quivr efficiently synthesizes trajectory queries consistent with those examples. A key contribution of our approach is the use of a quantitative semantics to prune the parameter search space, yielding a \(5.0{\times }\) speedup over the state-of-the-art. In our evaluation, we demonstrate that Quivr effectively synthesizes queries to identify interesting driving behaviors, and that our optimizations dramatically reduce synthesis time.