Abstract
Motion planning for automated vehicles (AVs) in mixed traffic, where AVs share the road with human-driven vehicles, is a challenging task. To reduce the complexity, state-of-the-art planning approaches often assume that the future motion of surrounding vehicles can be predicted independently of the AV’s plan. This separation can lead to suboptimal, overly conservative behavior especially in highly interactive traffic situations. In this work, we introduce a motion planning algorithm to generate interaction-aware behavior for highly interactive scenarios. The presented algorithm is based upon a reformulation of a bi-level optimization problem, which frames interactions between a human driver and a AV as a Stackelberg game. In contrast to existing works, the algorithm can account for general nonlinear state and input constraints. Further, we introduce mechanisms to integrate cooperation and courtesy into motion planning to prevent overly aggressive driving behavior.
You have full access to this open access chapter, Download chapter PDF
1 Introduction
When automated vehicles (AV)s first enter traffic, they will not drive in isolation but share the road with predominantly human drivers. Thus, interacting with them is crucial for smooth and efficient operation. This is especially important in interactive situations where the actions of multiple vehicles are tightly coupled. For instance, a driver on a highway might decide to slow down so that another driver can merge, or a driver might start to nudge into the adjacent lane, hoping that the driver behind will slow down and open a gap.
A key aspect to master such scenarios with AVs is to consider interactions with human drivers. However, to reduce the computational complexity of motion planning, most state-of-the-art planners follow a structure that overlooks these mechanisms. In particular, they follow a predict-then-plan scheme, where the motion planning is separated into a prediction step, where the future motion of surrounding drivers is predicted, and a subsequent planning step, where the motion of the AV is determined. During the planning, surrounding vehicles are treated as moving objects with an immutable trajectory.
While this separation poses a useful simplification for many traffic scenarios, it can lead to situations similar to the frozen robot problem [34], a state in which the predictions of other traffic participants block all paths, and thus the planner is not able to find a solution to its goal anymore. Fig. 1 illustrates this issue for a merge scenario. Following a predict-then-plan structure, the AV, in blue, first predicts the future motion of surrounding vehicles and plans its trajectory in a subsequent step. In Fig. 1(a) the result in low traffic is shown. Following the same principle in high traffic, shown in Fig. 1(b), the planner is unable to find a collision-free trajectory onto the highway and the AV stops at the end of the lane.
Some approaches are already able to overcome the structural limitation of planners following a predict-then-plan scheme by solving the prediction and planning task simultaneously. These planners can be categorized into the following three classes: Forward simulation methods, multi-agent methods, and game-theoretic methods.
Forward Simulation Methods: One technique to generate interaction-aware behavior is via forward simulation. Here, the current traffic scene is simulated for different actions of the AV. Transition models are used to describe how the environment changes due to the actions of the AV and further how other drivers react to these changes in the environment. We refer to such techniques as forward simulation methods. Most sampling-based planning methods that consider interactions can be associated with this category [11, 27]. An important group among the sampling-based planning methods are methods based on partially observable Markov decision process (POMDP), e.g. as presented in [17].
The behavior of other agents is often modeled with specific driver models such as the Intelligent Driver Model (IDM) [35] or the Minimum Overall Braking Induced by Lane change (MOBIL) model [18]. An example where the IDM is used to determine the reaction of other vehicles is presented in [11]. Here, to generate the behavior for the AV, multiple candidate trajectories are simulated and evaluated based on the effect imposed on others. In forward simulation methods the influence the AV exerts on others is not explicitly given, but must be determined by trying out several actions and subsequent forward simulations of the traffic scene.
Multi-Agent Methods: In multi-agent methods, the separate prediction of other vehicles is replaced by planning coupled trajectories. Therefore, the traffic scene is modeled as a multi-agent planning problem with the underlying assumption that all traffic participants behave towards optimizing a joint objective [2, 4, 5, 8, 19, 22, 32]. The AV then solves the multi-agent problem assuming that other agents will also roughly follow their part of the plan. Varying weights can be used to model different levels of cooperation or incorporate asymmetries in the traffic scene [4, 7]. To cope with uncertainties in the behavior of humans, these methods are combined with tracking approaches to estimate if humans roughly follow the same model [5, 32].
Game-Theoretic Methods: In real traffic, the assumption that each driver is behaving towards optimizing a common objective might not be valid, since some drivers are only interested in optimizing their own driving. To model interactions among agents with different objectives, a game-theoretic perspective might be more suitable. Several game-theoretic methods have already successfully been used, e.g., for lane change, merge, intersection, round-about, and overtaking scenarios [6, 10, 12, 13, 21, 29,30,31, 38, 39]. E.g., in [30], human-like driving behavior, e.g., slowing down before intersections or nudging into the adjacent lane while doing a lane change, could be generated.
Apart from these driving applications, game-theoretic methods have been used for agile maneuvering of multiple ground vehicles in close proximity [40], and automated car racing [23, 25, 37, 39], where it is shown that game-theoretic planners yield complex strategies such as blocking and faking and significantly outperform baseline MPC planners.
In game-theoretic formulations, there is no optimal solution in the traditional sense, but depending on the game’s structure, different solutions are possible, also referred to as equilibria. Therefore, an important feature to categorize game-theoretic methods is the type of solution they are solving for. In literature, it is distinguished between Nash and Stackelberg equilibria. A Nash equilibrium describes a set of strategies where no individual agent can benefit from unilaterally changing its strategy, given that all other agents will stick to their strategy. This type of equilibrium has been investigated, e.g., in [3, 10, 14, 20, 37, 40].
Compared to a Nash equilibrium, a Stackelberg equilibrium involves turn taking and, therefore, an asymmetry in the decision-making process. It is typically modeled for a two-player game, where one player is the leader, and the other is the follower. The leader chooses its strategy first, and the follower then optimizes its strategy as the best response to the leader’s strategy. In contrast, the Nash solution can be seen as the best response from everyone to everyone else without hierarchical turn-taking. Stackelberg equilibria are considered in [6, 12, 23, 29, 30, 33, 41, 42]
In this work, we present a model based on a game-theoretic formulation that directly captures interactions between a AV and a human driver (HD) as a Stackelberg game. This algorithm enables AVs to be aware of how their actions influence other drivers and thereby allows generating interaction-aware driving behavior. In contrast to existing works, the algorithm can account for general nonlinear state and input constraints. Additionally, we present mechanisms to integrate cooperation and courtesy into interaction-aware methods to prevent overly aggressive driving behavior, which has been reported as an issue of existing approaches.
2 Problem Statement
To derive a model to directly capture interactions, we consider a system with one AV, referred to as the leader L, and one HD, referred to as the follower F. The system’s state at time t is given by the leader’s and follower’s state \(\textbf{x}^L_t,\textbf{x}^F_t\in \mathcal {X}\), where \(\mathcal {X}\) is the set that contains all possible states. The leader’s and follower’s actions are described by their trajectories \(\xi _{L}(t),\xi _{F}(t):[0,\mathcal {T}]\rightarrow \mathcal {X}\). Further, each agent has its individual objective function denoted by \(J_{L}\) and \(J_{F}\).
The objective is minimized subject to the vehicle’s initial state \(\xi (0) = \textbf{x}_{0} \) and the evolution of the state described by the trajectory, which is only allowed to pass through the set of feasible states \(\mathcal {X}_{\text {feasible}}(t) \subseteq \mathcal {X}\). \(\mathcal {X}_{\text {feasible}}(t)\) encodes, for instance, collision avoidance. Additionally, system dynamics and bound constraints can be enforced by \(D(\xi (t), \dot{\xi }(t),\ddot{\xi }(t), \dots ) = 0\). The set of all feasible trajectories \(\xi (t)\) is denoted by \(\Xi \).
In contrast to traditional multi-agent systems, we assume a turn-taking structure, where the follower optimizes its trajectory as a response to the leader’s trajectory. To do so, the follower predicts the leader’s future motion \(\tilde{\xi }_{L}\) and then plans by minimizing its objective function \(J_{F}\) considering these predictions. Therefore, the follower’s optimal trajectory can be described as:
For simplicity, we assume that for short time horizons, a human can predict the trajectory of the AV sufficiently well, such that the prediction \(\tilde{\xi }_{L}\) can be assumed to be the actual trajectory \(\xi _{L}\) of the AV. Hence, the optimal trajectory of the follower as a function of the leader’s trajectory \(\xi _{L}\) is given as:
With this link between the leader’s actions and the follower’s actions the optimal trajectory for the AV can be stated as:
Equation (3) gives the leader the ability to reason about how its actions will influence the follower’s response and is therefore the fundamental model which enables interaction-aware planning.
3 Bi-level Formulation
The derived model, in (3), describes a Stackelberg game, where the leader decides on its behavior first and the follower optimizes its behavior given the decision of the leader (Fig. 2). If the follower’s best response to the leader’s actions can be stated in closed form, (3) can be solved as a standard optimal control problem (OCP). However, this is, in general, not the case since \(\xi _{F}^*\) is the outcome of an OCP itself. This results in a nested or bi-level optimization problem. Further, solving the underlying Stackelberg game would require planning until \(\mathcal {T}\), which is the end of an interaction. However, the end of an interaction is not trivial to determine and requires the consideration of a varying time horizon.
In the following, we propose an approximate solution to (3) based on model predictive control (MPC), where we solve the problem on a receding horizon with a fixed length T, execute the first action and then replan. We utilize multiple shooting methods and discretize the time horizon \(t \in [0,T]\) into \(N = T / \tau \) intervals, where \(\tau \) denotes the duration of one time step. To improve readability, we subsume the state and input sequences of the leader and follower as \(\textbf{x}:=[\textbf{x}_0,\ldots ,\textbf{x}_N]^T\) and \(\textbf{u}:=[\textbf{u}_0,\ldots ,\textbf{u}_{N-1}]^T\). In the following, the resulting nonlinear programs (NLP)s of the follower and leader are stated. The equality constraints \(\textbf{h}\) can be used to represent constraints imposed by the system dynamics while the inequality constraints \(\textbf{g}\) collect bound constraints, collision constraints, and dynamic constraints.
3.1 NLP of the Follower
The follower’s NLP is parametrized by the leader’s states and inputs \((\textbf{x}^L,\textbf{u}^L)\) and can be formulated as:
3.2 NLP of the Leader
The leader’s bi-level optimization problem can be stated as:
Formulating the follower’s optimization problem as a constraint, (5d), ensures that only optimal solutions for the follower are considered feasible solutions for the leader.
4 Single-Level Representation
To efficiently solve (5), we need to reformulate the bi-level optimization problem into a regular, single-level problem. Therefore, we assume that the follower will act optimally with respect to its own objective function (4). With this assumption, we can replace the inner optimization problem with its necessary conditions for optimality.
If the follower’s problem is convex, the Karush Kuhn Tucker (KKT) conditions are necessary and sufficient for optimality. However, due to the combinatorial nature of driving it is, in general, non-convex, e.g., due to non-linear collision avoidance constraints or a non-convex cost function. To obtain locally optimal solutions, we convexity the follower’s problem around an initial guess, which at the same time encodes the considered homotopy class. For the convexification, the constraints are linearized, and the cost function is approximated by a 2. order Tailer expansion.
By replacing the follower’s optimization problem with its KKT conditions in (5), we obtain the following single-level optimization problem:
with the Lagrangian
Here, \(\boldsymbol{\lambda }\) and \(\boldsymbol{\mu }\) are the KKT multipliers and \(\textbf{h}_{F_{\text {lin}}}, \textbf{g}_{F_{\text {lin}}}\) and \(J_{F_{\text {con}}}\) are the constraints and objective after the convexification. For the reformulation we assume sufficient regularity of the follower’s NLP, differentiability of \(\textbf{h}_{F}\) and \(\textbf{g}_{F}\), and the cost function \(J_{F}\) to be twice differentiable.
4.1 Solving the Complementarity Constraints
The leader’s NLP in (6) forms an instance of a mathematical program with complementarity constraints (MPCC). Due to the complementarity constraints \(\boldsymbol{\mu } \bot \textbf{g}_{F_{\text {lin}}}\), MPCCs are non-smooth and non-convex. MPCC are particularly challenging to solve because at every feasible point, ordinary constraint qualifiers (CQ) such as LICQ or Mangasarian-Fromovitz CQ are violated [9]. Therefore, to solve (6), we reformulate the complementarity constraints using relaxation methods [16] as shown in (7).
With \(\epsilon > 0\) a regularized NLP is obtained, and CQ can be satisfied again. The smaller \(\epsilon \) is chosen, the closer any feasible solution is to achieving complementarity. However, if \(\epsilon \) is chosen too small, the problem may be numerically unstable and the solver will fail to find a feasible solution at all.
5 Application to Motion Planning for AVs
So far the derived model represents a formulation of how interactions between a robot and a human can be considered during motion planning or decision-making for robots in general. In the following, we present a modeling to apply the bi-level algorithm to motion planning for AVs. The section starts by stating the OCP used for trajectory optimization of an AV. This OCP contains the system dynamics, bound constraints, as well as an objective function to encode desirable driving behavior. For the purpose of the evaluation in Sect. 6, we assume that a good approximation of a human objective function is provided. Such a function could be obtained, e.g., via inverse reinforcement learning.
5.1 Trajectory Optimization for AVs
The OCP used for trajectory optimization can be stated as:
The objective function \(J_{\text {base}}\) is used to generate a desirable driving behavior. The equality constraints (8b) enforces the vehicle dynamics. Further, (8c) ensures that the trajectory is planed from the current state \(\hat{{\textbf {x}}}\). The inequality constraints (8d)–(8h) are used for collision avoidance and to account for physical limitations of the real system.
5.1.1 Vehicle Model
To describe the dynamics of the vehicle (8b), the kinematic single-track model is used. The vehicle state at time k, \(\textbf{x}_k = [x_k,y_k,\psi _k, v_k]^T\), is described by the lateral and longitudinal position (x, y) of the vehicle’s center of gravity, the orientation \(\psi \), and the absolute velocity v. Together with the input \(\textbf{u}_k = [\delta _k, a_k]^T\) consisting of steering angle \(\delta \) and acceleration a, the dynamics of a vehicle are given by (Fig. 3):
Here, \(\beta \) is the slip angle which is given by \(\beta = \arctan \left( \frac{l_r}{l}\tan (\delta )\right) \).
Further, l is the wheelbase, and \(l_r\) is the distance between the center of gravity and the rear axis. To obtain the discrete dynamics model \(\textbf{x}_{k+1} = f(\textbf{x}_{k},\textbf{u}_{k})\) in (8b) we use a fourth-order Runge-Kutta method.
To ensure the validity of the kinematic single-track model [28], \({\textbf {g}}_{\text {dyn}}\) (8d) are introduced to limit the lateral acceleration as follows:
To also limit the jerk, the following constraints on the acceleration change are introduced:
Here, \(j_{\text {min}}\) and \(j_{\text {max}}\) are the minimum and maximum allowed jerk values.
5.1.2 Collision Avoidance
The collision avoidance constraints (8e) are formulated pairwise between vehicles. Hereby, the shape of one vehicle is approximated by a finite number of circles and the shape of the second vehicle is approximated with a superellipses, as illustrated in Fig. 4. Compared to regular ellipses, superellipses provide a more accurate approximation of the vehicle’s rectangular shape [24].
Collision avoidance between a point \(\textbf{p} = [x,y]^T\) and a superellipse defined by the semi-major a, the semi-minor b, and order \(n \in \mathbb {N}\) can be formulated as:
Similarly, collision avoidance between a circle with radius r and a superellipse can be formulated as a point mass constraint on the center point of the circle \(\mathbf {p_c} = [x_c,y_c]^T\). Therefore, \(\mathbf {p_c}\) needs to be outside the Minkowski sum of the superellipse and a circle with radius \(\frac{r}{2}\), see Fig. 5.
To maintain an efficient formulation, the Minkowsi sum is approximated by an enlarged superellipse. In case of a superellipse of order \(n=4\), enlarging the semi-major and semi-minor by the radius r is a sufficient over approximation, see Fig. 5.
Henceforth, the collision avoidance constraints can be stated as:
5.1.3 Objective Function
The objective function (8a) consists of three components, \(J_{{\textbf {x}}}\), \(J_{{\textbf {u}}}\), \(J_{\dot{{\textbf {u}}}}\), penalizing deviations from a desired state \(\textbf{x}_{\text {ref}} = [x_{k,\text {ref}}, y_{k,\text {ref}}, \psi _{k,\text {ref}}, v_{k,\text {ref}}]^T\), any control effort, and any changes in control, respectively. The function can be stated as:
With the velocity vector \(\textbf{v} = [v\cos (\psi +\beta ), v\sin (\psi +\beta )]^T\) and the road tangential unit vector \(\textbf{t}\), \(\Delta v = \textbf{v} \cdot \textbf{t} - v_{\text {ref}}\) measures the difference between the current velocity along the road and the reference velocity \(v_{\text {ref}}\). Further, \({\hat{\textbf{u}}}\) is the control input from the previous step. Finally, \(R_u\),\(R_{\dot{u}}\) and Q are weighting matrices used to model the desired driving behavior.
6 Evaluation
The efficacy of the proposed bi-level algorithm is evaluated in two different settings. First, the ability of the AV to deliberately influence the HD’s state through its driving behavior is investigated. These experiments make use of the direct link between the AV’s actions and the HD’s response which the bi-level approach provides.
Since in real driving applications, the goal of the AV is to drive efficiently and comfortably rather than to influence the state of other vehicles, the focus of the second part, is to demonstrate how the approach can be used to plan interaction-aware, cooperative driving behavior.
Apart from the efficacy, the algorithm’s runtime is analyzed followed by a discussion highlighting the advantages and limitations of the algorithm.
6.1 Base Scenario
We evaluate our approach in multi-lane scenarios as shown in Fig. 6, where the AV is depicted in blue and the HD is depicted in gray. For the purpose of these experiments, the AV is considered the leader, and the HD is considered the follower. In the following, we will use the terms leader and AV as well as follower and HD interchangeably.
Both vehicles have a width of \({2.0\,\mathrm{\text {m}}}\) and a length of \({4.0\,\mathrm{\text {m}}}\). Collision avoidance is implemented using a superellipse of order \(n = 4\) for the leader and two circles for the follower. Further parameters are given in Table 1.
The follower directly uses the cost function (13) for its trajectory optimization with the weights and vehicle characteristics given in Table 1. The leader’s NLP is also based on (13) but additionally considers the KKT conditions of the follower’s NLP as constraints, as stated in (6). Further, the leader’s objective function is augmented with additional cost terms to set scenario-specific incentives.
If not stated otherwise, the initial and reference states listed in Table 2 are used for the leader and follower.
6.2 Influence the Human’s State
The following two experiments investigate the leader’s ability to influence the follower’s state. To provide the appropriate incentives, the leader’s objective function is augmented with \(J_\text {influence}\). The leader’s objective is, therefore, the following weighted sum:
Henceforth, a ratio of \(\frac{w_{\text {influence}}}{w_L} = 10^7\) is used.
6.2.1 Slow Down the Human
In this experiment, the leader’s goal is to slow down the follower to a certain velocity, \(v_{\text {ref}}^F\). To incentivize this behavior, deviations of the follower’s velocity to \(v_{\text {ref}}^F\) are penalized. The scenario-specific \(J_{\text {influence}}\) is therefore set to
with \(\textbf{v} = [v\cos (\psi +\beta ), v\sin (\psi +\beta )]^T\) and \(\textbf{t}\) as the road tangential unit vector.
The results for a desired velocity of \(v_\text {ref}^F = {5.}\, {\frac{{\text {m}}}{{\text {s}}}}\) are illustrated in Fig. 7. As can be seen, the leader changes to the left lane to get in front of the follower. Despite its interest in driving fast, the leader starts to brake, forcing the follower to slow down. To prevent the follower from overtaking, the leader drives close to the center of the road.
6.2.2 Push the Human to the Adjacent Lane
In this experiment, the ability to also influence the follower in the lateral direction is investigated. Therefore, a 3-lane road is considered, see Fig. 8.
The leader’s goal is to enforce a lane change of the human to the adjacent left lane. This incentive is encoded by setting \(J_{\text {influence}}\) to penalize deviations of the follower’s lateral position to a reference \(y_\text {ref}^F\) as:
Figure 8 shows the behavior for \(y_\text {ref}^F = {8.5\,\mathrm{\text {m}}}\), which corresponds the center of the leftmost lane. To push the follower to the left, the leader changes lanes and slows down, almost coming to a full stop. The leader thereby blocks the middle lane, which forces the follower to also slow down to avoid a collision. To continue, the follower starts an overtaking maneuver. At the same time, the leader accelerates again to stay next to the follower, blocking him from changing back to his original lane.
6.3 Interaction-Aware Trajectory Optimization
In real traffic, the primary goal of the AV is to drive comfortably and efficiently rather than to change the state of surrounding vehicles in a certain way. Therefore, the generated behavior when planning trajectories with the proposed interaction-aware algorithm in different lane change scenarios is investigated next. To better show the effect of the planned behavior, the desired velocity of the follower is increased to \(v_\text {ref}^F = {15.} \, {\frac{{\text {m}}}{{\text {s}}}}\). Throughout the scenarios, the leader aims to perform a lane change to the left.
6.3.1 Efficient Planning
We start by formulating the leader’s objective in an egocentric way, similar to how it is formulated for planners following a predict-then-plan scheme. Here, the leader solely considers attributes of its own trajectory, formulated by only optimizing \(J_\text {base}\).
The resulting trajectories are shown in Fig. 9. As can be seen, the leader plans a very efficient lane change without any acceleration. However, as a response, the follower has to brake harshly to avoid a collision, see Fig. 10. This aggressive cut in is a result of the leader knowing that the follower will react, which the leader then exploits to further optimize its own driving behavior.
This example shows that interactive behavior not only occurs when the leader is incentivized to alter the state of the follower but also emerges out of efficiency.
6.3.2 Cooperative Interaction-Aware Planning
The proposed interaction-aware model gives the leader the ability to anticipate the follower’s reaction. When naively using an egocentric objective function, the leader exploits the follower’s response and generates an overly aggressive behavior, as demonstrated in the previous example.
To mitigate this effect, the impact imposed on others must be considered in the objective function of the leader. Therefore, a formulation base on a cooperative cost function that includes the leader’s and followers’s cost in the leader’s objective is considered in the following:
In this formulation, the variable \(\alpha \in [0,1]\) determines to which extent the leader’s and the follower’s cost are considered. Therefore, \(\alpha \) provides a way to design different driving behaviors, ranging from overly aggressive to overly conservative.
The impact the parameter \(\alpha \) has on the generated behavior is investigated in the following. Therefore, we consider a scenario including a mandatory lane change for the leader, see Fig. 11. The different \(\alpha \)-dependent acceleration and velocity profiles for \(\alpha = 0.0\), \(\alpha = 0.5\) and \(\alpha = 0.99\) are illustrated in Fig. 12.
In detail, for \(\alpha = 0.0\), the leader does not accelerate, and all the discomfort has to be carried out by the follower. This represents the aggressive, egocentric behavior presented in the previous experiment. With a larger \(\alpha \), the leader increases its acceleration until reaching the acceleration limits. In the case of \(\alpha = 0.99\), the leader mostly considers the follower’s cost and tries to intervene with its optimal plan as little as possible. This value of \(\alpha \) generates a very conservative behavior similar to a predict-then-plan approach. With \(\alpha = 0.5\), the leader’s and the follower’s cost are considered equaly, which leads to an approximately equal distribution of discomfort. Note however, that, besides adjusting the acceleration during the lane change, the leader also adapts its stationary velocity depending on \(\alpha \).
6.3.3 Courtesy Constraints
The cooperative cost formulation presented in the previous experiment has the side effect that for \(\alpha > 0.0\), the leader permanently drives faster than its desired velocity \(v_\text {ref}^L\). For some scenarios, e.g., overtaking a slow-moving truck on the highway, a temporal increased velocity might be acceptable or even desirable for traffic efficiency. However, in most situations, a vehicle in front does not adapt its velocity to the desires of rear traffic.
An alternative to the cooperative cost formulation is introducing courtesy constraints. With these constraints, the leader’s impact on others can be limited without altering the leader’s objective function.
In this experiment, we introduce a constraint such that the leader is allowed to, at max, cause a deceleration of \(a_{\text {limit}}\) to the follower. To enforce this, the following constraints are added to the leader’s NLP:
Here, \(a_{k}^F\) is the acceleration of the follower.
The effect of the courtesy constraint with \(a_{\text {limit}} = -2.0\) on the considered merging scenario is illustrated in Fig. 13. By introducing the constraint, the leader accelerates during the lane change, which successfully limits the induced deceleration to \(-{2.0\,\mathrm{\frac{{\text {m}}}{{\text {s}}^2}}}\). The velocity profiles are shown in Fig. 13b. Compared to the cooperative cost formulation, the leader returns to its desired velocity of \({10.0\,\mathrm{\frac{{\text {m}}}{{\text {s}}}}}\) after the successful merge.
Both, the cooperative cost and the courtesy constraint method have traffic scenarios where they are particularly suited. E.g., when overtaking a slower driving vehicle on the highway, the cooperative cost formulation might be more suitable as it leads to a temporal increase in velocity for the duration of the overtaking maneuver. In contrast, for a merging scenario or a permanent lane change, the courteous constraint method might be the better choice since the leader returns to its desired velocity after the merge is completed.
6.4 Runtime Experiments
The presented method for interaction-aware trajectory optimization computes an open-loop solution for the AV. More precisely, the control inputs are functions of time and not of the state. To adapt to unforeseen changes in the environment, the algorithm needs to run in an MPC fashion. For MPC, a sufficiently high update rate is crucial. Therefore, we analyze the performance of the algorithm with a proof of concept MPC implementation.
The MPC was implemented in Python. All necessary derivatives were calculated using the open-source software CasADi [1]. CasADi utilizes automatic differentiation methods to accurately calculate the derivatives. Compared to, finite difference methods, automatic differentiation is faster and more accurate. Further, IPOPT [36] was used to solve the formulated NLP. IPOPT is a general-purpose solver for large-scale nonlinear problems. We cold started the IPOPT solver with a feasible solution of the desired driving maneuver, which we obtained by sequentially solving a single vehicle NLP, as in (8), first for the leader and then for the follower. This initialization was only performed for the very first iteration of the planner. All subsequent iterations were warm started with the solution of the previous iteration. To get a better initial guess, the previous solution was shifted by the duration between the planning iterations.
The timing results were obtained by considering a merging scenario, with the two most relevant methods for the application to real traffic, namely, the cooperative cost function method, with \(\alpha = 0.5\) and the courtesy constraints method, with \(a_{\text {limit}} = -2.0\). We simulate each method for \({9.0\,\mathrm{\text {s}}}\). A horizon length of \(N = 30\) steps is considered for the MPC. Further parameters were taken from Table 1. The runtime results are obtained by running the MPC implementation 100 times on the merging scenario with both methods. The mean solve time over the 100 simulation runs are shown in the histogram in Fig. 14. Additionally, the mean and standard deviation of the mean solve times are listed in Table 3. All timing results were obtained on an Intel Core i7-8565U CPU with a clock rate of 1.80GHz.
Even though the experiments were conducted with an MPC implementation that leaves great potential for improvements, we could already demonstrate our algorithm’s real-time capability with mean solve times of \({96.82\,\mathrm{\text {m}\text {s}}}\) and \({83.85\,\mathrm{\text {m}\text {s}}}\), respectively. The presented results can be considered a conservative estimate of the achievable performance. However, in the future, this could be greatly improved by utilizing tailored solvers and implementing the approach in a high-performance programming language, e.g., C++.
7 Algorithm Discussion
A core assumption that we made to obtain the model for interaction-aware planning, stated in (3), is that the human does not try to influence the AV but rather reacts to its actions. According to [30, 31], this is a valid assumption for a wide range of interactive scenarios. Further, compared to a Nash equilibrium, it might even be the better model for how humans act in interactive situations since humans typically do not solve games in their everyday lives when they are not playing chess [15].
The formulated NLP (6), is a non-convex and non-smooth problem. As such, one can not expect to find globally optimal solutions. However, we use derivative-based optimization methods to find local optima. These methods require an initial guess, which sets the considered homotopy, as solutions of local methods are typically in the same homotopy as the initial guess. In the context of automated driving, homotopies are often thought of as maneuvers. Thus, we use the initial guess to encode the desired driving maneuver. Via the experiments, we empirically observe that initializing with a rough, but feasible initial guess of the desired maneuver is sufficient to reliably solves the problem. To take multiple maneuvers into consideration, it is advisable to combine the presented approach with a global method. E.g., a higher abstraction behavior planner based on an arbitration scheme as in [26] could be used to generate good initial guesses.
The focus of the experiments was to analyze the capabilities and the performance of the proposed bi-level planner. As such, the algorithm was evaluated in a tailored simulation environment, where one important modeling assumption was that the human driver is always attentive. However, in real traffic, this is not the case, and human drivers are sometimes distracted and do not respond to the actions of the AV. Therefore, the presented algorithm needs to be combined with an intention estimation, e.g, as presented in [5], to cope with unattentive drivers.
8 Conclusion
In this chapter, we presented an algorithm that is able to generate interaction-aware trajectories for AV. The interaction between a HD and an AV is modeled as a Stackelberg game, where the human responds rationally to the AV’s actions optimizing its own objective. This leads to a nested optimization problem which we approximate by MPC based on a bi-level optimization formulation. To solve this, we reformulated the problem into a single-level representation, exploiting the assumption that the human will act optimally with respect to its objective function. We solve the obtained NLP using derivative-based optimization methods. The presented algorithm is able to solve the interaction-aware trajectory optimization problem in a continuous state and input space. Further, in contrast to existing methods, general nonlinear state and input constraints can be considered, which allows for an accurate dynamics model.
The algorithm enables the AV to anticipate how surrounding HD will react to its actions. This gives the AV the possibility to deliberately influence the state of the human. Here, simply encoding the desired effect into the AV’s objective function is enough to generate complex, interaction rich behavior, without the need for hand designed decision heuristic. Further, interactive behavior does not only occur if incentivized in the AV’s objective function, but also emerges out of optimizing the AV’s behavior.
However, care must be taken to avoid that the AV exploits interactions to further optimize its own objective, and thereby generates an overly aggressive driving behavior. To prevent such an aggressive behavior, the AV’s objective is extended to also consider the costs of the HD.
As an alternative to modifying the AV’s objective function, we presented a strategy to establish courtesy in the planning algorithm via additional constraints. These constraints allow a motion planner to utilizes an egocentric objective function, provided that the negative impact imposed on other vehicles is limited.
The experiments demonstrated the efficacy of our algorithm and suggest that the algorithm can be used in challenging interactive driving scenarios. Further, we could achieve real-time performance even with an unoptimized proof-of-concept implementation.
References
Andersson, J.A.E. et al.: CasADi: a software framework for nonlinear optimization and optimal control. Math. Program. Comput. 11(1), 1–36 (2019). ISSN: 1867-2949, 1867-2957. https://doi.org/10.1007/s12532-018-0139-4
Bansal, S., et al.: Collaborative planning for mixed-autonomy lane merging. In: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Madrid, Spain, Oct. 2018, pp. 4449–4455 (2018). https://doi.org/10.1109/IROS.2018.8594197
Britzelmeier, A., Dreves, A., Gerdts, M.: Numerical solution of potential games arising in the control of cooperative automatic vehicles. In: 2019 Proceedings of the Conference on Control and Its Applications 24 Christoph Burger, Shengchao Yan, Wolfram Burgard and Christoph Stiller (CT). Proceedings. Society for Industrial and Applied Mathematics, Philadelphia, Jan. 1, 2019, pp. 38–45 (2019). https://doi.org/10.1137/1.9781611975758.7
Burger, C., Lauer, M.: Cooperative multiple vehicle trajectory planning using MIQP. In: 2018 21st International Conference on Intelligent Transportation Systems (ITSC). Maui, Hawaii, USA, Nov. 2018, pp. 602–607 (2018). https://doi.org/10.1109/ITSC.2018.8569776
Burger, C., Schneider, T., Lauer, M.: Interaction aware cooperative trajectory planning for lane change maneuvers in dense traffic. In: 2020 IEEE 23rd International Conference on Intelligent Transportation Systems (ITSC). Rhodes, Greece, Sept. 2020, pp. 1–8 (2020). https://doi.org/10.1109/ITSC45102.2020.9294638
Burger, C., et al.: Interaction-aware game-theoretic motion planning for automated vehicles using bi-level optimization. In: 2022 IEEE 25th International Conference on Intelligent Transportation Systems (ITSC). Macau, China, Oct. 2022, pp. 1–6 (2022)
Burger, C., et al.: Rating cooperative driving: a scheme for behavior assessment. In: 2017 IEEE 20th International Conference on Intelligent Transportation Systems (ITSC). Yokohama, Japan, Oct. 2017, pp. 1–6 (2017). https://doi.org/10.1109/ITSC.2017.8317794
de Campos, G.R., Falcone, P., Sjöberg, J.: Autonomous cooperative driving: a velocity-based negotiation approach for intersection crossing. In: 16th International IEEE Conference on Intelligent Transportation Systems (ITSC 2013). The Hague, Netherlands, Oct. 2013, pp. 1456–1461 (2013). https://doi.org/10.1109/ITSC.2013.6728435
Chen, H., Kremling, H., Allgöwer, F.: Nonlinear predictive control of a benchmark CSTR. In: Proceedings of the 3rd European Control Conference, Rome,Italy. (Jan. 1, 1995), pp. 3247–3252 (1995)
Dreves, A., Gerdts, M.: A generalized nash equilibrium approach for optimal control problems of autonomous cars. Optim. Control Appl. Methods 39(1), 326–342 (2018). ISSN: 1099-1514. https://doi.org/10.1002/oca.2348. https://onlinelibrary.wiley.com/doi/abs/10.1002/oca.2348 (visited on 12/30/2022)
Evestedt, N., et al.: Interaction aware trajectory planning for merge scenarios in congested traffic situations. In: 2016 IEEE 19th International Conference on Intelligent Transportation Systems (ITSC). Rio de Janeiro, Brazil, Nov. 2016, pp. 465–472 (2016). https://doi.org/10.1109/ITSC.2016.7795596
Fisac, J.F., et al.: Hierarchical game-theoretic planning for autonomous vehicles. Oct. 12, 2018. arXiv:1810.05766 [cs, math]. http://arxiv.org/abs/1810.05766 (visited on 12/30/2022)
Fridovich-Keil, D., et al.: Efficient iterative linear-quadratic approximations for nonlinear multi-player general-sum differential games (2020). arXiv:1909.04694 [cs, eess]. http://arxiv.org/abs/ 1909.04694 (visited on 12/30/2022)
Fridovich-Keil, D., et al.: Efficient iterative linear-quadratic approximations for nonlinear multi-player general-sum differential games. In: Interaction-Aware Motion Planning as a Game 25 2020 IEEE International Conference on Robotics and Automation (ICRA). Paris, France, May 2020, pp. 1475–1481. https://doi.org/10.1109/ICRA40945.2020.9197129
Hedden, T., Zhang, J.: What do you think i think you think?: Strategic reasoning in matrix games. Cognition 85(1), 1–36 (2002). ISSN: 0010-0277. https://doi.org/10.1016/S0010-0277(02)00054-9
Hoheisel, T., Kanzow, C., Schwartz, A.: Theoretical and numerical comparison of relaxation methods for mathematical programs with complementarity constraints. Math. Program. 137(1), 257–288 (2013). ISSN: 1436-4646. https://doi.org/10.1007/s10107-011-0488-5
Hubmann, C., et al.: A belief state planner for interactive merge maneuvers in congested traffic. In: 2018 21st International Conference on Intelligent Transportation Systems (ITSC). Maui, USA, Nov. 2018, pp. 1617– 1624 (2018). https://doi.org/10.1109/ITSC.2018.8569729
Kesting, A., Treiber, M., Helbing, D.: General lane-changing model MOBIL for car-following models. Transp. Res. Rec. J. Transp. Res. Board 1999 86–94 (2007)
Kretzschmar, H., et al.: Socially compliant mobile robot navigation via inverse reinforcement learning. Int. J. Robot. Res. 35(11), 1289–1307 (2016). ISSN: 0278-3649, 1741-3176. https://doi.org/10.1177/0278364915619772
Le Cleac’h, S., Schwager, M., Manchester, Z.: ALGAMES: a fast augmented lagrangian solver for constrained dynamic games. Autonom. Robot. 46(1), 201–215 (2022). ISSN: 1573-7527. https://doi.org/10.1007/s10514-021-10024-7
Le Cleac’h, S., Schwager, M., Manchester, Z.: LUCIDGames: online unscented inverse dynamic games for adaptive trajectory prediction and planning. In: IEEE Robotics and Automation Letters, p. 1 (2021). ISSN: 2377-3766, 2377-3774. https://doi.org/10.1109/LRA.2021.3074880. https://ieeexplore.ieee.org/document/9410364/ (visited on 12/30/2022)
Lenz, D., Kessler, T., Knoll, A.: Tactical cooperative planning for autonomous highway driving using Monte-Carlo tree search. In: 2016 IEEE Intelligent Vehicles Symposium (IV). Gothenburg, Sweden, June 2016, pp. 447–453. https://doi.org/10.1109/IVS.2016.7535424
Liniger, A., Lygeros, J.: A noncooperative game approach to autonomous racing. IEEE Trans. Control Syst. Technol. 28(3), 884–897 (2020). ISSN: 1558-0865. https://doi.org/10.1109/TCST.2019.2895282
Ne, S., et al.: Implementing conditional inequality constraints for optimal collision avoidance. J. Aeronaut. Aerospace Engin. 06(03) (2017). ISSN: 21689792. https://doi.org/10.4172/2168-9792.1000195
Notomista, G., et al.: Enhancing game-theoretic autonomous car racing using control barrier functions. In: 2020 IEEE International Confer- 26 Christoph Burger, Shengchao Yan, Wolfram Burgard and Christoph Stiller ence on Robotics and Automation (ICRA). Paris, France: IEEE, May 2020, pp. 5393–5399 (2020). isbn: 978-1-72817-395-5. https://doi.org/10.1109/ICRA40945.2020.9196757
Orzechowski, P.F., Burger, C., Lauer, M.: Decision-making for automated vehicles using a hierarchical behavior-based arbitration scheme. In: IEEE Intelligent Vehicles Symposium (IV). Las Vegas, NV, USA, Oct. 2020, pp. 767–774 (2020). https://doi.org/10.1109/IV47402.2020.9304723
Paden, B., et al.: A survey of motion planning and control techniques for self-driving urban vehicles. IEEE Trans. Intell. Veh. 1(1), 33–55 (2016). ISSN: 2379-8858. https://doi.org/10.1109/TIV.2016.2578706
Polack, P., et al.: The kinematic bicycle model: a consistent model for planning feasible trajectories for autonomous vehicles? In: 2017 IEEE Intelligent Vehicles Symposium (IV). Los Angeles, USA, June 2017, pp. 812– 818 (2017). https://doi.org/10.1109/IVS.2017.7995816
Sadigh, D., et al.: Information gathering actions over human internal state. In: 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Oct. 2016, pp. 66–73 (2016). https://doi.org/10.1109/IROS.2016.7759036
Sadigh, D., et al.: Planning for autonomous cars that leverage effects on human actions. In: Robotics: Science and Systems XII. Robotics: Science and Systems 2016. Robotics: Science and Systems Foundation (2016). ISBN: 978-0-9923747-2-3. https://doi.org/10.15607/RSS.2016.XII.029. http://www.roboticsproceedings.org/rss12/p29.pdf (visited on 12/30/2022)
Sadigh, D., et al.: Planning for cars that coordinate with people: leveraging effects on human actions for planning and active information gathering over human internal state. Autonom. Robot. 42(7), 1405–1426 (2018). ISSN: 1573-7527. https://doi.org/10.1007/s10514-018-9746-1. https://doi.org/10.1007/s10514-018-9746-1 (visited on 12/30/2022)
Schulz, J., et al.: Estimation of collective maneuvers through cooperative multi-agent planning. In: 2017 IEEE Intelligent Vehicles Symposium (IV). Los Angeles, USA, June 2017, pp. 624–631 (2017). https://doi.org/10.1109/IVS.2017.7995788
Sun, L., et al.: Courteous autonomous cars. In: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Madrid, Spain, Oct. 2018, pp. 663–670 (2018). https://doi.org/10.1109/IROS.2018.8593969
Trautman, P., Krause, A.: Unfreezing the robot: navigation in dense, interacting crowds. In: 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems. Taipei, Taiwan, Oct. 2010, pp. 797–803 (2010). https://doi.org/10.1109/IROS.2010.5654369
Treiber, M., Hennecke, A., Helbing, D.: Congested traffic states in empirical observations and microscopic simulations. Phys. Interaction-Aware Motion Plan. Game 27 Rev. E 62(2), 1805–1824 (2000). ISSN: 1063-651X, 1095-3787. https://doi.org/10.1103/PhysRevE.62.1805
Wächter, A., Biegler, L.T.: On the implementation of an interior-point filter line-search algorithm for large-scale nonlinear programming. Math. Program. 106(1), 25–57 (2006). ISSN: 1436-4646. https://doi.org/10.1007/s10107-004-0559-y
Wang, M., et al.: Game theoretic planning for self-driving cars in competitive scenarios. In: Robotics: Science and Systems XV. Freiburg, Germany: Robotics: Science and Systems Foundation, June 22 (2019). ISBN: 978-0-9923747-5-4. https://doi.org/10.15607/RSS.2019.XV.048
Wang, M., et al.: Game-theoretic plawang game theoretic planning risk aware 2020 inning for risk-aware interactive agents. In: 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, Las Vegas, pp. 6998–7005 (2020). ISBN: 978-1-72816-212-6. https://doi.org/10.1109/IROS45743.2020.9341137. https://ieeexplore.ieee.org/document/9341137/ (visited on 12/30/2022)
Wang, M., et al.: Game-theoretic planning for self-driving cars in multivehicle competitive scenarios. IEEE Trans. Robot. 37(4), 1313–1325 (2021). ISSN: 1552-3098, 1941-0468. https://doi.org/10.1109/TRO.2020.3047521
Williams, G., et al.: Best response model predictive control for agile interactions between autonomous ground vehicles. In: 2018 IEEE International Conference on Robotics and Automation (ICRA). Brisbane, Australia, May 2018, pp. 2403–2410 (2018). https://doi.org/10.1109/ICRA.2018.8462831
Yoo, J.H., Langari, R.: A stackelberg game theoretic driver model for merging. In: ASME 2013 Dynamic Systems and Control Conference. Palo Alto, California, USA: American Society of Mechanical Engineers Digital Collection, Mar. 6 (2014). https://doi.org/10.1115/DSCC2013-3882
Yoo, J.H., Langari, R.: Stackelberg game based model of highway driving. In: ASME 2012 5th Annual Dynamic Systems and Control Conference Joint with the JSME 2012 11th Motion and Vibration Conference. Florida, USA: American Society of Mechanical Engineers Digital Collection, Sept. 17, 2013, pp. 499–508 (2013). https://doi.org/10.1115/DSCC2012-MOVIC2012-8703
Acknowledgements
The authors thank the German Research Foundation (DFG) for being funded within the German collaborative research center “SPP 1835—Cooperative Interacting Automobiles” (CoInCar).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
Copyright information
© 2024 The Author(s)
About this chapter
Cite this chapter
Burger, C., Yan, S., Burgard, W., Stiller, C. (2024). Interaction-Aware Motion Planning as a Game. In: Stiller, C., Althoff, M., Burger, C., Deml, B., Eckstein, L., Flemisch, F. (eds) Cooperatively Interacting Vehicles. Springer, Cham. https://doi.org/10.1007/978-3-031-60494-2_7
Download citation
DOI: https://doi.org/10.1007/978-3-031-60494-2_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-60493-5
Online ISBN: 978-3-031-60494-2
eBook Packages: EngineeringEngineering (R0)