Abstract
Electromobility has profound economic and ecological impacts on human society. Much of the mobility sector’s transformation is catalyzed by digitalization, enabling many stakeholders, such as vehicle users and infrastructure owners, to interact with each other in real time. This article presents a new concept based on deep reinforcement learning to optimize agent interactions and decision-making in a smart mobility ecosystem. The algorithm performs context-aware, constrained optimization that fulfills on-demand requests from each agent. The algorithm can learn from the surrounding environment until the agent interactions reach an optimal equilibrium point in a given context. The methodology implements an automatic template-based approach via a continuous integration and delivery (CI/CD) framework using a GitLab runner and transfers highly computationally intensive tasks over a high-performance computing cluster automatically without manual intervention.
You have full access to this open access chapter, Download chapter PDF
Similar content being viewed by others
Keywords
- Continuous integration and delivery (CI/CD)
- Electric vehicles (EVs)
- Autonomous vehicles (AS)
- Deep reinforcement learning (DRL)
- Charging station (CS)
- Context-aware (CA)
1 Introduction
Over the past decade, the penetration of non-gasoline vehicles such as electric, hybrid, and plug-in hybrid vehicles in Germany has multiplied (Fig. 12.1). At present, there are close to 240,000 registered electric and plug-in hybrid vehicles in Germany. Moreover, according to the German Association of Energy and Water Industries (BDEW), as of March 2020, there are 27,730 publicly available charging stations to serve the electric and plug-in hybrid vehicle fleet of Germany VDA (2020). Today, the vast majority of electricity-powered vehicles in Germany are private cars. As the German government has ambitious plans to encourage electric mobility in Germany in the coming years, the penetration of electricity-powered vehicles should grow steadily.
Electrification has profound economic and ecological impacts on the future of the mobility sector. New business models arise based on the provision of different mobility services and shared economy. Moreover, we see the emergence of a complete ecosystem around mobility due to the converging trends of the physical and digital domains and the innovation and business models (Dia, 2019; Barreto et al., 2020).
This article focuses on the digital domain, which provides the playing field for the stakeholders, e.g., electric vehicles (EVs), autonomous vehicles (AS), charging stations, smart grid, and fleet management, to interact seamlessly. The efficiency of these complex interactions between the stakeholders determines the optimality of the outcome received by each participating stakeholder. This article presents an application based on deep reinforcement learning to optimize agent interactions and decision-making in an IoT-enabled smart mobility ecosystem. The optimization objective of agent interactions is the aggregate utility of all interacting agents described using the weighted sum approach. The algorithm also defines a set of soft and hard constraints. The hard constraints must always be adhered to by the agents; however, the soft constraints may be violated conditionally under extreme circumstances. The methodology adheres to the automatic template-based approach via CI/CD framework using GitLab runner.
Previous research work has addressed specific aspects of optimal agent interactions in a smart mobility landscape. For example, Lin et al. (2016) present a linear programming model of an optimal routing problem that takes into consideration charging station location and the cost. Chen et al. (2018) present a weighted sum multi-objective optimization model that takes into account the different preferences of the user. The mixed-integer quadratic programming model presented in this study is solved using a commercial optimizer such as CPLEX or Gurobi. Bessler and Grønbæk (2012) use a heuristic-based approach. The algorithm evaluates the optimal routing plan for EVs by considering the distance to the charging stations in the vicinity, the traffic situation, and the feasible charging patterns. However, to our understanding, the simultaneous consideration of multiple stakeholder perspectives has not been done in the past.
2 Architecture
The overall objective in this section is to describe the proposed architecture in detail with the addition of continuous integration or a continuous deployment (CI/CD)-based approach (Sharif et al. 2020b). We also explain how the algorithm processes contextual information to meet the necessary optimality conditions for each of the stakeholders. Moreover, four types of stakeholders are identified while keeping the concept of smart mobility in mind. These four types of stakeholders are EV end-user, grid operator, fleet operator, and charging station maintainer which have been explained thoroughly in our previous publication (Sharif et al. 2020a). Two of them, i.e., EV end-user and grid operator, are in the focus of the smart mobility use case in this article.
As shown on the left-hand side of Fig. 12.2 of the proposed architecture, each stakeholder provides a set of individual inputs (Xi, Xj … Xm) which are “daily travel activities, routing suggestion(s), car-battery, and environment” with associated actions (Ai, Aj … Am). The rewards (Ri, Rj … Rm) are the computed output(s) such as “charging type, the distance towards charging station, charging cost, etc.” Furthermore, there might be a probability that few of the inputs are mutually equivalent in more than one stakeholder with dissimilar priorities as well as constraints.
After this, these sets of information are processed via a state-of-the-art approach using a deep reinforcement learning-based algorithm via a Bellman equation, in which the system learns from the Q-value of state s and action a as inputs: (Q(s, a)) must be equivalent to the instantaneous reward r acquired as a result of that action and additionally the Q-value of the finest feasible next action a’ taken from the next state s’, which is multiplied by a given discount factor. Moreover, this is a value with premises range ∈ (0,1] which is a hyper-parameter.
We further need to decide how much weight to assign to the impurity for the short-term and long-term rewards (Tang et al., 2020; Nguyen et al., 2020a; François-Lavet et al., 2018). The right-hand side of Fig. 12.2 shows a specific output generated for different environments. For example, the EV end-user will acquire the best schedule and routing selection bestowing toward the needs of the car’s battery and the environment with appropriate personal convenience. Grid operators will acquire the demand forecasts of electricity of a specific region according to the reservation of the charging stations, which decreases the fluctuation of the electric. The system will continuously learn from its environment and observe state(s) by interpolating weights, etc. (Li et al., 2020; Nguyen et al., 2020b; Wang et al., 2013).
The core functionality is exposed as a component to stakeholders from other domains with a distinct objective over a self-developed middleware-as-a-service component (see the right-hand side of Fig. 12.2). This extension leads us to prove the performance of our model at the urban scale level where high-dimensional data and scalability of models are required. For example, “Stakeholder ‘X’ would like to collect information coming from EV end-users, fleet-managers, charging stations, and the power-grid for a certain area. Use an algorithm to process this data over high computing nodes and suggest the best trade-off for all actors in the eco-system.” To fulfill such a type of user scenario, we develop our electro-vehicle middleware where we promote a smart mobility use case established on the interaction between the stakeholders as depicted in Fig. 12.2, where stakeholders from different domains exchange information according to their objective. The middleware-as-a-service utility provides a set of services for each application (app. SAx, SAy, SAz, etc.) to handshake with high-performance computing nodes such as Nx, Ny, and so on. Each of these services requires high-performance computation nodes to execute their service request, and finally, the results calculated by the algorithm are assigned back to the respective app (Sharif et al., 2017; Amogh Vardhan et al., 2019; Espeholt et al., 2018; Jiang et al., 2019). The initial objective of the algorithm is to find the optimal trade-off scenario for EV charging by considering a set of conflicting interests of multiple stakeholders acting in the smart mobility ecosystem (Fig. 12.2).
Suppose that the City Council of Stuttgart, Germany, would like to organize an event where people from all over the country are expected to participate. We continue with the same set of stakeholders. Due to the popularity of the electric car, the event organizer expects many people from neighboring cities will participate via personal transportation, i.e., often EV. The event organizer needs to distribute resources optimally in terms of mobility, which will be a very challenging task (Alyousef et al., 2018). To fulfill this extent of expectation, organizers require frequent and timely updates of the resource distribution.
We proposed an automation-based service mobility which keeps running algorithm(s) using a GitLab CI/CD runner to pass computation-intensive tasks over to a high-performance computing cluster and find the best trade-off for all actors in the ecosystem. In the next section, we present a user scenario that promotes the proposed methodology.
3 User Scenario
Tina lives in Tübingen and owns an electric car. She drives her car to Stuttgart regularly, where she spends much of the time working and networking. Once, on the way to Stuttgart, she observes that her battery is low on charge, and she would like to locate a charging station near her current location P(x, y). She finds the charging stations CS1(x′, y′), CS2(x′, y′), CS3(x′, y′), and CS4(x′, y′), each of which has different charging options (i.e., fast charging or slow charging) at different charging costs. The price of charging the EV (in EUR/kWh) at the four charging stations are a1(t)–a4(t) (see Fig. 12.3). Note that charging prices are given as a function of time to accommodate for time-varying electricity prices. To locate the optimal charging location that matches her requirements, she uses the algorithm proposed in this paper. Optimality is a perception that merely depends on the user’s primacies to a set of conflicting interests.
In this example, Tina has a high priority to requirements such as the charging station availability, charging price, distance to the charging station, charging time, and potential service disruptions. Once Tina chooses her priorities, the algorithm processes her requirement and recommends her the most appropriate charging location that fits her requirements. Moreover, the algorithm can compare the charging station of choice with the other charging stations in the vicinity. The application allows her to make an informed decision about the best charging station that fits her need and also gives her the option to reserve a charging point ahead of time to confirm availability. Once the reservation is complete, the application ensures that the charging point remains available at the specified time (see Fig. 12.4).
On the other hand, the local grid operator monitors the electricity demand forecast, i.e., Objy in Fig. 12.3, variation due to EV charging requirements. For example, Wolfgang works for the local grid operator and is responsible for the uninterrupted electricity supply in his control area. Due to the rapid adoption of EVs and many public charging stations set up to serve those vehicles, he knows that there can be peak times when electricity demand can suddenly increase. He has several strategies to deal with such peak demands; for example, he can activate reserve power supplies or activate a demand response plan. However, without an accurate forecast or a warning in advance, the activation of demand response or reserve power can be more expensive.
The application can provide the grid operator with a forecast of the electricity demand due to vehicle charging the next 15–60 minutes’ period. Note that the forecast has more likelihood to be precise for 15 minutes’ forecast period rather than for 60 minutes’ forecast period due to uncertainty, which the application takes into account. Based on the grid operator’s constraints, the application can also highlight where the potential demand-supply bottlenecks can occur. This information helps Wolfgang to plan the best course of action to ensure the reliability of the electricity supply in advance. With our application’s service, now Wolfgang also has an additional action that he can take to mitigate supply bottlenecks, which is to advise charging station owners to interrupt their services for the incoming service requests. In other words, the service availability status of a charging station can be updated by request from the grid operator that serves as a “proactive” demand response strategy. Activation of demand response, either passive or proactive, incurs a cost to the grid operator and a loss of utility to vehicle owners whose services are denied.
4 Simulation Environment
The system evaluation is assessed with respect to an EV end-user objective such as optimal cost and with respect to a grid operator objective such as energy demand or charging station availability, on behalf of the event organizer, i.e., the City Council. The event organizer would like to take a closer look at the resource demand and supply distribution optimally in between the event’s participants.
Revisiting the end-user from the use case example (see Sect. 12.3), Tina’s car has a usable battery capacity of 120 Ah, and it is compatible with both slow- (maximum charging rate of 11 kW) and fast-charging (maximum charging rate of 50 kW) connectors. When Tina decides to look for a charging station, the state of charge of the battery has already degraded to 10%.
Figure 12.4 shows a graphical overview of the use case. The EV drives past four different charging stations CS1–CS4. Tina may decide to check for a charging location at any random point along the route denoted by P1–P3, and the optimizer yields a different objective value depending on the context related to that point. Figure 12.5 shows the optimal objective function value calculated for the EV at different locations on the driving route. The different colors represent the different charging stations. In this example, the optimal objective value is also the minimum cost, which, however, is not always true when multiple contradicting concerns and end-user priorities are taken into consideration when evaluating the objective function.
5 Conclusion and Future Work
Presently, we have only picked up one scenario with a smaller time-stamp, i.e., EV end-user and grid operator, in our simulation environment. Moreover, the simulated use case in a static environment that predicts the maturity of the system with multiple stakeholder participation. Therefore, the current simulation environment does not include time-varying contexts such as variable pricing and power distribution forecasts. From the software architecture point of view, the service app middleware layer is already presented in another paper.
The dynamic coupling between the optimal charging resource distribution and electricity network models enables us to define the network capacity as a finite resource in the resource distribution algorithm and observe the state and impacts of the local distribution network during the smart charging process. This future extension enables us to simulate optimal EV charging resource distribution scenarios in combination with other distributed loads and generators in a city. Furthermore, this will be a significant step forward in the field of integrated urban energy system planning.
References
Alyousef, A., Danner, D., Kupzog, F., and de Meer, H. (2018). Design and validation of a smart charging algorithm for power quality control in electrical distribution systems. In Proceedings of the Ninth International Conference on Future Energy Systems, e-Energy ‘18, page 380–382, New York, NY, USA. Association for Computing Machinery.
Amogh Vardhan, K., Jakaraddi, M. G. D., Shetty, J., Chala, A., and Camper, D. (2019). Design and development of IoT plugin for hpcc systems. In2019 IEEE4th International Conference on Big Data Analytics (ICBDA), pages 158–162.
Barreto, L., Amaral, A., and Baltazar, S. (2020). Mobility in the Era of Digitalization: Thinking Mobility as a Service (MaaS). In Jardim-Goncalves, R., Sgurev, V.,Jotsov, V., and Kacprzyk, J., editors, Intelligent Systems: Theory, Research and Innovation in Applications, pages 275–293. Springer International Publishing, Cham.
Bessler, S. and Grønbæk, J. (2012). Routing ev users towards an optimal charging plan. World Electric Vehicle Journal, 5(3):688–695.
Chen, T., Zhang, B., Pourbabak, H., Kavousi-Fard, A., and Su, W. (2018). Optimal routing and charging of an electric vehicle fleet for high-efficiency dynamic transit systems. IEEE Transactions on Smart Grid, 9(4):3563–3572.
Dia, H. (2019). Rethinking Urban Mobility: Unlocking the Benefits of Vehicle Electrification, pages 83–98. Springer Singapore, Singapore.
Espeholt, L., Soyer, H., Munos, R., Simonyan, K., Mnih, V., Ward, T., Doron, Y.,Firoiu, V., Harley, T., Dunning, I., Legg, S., and Kavukcuoglu, K. (2018). Impala: Scalable distributed deep-rl with importance weighted actor-learner architectures.
François-Lavet, V., Henderson, P., Islam, R., Bellemare, M. G., and Pineau, J. (2018). An introduction to deep reinforcement learning. Foundations and Trends® in Machine Learning, 11(3-4):219–354.
Jiang, Z., Gao, W., Wang, L., Xiong, X., Zhang, Y., Wen, X., Luo, C., Ye, H., Zhang, Y., Feng, S., Li, K., Xu, W., and Zhan, J. (2019). Hpc ai500: A benchmark suite for hpc ai systems. Kraftfahrt-Bundesamt (2020). Fahrzeuge.
Li, K., Zhang, T., and Wang, R. (2020). Deep reinforcement learning for multiobjective optimization. IEEE Transactions on Cybernetics, page 1–12.
Lin, J., Zhou, W., and Wolfson, O. (2016). Electric vehicle routing problem. Transportation Research Procedia, 12:508–521. Tenth International Conference on City Logistics 17-19 June 2015, Tenerife, Spain.
Nguyen, N. D., Nguyen, T. T., Nguyen, H., and Nahavandi, S. (2020a). Review, analyze, and design a comprehensive deep reinforcement learning framework. CoRR, abs/2002.11883.
Nguyen, T. T., Nguyen, N. D., Vamplew, P., Nahavandi, S., Dazeley, R., and Lim, C. P.(2020b). A multi-objective deep reinforcement learning framework. Engineering Applications of Artificial Intelligence, 96:103915
Sharif, M., Heendeniya, C. B., Muhammad, A. S., and Lückemeyer, G. (2020a). Context-aware optimal charging distribution using deep reinforcement learning. In Proceedings of the 2020 the 4th International Conference on Big Data and Internet of Things, BDIOT 2020, page 64–68, New York, NY, USA. Association for Computing Machinery.
Sharif, M., Janto, S., and Lueckemeyer, G. (2020b). Coaas: Continuous integration and delivery framework for hpc using gitlab-runner. In Proceedings of the 2020 the 4th International Conference on Big Data and Internet of Things, BDIOT2020, page 54–58, New York, NY, USA. Association for Computing Machinery.
Sharif, M., Mercelis, S., Van Den Bergh, W., and Hellinckx, P. (2017). Towards real-time smart road construction: Efficient process management through the implementation of internet of things. In Proceedings of the International Conference on Big Data and Internet of Thing, BDIOT2017, page 174–180, New York, NY, USA. Association for Computing Machinery.
Tang, Y., Agrawal, S., and Faenza, Y. (2020). Reinforcement learning for integer programming: Learning to cut. VDA (2020). Electric Mobility: Electric Mobility in Germany
Wang, G., Xu, Z., Wen, F., and Wong, K. (2013). Traffic-constrained multiobjective planning of electric-vehicle charging stations. IEEE Transactions on Power Delivery, 28(4):2363–2372.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
Copyright information
© 2022 The Author(s)
About this chapter
Cite this chapter
Sharif, M., Heendeniya, C.B., Lückemeyer, G. (2022). ARaaS: Context-Aware Optimal Charging Distribution Using Deep Reinforcement Learning. In: Coors, V., Pietruschka, D., Zeitler, B. (eds) iCity. Transformative Research for the Livable, Intelligent, and Sustainable City. Springer, Cham. https://doi.org/10.1007/978-3-030-92096-8_12
Download citation
DOI: https://doi.org/10.1007/978-3-030-92096-8_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-92095-1
Online ISBN: 978-3-030-92096-8
eBook Packages: Economics and FinanceEconomics and Finance (R0)