Abstract
We consider the verification of multiple expected reward objectives at once on Markov decision processes (MDPs). This enables a trade-off analysis among multiple objectives by obtaining a Pareto front. We focus on strategies that are easy to employ and implement. That is, strategies that are pure (no randomization) and have bounded memory. We show that checking whether a point is achievable by a pure stationary strategy is NP-complete, even for two objectives, and we provide an MILP encoding to solve the corresponding problem. The bounded memory case is treated by a product construction. Experimental results using Storm and Gurobi show the feasibility of our algorithms.
currently affiliated with Vrije Universiteit Brussel.
Research partially supported by F.R.S.-FNRS Grant n\(^{\circ }\) F.4520.18 (ManySynth). Mickael Randour is an F.R.S.-FNRS Research Associate.
Chapter PDF
Similar content being viewed by others
References
Baier, C., Daum, M., Dubslaff, C., Klein, J., Klüppelholz, S.: Energy-utility quantiles. In: NASA Formal Methods, NFM. pp. 285–299 (2014). https://doi.org/10.1007/978-3-319-06200-6_24
Baier, C., Dubslaff, C., Klüppelholz, S.: Trade-off analysis meets probabilistic model checking. In: CSL-LICS. pp. 1:1–1:10. ACM (2014)
Baier, C., Hermanns, H., Katoen, J.: The 10, 000 facets of MDP model checking. In: Computing and Software Science, LNCS, vol. 10000, pp. 420–451. Springer (2019)
Baier, C., Katoen, J.P.: Principles of model checking. MIT Press (2008)
Baier, C., Klein, J., Leuschner, L., Parker, D., Wunderlich, S.: Ensuring the reliability of your model checker: Interval iteration for Markov decision processes. In: CAV (1). LNCS, vol. 10426, pp. 160–180. Springer (2017)
Barrett, L., Narayanan, S.: Learning all optimal policies with multiple criteria. In: (ICML). pp. 41–47 (2008)
Benini, L., Bogliolo, A., Paleologo, G.A., De Micheli, G.: Policy optimization for dynamic power management. Trans. Comp.-Aided Des. Integ. Cir. Sys. 18(6), 813–833 (2006). https://doi.org/10.1109/43.766730
Berthon, R., Randour, M., Raskin, J.: Threshold constraints with guarantees for parity objectives in Markov decision processes. In: ICALP. LIPIcs, vol. 80, pp. 121:1–121:15. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik (2017)
Bouyer, P., González, M., Markey, N., Randour, M.: Multi-weighted Markov decision processes with reachability objectives. In: GandALF. EPTCS, vol. 277, pp. 250–264 (2018)
Bruno, J.L., Downey, P.J., Frederickson, G.N.: Sequencing tasks with exponential service times to minimize the expected flow time or makespan. J. ACM 28(1), 100–113 (1981). https://doi.org/10.1145/322234.322242
Bruyère, V., Filiot, E., Randour, M., Raskin, J.: Meet your expectations with guarantees: Beyond worst-case synthesis in quantitative games. Inf. Comput. 254, 259–295 (2017)
Chatterjee, K., de Alfaro, L., Henzinger, T.A.: Trading memory for randomness. In: QEST. pp. 206–217. IEEE Computer Society (2004)
Chatterjee, K., Kretínská, Z., Kretínský, J.: Unifying two views on multiple mean-payoff objectives in markov decision processes. LMCS 13(2) (2017)
Chatterjee, K., Majumdar, R., Henzinger, T.A.: Markov decision processes with multiple objectives. In: STACS. LNCS, vol. 3884, pp. 325–336. Springer (2006)
Chen, T., Kwiatkowska, M.Z., Parker, D., Simaitis, A.: Verifying team formation protocols with probabilistic model checking. In: CLIMA. pp. 190–207 (2011)
Dehnert, C., Junges, S., Katoen, J.P., Volk, M.: A Storm is coming: A modern probabilistic model checker. In: CAV. LNCS, vol. 10427. Springer (2017)
Delgrange, F., Katoen, J.P., Quatmann, T., Randour, M.: Simple strategies in multi-objective MDPs (technical report). CoRR abs//1910.11024 (2019), http://arxiv.org/abs/1910.11024
Delgrange, F., Katoen, J.P., Quatmann, T., Randour, M.: Evaluated artifact for this paper. figshare (2020). https://doi.org/10.6084/m9.figshare.11569485
von Essen, C., Giannakopoulou, D.: Probabilistic verification and synthesis of the next generation airborne collision avoidance system. STTT 18(2), 227–243 (2016)
Etessami, K., Kwiatkowska, M.Z., Vardi, M.Y., Yannakakis, M.: Multi-objective model checking of Markov decision processes. Logical Methods in Computer Science 4(4) (2008). https://doi.org/10.2168/LMCS-4(4:8)2008
Feng, L., Wiltsche, C., Humphrey, L.R., Topcu, U.: Controller synthesis for autonomous systems interacting with human operators. In: ICCPS. pp. 70–79. ACM (2015)
Forejt, V., Kwiatkowska, M.Z., Norman, G., Parker, D.: Automated verification techniques for probabilistic systems. In: SFM. LNCS, vol. 6659, pp.53–113. Springer (2011)
Forejt, V., Kwiatkowska, M.Z., Norman, G., Parker, D., Qu, H.: Quantitative multi-objective verification for probabilistic systems. In: TACAS. LNCS, vol. 6605, pp. 112–127. Springer (2011)
Forejt, V., Kwiatkowska, M.Z., Parker, D.: Pareto curves for probabilistic model checking. In: ATVA. LNCS, vol. 7561, pp. 317–332. Springer (2012)
Garey, M.R., Johnson, D.S.: Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman & Co., New York, NY, USA (1979)
Gleixner, A., Bastubbe, M., Eifler, L., Gally, T., Gamrath, G., Gottwald, R.L., Hendel, G., Hojny, C., Koch, T., Lübbecke, M.E., Maher, S.J., Miltenberger, M., Müller, B., Pfetsch, M.E., Puchert, C., Rehfeldt, D., Schlösser, F., Schubert, C., Serrano, F., Shinano, Y., Viernickel, J.M., Walter, M., Wegscheider, F., Witt, J.T., Witzig, J.: The SCIP Optimization Suite 6.0. Technical report, Optimization Online (July 2018), http://www.optimization-online.org/DB_HTML/2018/07/6692.html
Gurobi Optimization, L.: Gurobi optimizer reference manual (2019), http://www.gurobi.com
Hartmanns, A., Junges, S., Katoen, J., Quatmann, T.: Multi-cost bounded reachability in MDP. In: TACAS (2). LNCS, vol. 10806, pp. 320–339. Springer (2018)
Junges, S., Jansen, N., Wimmer, R., Quatmann, T., Winterer, L., Katoen, J., Becker, B.: Finite-state controllers of POMDPs using parameter synthesis. In: UAI. pp. 519–529. AUAI Press (2018)
Kwiatkowska, M., Norman, G., Parker, D.: PRISM 4.0: Verification of probabilistic real-time systems. In: Gopalakrishnan, G., Qadeer, S. (eds.) Proc. 23rd International Conference on Computer Aided Verification (CAV’11). LNCS, vol. 6806, pp. 585–591. Springer (2011)
Kwiatkowska, M.Z., Norman, G., Parker, D.: The PRISM benchmark suite. In: QEST. pp. 203–204 (2012). https://doi.org/10.1109/QEST.2012.14
Lacerda, B., Parker, D., Hawes, N.: Multi-objective policy generation for mobile robots under probabilistic time-bounded guarantees. In: ICAPS. pp. 504–512. AAAI Press (2017)
Lizotte, D.J., Bowling, M., Murphy, S.A.: Linear fitted-Q iteration with multiple reward functions. J. Mach. Learn. Res. 13, 3253–3295 (2012)
Perny, P., Weng, P.: On finding compromise solutions in multiobjective Markov decision processes. In: ECAI. FAIA, vol. 215, pp. 969–970. IOS Press (2010)
Pia, A.D., Dey, S.S., Molinaro, M.: Mixed-integer quadratic programming is in NP. Math. Program. 162(1-2), 225–240 (2017)
Puterman, M.L.: Markov Decision Processes. John Wiley and Sons (1994)
Qiu, Q., Wu, Q., Pedram, M.: Stochastic modeling of a power-managed system: Construction and optimization. In: ISLPED. pp. 194–199. ACM (1999)
Quatmann, T., Junges, S., Katoen, J.: Markov automata with multiple objectives. In: CAV (1). LNCS, vol. 10426, pp. 140–159. Springer (2017)
Randour, M., Raskin, J., Sankur, O.: Variations on the stochastic shortest path problem. In: VMCAI. Lecture Notes in Computer Science, vol. 8931, pp.1–18. Springer (2015)
Randour, M., Raskin, J., Sankur, O.: Percentile queries in multi-dimensional Markov decision processes. FMSD 50(2-3), 207–248 (2017)
Roijers, D.M., Vamplew, P., Whiteson, S., Dazeley, R.: A survey of multi-objective sequential decision-making. JAIR 48, 67–113 (2013)
Scheftelowitsch, D., Buchholz, P., Hashemi, V., Hermanns, H.: Multi-objective approaches to Markov decision processes with uncertain transition parameters. In: VALUETOOLS. pp. 44–51. ACM (2017)
Srinivasan, M.: Nondeterministic polling systems. Management Science 37(6), 667–681 (1991). https://doi.org/10.1287/mnsc.37.6.667
Wiering, M.A., de Jong, E.D.: Computing optimal stationary policies for multi-objective Markov decision processes. In: ADPRL. pp. 158–165 (2007). https://doi.org/10.1109/ADPRL.2007.368183
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
Copyright information
© 2020 The Author(s)
About this paper
Cite this paper
Delgrange, F., Katoen, JP., Quatmann, T., Randour, M. (2020). Simple Strategies in Multi-Objective MDPs. In: Biere, A., Parker, D. (eds) Tools and Algorithms for the Construction and Analysis of Systems. TACAS 2020. Lecture Notes in Computer Science(), vol 12078. Springer, Cham. https://doi.org/10.1007/978-3-030-45190-5_19
Download citation
DOI: https://doi.org/10.1007/978-3-030-45190-5_19
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-45189-9
Online ISBN: 978-3-030-45190-5
eBook Packages: Computer ScienceComputer Science (R0)