Abstract
Process mining includes the automated discovery of processes from event logs. Based on observed events (e.g., activities being executed or messages being exchanged) a process model is constructed. One of the essential problems in process mining is that one cannot assume to have seen all possible behavior. At best, one has seen a representative subset. Therefore, classical synthesis techniques are not suitable as they aim at finding a model that is able to exactly reproduce the log. Existing process mining techniques try to avoid such “overfitting” by generalizing the model to allow for more behavior. This generalization is often driven by the representation language and very crude assumptions about completeness. As a result, parts of the model are “overfitting” (allow only for what has actually been observed) while other parts may be “underfitting” (allow for much more behavior without strong support for it). None of the existing techniques enables the user to control the balance between “overfitting” and “underfitting”. To address this, we propose a two-step approach. First, using a configurable approach, a transition system is constructed. Then, using the “theory of regions”, the model is synthesized. The approach has been implemented in the context of ProM and overcomes many of the limitations of traditional approaches.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
van der Aalst W.M.P.: The application of Petri nets to workflow management. J. Circ. Syst. Comput. 8(1), 21–66 (1998)
van der Aalst, W.M.P., van Dongen, B.F., Günther, C.W., Mans, R.S., Alves de Medeiros, A.K., Rozinat, A., Rubin, V., Song, M., Verbeek, H.M.W., Weijters, A.J.M.M.: ProM 4.0: comprehensive support for real process analysis. In: Kleijn, J., Yakovlev, A., (eds.) Application and Theory of Petri Nets and Other Models of Concurrency (ICATPN 2007). Lecture Notes in Computer Science, vol. 4546, pp. 484–494. Springer, Berlin (2007)
van der Aalst W.M.P., Alves de Medeiros A.K., Weijters A.J.M.M.: Genetic process mining. In: Ciardo, G., Darondeau, P. (eds) Applications and Theory of Petri Nets 2005. Lecture Notes in Computer Science, vol. 3536, pp. 48–69. Springer, Berlin (2005)
van der Aalst W.M.P., Reijers H.A., Weijters A.J.M.M., van Dongen B.F., Alves de Medeiros A.K., Song M., Verbeek H.M.W.: Business process mining: an industrial application. Inf. Syst. 32(5), 713–732 (2007)
van der Aalst, W.M.P., Rubin, V., van Dongen, B.F., Kindler, E., Günther, C.W.: Process mining: a two-step approach using transition systems and regions. BPM Center Report BPM-06-30, BPMcenter.org (2006)
van der Aalst W.M.P., van Dongen B.F., Herbst J., Maruster L., Schimm G., Weijters A.J.M.M.: Workflow mining: a survey of issues and approaches. Data Knowl. Eng. 47(2), 237–267 (2003)
van der Aalst W.M.P., Weijters A.J.M.M., Maruster L.: Workflow mining: discovering process models from event logs. IEEE Trans. Knowl. Data Eng. 16(9), 1128–1142 (2004)
Agrawal, R., Gunopulos, D., Leymann, F.: Mining process models from workflow logs. In: Proceedings of the Sixth International Conference on Extending Database Technology, pp. 469–483 (1998)
Badouel E., Bernardinello L., Darondeau P.: The synthesis problem for elementary net systems is NP-complete. Theor. Comput. Sci. 186(1–2), 107–134 (1997)
Badouel E., Darondeau P.: Theory of regions. In: Reisig, W., Rozenberg, G. (eds) Lectures on Petri Nets I: Basic Models. Lecture Notes in Computer Science, vol. 1491, pp. 529–586. Springer, Berlin (1998)
Bergenthum R., Desel J., Lorenz R., Mauser S.: Process mining based on regions of languages. In: Alonso, G., Dadam, P., Rosemann, M. (eds) International Conference on Business Process Management (BPM 2007). Lecture Notes in Computer Science, vol. 4714, pp. 375–383. Springer, Berlin (2007)
Cook J.E., Wolf A.L.: Discovering models of software processes from event-based data. ACM Trans. Softw. Eng. Methodol. 7(3), 215–249 (1998)
Cortadella J., Kishinevsky M., Kondratyev A., Lavagno L., Yakovlev A.: Petrify: a tool for manipulating concurrent specifications and synthesis of asynchronous controllers. IEICE Trans. Inf. Syst. E 80-D(3), 315–325 (1997)
Cortadella, J., Kishinevsky, M., Lavagno, L., Yakovlev, A.: Synthesizing Petri Nets from state-based models. In: Proceedings of the 1995 IEEE/ACM International Conference on Computer-Aided Design (ICCAD ’95), pp. 164–171. IEEE Computer Society, Los Alamitos (1995)
Cortadella J., Kishinevsky M., Lavagno L., Yakovlev A.: Deriving Petri nets from finite transition systems. IEEE Trans. Comput. 47(8), 859–882 (1998)
Datta A.: Automating the discovery of As-Is business process models: probabilistic and algorithmic approaches. Inf. Syst. Res. 9(3), 275–301 (1998)
Desel, J., Esparza, J.: Free choice Petri nets. In: Cambridge Tracts in Theoretical Computer Science. Cambridge University Press, Cambridge (1995)
Desel J., Reisig W.: The synthesis problem of Petri nets. Acta Inf. 33(4), 297–315 (1996)
Desel, J., Reisig, W., Rozenberg, G.: (eds) Lectures on Concurrency and Petri Nets. Lecture Notes in Computer Science, vol. 3098, Springer-Verlag, Berlin (2004)
van Dongen B.F., van der Aalst W.M.P.: Multi-phase process mining: building instance graphs. In: Atzeni, P., Chu, W., Lu, H., Zhou, S., Ling, T.W. (eds) International Conference on Conceptual Modeling (ER 2004). Lecture Notes in Computer Science, vol. 3288., pp. 362–376. Springer, Berlin (2004)
van Dongen, B.F., van der Aalst, W.M.P.: Multi-phase mining: aggregating instances graphs into EPCs and Petri nets. In: Marinescu, D. (ed.) Proceedings of the Second International Workshop on Applications of Petri Nets to Coordination, Workflow and Business Process Management, pp. 35–58. Florida International University, Miami, Florida, USA (2005)
Dumas M., van der Aalst W.M.P., ter Hofstede A.H.M.: Process-Aware Information Systems: Bridging People and Software through Process Technology. Wiley, London (2005)
Ehrenfeucht A., Rozenberg G.: Partial (Set) 2-structures—part 1 and part 2. Acta Inf. 27(4), 315–368 (1989)
Grigori D., Casati F., Castellanos M., Dayal U., Sayal M., Shan M.C.: Business process intelligence. Comput. Ind. 53(3), 321–343 (2004)
Günther C., van der Aalst W.M.P.: A generic import framework for process event logs. In: Eder, J., Dustdar, S. (eds) Business Process Management Workshops, Workshop on Business Process Intelligence (BPI 2006). Lecture Notes in Computer Science, vol. 4103, pp. 81–92. Springer, Berlin (2006)
Herbst, J.: A machine learning approach to workflow management. In: Proceedings 11th European Conference on Machine Learning. Lecture Notes in Computer Science, vol. 1810, pp. 183–194. Springer, Berlin (2000)
IDS Scheer, ARIS Process Performance Manager (ARIS PPM): Measure, Analyze and Optimize Your Business Process Performance (whitepaper). IDS Scheer, Saarbruecken, Gemany. http://www.ids-scheer.com (2002)
Kindler E., Rubin V., Schäfer W.: Process mining and Petri net synthesis. In: Eder, J., Dustdar, S. (eds) Business Process Management Workshops. Lecture Notes in Computer Science, vol 4103., pp. 105–116. Springer, Berlin (2006)
Lorenz R., Bergenthum R., Desel J., Mauser S.: Synthesis of Petri nets from finite partial languages. In: Basten, T., Juhás, G., Shukla, S.K. (eds) International Conference on Application of Concurrency to System Design (ACSD 2007)., pp. 157–166. IEEE Computer Society, Los Alamitos (2007)
Lorenz, R., Juh’as, G.: How to synthesize nets from languages: a survey. In: Henderson, S.G., Biller, B., Hsieh, M., Shortle, J., Tew, J.D., Barton, R.R. (eds.) Proceedings of the Winter simulation Conference (WSC 2007), pp. 637–647. IEEE Computer Society, Los Alamitos (2007)
de Medeiros, A.K.A.: Genetic Process Mining. PhD thesis, Eindhoven University of Technology, Eindhoven (2006)
de Medeiros A.K.A., van der Aalst W.M.P., Weijters A.J.M.M.: Workflow mining: current status and future directions. In: Meersman, R., Tari, Z., Schmidt, D.C. (eds) On The Move to Meaningful Internet Systems 2003: CoopIS, DOA, and ODBASE. Lecture Notes in Computer Science, vol. 2888, pp. 389–406. Springer, Berlin (2003)
zur Mühlen, M., Rosemann, M.: Workflow-based process monitoring and controlling—technical and organizational issues. In: Sprague, R. (ed.) Proceedings of the 33rd Hawaii International Conference on System Science (HICSS-33), pp. 1–10. IEEE Computer Society Press, Los Alamitos (2000)
Reisig, W., Rozenberg, G. (eds): Lectures on Petri Nets I: Basic Models Lecture Notes in Computer Science, vol. 1491. Springer-Verlag, Berlin (1998)
Rozinat A., van der Aalst W.M.P.: Conformance testing: measuring the fit and appropriateness of event logs and process models. In: Bussler, C. et al. (eds) BPM 2005 Workshops (Workshop on Business Process Intelligence). Lecture Notes in Computer Science, vol. 3812, pp. 163–176. Springer, Berlin (2006)
Rozinat A., van der Aalst W.M.P.: Decision mining in ProM. In: Dustdar, S., Faideiro, J.L., Sheth, A. (eds) International Conference on Business Process Management (BPM 2006). Lecture Notes in Computer Science, vol. 4102, pp. 420–425. Springer, Berlin (2006)
Rozinat A., van der Aalst W.M.P.: Conformance checking of processes based on monitoring real behavior. Inf. Syst. 33(1), 64–95 (2008)
Sayal M., Casati, F., Dayal, U., Shan, M.C.: Business process cockpit. In: Proceedings of 28th International Conference on Very Large Data Bases (VLDBG’02), pp. 880–883. Morgan Kaufmann, USA (2002)
TIBCO. TIBCO Staffware Process Monitor (SPM). http://www.tibco.com, 2005
Weijters A.J.M.M., van der Aalst W.M.P.: Rediscovering workflow models from event-based data using little thumb. Integr. Comput. Aided Eng. 10(2), 151–162 (2003)
Wen L., van der Aalst W.M.P., Wang J., Sun J.: Mining process models with non-free-choice constructs. Data Min. Knowl. Discov. 15(2), 145–180 (2007)
van der Werf, J.M.E.M., van Dongen, B.F., Hurkens, C.A.J., Serebrenik, A.: Process discovery using integer linear programming. In: van Hee, K., Valk, R., (eds.) Proceedings of the 29th International Conference on Applications and Theory of Petri Nets (Petri Nets 2008). Lecture Notes in Computer Science, vol. 5062 , pp. 368–387. Springer, Berlin (2008)
Acknowledgments
This research is supported by EIT, NWO-EW, and the Technology Foundation STW. Moreover, we would like to thank the many people involved in the development of ProM.
Open Access
This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution,and reproduction in any medium, provided the original author(s) and source are credited.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Prof. August-Wilhelm Scheer.
Rights and permissions
Open Access This is an open access article distributed under the terms of the Creative Commons Attribution Noncommercial License (https://creativecommons.org/licenses/by-nc/2.0), which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.
About this article
Cite this article
van der Aalst, W.M.P., Rubin, V., Verbeek, H.M.W. et al. Process mining: a two-step approach to balance between underfitting and overfitting. Softw Syst Model 9, 87–111 (2010). https://doi.org/10.1007/s10270-008-0106-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10270-008-0106-z