Abstract
Thousands of machine learning research papers contain extensive experimental comparisons. However, the details of those experiments are often lost after publication, making it impossible to reuse these experiments in further research, or reproduce them to verify the claims made. In this paper, we present a collaboration framework designed to easily share machine learning experiments with the community, and automatically organize them in public databases. This enables immediate reuse of experiments for subsequent, possibly much broader investigation and offers faster and more thorough analysis based on a large set of varied results. We describe how we designed such an experiment database, currently holding over 650,000 classification experiments, and demonstrate its use by answering a wide range of interesting research questions and by verifying a number of recent studies.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Aha, D. (1992). Generalizing from case studies: a case study. In Proceedings of the international conference on machine learning (ICML) (pp. 1–10).
Ashburner, M., Ball, C. A., Blake, J. A., Botstein, D., Butler, H., Cherry, J. M., Davis, A. P., Dolinski, K., Dwight, S. S., Eppig, J. T., Harris, MA, Hill, D. P., Issel-Tarver, L., Kasarskis, A., Lewis, S., Matese, J. C., Richardson, J. E., Ringwald, M., Rubin, G. M., & Sherlock, G. (2000). Gene ontology: tool for the unification of biology. Nature Genetics, 25, 25–29.
Asuncion, A., & Newman, D. J. (2007). UCI machine learning repository. University of California, School of Information and Computer Science.
Ball, C. A., Brazma, A., Causton, H. C., & Chervitz, S. (2004). Submission of microarray data to public repositories. PLoS Biology, 2(9), e317.
Bauer, E., & Kohavi, R. (1999). An empirical comparison of voting classification algorithms: bagging, boosting, and variants. Machine Learning, 36(1–2), 105–139.
Blockeel, H. (2006). Experiment databases: A novel methodology for experimental research. Lecture Notes in Computer Science, 3933, 72–85.
Blockeel, H., & Vanschoren, J. (2007). Experiment databases: towards an improved experimental methodology in machine learning. Lecture Notes in Computer Science, 4702, 6–17.
Bradford, J., & Brodley, C. (2001). The effect of instance-space partition on significance. Machine Learning, 42, 269–286.
Brain, D., & Webb, G. (2002). The need for low bias algorithms in classification learning from large data sets. Lecture Notes in Artificial Intelligence, 2431, 62–73.
Brazdil, P., Giraud-Carrier, C., Soares, C., & Vilalta, R. (2009). Metalearning: applications to data mining. Berlin: Springer.
Brazma, A., Hingamp, P., Quackenbush, J., Sherlock, G., Spellman, P., Stoeckert, C., Aach, J., Ansorge, W., Ball, C., Causton, H. C., Gaasterland, T., Glenisson, P., Holstege, F., Kim, I., Markowitz, V., Matese, J., Parkinson, H., Robinson, A., Sarkans, U., Schulze-Kremer, S., Stewart, J., Taylor, R., & Vingron, J. (2001). Minimum information about a microarray experiment. Nature Genetics, 29, 365–371.
Brown, D., Vogt, R., Beck, B., & Pruet, J. (2007). High energy nuclear database: a testbed for nuclear data information technology. In Proceedings of the international conference on nuclear data for science and technology, article 250.
Carpenter, J. (2011). May the best analyst win. Science, 331(6018), 698–699.
Caruana, R., & Niculescu-Mizil, A. (2006). An empirical comparison of supervised learning algorithms. In Proceedings of the international conf. on machine learning (pp. 161–168).
Chandrasekaran, B., & Josephson, J. (1999). What are ontologies, and why do we need them? IEEE Intelligent Systems, 14(1), 20–26.
Demsar, J. (2006). Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research, 7, 1–30.
Derriere, S., Preite-Martinez, A., & Richard, A. (2006). UCDs and ontologies. ASP Conference Series, 351, 449.
Dietterich, T. G. (1998). Approximate statistical tests for comparing supervised classification learning algorithms. Neural Computation, 10(7), 1895–1923.
Frawley, W. (1989). The role of simulation in machine learning research. In Proceedings of the annual symposium on simulation (ANSS) (pp. 119–127).
Fromont, E., Blockeel, H., & Struyf, J. (2007). Integrating decision tree learning into inductive databases. Lecture Notes in Computer Science, 4747, 81–96.
Hall, M. (1998). Correlation-based feature selection for machine learning. PhD Thesis, Waikato University.
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., & Witten, I. H. (2009). The WEKA data mining software: an update. ACM SIGKDD Explorations Newsletter, 11(1), 10–18.
Hand, D. (2006). Classifier technology and the illusion of progress. Statistical Science, 21(1), 1–14.
Hilario, M., & Kalousis, A. (2000). Building algorithm profiles for prior model selection in knowledge discovery systems. Engineering Intelligent Systems, 8(2), 956–961.
Hilario, M., Kalousis, A., Nguyen, P., & Woznica, A. (2009). A data mining ontology for algorithm selection and meta-mining. In Proceedings of the ECML-PKDD’09 workshop on service-oriented knowledge discovery (pp. 76–87).
Hirsh, H. (2008). Data mining research: Current status and future opportunities. Statistical Analysis and Data Mining, 1(2), 104–107.
Holte, R. (1993). Very simple classification rules perform well on most commonly used datasets. Machine Learning, 11, 63–91.
Hoste, V., & Daelemans, W. (2005). Comparing learning approaches to coreference resolution. There is more to it than bias. In Proceedings of the ICML’05 workshop on meta-learning (pp. 20–27).
Imielinski, T., & Mannila, H. (1996). A database perspective on knowledge discovery. Communications of the ACM, 39(11), 58–64.
Jensen, D., & Cohen, P. (2000). Multiple comparisons in induction algorithms. Machine Learning, 38, 309–338.
Keogh, E., & Kasetty, S. (2003). On the need for time series data mining benchmarks: a survey and empirical demonstration. Data Mining and Knowledge Discovery, 7(4), 349–371.
Kietz, J., Serban, F., Bernstein, A., & Fischer, S. (2009). Towards cooperative planning of data mining workflows. In Proceedings of the ECML-PKDD’09 workshop on service-oriented knowledge discovery (pp. 1–12).
King, R., Rowland, J., Oliver, S., Young, M., Aubrey, W., Byrne, E., Liakata, M., Markham, M., Pir, P., Soldatova, L. N., Sparkes, A., Whelan, K. E., & Clare, A. (2009). The automation of science. Science, 324(5923), 85–89.
Kohavi, R., & Wolpert, D. (1996). Bias plus variance decomposition for zero-one loss functions. In Proceedings of the international conference on machine learning (pp. 275–283).
Leake, D., & Kendall-Morwick, J. (2008). Towards case-based support for e-science workflow generation by mining provenance. Lecture Notes in Computer Science, 5239, 269–283.
Manolescu, I., Afanasiev, L., Arion, A., Dittrich, J., Manegold, S., Polyzotis, N., Schnaitter, K., Senellart, P., & Zoupanos, S. (2008). The repeatability experiment of SIGMOD 2008. ACM SIGMOD Record, 37(1), 39–45.
Michie, D., Spiegelhalter, D., & Taylor, C. (1994). Machine learning, neural and statistical classification. Ellis Horwood: Chichester.
Morik, K., & Scholz, M. (2004). The MiningMart approach to knowledge discovery in databases. In N. Zhong & J. Liu (Eds.), Intelligent technologies for information analysis (pp. 47–65). Berlin: Springer.
Nielsen, M. (2008). The future of science: building a better collective memory. APS Physics, 17(10).
Ochsenbein, F., Williams, R. W., Davenhall, C., Durand, D., Fernique, P., Hanisch, R., Giaretta, D., McGlynn, T., Szalay, A., & Wicenec, A. (2004). VOTable: tabular data for the Virtual Observatory. In Q. Peter & G. Krzysztof (Eds.), Toward an international virtual observatory (Vol. 30, pp. 118–123). Berlin: Springer.
Panov, P., Soldatova, L. N., & Džeroski, S. (2009). Towards an ontology of data mining investigations. Lecture Notes in Artificial Intelligence, 5808, 257–271.
Pedersen, T. (2008). Empiricism is not a matter of faith. Computational Linguistics, 34, 465–470.
Perlich, C., Provost, F., & Simonoff, J. (2003). Tree induction vs. logistic regression: a learning-curve analysis. Journal of Machine Learning Research, 4, 211–255.
Pfahringer, B., Bensusan, H., & Giraud-Carrier, C. (2000). Meta-learning by landmarking various learning algorithms. In Proceedings of the international conference on machine learning (ICML) (pp. 743–750).
De Roure, D., Goble, C., & Stevens, R. (2009). The design and realisation of the myExperiment virtual research environment for social sharing of workflows. Future Generations Computer Systems, 25, 561–567.
Salzberg, S. (1999). On comparing classifiers: a critique of current research and methods. Data Mining and Knowledge Discovery, 1, 1–12.
Schaaff, A. (2007). Data in astronomy: from the pipeline to the virtual observatory. Lecture Notes in Computer Science, 4832, 52–62.
Soldatova, L., & King, R. (2006). An ontology of scientific experiments. Journal of the Royal Society Interface, 3(11), 795–803.
Sonnenburg, S., Braun, M., Ong, C., Bengio, S., Bottou, L., Holmes, G., LeCun, Y., Muller, K., Pereira, F., Rasmussen, C., Ratsch, G., Scholkopf, B., Smola, A., Vincent, P., Weston, J., & Williamson, R. (2007). The need for open source software in machine learning. Journal of Machine Learning Research, 8, 2443–2466.
Stoeckert, C., Causton, H. C., & Ball, C. A. (2002). Microarray databases: standards and ontologies. Nature Genetics, 32, 469–473.
Szalay, A., & Gray, J. (2001). The world-wide telescope. Science, 293, 2037–2040.
van Someren, M. (2001). Model class selection and construction: beyond the procrustean approach to machine learning applications. Lecture Notes in Computer Science, 2049, 196–217.
Vanschoren, J., & Blockeel, H. (2008). Investigating classifier learning behavior with experiment databases. Studies in Classification, Data Analysis, and Knowledge Organization, 5, 421–428.
Vanschoren, J., Pfahringer, B., & Holmes, G. (2008). Learning from the past with experiment databases. Lecture Notes in Artificial Intelligence, 5351, 485–492.
Vanschoren, J., Blockeel, H., Pfahringer, B., & Holmes, G. (2009). Organizing the world’s machine learning information. Communications in Computer and Information Science, 17(12), 693–708.
Vizcaino, J., Cote, R., Reisinger, F., Foster, J., Mueller, M., Rameseder, J., Hermjakob, H., & Martens, L. (2009). A guide to the Proteomics Identifications Database proteomics data repository. Proteomics, 9(18), 4276–4283.
Wojnarski, M., Stawicki, S., & Wojnarowski, P. (2010). TunedIT.org: system for automated evaluation of algorithms in repeatable experiments. Lecture Notes in Computer Science, 6086, 20–29.
Wolpert, D. (2001). The supervised learning no-free-lunch theorems. In Proceedings of the online world conference on soft computing in industrial applications (pp. 25–42).
Yasuda, N., Mizumoto, Y., Ohishi, M., O’Mullane, W., Budavari, T., Haridas, V., Li, N., Malik, T., Szalay, A., Hill, M., Linde, T., Mann, B., & Page, C. (2004). Astronomical data query language: simple query protocol for the virtual observatory. ASP Conference Series, 314, 293.
Záková, M., Kremen, P., Zelezný, F., & Lavrač, N. (2008). Planning to learn with a knowledge discovery ontology. In Proceedings of the ICML/UAI/COLT’08 workshop on planning to learn (pp. 29–34).
Author information
Authors and Affiliations
Corresponding author
Additional information
Editor: Carla Brodley.
Rights and permissions
Open Access This is an open access article distributed under the terms of the Creative Commons Attribution Noncommercial License (https://creativecommons.org/licenses/by-nc/2.0), which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.
About this article
Cite this article
Vanschoren, J., Blockeel, H., Pfahringer, B. et al. Experiment databases. Mach Learn 87, 127–158 (2012). https://doi.org/10.1007/s10994-011-5277-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10994-011-5277-0