Abstract
Rapidly changing heterogeneous supercomputer architectures pose a great challenge to many scientific communities trying to leverage the latest technology in high-performance computing. Many existing projects with a long development history have resulted in a large amount of code that is not directly compatible with the latest accelerator architectures. Furthermore, due to limited resources of scientific institutions, developing and maintaining architecture-specific ports is generally unsustainable. In order to adapt to modern accelerator architectures, many projects rely on directive-based programming models or build the codebase tightly around a third-party domain-specific language or library. This introduces external dependencies out of control of the project. The presented paper tackles the issue by proposing a lightweight application-side adaptor layer for compute kernels and memory management resulting in a versatile and inexpensive adaptation of new accelerator architectures with little draw backs. A widely used hydrologic model demonstrates that such an approach pursued more than 20 years ago is still paying off with modern accelerator architectures as demonstrated by a very significant performance gain from NVIDIA A100 GPUs, high developer productivity, and minimally invasive implementation; all while the codebase is kept well maintainable in the long-term.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
Change history
12 August 2021
A Correction to this paper has been published: https://doi.org/10.1007/s10596-021-10065-y
References
PRACE (2018) The scientific case for computing in Europe 2018–2026. Tech. rep
Lawrence, B.N., Rezny, M., Budich, R., Bauer, P., Behrens, J., Carter, M., Deconinck, W., Ford, R., Maynard, C., Mullerworth, S., Osuna, C., Porter, A., Serradell, K., Valcke, S., Wedi, N., Wilson, S.: Crossing the chasm: how to develop weather and climate models for next generation computers? Geosci. Model Dev. 11(5), 1799–1821 (2018). https://doi.org/10.5194/gmd-11-1799-2018, URL https://www.geosci-model-dev.net/11/1799/2018/
MPI Forum (1994) MPI: a message-passing interface standard. Tech. rep., University of Tennessee
Leiserson, C.E., Thompson, N.C., Emer, J.S., Kuszmaul, B.C., Lampson, B.W., Sanchez, D., Schardl, T.B.: There’s plenty of room at the top: what will drive computer performance after Moore’s law? Science. 368(6495), eaam9744 (2020). https://doi.org/10.1126/science.aam9744
Rathgeber, F., Ham, D.A., Mitchell, L., Lange, M., Luporini, F., McRae, A.T., Bercea, G.T., Markall, G.R., Kelly, P.H.: Firedrake: automating the finite element method by composing abstractions. ACM Trans. Math. Softw. 43(3), 1–27 (2016). https://doi.org/10.1145/2998441, URL https://dl.acm.org/doi/10.1145/2998441, 1501.01809
Thaler F, Moosbrugger S, Osuna C, Bianco M, Vogt H, Afanasyev A, Mosimann L, Fuhrer O, Schulthess TC, Hoefler T (2019) Porting the COSMO weather model to manycore CPUs. In: proceedings of the platform for advanced scientific computing conference, PASC 2019, Association for Computing Machinery, Inc, New York, NY, USA, pp 1–11, https://doi.org/10.1145/3324989.3325723, URL http://dl.acm.org/doi/10.1145/3324989.3325723
Adams, S.V., Ford, R.W., Hambley, M., Hobson, J.M., Kavcic, I., Maynard, C.M., Melvin, T., Mueller, E.H., Mullerworth, S., Porter, A.R., Rezny, M., Shipway, B.J., Wong, R.: LFRic: meeting the challenges of scalability and performance portability in weather and climate models. J. Parallel. Distr. Com. 132, 383–396 (2018). https://doi.org/10.1016/j.jpdc.2019.02.007, URL http://dx.doi.org/10.1016/j.jpdc.2019.02.007, 1809.07267
Zenker, E., Worpitz, B., Widera, R., Huebl, A., Juckeland, G., Knupfer, A., Nagel, W.E., Bussmann, M.: Alpaka - an abstraction library for parallel kernel acceleration. In: proceedings - 2016 IEEE 30th international parallel and distributed processing symposium, IPDPS 2016, Institute of Electrical and Electronics Engineers Inc., pp 631–640. (2016). https://doi.org/10.1109/IPDPSW.2016.50
Edwards, H.C., Sunderland, D., Porter, V., Amsler, C., Mish, S.: Manycore performance-portability: Kokkos multidimensional array library. Sci. Program. 20(2), 89–114 (2012). https://doi.org/10.3233/SPR-2012-0343
Beckingsale DA, Scogland TR, Burmark J, Hornung R, Jones H, Killian W, Kunen AJ, Pearce O, Robinson P, Ryujin BS (2019) RAJA: portable performance for large-scale scientific applications. In: Proceedings of P3HPC 2019: International Workshop on Performance, Portability and Productivity in HPC - Held in conjunction with SC 2019: The International Conference for High Performance Computing, Networking, Storage and Analysis, Institute of Electrical and Electronics Engineers Inc., pp 71–81, https://doi.org/10.1109/P3HPC49587.2019.00012
Kuffour, B.N.O., Engdahl, N.B., Woodward, C.S., Condon, L.E., Kollet, S., Maxwell, R.M.: Simulating coupled surface-subsurface flows with ParFlow v3.5.0: capabilities, applications, and ongoing development of an open-source, massively parallel, integrated hydrologic model. Geosci. Model Dev. 13(3), 1373–1397 (2020). https://doi.org/10.5194/gmd-13-1373-2020 URL https://www.geosci-model-dev.net/13/1373/2020/
Woodward CS (1998) A Newton-Krylov-multigrid solver for variably saturated flow problems. Transactions on Ecology and the Environment 17
Kollet, S.J., Maxwell, R.M.: Integrated surface-groundwater flow modeling: a free-surface overland flow boundary condition in a parallel groundwater flow model. Adv. Water Resour. 29(7), 945–958 (2006). https://doi.org/10.1016/j.advwatres.2005.08.006
Maxwell, R.M.: A terrain-following grid transform and preconditioner for parallel, large-scale, integrated hydrologic modeling. Adv. Water Resour. 53, 109–117 (2013). https://doi.org/10.1016/j.advwatres.2012.10.001
Pleiter D, Herten A (2020) Enabling applications for the JUWELS booster [A21365]. NVIDIA GPU Technology Conference
Acknowledgements
The work described in this paper has received funding from the Helmholtz Association (HGF) through the project “Advanced Earth System Modeling Capacity (ESM) and the Pilot Laboratory Exa-ESM. The authors gratefully acknowledge the computing time granted through the ESM test partition on the supercomputer JUWELS at the Jülich Supercomputing Centre, Forschungszentrum Jülich, Germany. The authors also gratefully acknowledge support from the European Commission Horizon 2020 research and innovation program under Grant Agreement No. 824158 (EoCoE-II). Furthermore, NVIDIA Application Lab at the Jülich Supercomputing Centre is thanked for technical support regarding the CUDA implementation. Finally, the foundations for the ParFlow eDSL were laid by Steven Smith, Rob Falgout, and Chuck Baldwin, all from Lawrence Livermore National Laboratory, USA.
Code availability
ParFlow source code is covered by the GNU Lesser General Public License and is available in a public repository at https://github.com/parflow (last access: 27th October 2020). The commit 974c7bb dated 21st October 2020 was used in this paper.
Funding
Open Access funding enabled and organized by Projekt DEAL.
Author information
Authors and Affiliations
Contributions
Jaro Hokkanen performed the technical developments, analyses, and wrote the manuscript; Stefan Kollet advised on ParFlow technical issues, contributed to the analyses, and co-wrote the manuscript; Jiri Kraus, Andreas Herten, and Markus Hrywniak provided technical support regarding the implementation, optimization, and the HPC environment; Dirk Pleiter contributed to the analyses and the manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that there is no conflict of interest.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
The original online version of this article was revised: The original publication of the article contains major typesetting and production errors introduced by the publisher. During proofreading, corrections provided by authors were not honored by the publisher.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Hokkanen, J., Kollet, S., Kraus, J. et al. Leveraging HPC accelerator architectures with modern techniques — hydrologic modeling on GPUs with ParFlow. Comput Geosci 25, 1579–1590 (2021). https://doi.org/10.1007/s10596-021-10051-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10596-021-10051-4
Keywords
- High-performance computing (HPC)
- GPU computing
- Distributed memory parallelism
- Accelerator architecture
- Domain-specific language (DSL)