Abstract
High-level ab initio quantum chemical (QC) molecular potential energy surfaces (PESs) are crucial for accurately simulating molecular rotation-vibration spectra. Machine learning (ML) can help alleviate the cost of constructing such PESs, but requires access to the original ab initio PES data, namely potential energies computed on high-density grids of nuclear geometries. In this work, we present a new structured PES database called VIB5, which contains high-quality ab initio data on 5 small polyatomic molecules of astrophysical significance (CH3Cl, CH4, SiH4, CH3F, and NaOH). The VIB5 database is based on previously used PESs, which, however, are either publicly unavailable or lacking key information to make them suitable for ML applications. The VIB5 database provides tens of thousands of grid points for each molecule with theoretical best estimates of potential energies along with their constituent energy correction terms and a data-extraction script. In addition, new complementary QC calculations of energies and energy gradients have been performed to provide a consistent database, which, e.g., can be used for gradient-based ML methods.
Measurement(s) | potential energy surfaces |
Technology Type(s) | quantum chemistry computational methods |
Similar content being viewed by others
Background & Summary
Many physical and chemical processes of molecular systems are governed by potential energy surfaces (PESs) that are functions of potential energy with respect to the molecular geometry defined by the nuclei1. Accurate ab initio quantum chemical (QC) molecular PESs are essential to predict and understand a multitude of physicochemical properties of interest such as reaction thermodynamics, kinetics2, and simulation of rovibrational spectra3,4,5. As for the latter, PESs of a number of different molecules have been constructed and used in variational nuclear motion calculations to provide accurate rotation-vibration-electronic line lists to aid the characterization of exoplanet atmospheres, amongst other applications6,7,8,9,10,11,12,13,14,15,16.
It is necessary to have a global PES covering all relevant regions of nuclear configurations allowing to simulate rotation-vibration (rovibrational) spectra approaching the coveted spectroscopic accuracy of 1 cm−1 in a broad range of temperatures. This can be achieved by defining the PES on a high-density grid of nuclear geometries with no holes and having the theoretical best estimate (TBE) of energies computed at a very high QC level of theory. The construction of an optimal grid usually involves many steps and human intervention, and often requires a staggeringly large number of grid points, e.g., ca. 100 thousand points even for a five-atom molecule such as methane10. The choice of QC level for TBE calculations is determined by the trade-off between accuracy and computational cost, but typically requires going well beyond the gold-standard17,18,19 CCSD(T)17/CBS (coupled cluster with single and double excitations and a perturbative treatment of triple excitations/complete basis set) limit and needs many QC corrections on top of it. Just to give a perspective, ca. 24 single processing unit (CPU)-hours are required for calculating TBE energy of each grid point of ~45 thousand methyl chloride (CH3Cl) geometries amounting to over 100 CPU-years when constructing its highly accurate ab initio PES20.
To reduce the high computational cost, machine learning (ML) has emerged as a powerful approach for constructing full-dimensional PESs21,22,23,24,25,26,27 and the resulting ML PESs can be used22,24,28,29,30,31,32,33,34,35 for performing vibrational calculations. In particular, substantial cost reduction can be achieved by calculating TBE energies only for a small number of existing grid points and then interpolating between them with ML36; such ML grids can be subsequently used for simulating rovibrational spectra with a relatively small loss of accuracy. Importantly, much larger savings in computational cost can be achieved20, when ML is applied to learn various QC corrections using a hierarchical ML (hML) scheme based on Δ-learning37 rather than to learn the TBE energy directly.
Despite all the above efforts in constructing highly accurate PESs, there is still room for improvement, e.g., via creating denser grids, using higher QC levels, and further development of ML approaches, all of which requires access to data. Unfortunately, the raw data containing geometries, TBEs and TBE constituent terms for many published studies is either missing or scattered. Thus, our data descriptor aims to organize these scattered data generated in the previous studies by some of us into a consolidated, structured PES database that we call VIB5. The VIB5 database contains five molecules CH3Cl7,9,20, CH410, SiH48, CH3F12, and NaOH14. The number of grid points ranges from 15 thousand to 100 thousand; altogether more than 300 thousand points (Table 1). In addition, it is also known that inclusion of the energy gradient information can significantly reduce the number of training points for ML, which is efficiently exploited in the gradient-based ML models38,39. Thus, for this database, we additionally calculate energies and energy gradients at two levels of theory, MP2/cc-pVTZ (second order Møller-Plesset perturbation theory/correlation-consistent triple-zeta basis set) and CCSD(T)/cc-pVQZ (correlation consistent quadruple-zeta basis set), and provide the HF (Hartree–Fock) energies calculated with the corresponding basis sets cc-pVTZ and cc-pVQZ.
Our database is complementary to existing databases used for developing ML PES models. Some existing databases contain only energies for equilibrium geometries of various compounds calculated at different levels (from density functional theory [DFT] up to coupled-cluster approaches): QM740, QM7b41, QM942, revised QM943, and ANI-1ccx44. Another database (ANI-145) also contains energies at DFT for off-equilibrium geometries. Energies and energy gradients at DFT are available for equilibrium and off-equilibrium geometries of different molecules in the ANI-1x44 and QM7-X46 databases. The MD-17 dataset38,39 is a popular database with energies and energy gradients for geometries taken from MD trajectories of several small- to medium-sized molecules at DFT and for subset of points at CCSD(T) with different basis sets. PESs generated from MD are, however, likely to have limited coverage of high-energy geometries and many holes, making them inapplicable to some kinds of accurate simulations such as diffusion Monte Carlo calculations as was pointed out recently47. In contrast to these databases, our database provides reliable, global PESs with QC energies and energy gradients at different levels including very accurate TBEs of energies going beyond CCSD(T)/CBS, which can be used for ML models trained on data from several levels of theory, such as hML, Δ-learning, etc. Finally, our database comes with a convenient data-extraction script that can be used to pull the required information in a suitable format for, e.g., ML.
Methods
Grid points generation
For each molecule, we take grid points directly from the previous studies by some of the authors. Here we only describe in short how these grid points were generated for the sake of completeness. We refer the reader to the original publications cited for each molecule for further details (see Table 1).
CH3Cl
44819 grid points for CH3Cl were taken from Refs. 7,9,20. A Monte Carlo random energy-weighted sampling algorithm was applied to nine internal coordinates of CH3Cl: the C–Cl bond length r0; three C–H bond lengths r1, r2, and r3; three ∠(HiCCl) interbond angles β1, β2, and β3; and two dihedral angles τ12 and τ13 between adjacent planes containing HiCCl and HjCCl (Fig. 1a). This procedure led to geometries in the range 1.3 ≤ r0 ≤ 2.95 Å, 0.7 ≤ ri ≤ 2.45 Å, 65 ≤ βi ≤ 165° for i = 1, 2, 3 and 55 ≤ τjk ≤ 185° with jk = 12, 13. The grid also includes 1000 carefully chosen low-energy points to ensure an adequate description of the equilibrium region.
CH4
97271 grid points for CH4 were taken from ref. 10. The global grid was built in the same fashion as the grid was constructed for CH3Cl. Nine internal coordinates of CH4 are defined as follows: four C–H bond lengths r1, r2, r3 and r4; five∠(Hj-C-Hk) interbond angles α12, α13, α14, α23, and α24, where j and k label the respective hydrogen atoms (Fig. 1b). Then grid points are in the range 0.71 ≤ ri ≤ 2.60 Å for i = 1, 2, 3, 4 and 40 ≤ αjk ≤ 140° with jk = 12, 13, 14, 23, 24.
SiH4
84002 grid points for SiH4 were taken from ref. 8. Nine internal coordinates of SiH4 are defined in the same way as CH4: four Si–H bond lengths r1, r2, r3 and r4; five∠(Hj-Si-Hk) interbond angles α12, α13, α14, α23, and α24, where j and k label the respective hydrogen atoms (Fig. 1c). Then geometries are in the range 0.98 ≤ ri ≤ 2.95 Å for i = 1, 2, 3, 4 and 40 ≤ αjk ≤ 140° with jk = 12, 13, 14, 23, 24.
CH3F
82653 grid points for CH3F were taken from ref. 12. Nine internal coordinates of CH3F are defined in the same way as CH3Cl: the C–F bond length r0; three C–H bond lengths r1, r2, and r3; three ∠(HiCF) interbond angles β1, β2, and β3; and two dihedral angles τ12 and τ13 between adjacent planes containing HiCF and HjCF (Fig. 1d). This procedure led to geometries in the range 1.005 ≤ r0 ≤ 2.555 Å, 0.705 ≤ ri ≤ 2.695 Å, 45.5 ≤ βi ≤ 169.5° for i = 1, 2, 3 and 40.5 ≤ τjk ≤ 189.5° with jk = 12, 13.
NaOH
15901 grid points for NaOH were taken from ref. 14. Grid points were generated randomly with a dense distribution around the equilibrium region. Three internal coordinates of NaOH are defined as follows: the Na–O bond length rNaO, the O–H bond length rOH, and the interbond angle ∠(NaOH) (Fig. 1e). This procedure led to geometries in the range 1.435 ≤ rNaO ≤ 4.400 Å, 0.690 ≤ rOH ≤ 1.680 Å, and 40 ≤ ∠(NaOH) ≤ 180°.
Theoretical best estimates and constituent terms
For each molecule, we take the TBEs and energy corrections directly from the previous studies by some of us. Here we only briefly introduce how these calculations were performed. We refer the reader to the original publications cited for each molecule for details (see Table 1). TBE is obtained through the sum of many constituent terms: ECBS, ∆ECV, ∆EHO, ∆ESR, and, for most molecules, ∆EDBOC. ECBS means the energy at the complete basis set (CBS) limit. ∆ECV refers to the core-valence (CV) electron correlation energy correction. ∆EHO refers to the energy correction accounted for by the higher-order (HO) coupled cluster terms and ∆ESR shows scalar relativistic (SR) effects. ∆EDBOC means the diagonal Born–Oppenheimer correction and was calculated for CH3Cl, CH4, CH3F, and NaOH, but not for SiH4 due to the little effect of ∆EDBOC on the vibrational energy levels of this molecule.
The constituent terms were not calculated at the same level of theory across all molecules in the data set. The computational details of five TBE constituent terms (ECBS, ∆ECV, ∆EHO, ∆ESR, and ∆EDBOC) for 5 molecules are shown below and summarized in the Table 2.
ECBS
To extrapolate the energy to the CBS limit, the parameterized, two-point formula48 \(\left({E}_{CBS}^{C}=\left({E}_{n+1}-{E}_{n}\right){F}_{n+1}^{C}+{E}_{n}\right)\) was used. In this process, the method CCSD(T)-F12b49 and two basis sets cc-pVTZ-F12 and cc-pVQZ-F1250 were chosen. When performing calculations, the frozen core approximation was adopted and the diagonal fixed amplitude ansatz 3C(FIX)51 with a Slater geminal exponent value48 of β = 1.0 a0−1 were employed. As for the auxiliary basis sets (ABS), the resolution of the identity OptRI52 basis and cc-pV5Z/JKFIT53 and aug-cc-pwCV5Z/MP2FIT54 basis sets for density fitting were used for all 5 molecules. These calculations were carried out with either MOLPRO201255 (CH3Cl, CH4, SiH4, CH3F) or MOLPRO201555,56 (NaOH). As for the coefficients \({F}_{n+1}^{C}\) in this two-point formula, FCCSD-F12b = 1.363388 and F(T) = 1.76947448 were used for all molecules. The extrapolation was not applied to the Hartree–Fock (HF) energy and the HF + CABS (complementary auxiliary basis set) singles correction49 calculated with the cc-pVQZ-F12 basis set was used.
∆ECV
∆ECV was computed at CCSD(T)-F12b/cc-pCVQZ-F1257 for CH3Cl and at CCSD(T)-F12b/cc-pCVTZ-F1257 for the other 4 molecules (CH4, SiH4, CH3F, NaOH). The same ansatz and ABS used for ECBS were employed for calculating ∆ECV but the Slater geminal exponent value was changed: β = 1.5 a0−1 for CH3Cl and β = 1.4 a0−1 for the other 4 molecules. For this term, all-electron calculations were adopted, but with the 1s orbital of Cl frozen for CH3Cl, the 1s orbital of Si frozen for SiH4, and the 1s orbital of Na frozen for NaOH. There is no frozen orbital in all-electron calculations for CH4 and CH3F. As for the software used, see the above ECBS part.
∆EHO
To obtain ∆EHO, the hierarchy of coupled cluster methods was used. ∆EHO = ECCSDT − ECCSD(T) for NaOH, while ∆EHO = ∆ET + ∆E(Q) for other 4 molecules (CH3Cl, CH4, SiH4, CH3F) with ∆ET = ECCSDT − ECCSD(T) for full triples contribution and ∆E(Q) = ECCSDT(Q) − ECCSDT for perturbative quadruples contribution. The frozen core approximation was employed in the calculations. Thus, energy calculations at CCSD(T) and CCSDT were performed for NaOH, while energy calculations at CCSD(T), CCSDT, and CCSDT(Q) levels of theory were performed for other 4 molecules. All of these calculations were carried out through the general coupled cluster approach58,59 implemented in the MRCC code (www.mrcc.hu)60 interfaced to CFOUR (www.cfour.de)61. As for the basis set, aug-cc-pVTZ(+d for Cl)62,63,64,65 & aug-cc-pVDZ(+d for Cl), cc-pVTZ62 & cc-pVDZ, cc-pVTZ(+d for Si)62,63,64,65 & cc-pVDZ(+d for Si), and cc-pVTZ62 & cc-pVDZ for full triples and the perturbative quadruples of CH3Cl, CH4, SiH4, and CH3F. For NaOH, cc-pVTZ(+d for Na)62,66 were used for CCSD(T) and CCSDT calculations.
∆ESR
∆ESR was calculated by using either one-electron mass velocity and Darwin (MVD1) terms from the Breit–Pauli Hamiltonian in first-order perturbation theory67 or the second-order Douglas–Kroll–Hess approach68,69. The former method was used for CH3Cl and the latter method was used for the other 4 molecules (CH4, SiH4, CH3F, and NaOH). All-electron calculations (except for the 1s orbital of Cl) was adopted for CH3Cl while the frozen core approximation was employed for the other 5 molecules. Calculations were performed at CCSD(T)/aug-cc-pCVTZ(+d for Cl)70,71 using the MVD1 approach72 implemented in CFOUR for CH3Cl and at CCSD(T)/cc-pVQZ-DK73 using MOLPRO (software versions the same as mentioned in the above ECBS part) for other 4 molecules.
∆EDBOC
∆EDBOC was computed using the CCSD method74 as implemented in CFOUR. This correction was not included for SiH4. For this term, all-electron calculations were adopted, but with the 1s orbital of Cl frozen for CH3Cl, all electrons correlated for CH4 and CH3F, and the 1s orbital of Na frozen for NaOH. As for the basis set, calculations were performed at aug-cc-pCVTZ (+d for Cl) for CH3Cl, aug-cc-pCVDZ for CH4, aug-cc-pCVDZ for CH3F, and aug-cc-pCVDZ(+d for Na) for NaOH.
Complementary energy and gradient calculations
All complementary ab initio QC energy and gradient calculations for a total of 324592 grid points were performed with two levels of theory: MP275,76/cc-pVTZ62,64,66 and CCSD(T)17,77,78/cc-pVQZ62,64,66 using the CFOUR program package (Versions 1.0 and 2.161; we use CFOUR V2.1 to perform calculations for some grid points in CH3Cl and NaOH that converge to high energy solutions); see Fig. 2 for the CFOUR input options. In the MP2/cc-pVTZ calculations, we use the default option FROZEN_CORE = OFF so that all electrons and all orbitals are correlated. In the CCSD(T)/cc-pVQZ calculations, the option FROZEN_CORE = ON is used for all molecules to allow valence electrons correlation alone. For CH3Cl, CH4, CH3F and NaOH, SCF_CONV = 10, CC_CONV = 10 and LINEQ_CONV = 8 are set to specify the convergence criterion for the HF-SCF, CC amplitude and linear equations and CC_PROG = ECC is set to specify that the CC program we used is ECC. For SiH4, we adopted CFOUR default options SCF_CONV = 7, CC_CONV = 7, LINEQ_CONV = 7 and CC_PROG = VCC. We use GEO_MAXCYC = 1 option to set the maximum number of geometry optimization iterations to one to obtain the gradient information of the current nuclear configuration. From these calculations we also extracted HF energies calculated with the corresponding basis sets cc-pVTZ and cc-pVQZ. In addition, for CH3Cl we include MP2/aug-cc-pVQZ energies calculated using MOLPRO201255 as reported in ref. 20.
Data Records
All data of 5 molecules are stored as a database in JSON format in the file named VIB5.json available for download from https://doi.org/10.6084/m9.figshare.1690328879. The first level of the database contains an item corresponding to each molecule in the order of CH3Cl, CH4, SiH4, CH3F, and NaOH. For each molecule, at the next level of the database, chemical formula, chemical name, number of atoms, list of nuclear charges in the same order as they appear in the items with nuclear coordinates are given at first, then the description of properties available for grid points (property type, levels of theory, units) is provided. Finally, the items for each grid point are given containing nuclear positions in both Cartesian and internal coordinates, and the values of properties (energies and energy gradients at different levels of theory, i.e., TBE, TBE constituent terms, complementary data). The JSON keys of items available for each grid point are listed in Table 3 with the brief description and units. The geometry configuration in Cartesian coordinates and in internal coordinates of each grid point for each molecule can be accessed by the “XYZ” key and the “INT” key, respectively. Definition of internal coordinates used in the database is shown in Fig. 3. The “HF-TZ”, “HF-QC”, “MP2”, “CCSD-T”, and “TBE” keys can be selected separately to obtain the energy of each grid point at HF/cc-pVTZ, HF/cc-pVQZ, MP2/cc-pVTZ, CCSD(T)/cc-pVQZ, and TBE, respectively. This database also provides the energy gradients in Cartesian coordinates and internal coordinates at MP2/cc-pVTZ and CCSD(T)/cc-pVQZ theory levels, which can be accessed through “MP2_grad_xyz”, “MP2_grad_int”, “CCSD-T_grad_xyz”, and “CCSD-T_grad_int” keys. See Table 3 for the summary and the keys of other properties.
Technical Validation
The TBE values and TBE constituent terms were validated by calculating rovibrational spectra and comparing them to experiment in the original peer-reviewed publications cited in the Methods section and Table 1. In brief, rovibrational energy levels were computed by fitting analytical expression for PES and performing with it variational calculations using the nuclear motion program TROVE80. Then the resulting line list of rovibrational energy levels was compared to experimental values (when available) to validate the accuracy of the underlying PES. The new complementary data we have calculated here was validated by making sure that all calculations fully converged. After the database was constructed, we performed additional checks for repeated geometries, which identified grid points with the same geometrical parameters in the CH4 grid points. We removed such duplicates from the database, which leads to a slightly reduced number of points (97217) compared to the numbers reported in the original publications (97271). This pruned grid is used as our final database.
Usage Notes
We provide a Python script extraction_data.py that can be used to pull the data of interest from the VIB5.json (Box 1). It is provided together with the database file from https://doi.org/10.6084/m9.figshare.1690328879.
Code availability
All the data generated at the MP2/cc-pVTZ and the CCSD(T)/cc-pVQZ levels of theory were performed with the CFOUR software package. TBE and other data were obtained using various software packages (MOLPRO, CFOUR, MRCC) as described in the Methods section.
References
Lewars, E. Computational Chemistry: Introduction to the Theory and Applications of Molecular and Quantum Mechanics 2nd edn (Springer Science+Business Media B.V., 2011).
Upadhyay, S. K. Chemical Kinetics and Reaction Dynamics (Anamaya Publishers, 2006).
Searles, D. J. & von Nagy-Felsobuki, E. I. In Ab Initio Variational Calculations of Molecular Vibrational-Rotational Spectra (Springer-Verlag Berlin Heidelberg, 1993).
Császár, A. G., Czakó, G., Furtenbacher, T. & Mátyus, E. In Annual Reports in Computational Chemistry 3 (Elsevier, 2007).
Bytautas, L., Bowman, J. M., Huang, X. & Varandas, A. J. C. Accurate potential energy surfaces and beyond: chemical reactivity, binding, long-range interactions, and spectroscopy. Adv. Phys. Chem. 2012, 679869 (2012).
Tennyson, J. & Yurchenko, S. N. ExoMol: molecular line lists for exoplanet and other atmospheres. Mon. Not. R. Astron. Soc. 425, 21–33 (2012).
Owens, A., Yurchenko, S. N., Yachmenev, A., Tennyson, J. & Thiel, W. Accurate ab initio vibrational energies of methyl chloride. J. Chem. Phys. 142 (2015).
Owens, A., Yurchenko, S. N., Yachmenev, A. & Thiel, W. A global potential energy surface and dipole moment surface for silane. J. Chem. Phys. 143 (2015).
Owens, A., Yurchenko, S. N., Yachmenev, A., Tennyson, J. & Thiel, W. A global ab initio dipole moment surface for methyl chloride. J. Quant. Spectrosc. Radiat. Transfer 184, 100–110 (2016).
Owens, A., Yurchenko, S. N., Yachmenev, A., Tennyson, J. & Thiel, W. A highly accurate ab initio potential energy surface for methane. J. Chem. Phys. 145 (2016).
Owens, A. & Yurchenko, S. N. Theoretical rotation-vibration spectroscopy of cis- and trans-diphosphene (P2H2) and the deuterated species P2HD. J. Chem. Phys. 150 (2019).
Owens, A., Yachmenev, A., Kupper, J., Yurchenko, S. N. & Thiel, W. The rotation-vibration spectrum of methyl fluoride from first principles. Phys. Chem. Chem. Phys. 21, 3496–3505 (2019).
Owens, A., Conway, E. K., Tennyson, J. & Yurchenko, S. N. ExoMol line lists – XXXVIII. High-temperature molecular line list of silicon dioxide (SiO2). Mon. Not. R. Astron. Soc. 495, 1927–1933 (2020).
Owens, A., Tennyson, J. & Yurchenko, S. N. ExoMol line lists – XLI. High-temperature molecular line lists for the alkali metal hydroxides KOH and NaOH. Mon. Not. R. Astron. Soc. 502, 1128–1135 (2021).
Tennyson, J. et al. ExoMol molecular line lists XXX: a complete high-accuracy line list for water. Mon. Not. R. Astron. Soc. 480, 2597–2608 (2018).
Yurchenko, S. N. & Tennyson, J. ExoMol line lists - IV. The rotation-vibration spectrum of methane up to 1500 K. Mon. Not. R. Astron. Soc. 440, 1649–1661 (2014).
Raghavachari, K., Trucks, G. W., Pople, J. A. & Head-Gordon, M. A fifth-order perturbation comparison of electron correlation theories. Chem. Phys. Lett. 157, 479–483 (1989).
Helgaker, T., Gauss, J., Jørgensen, P. & Olsen, J. The prediction of molecular equilibrium structures by the standard electronic wave functions. J. Chem. Phys. 106, 6430–6440 (1997).
Bak, K. L. et al. The accurate determination of molecular equilibrium structures. J. Chem. Phys. 114, 6548–6556 (2001).
Dral, P. O., Owens, A., Dral, A. & Csányi, G. Hierarchical machine learning of potential energy surfaces. J. Chem. Phys. 152, 204110 (2020).
Behler, J. Neural network potential-energy surfaces in chemistry: a tool for large-scale simulations. Phys. Chem. Chem. Phys. 13, 17930–17955 (2011).
Manzhos, S., Dawes, R. & Carrington, T. Jr. Neural network-based approaches for building high dimensional and quantum dynamics-friendly potential energy surfaces. Int. J. Quantum Chem. 115, 1012–1020 (2015).
Unke, O. T. et al. Machine learning force fields. Chem. Rev. 121, 10142–10186 (2021).
Manzhos, S. & Carrington, T. Jr. Neural network potential energy surfaces for small molecules and reactions. Chem. Rev. 121, 10187–10217 (2020).
Mueller, T., Hernandez, A. & Wang, C. Machine learning for interatomic potential models. J. Chem. Phys. 152, 050902 (2020).
Dral, P. O. In Advances in Quantum Chemistry: Chemical Physics and Quantum Chemistry 81 (Academic Press, 2020).
Dral, P. O. Quantum chemistry in the age of machine learning. J. Phys. Chem. Lett. 11, 2336–2347 (2020).
Schmitz, G., Artiukhin, D. G. & Christiansen, O. Approximate high mode coupling potentials using Gaussian process regression and adaptive density guided sampling. J. Chem. Phys. 150, 131102 (2019).
Gastegger, M., Behler, J. & Marquetand, P. Machine learning molecular dynamics for the simulation of infrared spectra. Chem. Sci. 8, 6924–6935 (2017).
Kamath, A., Vargas-Hernández, R. A., Krems, R. V., Carrington, T. Jr. & Manzhos, S. Neural networks vs Gaussian process regression for representing potential energy surfaces: A comparative study of fit quality and vibrational spectrum accuracy. J. Chem. Phys. 148, 241702 (2018).
Manzhos, S. Machine learning for the solution of the Schrödinger equation. Mach. Learn.: Sci. Technol. 1, 013002 (2020).
Manzhos, S., Yamashita, K. & Carrington, T. Jr. Using a neural network based method to solve the vibrational Schrodinger equation for H2O. Chem. Phys. Lett. 474, 217–221 (2009).
Manzhos, S., Wang, X. G., Dawes, R. & Carrington, T. Jr. A nested molecule-independent neural network approach for high-quality potential fits. J Phys Chem A 110, 5295–5304 (2006).
Manzhos, S. & Carrington, T. Jr. A random-sampling high dimensional model representation neural network for building potential energy surfaces. J. Chem. Phys. 125, 084109 (2006).
Manzhos, S. & Carrington, T. Jr. Using neural networks, optimized coordinates, and high-dimensional model representations to obtain a vinyl bromide potential surface. J. Chem. Phys. 129, 224104 (2008).
Dral, P. O., Owens, A., Yurchenko, S. N. & Thiel, W. Structure-based sampling and self-correcting machine learning for accurate calculations of potential energy surfaces and vibrational levels. J. Chem. Phys. 146, 244108 (2017).
Ramakrishnan, R., Dral, P. O., Rupp, M. & von Lilienfeld, O. A. Big data meets quantum chemistry approximations: the Δ-machine learning approach. J. Chem. Theory Comput. 11, 2087–2096 (2015).
Chmiela, S. et al. Machine learning of accurate energy-conserving molecular force fields. Sci. Adv. 3, e1603015 (2017).
Chmiela, S., Sauceda, H. E., Müller, K.-R. & Tkatchenko, A. Towards exact molecular dynamics simulations with machine-learned force fields. Nat. Commun. 9, 3887 (2018).
Rupp, M., Tkatchenko, A., Müller, K.-R. & von Lilienfeld, O. A. Fast and accurate modeling of molecular atomization energies with machine learning. Phys. Rev. Lett. 108, 058301 (2012).
Montavon, G. et al. Machine learning of molecular electronic properties in chemical compound space. New J. Phys. 15, 095003 (2013).
Ramakrishnan, R., Dral, P. O., Rupp, M. & von Lilienfeld, O. A. Quantum chemistry structures and properties of 134 kilo molecules. Sci. Data 1, 140022 (2014).
Kim, H., Park, J. Y. & Choi, S. Energy refinement and analysis of structures in the QM9 database via a highly accurate quantum chemical method. Sci. Data 6, 109 (2019).
Smith, J. S. et al. The ANI-1ccx and ANI-1x data sets, coupled-cluster and density functional theory properties for molecules. Sci. Data 7, 134 (2020).
Smith, J. S., Isayev, O. & Roitberg, A. E. ANI-1, a data set of 20 million calculated off-equilibrium conformations for organic molecules. Sci. Data 4, 170193 (2017).
Hoja, J. et al. QM7-X, a comprehensive dataset of quantum-mechanical properties spanning the chemical space of small organic molecules. Sci. Data 8, 43 (2021).
Qu, C., Houston, P. L., Conte, R., Nandi, A. & Bowman, J. M. MULTIMODE calculations of vibrational spectroscopy and 1d interconformer tunneling dynamics in Glycine using a full-dimensional potential energy surface. J Phys Chem A 125, 5346–5354 (2021).
Hill, J. G., Peterson, K. A., Knizia, G. & Werner, H.-J. Extrapolating MP2 and CCSD explicitly correlated correlation energies to the complete basis set limit with first and second row correlation consistent basis sets. J. Chem. Phys. 131, 194105 (2009).
Adler, T. B., Knizia, G. & Werner, H.-J. A simple and efficient CCSD(T)-F12 approximation. J. Chem. Phys. 127, 221106 (2007).
Peterson, K. A., Adler, T. B. & Werner, H.-J. Systematically convergent basis sets for explicitly correlated wavefunctions: the atoms H, He, B–Ne, and Al–Ar. J. Chem. Phys. 128, 084102 (2008).
Ten-no, S. Initiation of explicitly correlated Slater-type geminal theory. Chem. Phys. Lett. 398, 56–61 (2004).
Yousaf, K. E. & Peterson, K. A. Optimized auxiliary basis sets for explicitly correlated methods. J. Chem. Phys. 129, 184108 (2008).
Weigend, F. A fully direct RI-HF algorithm: implementation, optimised auxiliary basis sets, demonstration of accuracy and efficiency. Phys. Chem. Chem. Phys. 4, 4285–4291 (2002).
Hättig, C. Optimization of auxiliary basis sets for RI-MP2 and RI-CC2 calculations: Core-valence and quintuple-ζ basis sets for H to Ar and QZVPP basis sets for Li to Kr. Phys. Chem. Chem. Phys. 7, 59–66 (2005).
Werner, H.-J., Knowles, P. J., Knizia, G., Manby, F. R. & Schütz, M. Molpro: a general-purpose quantum chemistry program package. WIREs Comput. Mol. Sci. 2, 242–253 (2012).
Werner, H.-J. et al. The Molpro quantum chemistry package. J. Chem. Phys. 152, 144107 (2020).
Hill, J. G., Mazumder, S. & Peterson, K. A. Correlation consistent basis sets for molecular core-valence effects with explicitly correlated wave functions: the atoms B–Ne and Al–Ar. J. Chem. Phys. 132, 054108 (2010).
Kállay, M. & Gauss, J. Approximate treatment of higher excitations in coupled-cluster theory. J. Chem. Phys. 123, 214105 (2005).
Kállay, M. & Gauss, J. Approximate treatment of higher excitations in coupled-cluster theory. II. Extension to general single-determinant reference functions and improved approaches for the canonical Hartree–Fock case. J. Chem. Phys. 129, 144101 (2008).
MRCC, A string-based quantum chemical program suite written by M. Kállay; see also M. Kállay & P. R. Surján, J. Chem. Phys. 115, 2945 (2001).
Stanton, J. F. et al. CFOUR, a quantum chemical program package http://www.cfour.de (2010).
Dunning, T. H. Jr. Gaussian basis sets for use in correlated molecular calculations. I. The atoms boron through neon and hydrogen. J. Chem. Phys. 90, 1007–1023 (1989).
Kendall, R. A., Dunning, T. H. Jr. & Harrison, R. J. Electron affinities of the first‐row atoms revisited. Systematic basis sets and wave functions. J. Chem. Phys. 96, 6796–6806 (1992).
Woon, D. E. & Dunning, T. H. Jr. Gaussian basis sets for use in correlated molecular calculations. III. The atoms aluminum through argon. J. Chem. Phys. 98, 1358–1371 (1993).
Dunning, T. H. Jr., Peterson, K. A. & Wilson, A. K. Gaussian basis sets for use in correlated molecular calculations. X. The atoms aluminum through argon revisited. J. Chem. Phys. 114, 9244–9253 (2001).
Prascher, B. P., Woon, D. E., Peterson, K. A., Dunning, T. H. Jr. & Wilson, A. K. Gaussian basis sets for use in correlated molecular calculations. VII. Valence, core-valence, and scalar relativistic basis sets for Li, Be, Na, and Mg. Theor. Chem. Acc. 128, 69–82 (2011).
Cowan, R. D. & Griffin, D. C. Approximate relativistic corrections to atomic radial wave functions*. J. Opt. Soc. Am. 66, 1010–1014 (1976).
Douglas, M. & Kroll, N. M. Quantum electrodynamical corrections to the fine structure of helium. Ann. Phys. 82, 89–155 (1974).
Hess, B. A. Relativistic electronic-structure calculations employing a two-component no-pair formalism with external-field projection operators. Phys. Rev. A 33, 3742–3748 (1986).
Woon, D. E. & Dunning, T. H. Jr. Gaussian basis sets for use in correlated molecular calculations. V. Core-valence basis sets for boron through neon. J. Chem. Phys. 103, 4572–4585 (1995).
Peterson, K. A. & Dunning, T. H. Jr. Accurate correlation consistent basis sets for molecular core-valence correlation effects: The second row atoms Al–Ar, and the first row atoms B–Ne revisited. J. Chem. Phys. 117, 10548–10560 (2002).
Klopper, W. Simple recipe for implementing computation of first-order relativistic corrections to electron correlation energies in framework of direct perturbation theory. J. Comput. Chem. 18, 20–27 (1997).
Jong, W. A. D., Harrison, R. J. & Dixon, D. A. Parallel Douglas–Kroll energy and gradients in NWChem: estimating scalar relativistic effects using Douglas–Kroll contracted basis sets. J. Chem. Phys. 114, 48–53 (2001).
Gauss, J., Tajti, A., Kállay, M., Stanton, J. F. & Szalay, P. G. Analytic calculation of the diagonal Born-Oppenheimer correction within configuration-interaction and coupled-cluster theory. J. Chem. Phys. 125, 144111 (2006).
Bartlett, R. J. Many-body perturbation theory and coupled cluster theory for electron correlation in molecules. Annu. Rev. Phys. Chem. 32, 359–401 (1981).
Cremer, D. in Encyclopedia of Computational Chemistry (John Wiley and Sons, Ltd., 1998).
Bartlett, R. J., Watts, J. D., Kucharski, S. A. & Noga, J. Non-iterative fifth-order triple and quadruple excitation energy corrections in correlated methods. Chem. Phys. Lett. 165, 513–522 (1990).
Stanton, J. F. Why CCSD(T) works: a different perspective. Chem. Phys. Lett. 281, 130–134 (1997).
Zhang, L., Zhang, S., Owens, A., Yurchenko, S. N. & Dral, P. O. VIB5 database with accurate ab initio quantum chemical molecular potential energy surfaces. figshare https://doi.org/10.6084/m9.figshare.16903288 (2021).
Yurchenko, S. N., Thiel, W. & Jensen, P. Theoretical ROVibrational Energies (TROVE): a robust numerical approach to the calculation of rovibrational energies for polyatomic molecules. J. Mol. Spectrosc. 245, 126–140 (2007).
Acknowledgements
POD acknowledges funding by the National Natural Science Foundation of China (No. 22003051), the Fundamental Research Funds for the Central Universities (No. 20720210092), and via the Lab project of the State Key Laboratory of Physical Chemistry of Solid Surfaces. SNY and AO thank STFC under grant ST/R000476/1. Their calculations made extensive use of the STFC DiRAC HPC facility supported by BIS National E-infrastructure capital grant ST/J005673/1 and STFC grants ST/H008586/1 and ST/K00333X/1.
Author information
Authors and Affiliations
Contributions
L.Z. has written the original draft of the manuscript. S.Z. performed the complementary calculations, validation, created scripts and database files with assistance of L.Z. and P.O.D. A.O. provided raw data with grids, theoretical best estimates and energy correction terms as well as supporting scripts. A.O., S.N.Y. and P.O.D. supervised the project. S.N.Y. and P.O.D. acquired funding for the project. All authors provided critical feedback and helped shape the database collection, calculations, analysis, and manuscript. P.O.D. conceived the idea of creating a database.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
The Creative Commons Public Domain Dedication waiver http://creativecommons.org/publicdomain/zero/1.0/ applies to the metadata files associated with this article.
About this article
Cite this article
Zhang, L., Zhang, S., Owens, A. et al. VIB5 database with accurate ab initio quantum chemical molecular potential energy surfaces. Sci Data 9, 84 (2022). https://doi.org/10.1038/s41597-022-01185-w
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41597-022-01185-w
- Springer Nature Limited