Autonomous molecular design by Monte-Carlo tree search and rapid evaluations using molecular dynamics simulations

Kajita, Seiji; Kinjo, Tomoyuki; Nishi, Tomoki

doi:10.1038/s42005-020-0338-y

Autonomous molecular design by Monte-Carlo tree search and rapid evaluations using molecular dynamics simulations

Article
Open access
Published: 07 May 2020

Volume 3, article number 77, (2020)
Cite this article

Download PDF

You have full access to this open access article

Communications Physics

Autonomous molecular design by Monte-Carlo tree search and rapid evaluations using molecular dynamics simulations

Download PDF

Seiji Kajita¹,
Tomoyuki Kinjo¹ &
Tomoki Nishi¹

6956 Accesses
32 Citations
13 Altmetric
Explore all metrics

Abstract

Functional materials, especially those that largely differ from known materials, are not easily discoverable because both human experts and supervised machine learning need prior knowledge and datasets. An autonomous system can evaluate various properties a priori, and thereby explore unknown extrapolation spaces in high-throughput simulations. However, high-throughput evaluations of molecular dynamics simulations are unrealistically demanding. Here, we show an autonomous search system for organic molecules implemented by a reinforcement learning algorithm, and apply it to molecular dynamics simulations of viscosity. The evaluation is dramatically accelerated (by three orders of magnitude) using a femto-second stress-tensor correlation, which underlies the glass-transition model. We experimentally examine one of 55,000 lubricant oil molecules found by the system. This study indicates that merging simulations and physical models can open a path for simulation-driven approaches to materials informatics.

MolFinder: an evolutionary algorithm for the global optimization of molecular properties and the extensive exploration of chemical space using SMILES

Article Open access 18 March 2021

Learning in continuous action space for developing high dimensional potential energy models

Article Open access 18 January 2022

Adaptive simulations, towards interactive protein-ligand modeling

Article Open access 16 August 2017

Introduction

The development of materials conventionally depends on human sense and trial-and-error synthesis. Such laborious developments are expected to be accelerated by materials informatics (MI)^1,2, which is commonly implemented by virtual screening (see Fig. 1a). After training on existing data, a machine-learning model predicts the target properties of materials based on the features of known materials^{3,4,5,6,7,8,9}. Rapid inference by machine learning extracts the potential candidates from hundreds of thousands of compounds in a material database. This subset of the candidates is then examined experimentally. However, the prediction ability is effective only when the target materials are within an interpolation space coordinated by a supervised dataset. To discover truly new materials, we should explore outside the scope of known materials.

**Fig. 1: Material search schemes in materials informatics.**

An autonomous search scheme beyond the interpolation space is called a closed-loop search¹. The system configuration is illustrated in Fig. 1b. Here, a machine-learning search model accompanies robotics or simulation software. The search model receives feedback from the evaluated properties, and decides the material proposals in the next loop. This search-evaluation loop iterates until the material structure is optimized with respect to a target property. Search algorithms for this purpose are numerous and varied^{10,11,12,13,14}. An example is the artificial neural network in the chemical language SMILES, which generates a continuous latent space of molecules, and seeks the high-scoring molecules by a gradient-based optimization procedure^10,11. Elsewhere, prospective molecular structures were generated by a Bayesian approach using forward and backward predictions in the structure–property relationship¹². To design synthetic strategies and uncover new organic materials, Yang et al. and Segler et al. used a reinforcement learning algorithm called Monte Carlo tree search (MCTS)^13,14,15,16. This algorithm was used in the AlphaGo AI system for the Chinese board game “Go”¹⁷. The MCTS algorithm efficiently searches a tree graph whose nodes represent molecular fragments in SMILES. Its aim is to maximize the prospective reward of molecules^13,14.

However, no matter what search algorithms are used, a long evaluation time is a major bottleneck in the loop. Ab initio calculations provide important material properties such as formation energies and band gaps. These static properties can be obtained at reasonable computation cost only by advanced algorithms and multicore architectures^18,19,20,21. Transport-related properties, such as ion conductivity and viscosity, must be assessed in molecular dynamics (MD) calculations, which simulate the atomic dynamics of molecules. Although the evaluated transport properties are based on statistical physics, MD calculations cannot be a high-throughput evaluator²², because reliable ensemble averaging requires a huge number of MD steps^23,24. Another important consideration is accuracy of the empirical force fields. This topic has been actively studied in recent years, with developments of machine-learning potentials trained on appropriate ab initio reference data^{25,26,27,28,29}.

This paper presents an autonomous molecular-design system based on MCTS and MD simulations. As an example of transport properties, we focus on viscosity because viscosity is related to tribological properties^30,31 and its reciprocal value represents a diffusion coefficient. These properties are fundamental in mechanical and chemical engineering, which use oil and electrolytes on a daily basis. Our system performs ultra-fast MD evaluations that alleviate the time-demanding bottleneck of autonomous systems.

We first explain the conventional and proposed fast viscosity evaluations by MD simulations, define the target property, and explain the rules of oil-molecule generation in MCTS. After the closed-loop search, the MI-designed oil molecule is synthesized and its viscosity performance is experimentally examined. Finally, we inductively analyze the obtained large data to guide the development of lubricants. The technical details are provided in the Methods section and Supplementary Notes.

Results

Conventional MD evaluation

One conventional schemes for obtaining transport properties is the Green–Kubo (GK) formalism^32,33. Non-diagonal elements of a stress tensor P_ij is observed in a MD simulation of liquid molecules. The viscosity η is obtained by dynamical fluctuations of P_ij as

$$\eta = \, \mathop {{{\mathrm{lim}}}}\limits_{t \to \infty } {\mathrm{\Phi }}\left( t \right) \equiv \mathop {{{\mathrm{lim}}}}\limits_{t \to \infty } \left\langle {{\mathrm{\Phi }}\left( {t,t_0} \right)} \right\rangle \\ {\mathrm{\Phi }}\left( {t,t_0} \right) \equiv \, \int_{0}^{t} \frac{1}{{k_{\mathrm{B}}TV}}P_{ij}\left( {t^{\prime} + t_0} \right)P_{ij}\left( {t_0} \right){\mathrm{d}}t^{\prime},$$

(1)

where k_B, T, and V denote Boltzman’s constant, temperature, and volume of the simulation cell, respectively. The operator 〈〉 represents ensemble averaging in the MD calculation (see Fig. 2a), which samples the correlation Φ(t, t₀) with respect to the time origin t₀.

**Fig. 2: Viscosity evaluation in the Green–Kubo scheme.**

The bottleneck in the conventional MD-based evaluation is easily recognized from Φ(t, t₀). Figure 2b shows the density of the sampled Φ(t, t₀) entries in MD simulations of an oil molecule. After a long t, the variations among the samplings of the correlation are enlarged, meaning that the long-future state is loosely associated with its present state. Figure 2c shows the vice versa situation, in which the correlations at short times shows smaller variations. As evidenced in Eq. (1), viscosity is a long-time correlation, requiring a huge number of MD steps to obtain sufficiently many t₀ samplings for accurate ensemble averaging. Based on this insight, we suggest that if the viscosity can be predicted through the short-time correlation, the number of sampling MD steps can be reduced in the viscosity evaluation. Such a strategy is sought in this paper.

Fast evaluation

To realize the above idea, we import an elastic concept of liquid viscosity called the shoving model^34,35,36.

This model describes liquid from an atomic viewpoint as shown in Fig. 3. In the liquid state, a component molecule is surrounded by other liquid molecules in a caged space. Driven by thermal fluctuations, each molecule repeatedly collides with its neighbors. After a certain relaxation time, a molecule escapes from the cage by pushing its neighbors away. Through iterations of this local relaxation, all molecules are eventually rearranged and the liquid flows macroscopically. This phenomenological viewpoint suggests that the structural relaxation related to viscosity can be well represented by the energy required to push the surrounding molecules. The energy barrier is then proportional to the shear modulus of the liquid.

**Fig. 3: Schematic of a flow event in the shoving model.**

Combined with transition-state theory³⁷, the shoving model provides an Arrhenius-type equation of viscosity as

$${\mathrm{log}}\eta = \, \alpha \frac{{G_\infty }}{T} + \beta \frac{{G_\infty ^2}}{{T^2}} + \gamma ,\\ G_\infty = \, \mathop {{{\mathrm{lim}}}}\limits_{t \to 0} \frac{d}{{{d}t}}{\mathrm{\Phi }}\left( t \right) = \left\langle {\frac{1}{{k_{\mathrm{B}}TV}}P_{ij}^2\left( {t_0} \right)} \right\rangle,$$

(2)

where α, β, and γ are empirical parameters. Equation (2) demonstrates that viscosity is correlated with the stiffness of the liquid, which is measured under a given instantaneous force. Puosi and Leporini³⁵ and Dyre and Wang³⁶ improved the accuracy of viscosity calculations by a revised formula for the shear modulus $G_\infty ^ \ast \propto {\mathrm{\Phi }}\left( {\delta t} \right)$, where δt is a short-time period of the order of molecular vibrations. In this study, we use an averaged value of Φ as follows:

$$G_\infty ^ \ast \propto \overline {\mathrm{\Phi }} = \frac{{{\int}_0^{\delta t} {{\mathrm{\Phi }}\left( t \right){\mathrm{d}}t} }}{{\delta t}},$$

(3)

and δt is set to 5.0 fs.

The shoving model was originally developed to clarify the atomic mechanism of glass transition. Here, we employ it to accelerate the MD evaluation of viscosity, as described below. Note that as Eq. (2) uses the short-time correlation, we can estimate the viscosity by $\overline {\mathrm{\Phi }}$ instead of the conventional evaluation in Eq. (1).

To improve the accuracy of our evaluation, we modify the original Arrhenius equation in Eq. (2). Van Velzen’s model is a well-known modification of the Arrhenius form. Commonly used in lubrication engineering, this model corrects the viscosity–temperature relation with respect to the boiling point of the liquid^38,39. Combining the van Velzen model with Eqs. (2) and (3), we obtain

$${\mathrm{log}}\,\eta = A\overline {\mathrm{\Phi }} \left( {\frac{1}{T} - \frac{1}{{T_{\mathrm{b}}}}} \right) + B{\bar{\mathrm{\Phi }}}^2\left( {\frac{1}{T} - \frac{1}{{T_{\mathrm{b}}}}} \right)^2 + \, \, {\mathrm{log}}\,\eta _{\mathrm{b}},$$

(4)

where the boiling point T_b of the liquid is immediately estimated from a SMILES string via the Joback method⁴⁰ implemented in the python library thermo. Fitting Eq. (4) to the experimental viscosities of reference organic molecules (see Methods section), the parameters A, B, and η_b were determined as 7.577 × 10³, 1.607 × 10⁷, and 0.217 cP, respectively. Interestingly, the viscosity at the boiling temperature η_b is known to be constant value 0.22 cP for typical organic molecules that contain larger than 20 carbons⁴¹. This value is consistent with the fitted value. Note that the accuracy of the proposed approach may degrade in small-molecule cases.

Target property: viscosity index

As a target property for optimization, viscosity alone is unsuitably trivial. Viscosity typically increases with number of constituent atoms of a lubricant molecule, because longer molecules become more entangled in the liquid state than short molecules³⁹. Instead, we target the viscosity index (VI), which indicates the temperature sensitivity of viscosity⁴². Machinery equipment requires high-VI oil for stable mechanical operations in various environments. We use the most famous VI definition, namely the quantity VI_ASTM given in the American Society for Testing and Materials (ASTM) D 2270 standard^42,43. The VI_ASTM is calculated as

$${\mathrm{VI}}_{\mathrm{{ASTM}}} = 100 \times \frac{{L - \eta _k^{40^ \circ {\mathrm{C}}}}}{{L - H}},$$

(5)

where $\eta _k^T$ is the kinematic viscosity at temperature T. In this definition, it is obtained from the kinematic viscosities L and H with VI_ASTM = 0 and 100, respectively, at 40 °C, and having the same kinematic viscosity as the oil of interest at 100 °C. The reference viscosities can be obtained from a viscosity conversion table^42,44. We used the python library thermo to calculate VI_ASTM.

As a complementary measure of VI performance, we also computed the dynamic viscosity index (DVI)^42,45, because the VI_ASTM is unsuitable for low-viscosity oils⁴⁴. For example, if $\eta _k^{40^ \circ {\mathrm{C}}}$ ≤ 2.0 mm²/s, VI_ASTM is undefined. Moreover, the VI_ASTM underestimates the viscosity susceptivity of low-viscosity oils in the range of $\eta _k^{40^ \circ {\mathrm{C}}}$ ≤ 5.0 mm²/s⁴⁴. To resolve these problems, the DVI was proposed as

$${\mathrm{DVI}} = \, 220 - 7 \times 10^S\\ S = \, - {\mathrm{log}}_{10}\left( {\frac{{{\mathrm{log}}_{10}\left( {\eta ^{40^ \circ {\mathrm{C}}}} \right) + 1.2}}{{{\mathrm{log}}_{10}\left( {\eta ^{100^ \circ {\mathrm{C}}}} \right) + 1.2}}} \right)/{\mathrm{log}}_{10}\left( {\frac{{135 + 40}}{{135 + 100}}} \right),$$

(6)

where η denotes the viscosity. The kinematic viscosity and viscosity are related through η_k = η/ρ, where ρ is the density of the liquid.

An important difference between VI_ASTM and DVI is that the former observes the η_k variation, whereas the latter observes the η variation. Tribological properties such as oil film thickness and viscosity resistance at the sliding interface depend more on viscosity than the kinematic viscosity. Therefore, although the VI_ASTM is conventionally used, the DVI is also a good index of the temperature–viscosity sensitivity. These two indices are compared in the Supplementary Note 1.

Molecular fragments and rules of the Monte Carlo tree search

The remaining component of the autonomous design system is a search algorithm that generates molecular structures with the optimal target properties. The search algorithm should comprise both an efficient search strategy in regarding to inherent molecular representations and generation rules to meet material requirements. This study employs the MCTS as the search algorithm, which describes a molecule by a graph structure. The graph nodes describe the user-defined molecular fragments in SMILES^13,14. Oil molecules synthesized and purified from crude oil generally have hydrocarbon chain structures with several branches. To represent such structures, we defined different types of molecular fragments for the main and side chains of the molecules as follows:

In the main chain: CC, OC, C=C, (, $, c1ccccc1$, C1CCCCC1$, =O$
In the side chain: CC, OC, C=C, (,), c1ccccc1), C1CCCCC1), =O)

where $ indicates the end of the molecule. These side-chain fragments can be joined only after a “(” symbol in the main chain. The c1ccccc1, C1CCCCC1, and =O fragments are terminal groups. The initial molecular fragment, called a root node, is C.

We then restricted the generated molecules to lubricants. Unbranched molecules are inappropriate because they have high freezing points, so are prone to waxing at the operating temperature. To generate molecules with one or more branches, we rejected the no-branch molecules during the rollout operation of MCTS. The branched molecules were then restricted to the allowable viscosity range. An excessively high viscosity increases the fuel consumption, whereas a very low viscosity leads to scuffing. The preferred kinematic viscosity of the base oil of automobile lubricants ranges from 3.0 to 6.0 mm²/s. As viscosity is proportional to the number of constituent atoms³⁹, a typical oil molecule should contain 20–40 carbons⁴⁶. To accord with the MCTS rules, we set an ending rule by which fragments with $ can be used only when the total number of C and O is 20 or higher. When this number is 30 or higher, fragments with $ are used mandatorily.

In summary, we define three search rules: define the molecular fragments, prohibit the unbranched molecules, and impose the ending condition. The hyperparameters of the MCTS algorithm are given in the Methods section.

Evaluations of viscosity and viscosity index

The closed-loop feasibility is mainly determined by the acceleration extent of the MD evaluations. As a baseline method, we employed the conventional Einstein–Helfand (EH) scheme³³, which evaluates the viscosity by the mean-squared displacement of P_xy. We emphasize that this baseline was selected for a convenient comparison, because the EH scheme is defined to avoid erroneous negative viscosity, unlike the GK scheme. The two schemes are compared in Supplementary Note 2.

Figure 4a compares the viscosities evaluated by the fast evaluation and EH methods with an identical dataset of MD trajectories. The computational details are provided in the Methods section. Under the same sampling conditions, the root-mean-squared error (RMSE) was 3.8 cP in the proposed method, greatly reduced from 19.8 cP in the EH method. A distinctive advantage can be found in the standard deviation (STD) of each MD trajectory. In the present method, the STD is only 3.7% those of the EH method, so small that the error bars are hidden behind the points in Fig. 4a. We roughly estimated that to attain the same statistical accuracy as the EH method, the fast evaluation reduced the number of samplings in the MD steps to approximately (3.7/100)² ∼ 1/1000. The fast evaluation is examined in detail in Supplementary Note 3.

**Fig. 4: Plots of calculated versus experimental viscosities and viscosity index.**

Figure 4b compares the VI_ASTM values of the EH and proposed methods. Because the VI_ASTM is very sensitive to slight deviations in kinematic viscosity, the errors in the EH method were unacceptably large for the closed-loop system. In contrast, the VI_ASTM values obtained by the proposed method were sufficiently accurate and efficiently obtained.

Autonomous search

Figure 5a shows the protocol of closed-loop searching. The MCTS proposes the next molecule encoded in SMILES, and then the fast evaluation by MD simulations provides its VI_ASTM as feedback. The search was performed ten times with 5500 evaluation loops per search, giving 54,318 evaluated molecules. Figure 5b shows VI_ASTM and kinematic viscosity histograms of the molecules. Most of the viscosities ranged from 3.0 to 6.0 mm²/s as planned, and several high-VI_ASTM molecules were observed. As indicated by the top-ten molecules in Fig. 5c, the generated structures were very particular, unlikely to be synthesized by one or two chemical processing steps. Therefore, we investigated the candidate list for higher VI_ASTM molecules admitting an easy synthesis. For the easy synthesis requirement, we sought suggestions from organic chemists in our institute. Consequently, we took the 83rd-ranked molecule shown in Fig. 5d as a motif, and modified it to an easily synthesized form in Fig. 5e. The modified molecule was prepared by the etherification of farnesyl bromide with 1,5-diphenylpentan-3-ol, which is obtained by the Grignard reaction of 3-phyenylpropanal and 2-phenylethlmagnesium bromide⁴⁷. As comparison molecules, we used two major high-VI base oils refined from crude oil by hydrocracking and chemical synthetic: YUBASE-4 and SpectraSyn-4 made by SK lubricants and Exxon Mobil, respectively. The viscosities of these oils were experimentally determined by a Stabinger viscometer SVM^TM in Anton Paar Ltd.

**Fig. 5: Evaluations and structures of the molecules obtained by the molecular design system.**

Table 1 summarizes the properties obtained in the investigation. The calculated DVIs, kinematic viscosities, viscosities, and densities deviated within 20% of the experimental values. The calculated VI_ASTM was overestimated because it largely responds to even slight changes in kinematic viscosity (see Supplementary Note 1). The experimental VI_ASTM of the present molecule was 109, smaller than those of the high-VI commercial oils, but still classifiable between the high-VI group (VI_ASTM = 80–110) and the very high-VI group (VI_ASTM > 110) according to Neale ⁴⁸. In fact, when measured by another DVI metric, the obtained oil was slightly superior to the market oils.

Table 1 Comparisons of the present molecule and commercial high viscosity-index oils.

Full size table

Typically, the main components of high-VI oils are high-ration paraffin structures. For instance, poly-alpha oleffine shown in Fig. 5f is a major component of SpectraSyn. Interestingly, our molecule in Fig. 5e is quite unlike the conventional high-VI molecules. This result indicates that it extends the interpolated lubricant space. Nevertheless, engine oils in applications must not only satisfy the viscosity-index requirements but must also deliver high oxidative resistance and low freezing point at minimal production cost. These additional requirements are not considered in the present test search.

Discussion

As is often mentioned, material data are not big data, and the existing datasets of transport properties are limited. Nevertheless, experts try to deduce a design guideline from such a scarce dataset to develop better materials. For example, after observing synthesized molecules by properly controlled hydrocracking and 13C nuclear magnetic resonance (NMR), researchers deduced that high-VI molecules likely consist of long chains with few branches and rings^46,49,50,51. Owing to the time-intensiveness of the experiments, the hydrocracking and NMR data constituted only several tens of entries. To our knowledge, the present dataset of 55,000 entries is the largest acquired dataset of viscosity properties. In a simple data analysis, we now extract the features from this dataset that are relevant to high-VI molecules, and compare our insights with those reported by the experts.

Figure 6a and b show the correlation heat map and the main structure–property correlations (with values exceeding 0.4), respectively. For the correlation analysis, we selected the VI_ASTM, kinematic viscosity η_k, density ρ, number of constituent atoms N, number of branches N_branch, and the ring ratio R_ring. The positive correlation between the kinematic viscosity and N is well known³⁹. The VI_ASTM was strongly correlated with both η_k and N. To capture molecules with viscosities within the typical range of low-viscosity engine oils, we then restricted the dataset to 4.0 mm²/s ≤ $\eta _k^{100^ \circ {\mathrm{C}}}$ ≤ 5.0 mm²/s. In Fig. 6c, the edge between VI_ASTM and η_k disappears because its correlation was below the threshold magnitude 0.4, but the positive correlation between N and VI_ASTM remained under the viscosity restriction. According to this result, VI_ASTM is an increasing function of N. However, as N is also positively correlated with the viscosity, it cannot be increased indefinitely, but is restricted by the upper limit of the valid viscosity range. Therefore, when increasing N, the viscosity must be simultaneously suppressed. To favor a high-VI_ASTM, we minimized the viscosity of molecules with constant N. Figure 6d shows the major correlations in the dataset of molecules with N = 31. The kinematic viscosities of the restricted molecules were mainly distributed over 4.0–5.0 mm²/s. The nodes R_ring and N_branch were positively correlated with the node η_k, implying that straight-chain fragments are preferable for reducing the viscosity increment.

Meanwhile, a high VI was observed for molecules with many constituent atoms, few branches, and few rings. This result is consistent with the previously reported experimental insights^46,49,50,51. Note that although N_branch and R_ring negatively influenced the VI_ASTM, they could not describe the VI well, because they were poorly correlated with VI. The VI might be better represented by other features such as molecular configuration, dynamical entanglement, and dipole–dipole interactions. Other critical parameters of VI might be identified by mining the present dataset of 55,000 molecules; for this purpose, the dataset (see Supplementary Data 1) has been made publicly available.

In conclusion, our autonomous search confers two main advantages: (1) efficient design of a high-functioning molecule by referring to a prospective molecule selected from generated candidate molecules, and (2) acquisition of design insights and directions from the generated dataset. A major weakness of this system is the difficulty of evaluating the ease of synthesis, which has been intensively studied elsewhere¹⁴. Nevertheless, as a potentially new scheme of materials development, our MI system comprehensively explores the vast material space in high-speed evaluations. Experts can then modify the extracted prospective materials considering the required stability, safety, and production cost of the target product. Current AI systems for the “Go” game have continuously inspired professional players since demonstrating their ability to defeat the players⁵². This trend may also propagate into materials science, driving further technological developments through human–MI collaborations. Fast evaluation by MD simulations should be generalized to transport properties other than viscosity, such as ion conductivity. Such investigations will be undertaken in our future work.

Methods

Molecular dynamics simulation

The simulations were performed in the open-source MD solver LAMMPS with the force field TEAM_MS which is provided in the commercial software Direct Force Field (DFF). The TEAM_MS force field was constructed based on the results of ab-initio calculations of molecular fragments⁵³. To achieve a thermal equilibrium state, we first ran an NVT calculation with time interval Δt = 0.25 fs followed by an NPT calculation with Δt = 1.0 fs. We then executed a relatively long NVT calculation with Δt = 1.0 fs to sample the non-diagonal elements of the stress tensor P_ij. Table 2 summarizes the conditions of the MD simulations.

Table 2 Conditions of the molecular dynamics (MD) simulations.

Full size table

Figure 2b, c shows the distributions of Φ(t, t₀) entries, calculated in MD simulations under the ”Normal” condition in Table 2. To obtain the distributions, we divided the t₀ samplings into 100 domains, modifying Eq. (1) as

$$\left\langle {{\mathrm{\Phi }}\left( {t,t_0} \right)} \right\rangle = \, \frac{1}{{N_t}}\mathop {\sum}\limits_{n_0 = 1}^N {{\mathrm{\Phi }}\left( {t,n_0{\mathrm{\Delta }}t - {\mathrm{\Delta }}t} \right)} ,\\ = \mathop {\sum}\limits_{n_1 = 0}^{99} {\frac{1}{{100}}\frac{1}{{N_t/100}}\mathop {\sum}\limits_{n_2 = 1}^{N_t/100} {{\mathrm{\Phi }}\left( {t,100n_1N_t{\mathrm{\Delta }}t + n_2{\mathrm{\Delta }}t - {\mathrm{\Delta }}t} \right)} } ,\\ \equiv\, \mathop {\sum}\limits_{n_1 = 0}^{99} \frac{1}{{100}}{\mathrm{\Phi }}^\prime \left( {t,n_1} \right) .$$

We employed the averaged sampling quantity as P_ij ≡ (P_xy + P_yz + P_zx)/3. The MD simulations were repeated five times to increase the number of the MD samplings; therefore, Fig. 2b, c was constructed from 5 × 100 ${\mathrm{\Phi }}^\prime \left( {t,n_1} \right)$ trajectories.

Figure 4, which compares the results of the fast evaluation and conventional methods, was constructed from the same five MD trajectories under the “Normal” condition. In this case, we individually set P_xy, P_yz, and P_zx as P_ij and ran the MD simulation five times, thus obtaining 5 × 3 = 15 viscosity samples for each molecule.

The traceless-symmetric part of the stress tensor P_os is known to yield good statistics. The quantity P_os consists of five independent samples P_xy, P_yz, P_zx, (P_xx − P_yy)/2, and (P_yy − P_zz)/2 collected into one MD trajectory^23,24. We used P_os as the sampling quantity in the high-throughput calculations of Fig. 5. The number of molecules in the simulation cell was 120. To reduce the computational cost of the 55,000 evaluations, we decreased the cutoff length of the coulomb interaction and number of time steps (“High-throughput” row in Table 2). We confirmed that the high-throughput condition ensures acceptable accuracy for determining the order of VI_ASTM’s of different molecules, as shown in the Supplementary Note 4. The data in Table 1 were accurately calculated by sampling the traceless-symmetric quantity under the “Normal” condition.

Monte Carlo tree search

The reward in MCTS is defined by the upper confidence bound (UCB) score as

$${\mathrm{UCB}} = \overline {{\mathrm{VI}}} _{\mathrm{{ASTM}}}/200 + C\sqrt {2{\mathrm{log}}\left( {n_{\mathrm{{parent}}}/n} \right)} ,$$

(7)

where n and n_parent indicate the numbers of visits at a node and its parent node, respectively^15,16. The quantity $\overline {{\mathrm{VI}}} _{\mathrm{{ASTM}}}$ is obtained by averaging the VI_ASTMs of molecules that were randomly generated from the node called random rollout. The rollout number, which refers to the number of randomly generated molecules, was set to 10.

Because VI_ASTM cannot be defined when $\eta _k^{40^ \circ {\mathrm{C}}}$ ≤ 2.0 mm²/s, we set VI_ASTM = 0 in such cases. If the structure of the molecule generated in the rollout phase was chemically invalid, it was automatically detected by the RDKit software and replaced with a new molecule. The bias coefficient C is an arbitrary parameter. We set C = 1, which is theoretically validated when the first term of the right-hand side of Eq. (7) ranges from 0.0 to 1.0 (refs. ^15,16). We then divided $\overline {{\mathrm{VI}}} _{\mathrm{{ASTM}}}$ by its approximately expected maximum, namely, 200.

Reference molecules

As the reference models in the MD test, we adopted typical 12e organic molecules. Their structures and abbreviated names are displayed in Fig. 7. Their formal names and viscosity properties are listed in Tables 3 and 4, respectively. In the MD calculations, the numbers of molecules in the simulation cell were 150 for 9nhhd, 9chhd, diiso_seb, and 2m4odp, 120 for 1c2mh and 13cp, and 100 for the remainder. Approximately 10,000 atoms existed in each simulation cell.

Table 3 Reference oil molecules.

Full size table

Table 4 Viscosity properties of the reference oil molecules.

Full size table

Data availability

The dataset generated during the high-throughput evaluations (54,318 SMILES of the molecules along with the VI's, viscosities, kinematic viscosities, and densities) is available in the Supplementary Data 1. The authors declare that all other data supporting the findings of this study are available within the paper and its Supplementary Notes.

References

Tabor, D. P. et al. Accelerating the discovery of materials for clean energy in the era of smart automation. Nat. Rev. Mater. 3, 5–20 (2018).
Article ADS Google Scholar
Luna, P. D. et al. Use machine learning to find energy materials. Nature 552, 23–27 (2017).
Article ADS Google Scholar
Buttler, K. T., Davies, D. W., Cartwright, H., Isayev, O. & Walsh, A. Machine learning for molecular and materials science. Nature 559, 547–555 (2018).
Article ADS Google Scholar
Sendek, A. D. et al. Machine learning-assisted discovery of solid Li-ion conducting materials. Chem. Mater. 31, 342–352 (2018).
Article Google Scholar
Gomez-Bombarelli, R. et al. Design of efficient molecular organic light-emitting diodes by a high-throughput virtual screening and experimental approach. Nat. Mater. 15, 1120–1127 (2016).
Article ADS Google Scholar
Narayan, A. et al. Computational and experimental investigation for new transition metal selenides and sulfides: the importance of experimental verification for stability. Phys. Rev. B 94, 045105 (2016).
Article ADS Google Scholar
Lee, J., Ohba, N. & Asahi, R. Discovery of zirconium dioxides for the design of better oxygen-ion conductors using efficient algorithms beyond data mining. RSC Adv. 8, 25534–25545 (2018).
Article ADS Google Scholar
Ohba, N., Yokoya, T., Kajita, S. & Takechi, K. Search for high-capacity oxygen storage materials by materials informatics. RSC Adv. 9, 41811–41816 (2019).
Article ADS Google Scholar
Kajita, S., Ohba, N., Suzumura, A., Tajima, S., & Asahi, R. Discovery of superionic conductors by ensemble-scope descriptor. NPG Asia Mater. 12, 31 (2020).
Article Google Scholar
Gomez-Bombarelli, R. et al. Automatic chemical design using a data-driven continuous representation of molecules. ACS Cent. Sci. 4, 268–276 (2018).
Article Google Scholar
Segler, M. H., Kogej, T., Tyrchan, C. & Waller, M. P. Generating focussed molecule libraries for drug discovery with recurrent neural networks. ACS Cent. Sci. 4, 120–131 (2017).
Article Google Scholar
Ikebata, H., Hongo, K., Isomura, T., Maezono, R. & Yoshida, R. Bayesian molecular design with a chemical language model. J. Comput. Aided Mol. Des. 31, 379–391 (2017).
Article ADS Google Scholar
Yang, X., Zhang, J., Yoshizoe, K., Terayama, K. & Tsuda, K. ChemTS: an efficient python library for de novo molecular generation. Sci. Technol. Adv. Mater. 18, 972–976 (2017).
Article Google Scholar
Segler, M. H., Preuss, M. & Waller, M. P. Planning chemical syntheses with deep neural networks and symbolic AI. Nature 555, 604–610 (2018).
Article ADS Google Scholar
Agrawal, R. Sample mean based index policies by o (log n) regret for the multi-armed bandit problem. Adv. Appl. Probab. 27, 1054–1078 (1995).
Article MathSciNet MATH Google Scholar
Auer, P., Cesa-Bianchi, N. & Fischer, P. Finite-time analysis of the multiarmed bandit problem. Mach. Learn. 47, 235–256 (2002).
Article MATH Google Scholar
Silver, D. et al. Mastering the game of Go with deep neural networks and tree search. Nature 529, 484 (2016).
Article ADS Google Scholar
Hautier, G. et al. Phosphates as lithium-ion battery cathodes: an evaluation based on high-throughput ab initio calculations. Chem. Mater. 23, 3495–3508 (2011).
Article Google Scholar
Studt, F. et al. CO hydrogenation to methanol on Cu-Ni catalysts: theory and experiment. J. Catal. 293, 51–61 (2012).
Article Google Scholar
Nishijima, M. et al. Accelerated discovery of cathode materials with prolonged cycle life for lithium-ion battery. Nat. Commun. 5, 4553 (2014).
Article ADS Google Scholar
Hayashi, H. et al. Discovery of a novel Sn (II)-based oxide β-SnMoO₄ for daylight-driven photocatalysis. Adv. Sci. 4, 1600246 (2017).
Article Google Scholar
Curtarolo, S. et al. The high-throughput highway to computational materials design. Nat. Mater. 12, 191201 (2013).
Article Google Scholar
Meyer, E. R., Kress, J. D., Collins, L. A. & Ticknor, C. Effect of correlation on viscosity and diffusion in molecular-dynamics simulations. Phys. Rev. E 90, 043101 (2014).
Article ADS Google Scholar
Davis, P. J. & Evans, D. J. Comparison of constant pressure and constant volume nonequilibrium simulations of sheared model decane. J. Chem. Phys. 100, 541–547 (1994).
Article ADS Google Scholar
Jinnouchi, R., Lahnsteiner, J., Karsai, F., Kresse, G. & Bokdam, M. Phase transitions of hybrid perovskites simulated by machine-learning force fields trained on the fly with Bayesian inference. Phys. Rev. Lett. 122, 225701 (2019).
Article ADS Google Scholar
Schütt, K. T., Sauceda, H. E., Kindermans, P. J., Tkatchenko, A. & Muller, K. R. SchNet-A deep learning architecture for molecules and materials. J. Chem. Phys. 148, 241722 (2018).
Article ADS Google Scholar
Chmiela, S., Sauceda, H. E., Müller, K. & Tkatchenko, A. Towards exact molecular dynamics simulations with machine-learned force fields. Nat. Commun. 9, 1–10 (2018).
Article Google Scholar
Unke, T. O. & Meuwly, M. PhysNet: a neural network for predicting energies, forces, dipole moments, and partial charges. J. Chem. Theory Comput. 15, 3678–3693 (2019).
Article Google Scholar
Singraber, A., Behler, J. & Dellago, C. Library-based LAMMPS implementation of high-dimensional neural network potentials. J. Chem. Theory Comput. 15, 1827–1840 (2019).
Article Google Scholar
Kai, H. & Szlufarska, I. Green-Kubo relation for friction at liquid-solid interfaces. Phys. l Rev. E 89, 032119 (2014).
ADS Google Scholar
Washizu, H. & Ohmori, T. Molecular dynamics simulations of elastohydrodynamic lubrication oil film. Lubrication Sci. 22, 323–340 (2010).
Article Google Scholar
Mondello, M. & Grest, G. S. Viscosity calculations of n-alkanes by equilibrium molecular dynamics. J. Chem. Phys. 106, 9327–9336 (1997).
Article ADS Google Scholar
Helfand, E. Transport coefficients from dissipation in a canonical ensemble. Phys. Rev. E 119, 1 (1960).
Article ADS MathSciNet MATH Google Scholar
Dyre, J. C. Colloquium: The glass transition and elastic models of glass-forming liquids. Rev. Mod. Phys. 78, 953 (2006).
Article ADS Google Scholar
Puosi, F. & Leporini, D. Communication: correlation of the instantaneous and the intermediate-time elasticity with the structural relaxation in glassforming systems. J. Chem. Phys. 136, 041104 (2012).
Article ADS Google Scholar
Dyre, J. C. & Wang, W. H. The instantaneous shear modulus in the shoving model. J. Chem. Phys. 136, 224108 (2012).
Article ADS Google Scholar
Glasstone, S., Laidler, K. J. & Eyring, H. Theory of Rate Process (McGraw-Hill, New York, 1941).
Google Scholar
Van Velzen, D., Cardozo, R. L. & Langenkamp, H. A liquid viscosity-temperature-chemical constitution relation for organic compounds. Ind. Eng. Chem. Fundam. 11, 20–25 (1972).
Article Google Scholar
Viswanath, D. S., Ghosh, T. K., Prasad, D. H. L., Dutt, N. V. K. & Rani, K. Y. Viscosity of Liquids: Theory, Estimation, Experiment, and Data (Springer, Netherlands, 2007).
oback, K. G. & Reid, R. C. Estimation of pure-component properties from group-contributions. Chem. Eng. Commun. 57, 233–243 (1987).
Article Google Scholar
Smith, G. J., Wilding, W. V., Oscarson, J. L., & Rowley, R. L. Correlation of liquid viscosity at the normal boiling point. Proceedings of the Fifteenth Symposium on Thermophysical Properties, Boulder, Colorado, U.S.A.
Zakarian, J. The limitations of the viscosity index and proposals for other methods to rate viscosity-temperature behavior of lubricating oils. SAE Int. J. Fuels Lubr. 5, 1123–1131 (2012).
Article Google Scholar
ASTM D2270-10: Standard practice for calculating viscosity index from kinematic viscosity at 40 and 100 °C. http://ppapco.ir/wp-content/uploads/2019/07/ASTM-D2270-2016.pdf (2016).
Covitch, M. J. An improved method for calculating viscosity index (VI) of low viscosity base oils. J. Test. Eval. 46, 820–825 (2018).
Google Scholar
Roelands, C. J. A., Blok, H., Vlugter, J. C., & Eng, M. A new viscosity-temperature criterion for lubricating oils. ASME-ASLE International Lubrication Conference, No. 64-LUB-3, Washington, D.C. (1964).
Lynch, T. R. Process Chemistry of Lubricant Base Stocks (CRC Press, Boca Raton, 2007).
Book Google Scholar
Zhang, Q. C. et al. Modulating the rotation of a molecular rotor through hydrogen-bonding interactions between the rotator and stator. Angew. Chem. Int. Ed. 52, 12602–12605 (2013).
Article ADS Google Scholar
Neale, M. J. Table 2.1 in Lubrication and Reliability Handbook (Newnes, Elsevier, 2001).
Kapur, G. S., Chopra, A., Sarpal, A. S., Ramakumar, S. S. V. & Jain, S. K. Studies on competitive interactions and blending order of engine oil additives by variable temperature 31P-NMR and IR spectroscopy. Tribol. Trans. 42, 807–812 (1999).
Article Google Scholar
Verdier, S., Coutinho, J. A., Silva, A. M., Alkilde, O. F. & Hansen, J. A. A critical approach to viscosity index. Fuel 88, 2199–2206 (2009).
Article Google Scholar
Noh, K., Shin, J. & Lee, J. H. Change of hydrocarbon structure type in lube hydroprocessing and correlation model for viscosity index. Ind. Eng. Chem. Res. 56, 8016–8028 (2017).
Article Google Scholar
Lee, C. S. et al. Human vs. computer go: review and prospect. IEEE Comput. Intell. Mag. 11, 67–72 (2016).
Article Google Scholar
Sun, H. & J. COMPASS: an ab initio force-field optimized for condensed-phase applications—overview with details on alkane and benzene compounds. Phys. Chem. B 102, 7338 (1998).
Article Google Scholar

Download references

Acknowledgements

S.K. thanks M. Tohyama and T. Ohmori for their useful advices regarding the specification and evaluation of lubricants. S.K. thanks H. Takeuchi and Y. Kikuzawa for assisting with the synthesis of the present oil molecules. This research used the computational resources of the K computer provided by RIKEN through the HPCI System Research project (Project ID:hp180238).

Author information

Authors and Affiliations

Toyota Central R&D Labs, Inc., 41-1, Yokomichi, Nagakute, Aichi, 480-1192, Japan
Seiji Kajita, Tomoyuki Kinjo & Tomoki Nishi

Authors

Seiji Kajita
View author publications
You can also search for this author in PubMed Google Scholar
Tomoyuki Kinjo
View author publications
You can also search for this author in PubMed Google Scholar
Tomoki Nishi
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

S.K. developed the ultra-fast evaluation and MCTS, and operated the closed-loop search and experiment. T.K. developed the GK and EH methods, and selected the proper MD conditions. T.N. provided an idea and technical advices related to MCTS. All authors collectively wrote the manuscript.

Corresponding author

Correspondence to Seiji Kajita.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Description of Additional Supplementary Files

Supplementary Data 1

Peer Review File

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Kajita, S., Kinjo, T. & Nishi, T. Autonomous molecular design by Monte-Carlo tree search and rapid evaluations using molecular dynamics simulations. Commun Phys 3, 77 (2020). https://doi.org/10.1038/s42005-020-0338-y

Download citation

Received: 03 November 2019
Accepted: 23 March 2020
Published: 07 May 2020
DOI: https://doi.org/10.1038/s42005-020-0338-y
Springer Nature Limited

This article is cited by

Interpretability of rectangle packing solutions with Monte Carlo tree search
- Yeray Galán López
- Cristian González García
- Alberto Gómez Gómez
Journal of Heuristics (2024)
Human-in-the-loop assisted de novo molecular design
- Iiris Sundin
- Alexey Voronov
- Ola Engkvist
Journal of Cheminformatics (2022)
Fast evaluation technique for the shear viscosity and ionic conductivity of electrolyte solutions
- Takeshi Baba
- Seiji Kajita
- Nobuko Ohba
Scientific Reports (2022)
Multi-objective goal-directed optimization of de novo stable organic radicals for aqueous redox flow batteries
- Shree Sowndarya S. V.
- Jeffrey N. Law
- Peter C. St. John
Nature Machine Intelligence (2022)
A review of advances in tribology in 2020–2021
- Yonggang Meng
- Jun Xu
- Wenzhong Wang
Friction (2022)

Autonomous molecular design by Monte-Carlo tree search and rapid evaluations using molecular dynamics simulations

Abstract

Similar content being viewed by others

Introduction

Results

Conventional MD evaluation

Fast evaluation

Target property: viscosity index

Molecular fragments and rules of the Monte Carlo tree search

Evaluations of viscosity and viscosity index

Autonomous search

Discussion

Methods

Molecular dynamics simulation

Monte Carlo tree search

Reference molecules

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Supplementary information

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Navigation