Files

Abstract

Statistical (machine-learning, ML) models are more and more often used in computational chemistry as a substitute to more expensive ab initio and parametrizable methods. While the ML algorithms are capable of learning physical laws implicitly from data, addition of some prior physical knowledge improves the results and accelerates the training. This thesis covers several aspects of enhancing ML models with quantum-chemical information: representation design, preprocessing of the input data, and loss function choice. The first part focuses on extension of the symmetry-adapted Gaussian process regression model of the electron density. First, we study how the choice of density-fitting and training-loss-function metrics impacts the quality of the predictions. Withal, we show that densities predicted by the original model do not integrate to the exact number of electrons which compromises the extrapolative capabilities, and propose a modified, constrained model along with an a posteriori correction. Then, the framework is applied to the on-top pair density. Using a specialized fitting basis set, we train a model to predict CASSCF-quality on-top pair density and compute the on-top pair ratio to visualize static electron correlation effects. The second part introduces the spectrum of approximated Hamiltonian matrices (SPAHM), a family of physics-based molecular representations. Eigenvalue SPAHM is a global representation built from occupied-orbital eigenvalues of an initial-guess Hamiltonian. SPAHM(a,b) are local representations based on initial-guess-level electron densities attributed to atoms and bonds. These representations not only distinguish different molecules and conformations, but also different spin, charge, and potentially electronic states. The advantages of SPAHM are demonstrated on datasets featuring a wide variation of charge and spin. The last part is devoted to application of equivariant neural networks to chemical reaction properties. EquiReact — the model proposed — predicts reaction barriers from 3D structures of reactants and products. Its high interpolative and extrapolative capabilities, particularly in the absence of atom-mapping information, are demonstrated on several datasets. Overall, the work presented in this thesis contributes to the global effort to develop, improve, and advance ML-based methods used in computational chemistry.

Details

PDF