This is a synopsis of the paper by T. Zarrouk, R. Ibragimova, A.P. Bartók and M.A. Caro, “Experiment-Driven Atomistic Materials Modeling: A Case Study Combining X-Ray Photoelectron Spectroscopy and Machine Learning Potentials to Infer the Structure of Oxygen-Rich Amorphous Carbon”, J. Am. Chem. Soc., DOI:10.1021/jacs.4c01897, available as an Open Access PDF from the publisher’s website. All figures are reproduced under the CC-BY 4.0 license.
The atomistic modeling field is rapidly evolving in the midst of a machine learning (ML) and artificial intelligence (AI) driven revolution. Simulations of molecules and materials, involving thousands and even millions of atoms, or molecular dynamics (MD) simulations spanning long simulation times, are now routinely done with (close to) ab initio accuracy. These were unthinkable just a decade ago. Yet, the ultimate test for any theory and simulation is experiment, and achieving experimental agreement and incorporating experimental data directly into these atomistic simulations is necessarily the next frontier in atomistic modeling.
A central object of atomistic materials modeling is obtaining the atomic-scale structure of materials. This is particularly important (and interesting) when the material lacks crystalline ordering and experimental techniques like X-ray diffraction (XRD) cannot be used to determine its structure. Amorphous and generally disordered materials are one such class of materials, but also liquids and interfaces are relevant here. Through atomistic modeling we can attempt to derive structural models of these materials, e.g., a set of representative structures given in terms of the atomic positions (the Cartesian coordinates of the atoms). After deriving these models, we want to connect the simulation with the experiment, both to verify the soundness of the computational approach and to gain atomistic insight into the structure of the material. One of the different ways to do this, indirectly, is by using spectroscopy techniques, like X-ray photoelectron spectroscopy (XPS). On the one hand, we measure the XPS spectrum of the material experimentally; on the other, we make a computational XPS prediction for our candidate structural model. If they agree, we gain confidence that the model resembles reality; if they don’t, we keep looking for better structural models. (Of course, the whole story is more nuanced but this is the gist of it).
During our previous work on XPS prediction in collaboration with Dorothea Golze, we showed how XPS prediction for amorphous carbon-based materials can be made quantitatively accurate via a combination of electronic structure methods and atomistic machine learning technology. One of the disappointing aspects of that work is that we were unable to generate structural models of oxygenated amorphous carbon (a-COx) whose XPS spectra matched the experimental one. Since we had very good confidence in the accuracy of our computational XPS prediction, the necessary consequence was that the computational models of the structure of a-COx we were working with did not resemble the experimental samples. At the time, these models had been provided by our collaborator Volker Deringer from expensive DFT calculations, and we only had access to three a-COx models with a couple hundred atoms and different oxygen content. I was left intrigued by this issue and decided to train a machine learning potential (MLP) for CO. With this MLP, I could efficiently generate lots (thousands if not millions) of different samples at different conditions and thought: “surely, one of them will give the match with experiment”. But this was not the case! The reason is that the experimental growth is a non-equilibrium process involving energetic deposition of atoms: C atoms are deposited onto a growing film in an oxygen atmosphere, and the oxygen atoms are co-deposited with the carbon atoms into the film (see the nice work by Santini et al.). Direct simulation of the deposition process is very challenging. Our computational generation protocol was based on indirect structure generation routes and favored the formation of thermodynamically stable products: solids with low oxygen content and lots of CO and CO2 molecules.
At this point, we had to think a bit outside the box. The idea that came to mind was something that is known in the literature as reverse Monte Carlo (RMC): the atomic positions are updated and the observable directly comparable to experiment is monitored on the fly, such that moves that increase the agreement between simulation and experiment for said observable are favored. After many steps, computational and experimental observables will agree. There are (at least) two problems with this approach, and both are related to the need for cheap calculations, given the sheer number of individual evalutations of the system’s energy required in Monte Carlo optimization. First, RMC for materials has traditionally been done using XRD only, because this is computationally cheap to compute given a set of atomic positions. Second, the RMC protocol will not ensure that the generated structures are energetically sound (low in energy) as the only constraint is that the agreement with the experimental observable should improve. Previous work has dealt with this problem in the context of “hybrid” RMC (HRMC), where the optimization is done simultaneously on the observable agreement and the system’s energy via an interatomic potential. Again, the HRMC approach has traditionally been used with “cheap” interatomic potentials because the individual evaluations of the objective function (the function that depends on XRD and total energy and is being minimized) need to be efficient, because we need to do so many of them. But these classical interatomic potentials are not accurate enough to describe the complex chemical bonding taking place in a-COx!
But now, we have both an accurate description of the energy, afforded by our CO MLP, and a quantitatively accurate description of XPS signatures, thus addressing the issues inherent to HRMC preventing us from applying this approach to study a-COx. The only hurdle left at this point is how to handle the variable number of oxygen atoms in the samples: after all, we do not know how much oxygen is in there, this is actually one of the motivations for doing the simulations in the first place! To tackle this, we can resort to the grand-canonical version of Monte Carlo (GCMC), where a chemical potential can be defined for the oxygen atoms and they can be added and removed from the simulation.
With all of these ideas in place, the next step was to team up and getting it done. Together with Tigany Zarrouk and Rina Ibragimova from our group, and Albert Bartók from Warwick, we carried out an efficient implementation of these methods (special hats-off to Tigany for implementing this in the TurboGAP code!) and did lots of simulations. All the results and methodology are summarized in our paper, titled “Experiment-driven atomistic materials modeling: A case study combining X-ray photoelectron spectroscopy and machine learning potentials to infer the structure of oxygen-rich amorphous carbon” (and referenced at the beginning of this post). There, we leverage the predictive power of atomistic ML techniques but also their flexibility. We combine the accurate description of the potential energy surface of materials afforded by state-of-the-art MLPs with on-the-fly prediction of XPS. This allows us to make a direct link between the atomistic structure optimization procedure and the experimental structural fingerprint, such that it is the agreement between experiment and simulations that drives the structure optimization. The analysis reveals the elemental composition and atomic motif distribution in a-COx, as well as pointing toward a maximum oxygen content in carbon-based materials of about 30%.
While the method is illustrated in our manuscript for XPS as experimental analytical technique and a-COx as an interesting (and challenging) test-case material, the methodology (which we call “modified Hamiltonian” approach) is general and we are already extending it to incorporate other techniques, like XRD, Raman spectroscopy, etc. More generally, an ensemble of experimental techniques can be combined, e.g., to overcome one of the known limitations of XPS (and other individual techniques on their own), namely that more than one atomic motif can contribute in the same spectral region.
In addition to all the things already said, the paper also touches on a somewhat sensitive topic for the experimental materials community: the fact that, while a widely used technique, XPS analyses are often plagued by (incorrect) assumptions and suffer to a certain degree from arbitrariness in the fits. This is particularly true for carbon-based materials. But in pointing out the problem we also point out the solution: to incorporate ML-driven atomistic simulation into the normal workflows of XPS fitting procedures and, in the future, also other analytical techniques. This prospect depicts a very interesting time ahead in the field of structural and chemical analysis of materials and molecules.
This work was supported financially by the Research Council of Finland and benefitted from computational resources provided by CSC (the Finnish IT Center for Science) and the Aalto University Science-IT project.