Week 11: Final Project Demo 2 (Anakin-ASE)

This week we will run through an example of how to run a simulation using the ANI-1ccx forcefield using “Atomic Simulation Environment” (ASE).

This lab takes advantage of the TorchANI package and largely follows their example found here.

Check out the repository (linked above) and the TorchANI paper for more information.


Overview of today’s demo

As we learned last week, the ANI-1ccx forcefield is trained using high-level coupled-cluster quantum mechanical energies. The ANI-1x dataset was trained on 5 million DFT calculations dataset paper. The ANI-1ccx dataset augmented these with 500,000 QM calculations using CCSD(T)/CBS, which has higher accuracy than DFT but also higher computational cost. The resulting forcefield can accurately predict QM energies for systems including C, H, N and O elements.

Today we will give an example of how to run a system using the ANI-1ccx forcefield. We do not have to train this ourselves (thankfully!) and we use a pre-trained PyTorch model from the TorchANI toolkit.

We will use ASE instead of OpenMM to run the simulations. There are a lot of differences between the ANI simulations and the classical simulations that we are used to. The “Atomic Simulation Environment” package is better suited to run these kinds of simulations.

The steps to run these simulations are as follows:

  1. Build an xyz file that describes your system. This is a remarkably simple text file that only contains the element name and the \(x\), \(y\) and \(z\) coordinates of each atom. There are many possible ways to build this. Here we will use rdkit to generate some molecular conformers and some bash scripts to merge our xyz files together.

  2. Read this into an ASE atoms object and prepare it for simulation.. In this example we will modify the velocities to simulate a molecular collision. There are many other possible numerical “experiments” that you can do!

  3. Run the simulation. The data is saved in a .traj file.

  4. Visualize it! We can visualize our trajectories directly in the notebook using NGLViewer.


Notes of caution

Reactive transition states were not included in the training data! If you, like in this notebook, are designing a simulation that is “reactive”, i.e. if you want to observe a chemical reaction taking place, then keep in mind that the training data mostly focused on structures in and around the ground state. Reactive energy barriers are not likely to be very accurate. However, a study like this could be an important first step in examining the baseline predictions of ANI-1ccx. If you want to continue work in this direction after the class project you should further augment the ANI-1ccx dataset with energy calculations along the reactive pathways.

System size should be limited! While we can handle hundreds of thousands of atoms (or even millions!) in a classical MD simulation, you should be much more conservative here. The current TorchANI implementation (that is available to the public) does not handle large systems well, particularly in terms of memory usage. If you get a memory error then your system is likely too big. We have found in some tests that the limit is between 1000 and 2000 atoms on our hardware. That said, smaller systems can be more fun since it is easier to generate large data sets. In the example below we use an 8-atom system.


Installing packages

## Uncomment and run the lines below if you are on Colab
#import sys
#print(sys.version)
#!wget https://repo.anaconda.com/miniconda/Miniconda3-py37_4.10.3-Linux-x86_64.sh
#!bash Miniconda3-py*.sh -bfp /usr/local
!conda config --set always_yes yes
!conda install -c conda-forge ase rdkit nglview torchani

Building the xyz file

Here we are going to make a simple system that has two hydrogen peroxide molecules that we will bash together at high speeds! It is possible that we can see some water molecules form.

from rdkit import Chem
from rdkit.Chem import AllChem
import nglview as nv
import numpy as np

First we make a hydrogen peroxide molecule using rdkit:

perox = Chem.AddHs(Chem.MolFromSmiles('OO'))
perox

Then we generate some “conformers” that contain 3D positions of each atom:

_ = AllChem.EmbedMultipleConfs(perox, useExpTorsionAnglePrefs=True, useBasicKnowledge=True)
view = nv.show_rdkit(perox,conf_id=1)
view
conf0 = perox.GetConformer(0)
conf1 = perox.GetConformer(1)
xyz0 = np.array([[conf0.GetAtomPosition(i).x, conf0.GetAtomPosition(i).y, conf0.GetAtomPosition(i).z] for i in range(4)])
xyz1 = np.array([[conf1.GetAtomPosition(i).x, conf1.GetAtomPosition(i).y, conf1.GetAtomPosition(i).z] for i in range(4)])
print("conf0: ")
print(xyz0)

print("conf1: ")
print(xyz1)

Now let’s move all of the atoms in conf1 to center them at (10,0,0):

for i in range(4):
    pos = conf1.GetAtomPosition(i)
    pos.x += 10
    conf1.SetAtomPosition(i,pos)
    
new_xyz1 = np.array([[conf1.GetAtomPosition(i).x, conf1.GetAtomPosition(i).y, conf1.GetAtomPosition(i).z] for i in range(4)])
new_xyz1

Write both conformers to xyz files.

Chem.rdmolfiles.MolToXYZFile(perox,'perox0.xyz',confId=0)
Chem.rdmolfiles.MolToXYZFile(perox,'perox1.xyz',confId=1)

Combine both into one merged xyz file:

! echo "8" > perox_2mol.xyz
! echo "" >> perox_2mol.xyz
! tail -n 4 perox0.xyz >> perox_2mol.xyz
! tail -n 4 perox1.xyz >> perox_2mol.xyz

Our combined xyz file looks like this:

! cat perox_2mol.xyz

Create ASE atoms object and prepare it for simulation

import torchani

from ase.md.langevin import Langevin
from ase.md.verlet import VelocityVerlet
from ase.md.velocitydistribution import MaxwellBoltzmannDistribution
from ase.optimize import BFGS
from ase import units
from ase import io
from ase.io import read, write
from ase import Atoms
from ase.io.trajectory import Trajectory

Read in the xyz file into an atoms object:

my_atoms=io.read("perox_2mol.xyz")
my_atoms.set_pbc((True, True, True))
my_atoms.set_cell([20,20,20])
print(type(my_atoms))
n=len(my_atoms)
print(len(my_atoms), "atoms in the system")

Set the calculator to be ANI1ccx from the torchani package:

calculator = torchani.models.ANI1ccx().ase()
my_atoms.set_calculator(calculator)
print("Done!")

Minimize the energy:

#Using BFGS method for minimizing                                                                                                                                
optimum=BFGS(my_atoms)
optimum.run(fmax=0.001)

Define a function to print the energy and temperature during the simulation:

def print_energy(a=my_atoms):
    Potential_Energy=a.get_potential_energy()/n
    Kinetic_energy=a.get_kinetic_energy()/n

    print('Energy per atom: Epot=%4feV Ekin=%4feV (T=%4.0f K")  '
            'Etot = %.3feV' % (Potential_Energy,
            Kinetic_energy, Kinetic_energy / (1.5 * units.kB), Potential_Energy + Kinetic_energy))

Initialize the velocities via the Maxwell-Boltzmann distribution:

MaxwellBoltzmannDistribution(my_atoms,temperature_K=300)
v = my_atoms.get_velocities()
v

Add a positive x velocity to molecule 0 and a negative x velocity to molecule 1:

impact_vel = 0.3
v[0:4,0] += impact_vel
v[4:8,0] -= impact_vel

my_atoms.set_velocities(v)

Run an MD simulation with constant temperature using the Langevin algorithm:
Time step of 1 fs, temperature = 300K and the friction coefficient to 0.02 / ps.

dyn = VelocityVerlet(my_atoms, timestep=0.5*units.fs, trajectory='collision.traj', loginterval=50)
dyn.attach(print_energy, interval=50)
dyn.run(2000)

Visualize the trajectory with NGLView:

traj=Trajectory('collision.traj')
t = nv.ASETrajectory(traj)
w = nv.NGLWidget(t, viewer="ngl")
w.add_spacefill()
w

What do you see? If you run it again with the same starting positions, but different random velocities, would you see the same thing?

What if you change the impact velocity?

Analysis

Analysis of these trajectories can be a little more complicated than Classical trajectories. This is because we can’t take advantage of a constant molecular topology. Bonds that are present at the beginning of a simulation are not necessarily present at the end!

That said, the traj variable that was read from our collision.traj file is essentially a list of Atoms objects.

traj[0]

We can get distances between atoms as follows:

n = len(traj)
d02 = np.zeros((n))  # distance between H-O in conf 0
d06 = np.zeros((n))  # distance between O in conf 0 and H in conf 1
d07 = np.zeros((n))  # distance between O in conf 0 and other H in conf 1

for i in range(n):
    d02[i] = traj[i].get_distance(0,2)
    d06[i] = traj[i].get_distance(0,6)
    d07[i] = traj[i].get_distance(0,7)
import matplotlib.pyplot as plt

plt.plot(d02,label='original O-H')
plt.plot(d06,label='swapped O-H1')
plt.plot(d07,label='swapped O-H2')
plt.legend()