Solvate and equilibrate a ligand in a box of water

import time

import mdtraj
import nglview
import numpy
import openmm
import openmm.app
import openmm.unit
from openff.toolkit import ForceField, Molecule, unit
from rich.pretty import pprint

from openff.interchange import Interchange
from openff.interchange.components._packmol import RHOMBIC_DODECAHEDRON, pack_box

Construct the topology

In this example we’ll construct a topology consisting of one ligand in a rhombic dodecahedral box with 2100 water molecules. We’ll use a mapped SMILES when creating Molecule objects to ensure the atom ordering matches. (Atom ordering is not strictly a part of SMILES and therefore liable to be changed with updates to RDKit.)

This can be extended or modified by i.e.

  • Replacing this sample ligand with a different ligand of interest - substitute out the ligand SMILES

  • Using a different number of water molecules - substitute out the 2100 used below

  • Adding ions or co-solvents into the box - add more Molecule object as desired

ligand = Molecule.from_mapped_smiles(
    "[H:7][C@:6]1([C:13](=[C:11]([C:9](=[O:10])[O:8]1)[O:12][H:19])[O:14][H:20])[C@:3]([H:4])([C:2]([H:16])([H:17])[O:1][H:15])[O:5][H:18]"
)
water = Molecule.from_mapped_smiles("[H:2][O:1][H:3]")

# Naming the residue is not needed to parameterize the system or run the simulation, but makes visualization easier
for atom in water.atoms:
    atom.metadata["residue_name"] = "HOH"

There are a few ways to convert the information in this trajectory to an Openff Topology object. Since we already know how many of which molecules we want, we’ll use a PACKMOL wrapper shipped with Interchange. The Topology object returned by pack_box contains the ligand, 1000 copies of water, the box vectors we asked for, and the positions generated by PACKMOL.

topology = pack_box(
    molecules=[ligand, water],
    number_of_copies=[1, 1000],
    box_vectors=3.5 * RHOMBIC_DODECAHEDRON * unit.nanometer,
)
topology.n_molecules, topology.box_vectors, topology.get_positions().shape
(1001,
 array([[3.5       , 0.        , 0.        ],
        [0.        , 3.5       , 0.        ],
        [1.75      , 1.75      , 2.47487373]]) <Unit('nanometer')>,
 (3020, 3))

The “Sage” force field line (version 2.x.x) includes TIP3P parameters for water, so we don’t need to use multiple force fields to parametrize this topology. (One could use a different water model provided they accept the risks of using a different one than the force field was optimized with.)

Note that the “Parsley” (version 1.x.x) line did not include TIP3P parameters, so loading in an extra force field was required.

sage = ForceField("openff-2.0.0.offxml")

From here, we can create an Interchange object, which stores the results of applying the force field to the topology. Since the Topology object contained positions and box vectors, we don’t need to set them again - they’re already set on the Interchange object!

interchange: Interchange = Interchange.from_smirnoff(
    force_field=sage, topology=topology
)
interchange.topology.n_atoms, interchange.box, interchange.positions.shape
(3020,
 array([[3.5       , 0.        , 0.        ],
        [0.        , 3.5       , 0.        ],
        [1.75      , 1.75      , 2.47487373]]) <Unit('nanometer')>,
 (3020, 3))

Now, we can prepare everything that OpenMM needs to run and report a brief equilibration simulation:

  • A Simulation object containing

    • An openmm.System

    • A topology in OpenMM’s object model (openmm.app.Topology)

    • Positions and box vectors in OpenMM’s unit solution (openmm.unit.Quantity)

  • A barostat, since we want to use NPT dynamics to relax the box size toward equilibrium

  • An integrator

  • Reporters for the trajectory and simulation data

For convenience, let’s wrap some boilerplate code into a function that can be called again later with different inputs.

def create_simulation(
    interchange: Interchange,
    pdb_stride: int = 500,
    trajectory_name: str = "trajectory.pdb",
) -> openmm.app.Simulation:
    integrator = openmm.LangevinIntegrator(
        300 * openmm.unit.kelvin,
        1 / openmm.unit.picosecond,
        1 * openmm.unit.femtoseconds,
    )

    barostat = openmm.MonteCarloBarostat(
        1.0 * openmm.unit.bar,
        293.15 * openmm.unit.kelvin,
        25,
    )

    simulation = interchange.to_openmm_simulation(
        combine_nonbonded_forces=True,
        integrator=integrator,
        additional_forces=[barostat],
    )

    # https://github.com/openmm/openmm/issues/3736#issuecomment-1217250635
    simulation.minimizeEnergy()

    simulation.context.setVelocitiesToTemperature(300 * openmm.unit.kelvin)
    simulation.context.computeVirtualSites()

    pdb_reporter = openmm.app.PDBReporter(trajectory_name, pdb_stride)
    state_data_reporter = openmm.app.StateDataReporter(
        "data.csv",
        10,
        step=True,
        potentialEnergy=True,
        temperature=True,
        density=True,
    )
    simulation.reporters.append(pdb_reporter)
    simulation.reporters.append(state_data_reporter)

    return simulation
simulation = create_simulation(interchange)

Finally, we can run this simulation. This should take approximately 10-20 seconds on a laptop or small workstation.

Again, let’s wrap this up into a function to avoid copy-pasting code.

def run_simulation(simulation: openmm.app.Simulation, n_steps: int = 5000):
    print("Starting simulation")
    start_time = time.process_time()

    print("Step, volume (nm^3)")

    for step in range(n_steps):
        simulation.step(1)
        if step % 500 == 0:
            box_vectors = simulation.context.getState().getPeriodicBoxVectors()
            print(step, numpy.linalg.det(box_vectors._value).round(3))

    end_time = time.process_time()
    print(f"Elapsed time: {(end_time - start_time):.2f} seconds")

We should observe (small) fluctuations in the density of the simulation, which is running under the NPT ensemble. If the initial density was a good guess, there shouldn’t be significant changes from the initial value around \(30 nm^3\).

run_simulation(simulation)
Starting simulation
Step, volume (nm^3)
0 30.317
500 30.581
1000 30.923
1500 30.961
2000 30.927
2500 30.616
3000 30.678
3500 30.671
4000 30.499
4500 30.468
Elapsed time: 69.22 seconds

Appendix A: visualizing the trajectory

If NGLView is installed, we can use it and MDTraj to load and visualize the PDB trajectory:

view = nglview.show_mdtraj(mdtraj.load("trajectory.pdb"))
view.add_representation("licorice", selection="water")
view

Appendix B: Using the OPC water model

The openff-forcefields package now includes some common water models. After release 2023.05.1, this includes OPC, a 4-site water model from the Amber community. We can load it by simply passing it to the ForceField constructor after Sage. (When loading multiple force fields like this, parameters in each section are appended in order of the files.) We can inspect the virtual site parameter in our new in-memory force field.

force_field_opc = ForceField("openff_unconstrained-2.0.0.offxml", "opc.offxml")
pprint(force_field_opc["VirtualSites"].parameters[0].to_dict())
{
'smirks': '[#1:2]-[#8X2H2+0:1]-[#1:3]',
'epsilon': <Quantity(0.0, 'kilocalorie_per_mole')>,
'type': 'DivalentLonePair',
'match': 'once',
'distance': <Quantity(-0.15939833, 'angstrom')>,
'outOfPlaneAngle': <Quantity(0.0, 'degree')>,
'inPlaneAngle': None,
'charge_increment1': <Quantity(0.0, 'elementary_charge')>,
'charge_increment2': <Quantity(0.679142, 'elementary_charge')>,
'charge_increment3': <Quantity(0.679142, 'elementary_charge')>,
'rmin_half': <Quantity(1.0, 'angstrom')>,
'name': 'EP'
}

We can also get a rough visualization of a single water molecule including the virtual site.

water.generate_conformers(n_conformers=1)
view = force_field_opc.create_interchange(water.to_topology()).visualize(
    include_virtual_sites=True
)
view.clear_representations()
view.add_representation("ball+stick", aspectRatio=1.5)
view

Since we want to use a different force field with the same chemical topology - a ligand in a box of water - we can re-use the same Topology object we prepared earlier, re-using the same functions we defined above!

interchange_opc = Interchange.from_smirnoff(
    force_field=force_field_opc,
    topology=topology,
)
interchange_opc.to_openmm()
<openmm.openmm.System; proxy of <Swig Object of type 'OpenMM::System *' at 0x7f28bef74b40> >
simulation = create_simulation(interchange_opc, trajectory_name="trajectory_opc.pdb")

run_simulation(simulation)
Starting simulation
Step, volume (nm^3)
0 30.317
500 30.744
1000 30.935
1500 30.726
2000 31.014
2500 30.897
3000 30.758
3500 31.013
4000 30.62
4500 30.07
Elapsed time: 105.86 seconds
def view_trajectory(
    trajectory: mdtraj.Trajectory,
    show_virtual_sites: bool = False,
) -> nglview.NGLWidget:
    if show_virtual_sites:
        view = nglview.show_mdtraj(trajectory)
    else:
        import numpy

        atom_indices = numpy.where(
            [a.element.mass > 0.0 for a in trajectory[0].topology.atoms]
        )[0]

        view = nglview.show_mdtraj(trajectory.atom_slice(atom_indices))

    view.add_representation("licorice", selection="water")
    return view


# NGLview likes to infer bonds between virtual sites in ways that look messy,
# but you can pass `show_virtual_sites=True` just to ensure they're there
view_trajectory(mdtraj.load("trajectory_opc.pdb"))

Appendix C: Using the TIP4P-FB water model with GROMACS

The openff-forcefields package includes several other common water models. The toolkit provides an API for listing the force fields found in your current installation. We can filter out some force field lines and get a picture of the water models available to us. At the time of writing, TIP4P-FB and OPC3 are also provided, but future releases will likely add more. (See the repo’s documentation for why multiple copies of each force field are provided with slightly different versions.)

from openff.toolkit.typing.engines.smirnoff.forcefield import get_available_force_fields

sorted(
    filter(
        lambda x: not x.startswith(("openff", "smirnoff99", "ff14sb")),
        get_available_force_fields(),
    )
)
['opc-1.0.0.offxml',
 'opc-1.0.1.offxml',
 'opc-1.0.2.offxml',
 'opc.offxml',
 'opc3-1.0.0.offxml',
 'opc3-1.0.1.offxml',
 'opc3.offxml',
 'spce-1.0.0.offxml',
 'spce.offxml',
 'tip3p-1.0.0.offxml',
 'tip3p-1.0.1.offxml',
 'tip3p.offxml',
 'tip3p_fb-1.0.0.offxml',
 'tip3p_fb-1.1.0.offxml',
 'tip3p_fb-1.1.1.offxml',
 'tip3p_fb.offxml',
 'tip4p_ew-1.0.0.offxml',
 'tip4p_ew.offxml',
 'tip4p_fb-1.0.0.offxml',
 'tip4p_fb-1.0.1.offxml',
 'tip4p_fb.offxml',
 'tip5p-1.0.0.offxml',
 'tip5p.offxml']

To use a different force field, such as TIP4P-FB, simply point the ForceField constructor at a different file:

sage_tip4p_fb = ForceField("openff-2.0.0.offxml", "tip4p_fb.offxml")

From here we can create a new Interchange object - only necessary since we’re using a different force field - and use the Interchange.to_gromacs method to write GROMACS files. We’ll also run an energy minimization before writing otu GROMACS files - this could also be done as an extra step with GROMACS, but it’s convient to just call it from the Interchange API here.

out = Interchange.from_smirnoff(
    force_field=sage_tip4p_fb,
    topology=topology,
)

# could take a few seconds
out.minimize()

out.to_gromacs("tip4p")

Now, with GROMACS files written, we can use the bundled run.mdp file to compile and run a GROMACS simulation!

!gmx grompp -p tip4p.top -c tip4p.gro -f run.mdp -o run.tpr -maxwarn 1
!gmx mdrun --deffnm run
                :-) GROMACS - gmx grompp, 2023.4-conda_forge (-:

Executable:   /home/runner/micromamba/envs/openff-docs-examples/bin.AVX2_256/gmx
Data prefix:  /home/runner/micromamba/envs/openff-docs-examples
Working dir:  /home/runner/work/openff-docs/openff-docs/build/cookbook/src/openforcefield/openff-interchange/ligand_in_water
Command line:
  gmx grompp -p tip4p.top -c tip4p.gro -f run.mdp -o run.tpr -maxwarn 1


NOTE 1 [file run.mdp]:
  You have set rlist larger than the interaction cut-off, but you also have
  verlet-buffer-tolerance > 0. Will set rlist using
  verlet-buffer-tolerance.

Setting the LD random seed to -2949122

Generated 300 of the 300 non-bonded parameter combinations
Generating 1-4 interactions: fudge = 0.5

Generated 300 of the 300 1-4 parameter combinations

Excluding 3 bonded neighbours molecule type 'MOL0'

turning H bonds into constraints...

Excluding 3 bonded neighbours molecule type 'MOL1'

turning H bonds into constraints...

Setting gen_seed to -33833221

Velocities were taken from a Maxwell distribution at 300 K

Cleaning up constraints and constant bonded interactions with virtual sites
Analysing residue names:
There are:     1      Other residues
There are:  1000      Water residues
Analysing residues not classified as Protein/DNA/RNA/Water and splitting into groups...
Number of degrees of freedom in T-Coupling group System is 6049.00

The largest distance between excluded atoms is 0.380 nm between atom 2 and 8

Determining Verlet buffer for a tolerance of 0.005 kJ/mol/ps at 300 K

Calculated rlist for 1x1 atom pair-list as 0.914 nm, buffer size 0.014 nm

Set rlist, assuming 4x4 atom pair-list, to 0.900 nm, buffer size 0.000 nm

Note that mdrun will redetermine rlist based on the actual pair-list setup
Calculating fourier grid dimensions for X Y Z
Using a fourier grid of 32x32x32, spacing 0.109 0.109 0.109

Estimate for the relative computational load of the PME mesh part: 0.26

This run will generate roughly 1 Mb of data

There was 1 NOTE

GROMACS reminds you: "It Just Tastes Better" (Burger King)
/home/runner/micromamba/envs/openff-docs-examples/lib/python3.10/pty.py:89: RuntimeWarning: os.fork() was called. os.fork() is incompatible with multithreaded code, and JAX is multithreaded, so this will likely lead to a deadlock.
  pid, fd = os.forkpty()
                :-) GROMACS - gmx mdrun, 2023.4-conda_forge (-:

Executable:   /home/runner/micromamba/envs/openff-docs-examples/bin.AVX2_256/gmx
Data prefix:  /home/runner/micromamba/envs/openff-docs-examples
Working dir:  /home/runner/work/openff-docs/openff-docs/build/cookbook/src/openforcefield/openff-interchange/ligand_in_water
Command line:
  gmx mdrun --deffnm run

The current CPU can measure timings more accurately than the code in
gmx mdrun was configured to use. This might affect your simulation
speed as accurate timings are needed for load-balancing.
Please consider rebuilding gmx mdrun with the GMX_USE_RDTSCP=ON CMake option.
Reading file run.tpr, VERSION 2023.4-conda_forge (single precision)
Changing nstlist from 10 to 80, rlist from 0.9 to 1.011


Update groups can not be used for this system because atoms that are (in)directly constrained together are interdispersed with other atoms

Using 1 MPI thread
Using 4 OpenMP threads 
starting mdrun 'FOO'
5000 steps,      5.0 ps.
Writing final coordinates.
               Core t (s)   Wall t (s)        (%)
       Time:      146.720       36.680      400.0
                 (ns/day)    (hour/ns)
Performance:       11.780        2.037

GROMACS reminds you: "I am at two with nature." (Woody Allen)
# This view might show erroneous bonds across the periodic boundaries, but individual waters should look fine
view_trajectory(mdtraj.load("run.xtc", top="run.gro"))