Download Notebook View in GitHub Open in Google Colab
Solvate and equilibrate a ligand in a box of water
import time
import mdtraj
import nglview
import numpy
import openmm
import openmm.app
import openmm.unit
from openff.toolkit import ForceField, Molecule, unit
from rich.pretty import pprint
from openff.interchange import Interchange
from openff.interchange.components._packmol import RHOMBIC_DODECAHEDRON, pack_box
Construct the topology
In this example we’ll construct a topology consisting of one ligand in a rhombic dodecahedral box with 2100 water molecules. We’ll use a mapped SMILES when creating Molecule
objects to ensure the atom ordering matches. (Atom ordering is not strictly a part of SMILES and therefore liable to be changed with updates to RDKit.)
This can be extended or modified by i.e.
Replacing this sample ligand with a different ligand of interest - substitute out the ligand SMILES
Using a different number of water molecules - substitute out the
2100
used belowAdding ions or co-solvents into the box - add more
Molecule
object as desired
ligand = Molecule.from_mapped_smiles(
"[H:7][C@:6]1([C:13](=[C:11]([C:9](=[O:10])[O:8]1)[O:12][H:19])[O:14][H:20])[C@:3]([H:4])([C:2]([H:16])([H:17])[O:1][H:15])[O:5][H:18]"
)
water = Molecule.from_mapped_smiles("[H:2][O:1][H:3]")
# Naming the residue is not needed to parameterize the system or run the simulation, but makes visualization easier
for atom in water.atoms:
atom.metadata["residue_name"] = "HOH"
There are a few ways to convert the information in this trajectory to an Openff Topology
object. Since we already know how many of which molecules we want, we’ll use a PACKMOL wrapper shipped with Interchange. The Topology
object returned by pack_box
contains the ligand, 1000 copies of water, the box vectors we asked for, and the positions generated by PACKMOL.
topology = pack_box(
molecules=[ligand, water],
number_of_copies=[1, 1000],
box_vectors=3.5 * RHOMBIC_DODECAHEDRON * unit.nanometer,
)
topology.n_molecules, topology.box_vectors, topology.get_positions().shape
(1001,
array([[3.5 , 0. , 0. ],
[0. , 3.5 , 0. ],
[1.75 , 1.75 , 2.47487373]]) <Unit('nanometer')>,
(3020, 3))
The “Sage” force field line (version 2.x.x) includes TIP3P parameters for water, so we don’t need to use multiple force fields to parametrize this topology. (One could use a different water model provided they accept the risks of using a different one than the force field was optimized with.)
Note that the “Parsley” (version 1.x.x) line did not include TIP3P parameters, so loading in an extra force field was required.
sage = ForceField("openff-2.0.0.offxml")
From here, we can create an Interchange
object, which stores the results of applying the force field to the topology. Since the Topology
object contained positions and box vectors, we don’t need to set them again - they’re already set on the Interchange
object!
interchange: Interchange = Interchange.from_smirnoff(
force_field=sage, topology=topology
)
interchange.topology.n_atoms, interchange.box, interchange.positions.shape
(3020,
array([[3.5 , 0. , 0. ],
[0. , 3.5 , 0. ],
[1.75 , 1.75 , 2.47487373]]) <Unit('nanometer')>,
(3020, 3))
Now, we can prepare everything that OpenMM needs to run and report a brief equilibration simulation:
A
Simulation
object containingAn
openmm.System
A topology in OpenMM’s object model (
openmm.app.Topology
)Positions and box vectors in OpenMM’s unit solution (
openmm.unit.Quantity
)
A barostat, since we want to use NPT dynamics to relax the box size toward equilibrium
An integrator
Reporters for the trajectory and simulation data
For convenience, let’s wrap some boilerplate code into a function that can be called again later with different inputs.
def create_simulation(
interchange: Interchange,
pdb_stride: int = 500,
trajectory_name: str = "trajectory.pdb",
) -> openmm.app.Simulation:
integrator = openmm.LangevinIntegrator(
300 * openmm.unit.kelvin,
1 / openmm.unit.picosecond,
1 * openmm.unit.femtoseconds,
)
barostat = openmm.MonteCarloBarostat(
1.0 * openmm.unit.bar,
293.15 * openmm.unit.kelvin,
25,
)
simulation = interchange.to_openmm_simulation(
combine_nonbonded_forces=True,
integrator=integrator,
additional_forces=[barostat],
)
# https://github.com/openmm/openmm/issues/3736#issuecomment-1217250635
simulation.minimizeEnergy()
simulation.context.setVelocitiesToTemperature(300 * openmm.unit.kelvin)
simulation.context.computeVirtualSites()
pdb_reporter = openmm.app.PDBReporter(trajectory_name, pdb_stride)
state_data_reporter = openmm.app.StateDataReporter(
"data.csv",
10,
step=True,
potentialEnergy=True,
temperature=True,
density=True,
)
simulation.reporters.append(pdb_reporter)
simulation.reporters.append(state_data_reporter)
return simulation
simulation = create_simulation(interchange)
Finally, we can run this simulation. This should take approximately 10-20 seconds on a laptop or small workstation.
Again, let’s wrap this up into a function to avoid copy-pasting code.
def run_simulation(simulation: openmm.app.Simulation, n_steps: int = 5000):
print("Starting simulation")
start_time = time.process_time()
print("Step, volume (nm^3)")
for step in range(n_steps):
simulation.step(1)
if step % 500 == 0:
box_vectors = simulation.context.getState().getPeriodicBoxVectors()
print(step, numpy.linalg.det(box_vectors._value).round(3))
end_time = time.process_time()
print(f"Elapsed time: {(end_time - start_time):.2f} seconds")
We should observe (small) fluctuations in the density of the simulation, which is running under the NPT ensemble. If the initial density was a good guess, there shouldn’t be significant changes from the initial value around \(30 nm^3\).
run_simulation(simulation)
Starting simulation
Step, volume (nm^3)
0 30.317
500 30.766
1000 30.924
1500 30.734
2000 30.565
2500 30.406
3000 30.212
3500 30.151
4000 30.268
4500 30.743
Elapsed time: 70.36 seconds
Appendix A: visualizing the trajectory
If NGLView is installed, we can use it and MDTraj to load and visualize the PDB trajectory:
view = nglview.show_mdtraj(mdtraj.load("trajectory.pdb"))
view.add_representation("licorice", selection="water")
view
Appendix B: Using the OPC water model
The openff-forcefields
package now includes some common water models. After release 2023.05.1, this includes OPC, a 4-site water model from the Amber community. We can load it by simply passing it to the ForceField
constructor after Sage. (When loading multiple force fields like this, parameters in each section are appended in order of the files.) We can inspect the virtual site parameter in our new in-memory force field.
force_field_opc = ForceField("openff_unconstrained-2.0.0.offxml", "opc.offxml")
pprint(force_field_opc["VirtualSites"].to_dict())
{ │ 'VirtualSite': [ │ │ { │ │ │ 'smirks': '[#1:2]-[#8X2H2+0:1]-[#1:3]', │ │ │ 'epsilon': <Quantity(0.0, 'kilocalorie_per_mole')>, │ │ │ 'type': 'DivalentLonePair', │ │ │ 'match': 'once', │ │ │ 'distance': <Quantity(-0.15939833, 'angstrom')>, │ │ │ 'outOfPlaneAngle': <Quantity(0.0, 'degree')>, │ │ │ 'inPlaneAngle': None, │ │ │ 'charge_increment1': <Quantity(0.0, 'elementary_charge')>, │ │ │ 'charge_increment2': <Quantity(0.679142, 'elementary_charge')>, │ │ │ 'charge_increment3': <Quantity(0.679142, 'elementary_charge')>, │ │ │ 'rmin_half': <Quantity(1.0, 'angstrom')>, │ │ │ 'name': 'EP' │ │ } │ ], │ 'version': '0.3', │ 'exclusion_policy': 'parents' }
We can also get a rough visualization of a single water molecule including the virtual site.
water.generate_conformers(n_conformers=1)
view = force_field_opc.create_interchange(water.to_topology()).visualize(
include_virtual_sites=True
)
view.clear_representations()
view.add_representation("ball+stick", aspectRatio=1.5)
view
Since we want to use a different force field with the same chemical topology - a ligand in a box of water - we can re-use the same Topology
object we prepared earlier, re-using the same functions we defined above!
interchange_opc = Interchange.from_smirnoff(
force_field=force_field_opc,
topology=topology,
)
interchange_opc.to_openmm()
<openmm.openmm.System; proxy of <Swig Object of type 'OpenMM::System *' at 0x7fbe5f7c0120> >
simulation = create_simulation(interchange_opc, trajectory_name="trajectory_opc.pdb")
run_simulation(simulation)
Starting simulation
Step, volume (nm^3)
0 30.317
500 30.819
1000 30.964
1500 30.71
2000 30.703
2500 30.428
3000 30.535
3500 30.571
4000 30.222
4500 30.472
Elapsed time: 106.72 seconds
def view_trajectory(
trajectory: mdtraj.Trajectory,
show_virtual_sites: bool = False,
) -> nglview.NGLWidget:
if show_virtual_sites:
view = nglview.show_mdtraj(trajectory)
else:
import numpy
atom_indices = numpy.where(
[a.element.mass > 0.0 for a in trajectory[0].topology.atoms]
)[0]
view = nglview.show_mdtraj(trajectory.atom_slice(atom_indices))
view.add_representation("licorice", selection="water")
return view
# NGLview likes to infer bonds between virtual sites in ways that look messy,
# but you can pass `show_virtual_sites=True` just to ensure they're there
view_trajectory(mdtraj.load("trajectory_opc.pdb"))
Appendix C: Using the TIP4P-FB water model with GROMACS
The openff-forcefields
package includes several other common water models. The toolkit provides an API for listing the force fields found in your current installation. We can filter out some force field lines and get a picture of the water models available to us. At the time of writing, TIP4P-FB and OPC3 are also provided, but future releases will likely add more. (See the repo’s documentation for why multiple copies of each force field are provided with slightly different versions.)
from openff.toolkit.typing.engines.smirnoff.forcefield import get_available_force_fields
sorted(
filter(
lambda x: not x.startswith(("openff", "smirnoff99", "ff14sb")),
get_available_force_fields(),
)
)
['opc-1.0.0.offxml',
'opc-1.0.1.offxml',
'opc-1.0.2.offxml',
'opc.offxml',
'opc3-1.0.0.offxml',
'opc3-1.0.1.offxml',
'opc3.offxml',
'spce-1.0.0.offxml',
'spce.offxml',
'tip3p-1.0.0.offxml',
'tip3p-1.0.1.offxml',
'tip3p.offxml',
'tip3p_fb-1.0.0.offxml',
'tip3p_fb-1.1.0.offxml',
'tip3p_fb-1.1.1.offxml',
'tip3p_fb.offxml',
'tip4p_ew-1.0.0.offxml',
'tip4p_ew.offxml',
'tip4p_fb-1.0.0.offxml',
'tip4p_fb-1.0.1.offxml',
'tip4p_fb.offxml',
'tip5p-1.0.0.offxml',
'tip5p.offxml']
To use a different force field, such as TIP4P-FB, simply point the ForceField
constructor at a different file:
sage_tip4p_fb = ForceField("openff-2.0.0.offxml", "tip4p_fb.offxml")
From here we can create a new Interchange
object - only necessary since we’re using a different force field - and use the Interchange.to_gromacs
method to write GROMACS files. We’ll also run an energy minimization before writing otu GROMACS files - this could also be done as an extra step with GROMACS, but it’s convient to just call it from the Interchange API here.
out = Interchange.from_smirnoff(
force_field=sage_tip4p_fb,
topology=topology,
)
# could take a few seconds
out.minimize()
out.to_gromacs("tip4p")
Now, with GROMACS files written, we can use the bundled run.mdp
file to compile and run a GROMACS simulation!
!gmx grompp -p tip4p.top -c tip4p.gro -f run.mdp -o run.tpr -maxwarn 1
!gmx mdrun --deffnm run
:-) GROMACS - gmx grompp, 2024.3-conda_forge (-:
Executable: /home/runner/micromamba/envs/openff-docs-examples/bin.AVX2_256/gmx
Data prefix: /home/runner/micromamba/envs/openff-docs-examples
Working dir: /home/runner/work/openff-docs/openff-docs/build/cookbook/src/openforcefield/openff-interchange/ligand_in_water
Command line:
gmx grompp -p tip4p.top -c tip4p.gro -f run.mdp -o run.tpr -maxwarn 1
NOTE 1 [file run.mdp]:
You have set rlist larger than the interaction cut-off, but you also have
verlet-buffer-tolerance > 0. Will set rlist using
verlet-buffer-tolerance.
Setting the LD random seed to -1895911620
Generated 300 of the 300 non-bonded parameter combinations
Generating 1-4 interactions: fudge = 0.5
Generated 300 of the 300 1-4 parameter combinations
Excluding 3 bonded neighbours molecule type 'MOL0'
turning H bonds into constraints...
Excluding 3 bonded neighbours molecule type 'MOL1'
turning H bonds into constraints...
Setting gen_seed to -1080101570
Velocities were taken from a Maxwell distribution at 300 K
Cleaning up constraints and constant bonded interactions with virtual sites
Analysing residue names:
There are: 1 Other residues
There are: 1000 Water residues
Analysing residues not classified as Protein/DNA/RNA/Water and splitting into groups...
Number of degrees of freedom in T-Coupling group System is 6049.00
The largest distance between excluded atoms is 0.380 nm between atom 2 and 8
Determining Verlet buffer for a tolerance of 0.005 kJ/mol/ps at 300 K
Calculated rlist for 1x1 atom pair-list as 0.914 nm, buffer size 0.014 nm
Set rlist, assuming 4x4 atom pair-list, to 0.900 nm, buffer size 0.000 nm
Note that mdrun will redetermine rlist based on the actual pair-list setup
Calculating fourier grid dimensions for X Y Z
Using a fourier grid of 32x32x32, spacing 0.109 0.109 0.109
Estimate for the relative computational load of the PME mesh part: 0.26
This run will generate roughly 1 Mb of data
There was 1 NOTE
GROMACS reminds you: "Alas, You're Welcome" (Prof. Dumbledore in Potter Puppet Pals)
:-) GROMACS - gmx mdrun, 2024.3-conda_forge (-:
Executable: /home/runner/micromamba/envs/openff-docs-examples/bin.AVX2_256/gmx
Data prefix: /home/runner/micromamba/envs/openff-docs-examples
Working dir: /home/runner/work/openff-docs/openff-docs/build/cookbook/src/openforcefield/openff-interchange/ligand_in_water
Command line:
gmx mdrun --deffnm run
The current CPU can measure timings more accurately than the code in
gmx mdrun was configured to use. This might affect your simulation
speed as accurate timings are needed for load-balancing.
Please consider rebuilding gmx mdrun with the GMX_USE_RDTSCP=ON CMake option.
Reading file run.tpr, VERSION 2024.3-conda_forge (single precision)
Changing nstlist from 10 to 80, rlist from 0.9 to 1.01
Update groups can not be used for this system because atoms that are (in)directly constrained together are interdispersed with other atoms
Using 1 MPI thread
Using 4 OpenMP threads
starting mdrun 'FOO'
5000 steps, 5.0 ps.
Writing final coordinates.
Core t (s) Wall t (s) (%)
Time: 83.967 20.992 400.0
(ns/day) (hour/ns)
Performance: 20.584 1.166
GROMACS reminds you: "It's hard to ignore 12 orders of magnitude" (John Urbanic)
# This view might show erroneous bonds across the periodic boundaries, but individual waters should look fine
view_trajectory(mdtraj.load("run.xtc", top="run.gro"))