Mixing Sage and Amber: Full parametrization with OpenMM Force Fields

This example applies SMIRNOFF-format parameters to a BRD4 inhibitor from the living review on binding free energy benchmark systems by Mobley and Gilson. The BRD4 system comes from the accompanying GitHub repository.

This example uses the openmmforcefields package to add a residue template generator to the openmm.app.ForceField to allow Open Force Field small molecule parameters to be generated on the fly when parameterizing a system containing protein, small molecules, ions, and water. This example is meant to illustrate how to apply parameters to a single ligand, but it’s also easy to process many ligands.

Loading the already-parametrized system

# Retrieve protein and ligand files for BRD4 and a docked inhibitor from the benchmark systems GitHub repository
# https://github.com/MobleyLab/benchmarksets
import requests

repo_url = (
    "https://raw.githubusercontent.com/MobleyLab/benchmarksets/master/input_files/"
)
sources = {
    "system.pdb": repo_url
    + "BRD4/prmtop-coords/BRD4-1.pdb",  # complete system (protein+ligand+solvent+ions)
    "ligand.sdf": repo_url + "BRD4/sdf/ligand-1.sdf",  # ligand molecular identity
}
for filename, url in sources.items():
    r = requests.get(url)
    open(filename, "w").write(r.text)
from openmm.app import PDBFile

# Read complete system in OpenMM PDBFile
system_pdb = "system.pdb"
pdbfile = PDBFile(system_pdb)


# We have to remove H1-H2 bonds in waters if they are present
# AMBER's 'ambpdb -conect' adds these H1-H2 bonds, so we must remove them
def fix_water_bonds(topology):
    # TODO: We should create a simpler way to do this within OpenMM's Topology object
    n_bonds_before = sum(1 for bond in topology.bonds())
    from openmm.app.element import hydrogen

    bonds_to_delete = [
        index
        for (index, bond) in enumerate(topology.bonds())
        if ((bond.atom1.element == hydrogen) and (bond.atom2.element == hydrogen))
    ]
    bonds_to_delete.reverse()
    for index in bonds_to_delete:
        topology._bonds.pop(index)
    n_bonds_after = sum(1 for bond in topology.bonds())
    print(f"{n_bonds_before - n_bonds_after} H-H bonds removed")


fix_water_bonds(pdbfile.topology)
11000 H-H bonds removed
from openmm import unit as openmm_unit

from openff.toolkit import Molecule

# Load the definition of the small molecule in the system from an SDF file
ligand = Molecule.from_file("ligand.sdf")
from openmm import app
from openmmforcefields.generators import SMIRNOFFTemplateGenerator

# Create an OpenMM ForceField object with AMBER ff14SB and TIP3P with compatible ions
forcefield = app.ForceField(
    "amber/protein.ff14SB.xml",
    "amber/tip3p_standard.xml",
    "amber/tip3p_HFE_multivalent.xml",
)

# Use the SMIRNOFF residue template generator to load the openff-2.1.0 ("Sage") that knows about the ligand
smirnoff = SMIRNOFFTemplateGenerator(forcefield="openff-2.1.0", molecules=ligand)

# Register the SMIRNOFF template generator
forcefield.registerTemplateGenerator(smirnoff.generator)
# Create a parameterized OpenMM System from the PDB topology without bond constraints so we can convert to other packages
system = forcefield.createSystem(
    pdbfile.topology,
    nonbondedMethod=app.PME,
    constraints=None,
    rigidWater=False,
    removeCMMotion=False,
)

Create a ParmEd Structure object to export to other formats

import parmed

# Create the complex Structure
complex_structure = parmed.openmm.load_topology(pdbfile.topology, system=system)

# Copy over the original coordinates and box vectors
complex_structure.coordinates = pdbfile.positions
complex_structure.box_vectors = pdbfile.topology.getPeriodicBoxVectors()
Warning: importing 'simtk.openmm' is deprecated.  Import 'openmm' instead.
# Save the final PDB file to make sure the conversion worked
complex_structure.save("new-system.pdb", overwrite=True)

Export to AMBER and GROMACS formats

We started off in AMBER format, and presumably may want to continue in that format – so let’s write out to AMBER and GROMACS format:

# Export the Structure to AMBER files
complex_structure.save("complex-openff.prmtop", overwrite=True)
complex_structure.save("complex-openff.inpcrd", overwrite=True)

# Export the Structure to Gromacs files
complex_structure.save("complex-openff.gro", overwrite=True)
complex_structure.save("complex-openff.top", overwrite=True)

That should conclude our work in this example. However, perhaps we should just doublecheck by ensuring we can actually run some dynamics on the combined system without any trouble.

As a test, run some dynamics on the combined system

First, we create an OpenMM system, as we’ve done in other examples here. We can do this, in this case, using ParmEd’s built-in createSystem functionality already attached to the combined Structure. We ask for a reasonable cutoff, constrained hydrogen bonds (note that this keyword argument overrides the fact that we use the unconstrained force field above; the ligand (and all other molecules in the system) will have covalent bonds to hydrogen constrainted), PME, and rigid water:

from openmm import LangevinIntegrator, app
from parmed.openmm import NetCDFReporter

system = complex_structure.createSystem(
    nonbondedMethod=app.PME,
    nonbondedCutoff=9 * openmm_unit.angstrom,
    constraints=app.HBonds,
    rigidWater=True,
)

Next we’ll set up the integrator, a reporter to write the trajectory, pick the timestep, and then go on to minimize the energy and run a very short amount of dynamics after setting the temperature to 300K:

integrator = LangevinIntegrator(
    300 * openmm_unit.kelvin,
    1 / openmm_unit.picosecond,
    0.001 * openmm_unit.picoseconds,
)
simulation = app.Simulation(complex_structure.topology, system, integrator)

# Depending on where your system came from, you may want to
# add something like (30, 30, 30)*Angstrom to center the protein
# (no functional effect, just visualizes better)
# simulation.context.setPositions(complex_structure.positions + np.array([30, 30, 30])*unit.angstrom)

simulation.context.setPositions(complex_structure.positions)

nc_reporter = NetCDFReporter("trajectory.nc", 10)
simulation.reporters.append(nc_reporter)
# Show the initial potential energy
potential_energy = simulation.context.getState(getEnergy=True).getPotentialEnergy()
print(potential_energy)
-397312.59054250474 kJ/mol
# Minimize the energy
simulation.minimizeEnergy()
minimized_coords = simulation.context.getState(getPositions=True).getPositions()
# Run some dynamics
simulation.context.setVelocitiesToTemperature(300 * openmm_unit.kelvin)
simulation.step(1000)
/home/runner/micromamba/envs/openff-docs-examples/lib/python3.10/site-packages/parmed/amber/netcdffiles.py:409: UserWarning: Could not find netCDF4 module. Falling back on scipy implementation, which can significantly slow down simulations if used as a reporter
  warnings.warn('Could not find netCDF4 module. Falling back on '
# Show the final potential energy
potential_energy = simulation.context.getState(getEnergy=True).getPotentialEnergy()
print(potential_energy)
-502725.1744504621 kJ/mol