Vectorized force field representations

Interchange can produce representations of both force field parameters and parametrized systems as NumPy arrays, which might be convenient for any number of computational approaches to force field development. This example explores this feature.

import numpy
from openff.toolkit import ForceField, Molecule
from rich.pretty import pprint

from openff.interchange import Interchange
sage = ForceField("openff-2.0.0.offxml")
molecule = Molecule.from_smiles(r"F\C=C/[F]")
interchange = Interchange.from_smirnoff(sage, [molecule])

pprint(interchange.collections.keys())
dict_keys(['Bonds', 'Constraints', 'Angles', 'ProperTorsions', 'ImproperTorsions', 'vdW', 'Electrostatics'])

An Interchange constructed from a SMIRNOFF force field contains collections for several different types of parameters. For simplicity, let’s look at the bond collection.

collection = interchange.collections["Bonds"]

A Collection stores force field parameters and information about how they relate to the Interchange’s topology. In addition, they include some handy methods for transforming these to vectorized representations.

pprint(collection.get_force_field_parameters())

#      k (kcal/mol/Å),   length (Å)
#
# 0: [#6:1]-[#9:2]
# 1: [#6X3:1]=[#6X3:2]
# 2: [#6X3:1]-[#1:2]
array([[808.77100296,   1.35292621],
[798.31859066,   1.37168856],
[794.50915792,   1.0853585 ]])

Collection.get_force_field_parameters() returns an array with one row per unique force field parameter used and one colum per number in each parameter. For this molecule, that means three rows (C-F, C#C, and C-H chemistries) and two columns (k and length). This matrix scales with the number of unique force field parameters used so it will not generally scale with system size.

pprint(collection.get_system_parameters())

#      k (kcal/mol/Å),   length (Å)
#
# bond0: (0, 1)
# bond1: (1, 2)
# bond2: (1, 4)
# bond3: (2, 3)
# bond4: (2, 5)
array([[808.77100296,   1.35292621],
[798.31859066,   1.37168856],
[794.50915792,   1.0853585 ],
[808.77100296,   1.35292621],
[794.50915792,   1.0853585 ]])

Collection.get_system_field_parameters() returns a similar array but with one row per bond in the topology, including duplicates. Since there are two C-H and two C-F bonds, those parameters each appear twice. This matrix scales with the size of the system (in this case, number of bonds) so it may be large for large systems.

pprint(collection.get_param_matrix())

#      k0  l0   k1  l1   k2  l2
# bond0: k
# bond0: l
# bond1: k
# bond1: l
# bond2: k
# bond2: l
# bond3: k
# bond3: l
# bond4: k
# bond4: l
Array([[1., 0., 0., 0., 0., 0.],
[0., 1., 0., 0., 0., 0.],
[0., 0., 1., 0., 0., 0.],
[0., 0., 0., 1., 0., 0.],
[0., 0., 0., 0., 1., 0.],
[0., 0., 0., 0., 0., 1.],
[1., 0., 0., 0., 0., 0.],
[0., 1., 0., 0., 0., 0.],
[0., 0., 0., 0., 1., 0.],
[0., 0., 0., 0., 0., 1.]], dtype=float32)

It may be useful to encode the relationships between force field parameters and where in the topology they’re applied. This is handled by collection.get_param_matrix(), which returns a spare matrix. Each column corresponds to a force field parameter and each row corresponds to a bond that could be associated with each, each dimension being a flattened representation of the above matrices. A 1 indicates that a parameter is applied to that bond, a 0 indicates that it is not. For example, the 1 at [0, 0] indicates that the first bond gets assigned the first k value. The 1 at [7, 1] indicates that the fourth bond gets assigned the first length.

Conveniently, the dot product of this matrix with a flattened view of the force field parameters is equal to the view of the system parameters we saw above.

dotted = numpy.dot(
    interchange["Bonds"].get_param_matrix(),
    interchange["Bonds"].get_force_field_parameters().flatten(),
).reshape((-1, 2))

assert numpy.allclose(dotted, collection.get_system_parameters())

pprint(dotted)
array([[808.77100296,   1.35292621],
[798.31859066,   1.37168856],
[794.50915792,   1.0853585 ],
[808.77100296,   1.35292621],
[794.50915792,   1.0853585 ]])