Molecule conversion to other packages
Molecule conversion spec
Hierarchy data (chains and residues)
Note that the representations of hierarchy data (namely, chains and residues) in different software packages have different expectations. For example, in OpenMM, the atoms of a single residue must be contiguous. In RDKit, it is permissible to have an atom with no PDB residue information, whereas in OpenEye the fields must be populated. In most packages, it is expected that any atom with a residue name defined will also have a residue number.
The OpenFF toolkit does not have these restrictions, and records hierarchy metadata almost entirely for interoperability and user convenience reasons. No code paths in the OpenFF Toolkit consider hierarchy metadata during parameter assignment. While users should expect hierarchy metadata to be correctly handled in simple loading operations and export to other packages, modifying hierarchy metadata in the OpenFF Toolkit may lead to unexpected incompatibilities with other packages/representations.
Another consequence of this difference in representations is that hierarchy iterators (like
Molecule.residues
and Topology.chains
) are not accessed during conversion to other packages.
Only the underlying hierarchy metadata from the atoms is transferred, and the OpenFF Toolkit makes
no attempt to match the behavior of iterators in other packages.
In cases where only some common metadata fields are set (but not others), the following calls happen during conversion TO other packages
RDKit - We run
rdatom.SetPDBMetadata
if ANY ofresidue_name
,residue_number
, orchain_id
are set on the OpenFF Atom. This means that, in cases where only one or two of those fields are filled, the others will be set to be the default values in RDKitOpenEye - We always run
oechem.OEAtomSetResidue(oe_atom, res)
. If the metadata values are not defined in OpenFF, we assign default values (residue_name="UNL"
,residue_number=1
,chain_id=" "
)OpenMM - OpenMM requires identification of chains and residues when constructing a Topology, and residues must be entirely contained within their parent chain. Our Topology.to_openmm method creates at least one chain for each OpenFF Molecule. Contiguously-indexed atoms with the same
chain_id
value within a OpenFF Molecule will be assigned to a single OpenMM chain. Continuously indexed atoms with the sameresidue_name
,residue_number
, andchain_id
will be assigned to the same OpenMM residue.