FrozenMolecule
- class openff.toolkit.topology.FrozenMolecule(other=None, file_format: str | None = None, toolkit_registry: ToolkitRegistry | ToolkitWrapper = GLOBAL_TOOLKIT_REGISTRY, allow_undefined_stereo: bool = False)[source]
Immutable chemical representation of a molecule, such as a small molecule or biopolymer.
Examples
Create a molecule from a sdf file
>>> from openff.toolkit.utils import get_data_file_path >>> sdf_filepath = get_data_file_path('molecules/ethanol.sdf') >>> molecule = FrozenMolecule.from_file(sdf_filepath)
Convert to OpenEye OEMol object
>>> oemol = molecule.to_openeye()
Create a molecule from an OpenEye molecule
>>> molecule = FrozenMolecule.from_openeye(oemol)
Convert to RDKit Mol object
>>> rdmol = molecule.to_rdkit()
Create a molecule from an RDKit molecule
>>> molecule = FrozenMolecule.from_rdkit(rdmol)
Create a molecule from IUPAC name (requires the OpenEye toolkit)
>>> molecule = FrozenMolecule.from_iupac('imatinib')
Create a molecule from SMILES
>>> molecule = FrozenMolecule.from_smiles('Cc1ccccc1')
Warning
This API is experimental and subject to change.
- __init__(other=None, file_format: str | None = None, toolkit_registry: ToolkitRegistry | ToolkitWrapper = GLOBAL_TOOLKIT_REGISTRY, allow_undefined_stereo: bool = False)[source]
Create a new FrozenMolecule object
- Parameters:
other –
If specified, attempt to construct a copy of the molecule from the specified object. This can be any one of the following:
file_format – If providing a file-like object, you must specify the format of the data. If providing a file, the file format will attempt to be guessed from the suffix.
toolkit_registry – A registry to use for I/O operations
allow_undefined_stereo – If loaded from a file and
False
, raises an exception if undefined stereochemistry is detected during the molecule’s construction.
Examples
Create an empty molecule:
>>> empty_molecule = FrozenMolecule()
Create a molecule from a file that can be used to construct a molecule, using either a filename or file-like object:
>>> from openff.toolkit.utils import get_data_file_path >>> sdf_filepath = get_data_file_path('molecules/ethanol.sdf') >>> molecule = FrozenMolecule(sdf_filepath) >>> molecule = FrozenMolecule(open(sdf_filepath, 'r'), file_format='sdf')
>>> import gzip >>> mol2_gz_filepath = get_data_file_path('molecules/toluene.mol2.gz') >>> molecule = FrozenMolecule(gzip.GzipFile(mol2_gz_filepath, 'r'), file_format='mol2')
Create a molecule from another molecule:
>>> molecule_copy = FrozenMolecule(molecule)
Convert to OpenEye OEMol object
>>> oemol = molecule.to_openeye()
Create a molecule from an OpenEye molecule:
>>> molecule = FrozenMolecule(oemol)
Convert to RDKit Mol object
>>> rdmol = molecule.to_rdkit()
Create a molecule from an RDKit molecule:
>>> molecule = FrozenMolecule(rdmol)
Convert the molecule into a dictionary and back again:
>>> serialized_molecule = molecule.to_dict() >>> molecule_copy = FrozenMolecule(serialized_molecule)
Methods
__init__
([other, file_format, ...])Create a new FrozenMolecule object
Adds
chain
andresidue
hierarchy schemes.add_hierarchy_scheme
(uniqueness_criteria, ...)Use the molecule's metadata to facilitate iteration over its atoms.
apply_elf_conformer_selection
([percentage, ...])Select a set of diverse conformers from the molecule's conformers with ELF.
are_isomorphic
(mol1, mol2[, ...])Determine if
mol1
is isomorphic tomol2
.Update and store list of bond orders this molecule.
assign_partial_charges
(partial_charge_method)Calculate partial atomic charges and store them in the molecule.
atom
(index)Get the atom with the specified index.
atom_index
(atom)Returns the index of the given atom in this molecule
bond
(index)Get the bond with the specified index.
canonical_order_atoms
([toolkit_registry])Produce a copy of the molecule with the atoms reordered canonically.
chemical_environment_matches
(query[, ...])Find matches in the molecule for a SMARTS string
delete_hierarchy_scheme
(iter_name)Remove an existing
HierarchyScheme
specified by its iterator name.enumerate_protomers
([max_states])Enumerate the formal charges of a molecule to generate different protomers.
enumerate_stereoisomers
([undefined_only, ...])Enumerate the stereocenters and bonds of the current molecule.
enumerate_tautomers
([max_states, ...])Enumerate the possible tautomers of the current molecule
find_rotatable_bonds
([...])Find all bonds classed as rotatable ignoring any matched to the
ignore_functional_groups
list.from_bson
(serialized)Instantiate an object from a BSON serialized representation.
from_dict
(molecule_dict)Create a new Molecule from a dictionary representation
from_file
(file_path[, file_format, ...])Create one or more molecules from a file
from_inchi
(inchi[, allow_undefined_stereo, ...])Construct a Molecule from a InChI representation
from_iupac
(iupac_name[, toolkit_registry, ...])Generate a molecule from IUPAC or common name
from_json
(serialized)Instantiate an object from a JSON serialized representation.
from_mapped_smiles
(mapped_smiles[, ...])Create a
Molecule
from a SMILES string, ordering atoms from mappingsfrom_messagepack
(serialized)Instantiate an object from a MessagePack serialized representation.
from_openeye
(oemol[, allow_undefined_stereo])Create a
Molecule
from an OpenEye molecule.from_pdb_and_smiles
(file_path, smiles[, ...])Create a Molecule from a pdb file and a SMILES string using RDKit.
from_pickle
(serialized)Instantiate an object from a pickle serialized representation.
from_polymer_pdb
(file_path[, ...])Loads a polymer from a PDB file.
from_qcschema
(qca_object[, ...])Create a Molecule from a QCArchive molecule record or dataset entry based on attached cmiles information.
from_rdkit
(rdmol[, allow_undefined_stereo, ...])Create a Molecule from an RDKit molecule.
from_smiles
(smiles[, ...])Construct a
Molecule
from a SMILES representationfrom_toml
(serialized)Instantiate an object from a TOML serialized representation.
from_topology
(topology)Return a Molecule representation of an OpenFF Topology containing a single Molecule object.
from_xml
(serialized)Instantiate an object from an XML serialized representation.
from_yaml
(serialized)Instantiate from a YAML serialized representation.
generate_conformers
([toolkit_registry, ...])Generate conformers for this molecule using an underlying toolkit.
generate_unique_atom_names
([suffix])Generate unique atom names from the element symbol and count.
get_available_charge_methods
([toolkit_registry])Get the charge methods supported by each wrapper in the specified registry.
get_bond_between
(i, j)Returns the bond between two atoms
is_isomorphic_with
(other, **kwargs)Check if the molecule is isomorphic with the other molecule which can be an openff.toolkit.topology.Molecule or nx.Graph().
nth_degree_neighbors
(n_degrees)Return canonicalized pairs of atoms whose shortest separation is exactly n bonds.
Compute an ordered hash of the atoms and bonds in the molecule
remap
(mapping_dict[, current_to_new, partial])Reorder the atoms in the molecule according to the given mapping dict.
strip_atom_stereochemistry
(smarts[, ...])Delete stereochemistry information for certain atoms, if it is present.
to_bson
()Return a BSON serialized representation.
to_dict
()Return a dictionary representation of the molecule.
to_file
(file_path, file_format[, ...])Write the current molecule to a file or file-like object
Generate the Hill formula of this molecule.
to_inchi
([fixed_hydrogens, toolkit_registry])Create an InChI string for the molecule using the requested toolkit backend.
to_inchikey
([fixed_hydrogens, toolkit_registry])Create an InChIKey for the molecule using the requested toolkit backend.
to_iupac
([toolkit_registry])Generate IUPAC name from Molecule
to_json
([indent])Return a JSON serialized representation.
Return a MessagePack representation.
Generate a NetworkX undirected graph from the molecule.
to_openeye
([toolkit_registry, aromaticity_model])Create an OpenEye molecule
Return a pickle serialized representation.
to_qcschema
([multiplicity, conformer, extras])Create a QCElemental Molecule.
to_rdkit
([aromaticity_model, toolkit_registry])Create an RDKit molecule
to_smiles
([isomeric, explicit_hydrogens, ...])Return a canonical isomeric SMILES representation of the current molecule.
to_toml
()Return a TOML serialized representation.
Return an OpenFF Topology representation containing one copy of this molecule
to_xml
([indent])Return an XML representation.
to_yaml
()Return a YAML serialized representation.
update_hierarchy_schemes
([iter_names])Infer a hierarchy from atom metadata according to the existing hierarchy schemes.
Attributes
Iterate over all impropers with trivalent centers, reporting the central atom first.
Get an iterator over all i-j-k angles.
Iterate over all Atom objects in the molecule.
Iterate over all Bond objects in the molecule.
Returns the list of conformers for this molecule.
True
if the molecule has unique atom names,False
otherwise.The hierarchy schemes available on the molecule.
Get the Hill formula of the molecule
Iterate over all improper torsions in the molecule.
Number of angles in the molecule.
The number of Atom objects.
The number of Bond objects in the molecule.
The number of conformers for this molecule.
Number of possible improper torsions in the molecule.
Number of proper torsions in the molecule.
The name (or title) of the molecule
Returns the partial charges (if present) on the molecule.
Iterate over all proper torsions in the molecule
The properties dictionary of the molecule
Iterate over all impropers with trivalent centers, reporting the central atom second.
Get an iterator over all i-j-k-l torsions.
Return the total charge on the molecule
- property has_unique_atom_names: bool
True
if the molecule has unique atom names,False
otherwise.
- generate_unique_atom_names(suffix: str = 'x')[source]
Generate unique atom names from the element symbol and count.
Names are generated from the elemental symbol and the number of times that element is found in the hierarchy element. The character ‘x’ is appended to these generated names to reduce the odds that they clash with an atom name or type imported from another source. For example, generated atom names might begin ‘C1x’, ‘H1x’, ‘O1x’, ‘C2x’, etc.
- Parameters:
suffix – Optional suffix added to atom names. Assists in denoting molecule types
- strip_atom_stereochemistry(smarts: str, toolkit_registry: ToolkitRegistry | ToolkitWrapper = GLOBAL_TOOLKIT_REGISTRY)[source]
Delete stereochemistry information for certain atoms, if it is present. This method can be used to “normalize” molecules imported from different cheminformatics toolkits, which differ in which atom centers are considered stereogenic.
- Parameters:
smarts (str) – Tagged SMARTS with a single atom with index 1. Any matches for this atom will have any assigned stereocheistry information removed.
toolkit_registry –
ToolkitRegistry
orToolkitWrapper
to use for I/O operations
- to_dict() dict [source]
Return a dictionary representation of the molecule.
- Returns:
molecule_dict – A dictionary representation of the molecule.
name (str): An optional name to be associated with the molecule
atoms (list[dict]): A list of dictionary inputs for
Atom.from_dict()
bonds (list[dict]): A list of dictionary inputs for
Bond.from_dict()
conformers (list[list[float]]): A list containing the cartesian coordinates of each atom in units of
conformer_unit
in the order defined inatoms
.properties (dict): Outputs from a chosen toolkit:
atom_map (dict): Dictionary of atom index (as in
atoms
entry) and the mapped index relevant to a mapped canonical smiles string**kwargs: Other toolkit dependent outputs
hierarchy_schemes (dict[dict]): Dictionary where keys (such as
"residues"
and"chains"
) represent dictionary outputs fromHierarchyScheme.to_dict()
conformers_unit (str, default=”angstrom”): Valid unit of length input for the OpenFF Units module.
partial_charges (list[float], default=None): Array of partial charge (in unit defined by
partial_charge_unit
) for atoms in the same order as the output,``atoms``.partial_charge_unit (str, default=None): Valid unit of charge input for the OpenFF Units module. If
partial_charges
is also included, the default is"elementary_charge"
instead.
- ordered_connection_table_hash() int [source]
Compute an ordered hash of the atoms and bonds in the molecule
- classmethod from_dict(molecule_dict: dict) FM [source]
Create a new Molecule from a dictionary representation
- Parameters:
molecule_dict – A dictionary representation of the molecule defined by the inputs of
Molecule.to_dict()
.- Returns:
molecule – A
Molecule
class instance created from the dictionary representation
- add_default_hierarchy_schemes(overwrite_existing: bool = True)[source]
Adds
chain
andresidue
hierarchy schemes.The Open Force Field Toolkit has no native understanding of hierarchical atom organisation schemes common to other biomolecular software, such as “residues” or “chains” (see Hierarchy data (chains and residues)). Hierarchy schemes allow iteration over groups of atoms according to their metadata. For more information, see
HierarchyScheme
.If a
Molecule
with the default hierarchy schemes changes,Molecule.update_hierarchy_schemes()
must be called before the residues or chains are iterated over again or else the iteration may be incorrect.- Parameters:
overwrite_existing – Whether to overwrite existing instances of the residue and chain hierarchy schemes. If this is
False
and either of the hierarchy schemes are already defined on this molecule, an exception will be raised.- Raises:
HierarchySchemeWithIteratorNameAlreadyRegisteredException – When
overwrite_existing=False
and either thechains
orresidues
hierarchy scheme is already configured.
- add_hierarchy_scheme(uniqueness_criteria: Iterable[str], iterator_name: str) HierarchyScheme [source]
Use the molecule’s metadata to facilitate iteration over its atoms.
This method will add an attribute with the name given by the
iterator_name
argument that provides an iterator over groups of atoms. Atoms are grouped by the values in theiratom.metadata
dictionary; any atoms with the same values for the keys given in theuniqueness_criteria
argument will be in the same group. These groups have the typeHierarchyElement
.Hierarchy schemes are not updated dynamically; if a
Molecule
with hierarchy schemes changes,Molecule.update_hierarchy_schemes()
must be called before the scheme is iterated over again or else the grouping may be incorrect.Hierarchy schemes allow iteration over groups of atoms according to their metadata. For more information, see
HierarchyScheme
.- Parameters:
uniqueness_criteria – The names of
Atom
metadata entries that define this scheme. An atom belongs to aHierarchyElement
only if its metadata has the same values for these criteria as the other atoms in theHierarchyElement
.iterator_name – Name of the iterator that will be exposed to access the hierarchy elements generated by this scheme. Must not match an existing attribute of the
Molecule
, i.e.atoms
,angles
, etc.
- Returns:
new_hier_scheme – The newly created HierarchyScheme
- property hierarchy_schemes: dict[str, HierarchyScheme]
The hierarchy schemes available on the molecule.
Hierarchy schemes allow iteration over groups of atoms according to their metadata. For more information, see
HierarchyScheme
.- Returns:
A dict of the form {str (HierarchyScheme}) – The HierarchySchemes associated with the molecule.
- delete_hierarchy_scheme(iter_name: str)[source]
Remove an existing
HierarchyScheme
specified by its iterator name.Hierarchy schemes allow iteration over groups of atoms according to their metadata. For more information, see
HierarchyScheme
.- Parameters:
iter_name –
- update_hierarchy_schemes(iter_names: list[str] | None = None)[source]
Infer a hierarchy from atom metadata according to the existing hierarchy schemes.
Hierarchy schemes allow iteration over groups of atoms according to their metadata. For more information, see
HierarchyScheme
.- Parameters:
iter_names – Only perceive hierarchy for HierarchySchemes that expose these iterator names. If not provided, all known hierarchies will be perceived, overwriting previous results if applicable.
See also
Molecule.add_hierarchy_scheme
,Molecule.delete_hierarchy_schemes
,Molecule.hierarchy_schemes
,HierarchyScheme
- to_smiles(isomeric: bool = True, explicit_hydrogens: bool = True, mapped: bool = False, toolkit_registry: ToolkitRegistry | ToolkitWrapper = GLOBAL_TOOLKIT_REGISTRY)[source]
Return a canonical isomeric SMILES representation of the current molecule. A partially mapped smiles can also be generated for atoms of interest by supplying an atom_map to the properties dictionary.
Note
RDKit and OpenEye versions will not necessarily return the same representation.
- Parameters:
isomeric – return an isomeric smiles
explicit_hydrogens – return a smiles string containing all hydrogens explicitly
mapped – return a explicit hydrogen mapped smiles, the atoms to be mapped can be controlled by supplying an atom map into the properties dictionary. If no mapping is passed all atoms will be mapped in order, else an atom map dictionary from the current atom index to the map id should be supplied with no duplicates. The map ids (values) should start from 0 or 1.
toolkit_registry –
ToolkitRegistry
orToolkitWrapper
to use for SMILES conversion
- Returns:
smiles – Canonical isomeric explicit-hydrogen SMILES
Examples
>>> from openff.toolkit.utils import get_data_file_path >>> sdf_filepath = get_data_file_path('molecules/ethanol.sdf') >>> molecule = Molecule(sdf_filepath) >>> smiles = molecule.to_smiles()
- classmethod from_inchi(inchi: str, allow_undefined_stereo: bool = False, toolkit_registry: ToolkitRegistry | ToolkitWrapper = GLOBAL_TOOLKIT_REGISTRY, name: str = '') FM [source]
Construct a Molecule from a InChI representation
- Parameters:
inchi – The InChI representation of the molecule.
allow_undefined_stereo – Whether to accept InChI with undefined stereochemistry. If False, an exception will be raised if a InChI with undefined stereochemistry is passed into this function.
toolkit_registry –
ToolkitRegistry
orToolkitWrapper
to use for InChI-to-molecule conversionname – An optional name for the output molecule
- Returns:
molecule
Examples
Make cis-1,2-Dichloroethene:
>>> molecule = Molecule.from_inchi('InChI=1S/C2H2Cl2/c3-1-2-4/h1-2H/b2-1-')
- to_inchi(fixed_hydrogens: bool = False, toolkit_registry: ToolkitRegistry | ToolkitWrapper = GLOBAL_TOOLKIT_REGISTRY) str [source]
Create an InChI string for the molecule using the requested toolkit backend. InChI is a standardised representation that does not capture tautomers unless specified using the fixed hydrogen layer.
For information on InChi see here https://iupac.org/who-we-are/divisions/division-details/inchi/
- Parameters:
fixed_hydrogens – If a fixed hydrogen layer should be added to the InChI, if True this will produce a non standard specific InChI string of the molecule.
toolkit_registry –
ToolkitRegistry
orToolkitWrapper
to use for molecule-to-InChI conversion
- Returns:
inchi (str) – The InChI string of the molecule.
- Raises:
InvalidToolkitRegistryError – If an invalid object is passed as the toolkit_registry parameter
- to_inchikey(fixed_hydrogens: bool = False, toolkit_registry: ToolkitRegistry | ToolkitWrapper = GLOBAL_TOOLKIT_REGISTRY)[source]
Create an InChIKey for the molecule using the requested toolkit backend. InChIKey is a standardised representation that does not capture tautomers unless specified using the fixed hydrogen layer.
For information on InChi see here https://iupac.org/who-we-are/divisions/division-details/inchi/
- Parameters:
fixed_hydrogens – If a fixed hydrogen layer should be added to the InChI, if True this will produce a non standard specific InChI string of the molecule.
toolkit_registry –
ToolkitRegistry
orToolkitWrapper
to use for molecule-to-InChIKey conversion
- Returns:
inchi_key (str) – The InChIKey representation of the molecule.
- Raises:
InvalidToolkitRegistryError – If an invalid object is passed as the toolkit_registry parameter
- classmethod from_smiles(smiles: str, hydrogens_are_explicit: bool = False, toolkit_registry: ToolkitRegistry | ToolkitWrapper = GLOBAL_TOOLKIT_REGISTRY, allow_undefined_stereo: bool = False, name: str = '') FM [source]
Construct a
Molecule
from a SMILES representationThe order of atoms in the
Molecule
is unspecified and may change from version to version or with different toolkits. SMILES atom indices (also known as atom maps) are not used to order atoms; instead, they are stored in the produced molecule’s properties attribute, accessible viamolecule.properties["atom_map"]
. The atom map is stored as a dictionary mapping molecule atom indices to SMILES atom maps. To order atoms according to SMILES atom indices, seeMolecule.from_mapped_smiles()
, which helpfully raises an exception if any atom map is missing, duplicated, or out-of-range, or elseMolecule.remap()
for arbitrary remaps.- Parameters:
smiles – The SMILES representation of the molecule.
hydrogens_are_explicit – If
True
, forbid the cheminformatics toolkit from inferring hydrogen atoms not explicitly specified in the SMILES.toolkit_registry – The cheminformatics toolkit to use to interpret the SMILES.
allow_undefined_stereo – Whether to accept SMILES with undefined stereochemistry. If
False
, an exception will be raised if a SMILES with undefined stereochemistry is passed into this function.name – An optional name for the output molecule
- Raises:
RadicalsNotSupportedError – If any atoms in the input molecule contain radical electrons.
Examples
Create a
Molecule
representing toluene from SMILES:>>> molecule = Molecule.from_smiles('Cc1ccccc1')
Create a
Molecule
representing phenol from SMILES with the oxygen at atom index 0 (SMILES indices begin at 1):>>> molecule = Molecule.from_smiles('c1ccccc1[OH:1]') >>> molecule = molecule.remap( ... {k: v - 1 for k, v in molecule.properties["atom_map"].items()}, ... partial=True, ... ) >>> assert molecule.atom(0).symbol == "O"
See also
- static are_isomorphic(mol1: FrozenMolecule | _SimpleMolecule | nx.Graph, mol2: FrozenMolecule | _SimpleMolecule | nx.Graph, return_atom_map: bool = False, aromatic_matching: bool = True, formal_charge_matching: bool = True, bond_order_matching: bool = True, atom_stereochemistry_matching: bool = True, bond_stereochemistry_matching: bool = True, strip_pyrimidal_n_atom_stereo: bool = True, toolkit_registry: ToolkitRegistry | ToolkitWrapper = GLOBAL_TOOLKIT_REGISTRY) tuple[bool, Optional[dict[int, int]]] [source]
Determine if
mol1
is isomorphic tomol2
.are_isomorphic()
compares two molecule’s graph representations and the chosen node/edge attributes. Connections and atomic numbers are always checked.If nx.Graphs() are given they must at least have
atomic_number
attributes on nodes. Other attributes thatare_isomorphic()
can optionally check…… in nodes are:
is_aromatic
formal_charge
stereochemistry
… in edges are:
is_aromatic
bond_order
stereochemistry
By default, all attributes are checked, but stereochemistry around pyrimidal nitrogen is ignored.
Warning
This API is experimental and subject to change.
- Parameters:
mol1 – The first molecule to test for isomorphism.
mol2 – The second molecule to test for isomorphism.
return_atom_map – Return a
dict
containing the atomic mapping, otherwiseNone
. Only processed if inputs are isomorphic, will always returnNone
if inputs are not isomorphic.aromatic_matching – If
False
, aromaticity of graph nodes and edges are ignored for the purpose of determining isomorphism.formal_charge_matching – If
False
, formal charges of graph nodes are ignored for the purpose of determining isomorphism.bond_order_matching – If
False
, bond orders of graph edges are ignored for the purpose of determining isomorphism.atom_stereochemistry_matching – If
False
, atoms’ stereochemistry is ignored for the purpose of determining isomorphism.bond_stereochemistry_matching – If
False
, bonds’ stereochemistry is ignored for the purpose of determining isomorphism.strip_pyrimidal_n_atom_stereo – If
True
, any stereochemistry defined around pyrimidal nitrogen stereocenters will be disregarded in the isomorphism check.toolkit_registry –
ToolkitRegistry
orToolkitWrapper
to use for removing stereochemistry from pyrimidal nitrogens.
- Returns:
molecules_are_isomorphic
atom_map – [dict[int,int]] ordered by mol1 indexing {mol1_index: mol2_index} If molecules are not isomorphic given input arguments, will return None instead of dict.
- is_isomorphic_with(other: FrozenMolecule | _SimpleMolecule | nx.Graph, **kwargs) bool [source]
Check if the molecule is isomorphic with the other molecule which can be an openff.toolkit.topology.Molecule or nx.Graph(). Full matching is done using the options described bellow.
Warning
This API is experimental and subject to change.
- Parameters:
other –
aromatic_matching –
atoms. (compare the formal charges attributes of the) –
formal_charge_matching –
atoms. –
bond_order_matching –
bonds. (compare the bond order on attributes of the) –
atom_stereochemistry_matching – If
False
, atoms’ stereochemistry is ignored for the purpose of determining equality.bond_stereochemistry_matching – If
False
, bonds’ stereochemistry is ignored for the purpose of determining equality.strip_pyrimidal_n_atom_stereo – If
True
, any stereochemistry defined around pyrimidal nitrogen stereocenters will be disregarded in the isomorphism check.toolkit_registry –
ToolkitRegistry
orToolkitWrapper
to use for removing stereochemistry from pyrimidal nitrogens.
- Returns:
isomorphic
- generate_conformers(toolkit_registry: ToolkitRegistry | ToolkitWrapper = GLOBAL_TOOLKIT_REGISTRY, n_conformers: int = 10, rms_cutoff: Quantity | None = None, clear_existing: bool = True, make_carboxylic_acids_cis: bool = True)[source]
Generate conformers for this molecule using an underlying toolkit.
If
n_conformers=0
, no toolkit wrapper will be called. Ifn_conformers=0
andclear_existing=True
,molecule.conformers
will be set toNone
.- Parameters:
toolkit_registry –
ToolkitRegistry
orToolkitWrapper
to use for SMILES-to-molecule conversionn_conformers – The maximum number of conformers to produce
rms_cutoff – The minimum RMS value at which two conformers are considered redundant and one is deleted. Precise implementation of this cutoff may be toolkit-dependent. If
None
, the cutoff is set to be the default value for eachToolkitWrapper
(generally 1 Angstrom).clear_existing – Whether to overwrite existing conformers for the molecule
make_carboxylic_acids_cis – Guarantee all conformers have exclusively cis carboxylic acid groups (COOH) by rotating the proton in any trans carboxylic acids 180 degrees around the C-O bond. Works around a bug in conformer generation by the OpenEye toolkit where trans COOH is much more common than it should be.
Examples
>>> molecule = Molecule.from_smiles('CCCCCC') >>> molecule.generate_conformers()
- Raises:
InvalidToolkitRegistryError – If an invalid object is passed as the toolkit_registry parameter
- apply_elf_conformer_selection(percentage: float = 2.0, limit: int = 10, toolkit_registry: ToolkitRegistry | ToolkitWrapper | None = GLOBAL_TOOLKIT_REGISTRY, **kwargs)[source]
Select a set of diverse conformers from the molecule’s conformers with ELF.
Applies the Electrostatically Least-interacting Functional groups method to select a set of diverse conformers which have minimal electrostatically strongly interacting functional groups from the molecule’s conformers.
- Parameters:
toolkit_registry – The underlying toolkit to use to select the ELF conformers.
percentage – The percentage of conformers with the lowest electrostatic interaction energies to greedily select from.
limit – The maximum number of conformers to select.
Notes
The input molecule should have a large set of conformers already generated to select the ELF conformers from.
The selected conformers will be retained in the conformers list while unselected conformers will be discarded.
- get_available_charge_methods(toolkit_registry: ToolkitRegistry | ToolkitWrapper = GLOBAL_TOOLKIT_REGISTRY) list[str] [source]
Get the charge methods supported by each wrapper in the specified registry.
- Parameters:
toolkit_registry –
ToolkitRegistry
orToolkitWrapper
to use for the calculation.
- assign_partial_charges(partial_charge_method: str, strict_n_conformers: bool = False, use_conformers: Iterable[Quantity] | None = None, toolkit_registry: ToolkitRegistry | ToolkitWrapper = GLOBAL_TOOLKIT_REGISTRY, normalize_partial_charges: bool = True)[source]
Calculate partial atomic charges and store them in the molecule.
assign_partial_charges
computes charges using the specified toolkit and assigns the new values to thepartial_charges
attribute. Supported charge methods vary from toolkit to toolkit, but some supported methods are:"am1bcc"
"am1bccelf10"
(requires OpenEye Toolkits)"am1-mulliken"
"mmff94"
"gasteiger"
By default, the conformers on the input molecule are not used in the charge calculation. Instead, any conformers needed for the charge calculation are generated by this method. If this behavior is undesired, specific conformers can be provided via the
use_conformers
argument.ELF10 methods will neither fail nor warn when fewer than the expected number of conformers could be generated, as many small molecules are too rigid to provide a large number of conformers. Note that only the
"am1bccelf10"
partial charge method uses ELF conformer selection; the"am1bcc"
method only uses a single conformer. This may confuse users as the ToolkitAM1BCC SMIRNOFF tag in a force field file defines that AM1BCC-ELF10 should be used if the OpenEye Toolkits are available.For more supported charge methods and their details, see the corresponding methods in each toolkit wrapper:
- Parameters:
partial_charge_method – The partial charge calculation method to use for partial charge calculation.
strict_n_conformers – Whether to raise an exception if an invalid number of conformers is provided for the given charge method. If this is False and an invalid number of conformers is found, a warning will be raised.
use_conformers – Coordinates to use for partial charge calculation. If
None
, an appropriate number of conformers will be generated.toolkit_registry –
ToolkitRegistry
orToolkitWrapper
to use for the calculation.normalize_partial_charges – Whether to offset partial charges so that they sum to the total formal charge of the molecule. This is used to prevent accumulation of rounding errors when the partial charge assignment method returns values at limited precision.
Examples
Generate AM1 Mulliken partial charges. Conformers for the AM1 calculation are generated automatically:
>>> molecule = Molecule.from_smiles('CCCCCC') >>> molecule.assign_partial_charges('am1-mulliken')
To use pre-generated conformations, use the
use_conformers
argument:>>> molecule = Molecule.from_smiles('CCCCCC') >>> molecule.generate_conformers(n_conformers=1) >>> molecule.assign_partial_charges( ... 'am1-mulliken', ... use_conformers=molecule.conformers ... )
- Raises:
InvalidToolkitRegistryError – If an invalid object is passed as the toolkit_registry parameter
See also
openff.toolkit.utils.toolkits.OpenEyeToolkitWrapper.assign_partial_charges
,openff.toolkit.utils.toolkits.RDKitToolkitWrapper.assign_partial_charges
,openff.toolkit.utils.toolkits.AmberToolsToolkitWrapper.assign_partial_charges
,openff.toolkit.utils.toolkits.BuiltInToolkitWrapper.assign_partial_charges
- assign_fractional_bond_orders(bond_order_model: str | None = None, toolkit_registry: ToolkitRegistry | ToolkitWrapper = GLOBAL_TOOLKIT_REGISTRY, use_conformers: Iterable[Quantity] | None = None)[source]
Update and store list of bond orders this molecule.
Bond orders are stored on each bond, in the
bond.fractional_bond_order
attribute.Warning
This API is experimental and subject to change.
- Parameters:
toolkit_registry –
ToolkitRegistry
orToolkitWrapper
to use for SMILES-to-molecule conversionbond_order_model – The bond order model to use for fractional bond order calculation. If
None
,"am1-wiberg"
is used.use_conformers – The conformers to use for fractional bond order calculation. If
None
, an appropriate number of conformers will be generated by an availableToolkitWrapper
.
Examples
>>> from openff.toolkit import Molecule >>> molecule = Molecule.from_smiles('CCCCCC') >>> molecule.assign_fractional_bond_orders()
- Raises:
InvalidToolkitRegistryError – If an invalid object is passed as the toolkit_registry parameter
- to_networkx() nx.Graph [source]
Generate a NetworkX undirected graph from the molecule.
Nodes are Atoms labeled with atom indices and atomic elements (via the
element
node atrribute). Edges denote chemical bonds between Atoms.- Returns:
graph – The resulting graph, with nodes (atoms) labeled with atom indices, elements, stereochemistry and aromaticity flags and bonds with two atom indices, bond order, stereochemistry, and aromaticity flags
Examples
Retrieve the bond graph for imatinib (OpenEye toolkit required)
>>> molecule = Molecule.from_iupac('imatinib') >>> nxgraph = molecule.to_networkx()
- find_rotatable_bonds(ignore_functional_groups: list[str] | None = None, toolkit_registry: ToolkitRegistry | ToolkitWrapper = GLOBAL_TOOLKIT_REGISTRY) list[Bond] [source]
Find all bonds classed as rotatable ignoring any matched to the
ignore_functional_groups
list.- Parameters:
ignore_functional_groups – A list of bond SMARTS patterns to be ignored when finding rotatable bonds.
toolkit_registry –
ToolkitRegistry
orToolkitWrapper
to use for SMARTS matching
- Returns:
bonds (list[openff.toolkit.topology.molecule.Bond]) – The list of openff.toolkit.topology.molecule.Bond instances which are rotatable.
- property partial_charges
Returns the partial charges (if present) on the molecule.
- Returns:
partial_charges – The partial charges on the molecule’s atoms. Returns None if no charges have been specified.
- property n_atoms: int
The number of Atom objects.
- property n_bonds: int
The number of Bond objects in the molecule.
- property n_angles: int
Number of angles in the molecule.
- property n_propers: int
Number of proper torsions in the molecule.
- property n_impropers: int
Number of possible improper torsions in the molecule.
- atom(index: int) Atom [source]
Get the atom with the specified index.
- Parameters:
index –
- Returns:
atom
- atom_index(atom: Atom) int [source]
Returns the index of the given atom in this molecule
- Parameters:
atom –
- Returns:
index – The index of the given atom in this molecule
- property conformers
Returns the list of conformers for this molecule.
Conformers are presented as a list of
Quantity
-wrapped NumPy arrays, of shape (3 x n_atoms) and with dimensions of [Distance]. The return value is the actual list of conformers, and changes to the contents affect the originalFrozenMolecule
.
- property n_conformers: int
The number of conformers for this molecule.
- bond(index: int) Bond [source]
Get the bond with the specified index.
- Parameters:
index –
- Returns:
bond
- property torsions: set[tuple[Atom, Atom, Atom, Atom]]
Get an iterator over all i-j-k-l torsions. Note that i-j-k-i torsions (cycles) are excluded.
- Returns:
torsions
- property propers: set[tuple[Atom, Atom, Atom, Atom]]
Iterate over all proper torsions in the molecule
- property impropers: set[tuple[Atom, Atom, Atom, Atom]]
Iterate over all improper torsions in the molecule.
- Returns:
impropers – An iterator of tuples, each containing the atoms making up a possible improper torsion.
See also
- property smirnoff_impropers: set[tuple[Atom, Atom, Atom, Atom]]
Iterate over all impropers with trivalent centers, reporting the central atom second.
The central atom is reported second in each torsion. This method reports an improper for each trivalent atom in the molecule, whether or not any given force field would assign it improper torsion parameters.
Also note that this will return 6 possible atom orderings around each improper center. In current SMIRNOFF parameterization, three of these six orderings will be used for the actual assignment of the improper term and measurement of the angles. These three orderings capture the three unique angles that could be calculated around the improper center, therefore the sum of these three terms will always return a consistent energy.
The exact three orderings that will be applied during parameterization can not be determined in this method, since it requires sorting the atom indices, and those indices may change when this molecule is added to a Topology.
For more details on the use of three-fold (‘trefoil’) impropers, see https://openforcefield.github.io/standards/standards/smirnoff/#impropertorsions
- Returns:
impropers – An iterator of tuples, each containing the indices of atoms making up a possible improper torsion. The central atom is listed second in each tuple.
See also
- property amber_impropers: set[tuple[Atom, Atom, Atom, Atom]]
Iterate over all impropers with trivalent centers, reporting the central atom first.
The central atom is reported first in each torsion. This method reports an improper for each trivalent atom in the molecule, whether or not any given force field would assign it improper torsion parameters.
Also note that this will return 6 possible atom orderings around each improper center. In current AMBER parameterization, one of these six orderings will be used for the actual assignment of the improper term and measurement of the angle. This method does not encode the logic to determine which of the six orderings AMBER would use.
- Returns:
impropers – An iterator of tuples, each containing the indices of atoms making up a possible improper torsion. The central atom is listed first in each tuple.
See also
- nth_degree_neighbors(n_degrees)[source]
Return canonicalized pairs of atoms whose shortest separation is exactly n bonds. Only pairs with increasing atom indices are returned.
- Parameters:
n (int) – The number of bonds separating atoms in each pair
- Returns:
neighbors – tuples (len 2) of atom that are separated by
n
bonds.
Notes
The criteria used here relies on minimum distances; when there are multiple valid paths between atoms, such as atoms in rings, the shortest path is considered. For example, two atoms in “meta” positions with respect to each other in a benzene are separated by two paths, one length 2 bonds and the other length 4 bonds. This function would consider them to be 2 apart and would not include them if
n=4
was passed.
- property name: str
The name (or title) of the molecule
- property hill_formula: str
Get the Hill formula of the molecule
- chemical_environment_matches(query: str, unique: bool = False, toolkit_registry: ToolkitRegistry | ToolkitWrapper = GLOBAL_TOOLKIT_REGISTRY)[source]
Find matches in the molecule for a SMARTS string
- Parameters:
query – SMARTS string (with one or more tagged atoms).
unique – If
True
, de-duplicates matches before returning.toolkit_registry –
ToolkitRegistry
orToolkitWrapper
to use for chemical environment matches
- Returns:
matches – A list of tuples, containing the indices of the matching atoms.
Examples
Retrieve all the carbon-carbon bond matches in a molecule
>>> molecule = Molecule.from_iupac('imatinib') >>> matches = molecule.chemical_environment_matches('[#6X3:1]~[#6X3:2]')
- classmethod from_iupac(iupac_name: str, toolkit_registry: ToolkitRegistry | ToolkitWrapper = GLOBAL_TOOLKIT_REGISTRY, allow_undefined_stereo: bool = False, **kwargs) FM [source]
Generate a molecule from IUPAC or common name
Note
This method requires the OpenEye toolkit to be installed.
- Parameters:
iupac_name – IUPAC name of molecule to be generated
toolkit_registry –
ToolkitRegistry
orToolkitWrapper
to use for chemical environment matchesallow_undefined_stereo – If false, raises an exception if molecule contains undefined stereochemistry.
- Returns:
molecule – The resulting molecule with position
Examples
Create a molecule from an IUPAC name
>>> molecule = Molecule.from_iupac('4-[(4-methylpiperazin-1-yl)methyl]-N-(4-methyl-3-{[4-(pyridin-3-yl)pyrimidin-2-yl]amino}phenyl)benzamide')
Create a molecule from a common name
>>> molecule = Molecule.from_iupac('imatinib')
- to_iupac(toolkit_registry=GLOBAL_TOOLKIT_REGISTRY)[source]
Generate IUPAC name from Molecule
- Returns:
iupac_name – IUPAC name of the molecule
.. note :: This method requires the OpenEye toolkit to be installed.
Examples
>>> from openff.toolkit.utils import get_data_file_path >>> sdf_filepath = get_data_file_path('molecules/ethanol.sdf') >>> molecule = Molecule(sdf_filepath) >>> iupac_name = molecule.to_iupac()
- classmethod from_topology(topology) FM [source]
Return a Molecule representation of an OpenFF Topology containing a single Molecule object.
- Parameters:
topology – The
Topology
object containing a singleMolecule
object. Note that OpenMM and MDTrajTopology
objects are not supported.- Returns:
molecule – The Molecule object in the topology
- Raises:
ValueError – If the topology does not contain exactly one molecule.
Examples
Create a molecule from a Topology object that contains exactly one molecule
>>> from openff.toolkit import Molecule, Topology >>> topology = Topology.from_molecules(Molecule.from_smiles('[CH4]')) >>> molecule = Molecule.from_topology(topology)
- to_topology()[source]
Return an OpenFF Topology representation containing one copy of this molecule
- Returns:
topology – A Topology representation of this molecule
Examples
>>> from openff.toolkit import Molecule >>> molecule = Molecule.from_iupac('imatinib') >>> topology = molecule.to_topology()
- classmethod from_file(file_path: str | Path | TextIO, file_format=None, toolkit_registry=GLOBAL_TOOLKIT_REGISTRY, allow_undefined_stereo: bool = False) FM | list[FM] [source]
Create one or more molecules from a file
- Parameters:
file_path – The path to the file or file-like object to stream one or more molecules from.
file_format – Format specifier, usually file suffix (eg. ‘MOL2’, ‘SMI’) Note that not all toolkits support all formats. Check ToolkitWrapper.toolkit_file_read_formats for your loaded toolkits for details.
toolkit_registry –
ToolkitRegistry
orToolkitWrapper
to use for file loading. If a Toolkit is passed, only the highest-precedence toolkit is usedallow_undefined_stereo – If false, raises an exception if oemol contains undefined stereochemistry.
- Returns:
molecules – If there is a single molecule in the file, a Molecule is returned; otherwise, a list of Molecule objects is returned.
Examples
>>> from openff.toolkit import Molecule >>> from openff.toolkit.utils.utils import get_data_file_path >>> sdf_file_path = get_data_file_path("molecules/toluene.sdf") >>> molecule = Molecule.from_file(sdf_file_path)
- classmethod from_polymer_pdb(file_path: str | Path | TextIO, toolkit_registry=GLOBAL_TOOLKIT_REGISTRY, name: str = '') FM [source]
Loads a polymer from a PDB file.
Also see
Topology.from_multicomponent_pdb()
, which can do everything this method can and more.Currently only supports proteins with canonical amino acids that are either uncapped or capped by ACE/NME groups, but may later be extended to handle other common polymers, or accept user-defined polymer templates. Only one polymer chain may be present in the PDB file, and it must be the only molecule present.
Connectivity and bond orders are assigned by matching SMARTS codes for the supported residues against atom names. The PDB file must include all atoms with the correct standard atom names described in the PDB Chemical Component Dictionary. Residue names are used to assist trouble-shooting failed assignments, but are not used in the actual assignment process.
Metadata such as residues, chains, and atom names are recorded in the
Atom.metadata
attribute, which is a dictionary mapping from strings like “residue_name” to the appropriate value.from_polymer_pdb
returns a molecule that can be iterated over with the.residues
and.chains
attributes, as well as the usual.atoms
.- Parameters:
file_path – PDB information to be passed to OpenMM PDBFile object for loading
None (toolkit_registry = ToolkitWrapper or ToolkitRegistry. Default =) – Either a ToolkitRegistry, ToolkitWrapper
name – An optional name for the output molecule
- Returns:
molecule
- Raises:
UnassignedChemistryInPDBError – If an atom or bond could not be assigned; the exception will provide a detailed diagnostic of what went wrong.
MultipleMoleculesInPDBError – If all atoms and bonds could be assigned, but the PDB includes multiple chains or molecules.
- to_file(file_path, file_format, toolkit_registry=GLOBAL_TOOLKIT_REGISTRY)[source]
Write the current molecule to a file or file-like object
- Parameters:
file_path – A file-like object or the path to the file to be written.
file_format – Format specifier, one of [‘MOL2’, ‘MOL2H’, ‘SDF’, ‘PDB’, ‘SMI’, ‘CAN’, ‘TDT’] Note that not all toolkits support all formats
toolkit_registry –
ToolkitRegistry
orToolkitWrapper
to use for file writing. If a Toolkit is passed, only the highest-precedence toolkit is used
- Raises:
ValueError – If the requested file_format is not supported by one of the installed cheminformatics toolkits
Examples
>>> molecule = Molecule.from_iupac('imatinib') >>> molecule.to_file('imatinib.mol2', file_format='mol2') >>> molecule.to_file('imatinib.sdf', file_format='sdf') >>> molecule.to_file('imatinib.pdb', file_format='pdb')
- enumerate_tautomers(max_states=20, toolkit_registry=GLOBAL_TOOLKIT_REGISTRY)[source]
Enumerate the possible tautomers of the current molecule
- Parameters:
max_states – The maximum amount of molecules that should be returned
toolkit_registry –
ToolkitRegistry
orToolkitWrapper
to use to enumerate the tautomers.
- Returns:
molecules – A list of openff.toolkit.topology.Molecule instances not including the input molecule.
- enumerate_stereoisomers(undefined_only: bool = False, max_isomers: int = 20, rationalise: bool = True, toolkit_registry: ToolkitRegistry | ToolkitWrapper = GLOBAL_TOOLKIT_REGISTRY)[source]
Enumerate the stereocenters and bonds of the current molecule.
- Parameters:
undefined_only – If we should enumerate all stereocenters and bonds or only those with undefined stereochemistry
max_isomers – The maximum amount of molecules that should be returned
rationalise – If we should try to build and rationalise the molecule to ensure it can exist
toolkit_registry –
ToolkitRegistry
orToolkitWrapper
to use to enumerate the stereoisomers.
- Returns:
molecules – A list of
Molecule
instances not including the input molecule.
- enumerate_protomers(max_states: int = 0) list [source]
Enumerate the formal charges of a molecule to generate different protomers.
- Parameters:
max_states – The maximum number of protomer states to be returned. If 0, the default, attempt to return all protomers. If set to a non-zero number, the input molecule is not guaranteed to be included in the returned list.
- Returns:
molecules – A list of the protomers of the input molecules, including the input molecule if found by the underlying toolkit’s protomer enumeration tool and not pruned by max_states.
- classmethod from_rdkit(rdmol, allow_undefined_stereo: bool = False, hydrogens_are_explicit: bool = False) FM [source]
Create a Molecule from an RDKit molecule.
Requires the RDKit to be installed.
- Parameters:
rdmol – An RDKit molecule
allow_undefined_stereo – If
False
, raises an exception ifrdmol
contains undefined stereochemistry.hydrogens_are_explicit – If
False
, RDKit will perform hydrogen addition usingChem.AddHs
- Returns:
molecule – An OpenFF molecule
Examples
Create a molecule from an RDKit molecule
>>> from openff.toolkit import Molecule >>> from rdkit import Chem >>> rdmol = Chem.MolFromSmiles("CCO") >>> molecule = Molecule.from_rdkit(rdmol)
- to_rdkit(aromaticity_model=DEFAULT_AROMATICITY_MODEL, toolkit_registry=GLOBAL_TOOLKIT_REGISTRY) RDMol [source]
Create an RDKit molecule
Requires the RDKit to be installed.
- Parameters:
aromaticity_model – The aromaticity model to use. Only OEAroModel_MDL is supported.
- Returns:
rdmol – An RDKit molecule
Examples
Convert a molecule to RDKit
>>> from openff.toolkit.utils import get_data_file_path >>> sdf_filepath = get_data_file_path('molecules/ethanol.sdf') >>> molecule = Molecule(sdf_filepath) >>> rdmol = molecule.to_rdkit()
- classmethod from_openeye(oemol, allow_undefined_stereo: bool = False) FrozenMolecule [source]
Create a
Molecule
from an OpenEye molecule.Requires the OpenEye toolkit to be installed.
- Parameters:
oemol – An OpenEye molecule
allow_undefined_stereo – If
False
, raises an exception if oemol contains undefined stereochemistry.
- Returns:
molecule – An OpenFF molecule
Examples
Create a
Molecule
from an OpenEye OEMol>>> from openff.toolkit import Molecule >>> from openeye import oechem >>> oemol = oechem.OEMol() >>> oechem.OESmilesToMol(oemol, '[H]C([H])([H])C([H])([H])O[H]') True >>> molecule = Molecule.from_openeye(oemol)
- to_qcschema(multiplicity=1, conformer=0, extras=None)[source]
Create a QCElemental Molecule.
The kekule structure of the molecule is saved in two places on the returned Molecule:
extras["canonical_isomeric_explicit_hydrogen_mapped_smiles"]
identifiers["canonical_isomeric_explicit_hydrogen_mapped_smiles"]
Warning
This API is experimental and subject to change.
- Parameters:
multiplicity – The multiplicity of the molecule; sets
molecular_multiplicity
field for QCElemental Molecule.conformer – The index of the conformer to use for the QCElemental Molecule geometry.
extras – A dictionary that should be included in the
extras
field on the QCElemental Molecule. This can be used to include extra information, such as a smiles representation.
- Returns:
qcelemental.models.Molecule – A validated QCElemental Molecule.
Examples
Create a QCElemental Molecule:
>>> import qcelemental as qcel >>> mol = Molecule.from_smiles('CC') >>> mol.generate_conformers(n_conformers=1) >>> qcemol = mol.to_qcschema()
- Raises:
MissingOptionalDependencyError – If qcelemental is not installed, the qcschema can not be validated.
InvalidConformerError – No conformer found at the given index.
- classmethod from_mapped_smiles(mapped_smiles: str, toolkit_registry: ToolkitRegistry | ToolkitWrapper = GLOBAL_TOOLKIT_REGISTRY, allow_undefined_stereo: bool = False) FM [source]
Create a
Molecule
from a SMILES string, ordering atoms from mappingsSMILES strings support mapping integer indices to each atom by ending a bracketed atom declaration with a colon followed by a 1-indexed integer:
This method creates a
Molecule
from such a SMILES string whose atoms are ordered according to the mapping. Each atom must be mapped exactly once; any duplicate, missing, or out-of-range mappings will cause the method to fail.Warning
This API is experimental and subject to change.
- Parameters:
mapped_smiles (str) – A mapped SMILES string with explicit hydrogens.
toolkit_registry – Cheminformatics toolkit to use for SMILES-to-molecule conversion
allow_undefined_stereo – If false, raise an exception if the SMILES contains undefined stereochemistry.
- Returns:
offmol – An OpenFF molecule instance.
- Raises:
SmilesParsingError – If the given SMILES had no indexing picked up by the toolkits, or if the indexing is missing indices.
RemapIndexError – If the mapping has duplicate or out-of-range indices.
Examples
Create a mapped chlorofluoroiodomethane molecule and check the atoms are placed accordingly:
>>> molecule = Molecule.from_mapped_smiles( ... "[Cl:2][C@:1]([F:3])([I:4])[H:5]" ... ) >>> assert molecule.atom(0).symbol == "C" >>> assert molecule.atom(1).symbol == "Cl" >>> assert molecule.atom(2).symbol == "F" >>> assert molecule.atom(3).symbol == "I" >>> assert molecule.atom(4).symbol == "H"
See also
- classmethod from_qcschema(qca_object, toolkit_registry=GLOBAL_TOOLKIT_REGISTRY, allow_undefined_stereo: bool = False)[source]
Create a Molecule from a QCArchive molecule record or dataset entry based on attached cmiles information.
If this method is provided a QCElemental Molecule (or dict representation of a Molecule), it will return a single-conformer OpenFF Molecule.
If this method is provided a QCFractal dataset Entry (or dict representation of an Entry), it will return an OpenFF Molecule with at least one conformer, corresponding to the:
.molecule
attribute of a SinglepointDatasetEntry (single conformer).initial_molecule
attribute of an OptimizationDatasetEntry or GridoptimizationDatasetEntry (single conformer)initial_molecules
attribute of a TorsiondriveDatasetEntry (one or more conformers, in the order that they appear when accessing theinitial_molecules
attribute on the Entry object)
If these QC molecules have their
.id
fields populated, the returned OpenFF Molecule will have a dict mapping QC IDs to conformer numbers (offmol.properties["initial_molecules"]
)The data source must also specify the kekule structure of the molecule. Currently the only supported format for this is in the
canonical_isomeric_explicit_hydrogen_mapped_smiles
field, which will be taken from the following locations, if available, in the following order of priority:The input’s
attributes
attribute (set on QCFractal DatasetEntry objects, such asSinglepointDatasetEntry
andTorsiondriveDatasetEntry
)The input’s
identifiers
attribute (set on QCSchema Molecules made after QCFractal 0.50)The input’s
extras
attribute (the information was typically set on QCSchema Molecules as part of OpenFF’s QC data submission pipeline before QCFractal 0.50)
A QCElemental Molecule produced from
Molecule.to_qcschema
can be round-tripped through this method to produce a new, valid Molecule.- Parameters:
qca_object – A QCArchive molecule record or dataset entry, or dict representation of either.
toolkit_registry – openff.toolkit.utils.toolkits.ToolkitWrapper, optional
ToolkitRegistry
orToolkitWrapper
to use for SMILES-to-molecule conversionallow_undefined_stereo – If false, raises an exception if qca_object contains undefined stereochemistry.
- Returns:
molecule – An OpenFF molecule instance.
Examples
Get Molecule from a QCArchive molecule record:
>>> try: ... from qcportal import PortalClient ... except ImportError: ... import pytest ... pytest.skip("This tests sometimes fails when OpenEye is installed") >>> client = PortalClient("https://api.qcarchive.molssi.org:443/") >>> offmol = Molecule.from_qcschema( ... [*client.query_molecules(molecular_formula="C16H20N3O5")][-1] ... ) >>> offmol.to_hill_formula() 'C16H20N3O5'
Get Molecule from a QCArchive optimization entry:
>>> from qcportal import PortalClient >>> client = PortalClient("https://api.qcarchive.molssi.org:443/") >>> optimizations = client.get_dataset( ... dataset_type="optimization", ... dataset_name="SMIRNOFF Coverage Set 1", ... ) >>> offmol = Molecule.from_qcschema(optimizations.get_entry('coc(o)oc-0')) >>> offmol.to_hill_formula() 'C3H8O3'
- Raises:
InvalidQCInputError – If the input record isn’t suitable to be made into an OpenFF Molecule
MissingCMILESError – If the record does not contain the
canonical_isomeric_explicit_hydrogen_mapped_smiles
.InvalidConformerError – If the conformer could not be attached.
- classmethod from_pdb_and_smiles(file_path, smiles, allow_undefined_stereo: bool = False, name: str = '') FM [source]
Create a Molecule from a pdb file and a SMILES string using RDKit.
Requires RDKit to be installed.
Warning
This API is experimental and subject to change.
The molecule is created and sanitised based on the SMILES string, we then find a mapping between this molecule and one from the PDB based only on atomic number and connections. The SMILES molecule is then reindexed to match the PDB, the conformer is attached, and the molecule returned.
Note that any stereochemistry in the molecule is set by the SMILES, and not the coordinates of the PDB.
- Parameters:
file_path – PDB file path
smiles – a valid smiles string for the pdb, used for stereochemistry, formal charges, and bond order
allow_undefined_stereo – If false, raises an exception if SMILES contains undefined stereochemistry.
name – An optional name for the output molecule
- Returns:
molecule – An OFFMol instance with ordering the same as used in the PDB file.
- Raises:
InvalidConformerError – If the SMILES and PDB molecules are not isomorphic.
- canonical_order_atoms(toolkit_registry=GLOBAL_TOOLKIT_REGISTRY)[source]
Produce a copy of the molecule with the atoms reordered canonically.
Each toolkit defines its own canonical ordering of atoms. The canonical order may change from toolkit version to toolkit version or between toolkits.
Warning
This API is experimental and subject to change.
- Parameters:
toolkit_registry – openff.toolkit.utils.toolkits.ToolkitWrapper, optional
ToolkitRegistry
orToolkitWrapper
to use for SMILES-to-molecule conversion- Returns:
molecule – An new OpenFF style molecule with atoms in the canonical order.
- remap(mapping_dict: dict[int, int], current_to_new: bool = True, partial: bool = False)[source]
Reorder the atoms in the molecule according to the given mapping dict.
The mapping dict must be a dictionary mapping atom indices to atom indices. Each atom index must be an integer in the half-open interval
[0, n_atoms)
; ie, it must be a valid index into theself.atoms
list. All atom indices in the molecule must be mapped from and to exactly once unlesspartial=True
is given, in which case they must be mapped no more than once. Missing (unlesspartial=True
), out-of-range (including non-integer), or duplicate indices are not allowed in themapping_dict
and will lead to an exception.By default, the mapping dict’s keys are the source indices and its values are destination indices, but this can be changed with the
current_to_new
argument.The keys of the
self.properties["atom_map"]
property are updated for the new ordering. Other values of the properties dictionary are transferred unchanged.Warning
This API is experimental and subject to change.
- Parameters:
mapping_dict – A dictionary of the mapping between indices. The mapping should be indexed starting from 0 for both the source and destination; note that SMILES atom mapping is typically 1-based.
current_to_new – If this is
True
, thenmapping_dict
is of the form{current_index: new_index}
; otherwise, it is of the form{new_index: current_index}
.partial – If
False
(the default), an exception will be raised if any atom is lacking a destination in the atom map. Note that if this isTrue
, atoms without entries in the mapping dict may be moved in addition to those in the dictionary. Note that partial maps must still be in-range and not include duplicates.
- Returns:
new_molecule – A copy of the molecule in the new order.
- Raises:
RemapIndexError – When an out-of-range, duplicate, or missing index is found in the
mapping_dict
.
See also
- to_openeye(toolkit_registry: ToolkitRegistry | ToolkitWrapper = GLOBAL_TOOLKIT_REGISTRY, aromaticity_model: str = DEFAULT_AROMATICITY_MODEL)[source]
Create an OpenEye molecule
Requires the OpenEye toolkit to be installed.
- Parameters:
aromaticity_model – The aromaticity model to use. Only OEAroModel_MDL is supported.
- Returns:
oemol – An OpenEye molecule
Examples
Create an OpenEye molecule from a Molecule
>>> molecule = Molecule.from_smiles('CC') >>> oemol = molecule.to_openeye()
- get_bond_between(i: int | Atom, j: int | Atom) Bond [source]
Returns the bond between two atoms
- Parameters:
i – Atoms or atom indices to check
j – Atoms or atom indices to check
- Returns:
bond – The bond between i and j.
- classmethod from_bson(serialized)
Instantiate an object from a BSON serialized representation.
Specification: http://bsonspec.org/
- Parameters:
serialized – A BSON serialized representation of the object
- Returns:
instance – An instantiated object
- classmethod from_json(serialized: str)
Instantiate an object from a JSON serialized representation.
Specification: https://www.json.org/
- Parameters:
serialized – A JSON serialized representation of the object
- Returns:
instance – An instantiated object
- classmethod from_messagepack(serialized)
Instantiate an object from a MessagePack serialized representation.
Specification: https://msgpack.org/index.html
- Parameters:
serialized – A MessagePack-encoded bytes serialized representation
- Returns:
instance – Instantiated object.
- classmethod from_pickle(serialized)
Instantiate an object from a pickle serialized representation.
Warning
This is not recommended for safe, stable storage since the pickle specification may change between Python versions.
- Parameters:
serialized – A pickled representation of the object
- Returns:
instance – An instantiated object
- classmethod from_toml(serialized)
Instantiate an object from a TOML serialized representation.
Specification: https://github.com/toml-lang/toml
- Parameters:
serlialized – A TOML serialized representation of the object
- Returns:
instance – An instantiated object
- classmethod from_xml(serialized)
Instantiate an object from an XML serialized representation.
Specification: https://www.w3.org/XML/
- Parameters:
serialized – An XML serialized representation
- Returns:
instance – Instantiated object.
- classmethod from_yaml(serialized)
Instantiate from a YAML serialized representation.
Specification: http://yaml.org/
- Parameters:
serialized – A YAML serialized representation of the object
- Returns:
instance – Instantiated object
- to_bson()
Return a BSON serialized representation.
Specification: http://bsonspec.org/
- Returns:
serialized – A BSON serialized representation of the objecft
- to_json(indent=None) str
Return a JSON serialized representation.
Specification: https://www.json.org/
- Parameters:
indent – If not None, will pretty-print with specified number of spaces for indentation
- Returns:
serialized – A JSON serialized representation of the object
- to_messagepack()
Return a MessagePack representation.
Specification: https://msgpack.org/index.html
- Returns:
serialized – A MessagePack-encoded bytes serialized representation of the object
- to_pickle()
Return a pickle serialized representation.
Warning
This is not recommended for safe, stable storage since the pickle specification may change between Python versions.
- Returns:
serialized – A pickled representation of the object
- to_toml()
Return a TOML serialized representation.
Specification: https://github.com/toml-lang/toml
- Returns:
serialized – A TOML serialized representation of the object
- to_xml(indent=2)
Return an XML representation.
Specification: https://www.w3.org/XML/
- Parameters:
indent – If not None, will pretty-print with specified number of spaces for indentation
- Returns:
serialized – A MessagePack-encoded bytes serialized representation.
- to_yaml()
Return a YAML serialized representation.
Specification: http://yaml.org/
- Returns:
serialized – A YAML serialized representation of the object