TorsiondriveDataset
- class openff.qcsubmit.datasets.TorsiondriveDataset(*, qc_specifications={'default': QCSpec(method='B3LYP-D3BJ', basis='DZVP', program='psi4', spec_name='default', spec_description='Standard OpenFF optimization quantum chemistry specification.', store_wavefunction=<WavefunctionProtocolEnum.none: 'none'>, implicit_solvent=None, maxiter=200, scf_properties=[<SCFProperties.Dipole: 'dipole'>, <SCFProperties.Quadrupole: 'quadrupole'>, <SCFProperties.WibergLowdinIndices: 'wiberg_lowdin_indices'>, <SCFProperties.MayerIndices: 'mayer_indices'>], keywords={})}, driver=SinglepointDriver.deferred, priority='normal', dataset_tags=['openff'], compute_tag='openff', dataset_name, dataset_tagline, type='TorsionDriveDataset', description, metadata=Metadata(submitter='docs', creation_date=datetime.date(2024, 11, 7), collection_type=None, dataset_name=None, short_description=None, long_description_url=None, long_description=None, elements=set()), provenance={}, dataset={}, filtered_molecules={}, optimization_procedure=GeometricProcedure(program='geometric', coordsys='dlc', enforce=0.1, epsilon=0.0, reset=True, qccnv=True, molcnv=False, check=0, trust=0.1, tmax=0.3, maxiter=300, convergence_set='GAU', constraints={}), protocols=OptimizationProtocols(trajectory=<TrajectoryProtocolEnum.all: 'all'>), grid_spacing=[15], energy_upper_limit=0.05, dihedral_ranges=None, energy_decrease_thresh=None)[source]
An torsiondrive dataset class which handles submission of settings differently from the basic dataset, and creates torsiondrive datasets in the public or local qcarchive instance.
Important
The dihedral_ranges for the whole dataset can be defined here or if different scan ranges are required on a case by case basis they can be defined for each torsion in a molecule separately in the keywords of the torsiondrive entry.
- Parameters
qc_specifications (Dict[str, openff.qcsubmit.common_structures.QCSpec]) –
driver (qcportal.singlepoint.record_models.SinglepointDriver) –
priority (str) –
compute_tag (str) –
dataset_name (str) –
dataset_tagline (pydantic.v1.types.ConstrainedStrValue) –
type (Literal['TorsionDriveDataset']) –
description (pydantic.v1.types.ConstrainedStrValue) –
metadata (openff.qcsubmit.common_structures.Metadata) –
dataset (Dict[str, openff.qcsubmit.datasets.entries.TorsionDriveEntry]) –
filtered_molecules (Dict[str, openff.qcsubmit.datasets.entries.FilterEntry]) –
optimization_procedure (openff.qcsubmit.procedures.GeometricProcedure) –
protocols (qcelemental.models.procedures.OptimizationProtocols) –
energy_upper_limit (float) –
- Return type
None
- __init__(**kwargs)
Make sure the metadata has been assigned correctly if not autofill some information.
Methods
__init__
(**kwargs)Make sure the metadata has been assigned correctly if not autofill some information.
add_molecule
(index, molecule[, extras, keywords])Add a molecule to the dataset under the given index with the passed cmiles.
add_qc_spec
(method, basis, program, ...[, ...])Add a new qcspecification to the factory which will be applied to the dataset.
Clear out any current QCSpecs.
construct
([_fields_set])Creates a new model setting __dict__ and __fields_set__ from trusted or pre-validated data.
copy
(*[, include, exclude, update, deep])Duplicate a model, optionally choose which fields to include, exclude and change.
coverage_report
(force_field[, verbose])Returns a summary of how many molecules within this dataset would be assigned each of the parameters in a force field.
dict
(*args, **kwargs)Overwrite the dict method to handle any enums when saving to yaml/json via a dict call.
export_dataset
(file_name[, compression])Export the dataset to file so that it can be used to make another dataset quickly.
filter_molecules
(molecules, component, ...)Filter a molecule or list of molecules by the component they failed.
from_orm
(obj)get_molecule_entry
(molecule)Search through the dataset for a molecule and return the dataset index of any exact molecule matches.
json
(*[, include, exclude, by_alias, ...])Generate a JSON representation of the model, include and exclude arguments as per dict().
molecules_to_file
(file_name, file_type)Write the molecules to the requested file type.
parse_file
(file_name)Create a Dataset object from a compressed json file.
parse_obj
(obj)parse_raw
(b, *[, content_type, encoding, ...])remove_qcspec
(spec_name)Remove a QCSpec from the dataset.
schema
([by_alias, ref_template])schema_json
(*[, by_alias, ref_template])submit
(client[, ignore_errors, verbose])Submit the dataset to a QCFractal server.
to_tasks
()Build a list of QCEngine procedure tasks which correspond to this dataset.
update_forward_refs
(**localns)Try to update ForwardRefs on fields based on this Model, globalns and localns.
validate
(value)visualize
(file_name[, columns, toolkit])Create a pdf file of the molecules with any torsions highlighted using either openeye or rdkit.
Attributes
Gather the details of the components that were ran during the creation of this dataset.
A generator which yields a openff molecule representation for each molecule filtered while creating this dataset.
A generator that creates an openforcefield.topology.Molecule one by one from the dataset.
Return the amount of components that have been ran during generating the dataset.
Calculate the total number of molecules filtered by the components used in a workflow to create this dataset.
Calculate the number of unique molecules to be submitted.
Return the number of QCSpecs on this dataset.
Calculate the number of records that will be submitted.
dataset
type
optimization_procedure
grid_spacing
energy_upper_limit
dihedral_ranges
energy_decrease_thresh
- property n_records: int
Calculate the number of records that will be submitted.
- to_tasks()[source]
Build a list of QCEngine procedure tasks which correspond to this dataset.
- add_molecule(index, molecule, extras=None, keywords=None, **kwargs)
Add a molecule to the dataset under the given index with the passed cmiles.
- Parameters
index (str) – The index that should be associated with the molecule in QCArchive.
molecule (Optional[openff.toolkit.topology.molecule.Molecule]) – The instance of the molecule which contains its conformer information.
extras (Optional[Dict[str, Any]]) – The extras that should be supplied into the qcportal.moldels.Molecule.
keywords (Optional[Dict[str, Any]]) – Any extra keywords which are required for the calculation.
- Return type
None
Note
Each molecule in this basic dataset should have all of its conformers expanded out into separate entries. Thus here we take the general molecule index and increment it.
- add_qc_spec(method, basis, program, spec_name, spec_description, store_wavefunction='none', overwrite=False, implicit_solvent=None, maxiter=200, scf_properties=None, keywords=None)
Add a new qcspecification to the factory which will be applied to the dataset.
- Parameters
method (str) – The name of the method to use eg B3LYP-D3BJ
basis (Optional[str]) – The name of the basis to use can also be None
program (str) – The name of the program to execute the computation
spec_name (str) – The name the spec should be stored under
spec_description (str) – The description of the spec
store_wavefunction (str) – what parts of the wavefunction that should be saved
overwrite (bool) – If there is a spec under this name already overwrite it
implicit_solvent (Optional[Union[openff.qcsubmit.common_structures.PCMSettings, openff.qcsubmit.common_structures.DDXSettings]]) – The implicit solvent settings if it is to be used.
maxiter (pydantic.v1.types.PositiveInt) – The maximum number of SCF iterations that should be done.
scf_properties (Optional[List[openff.qcsubmit.common_structures.SCFProperties]]) – The list of SCF properties that should be extracted from the calculation.
keywords (Optional[Dict[str, Union[pydantic.v1.types.StrictStr, pydantic.v1.types.StrictInt, pydantic.v1.types.StrictFloat, pydantic.v1.types.StrictBool, List[pydantic.v1.types.StrictFloat]]]]) – Program specific computational keywords that should be passed to the program
- Return type
None
- property components: List[Dict[str, Union[str, Dict[str, str]]]]
Gather the details of the components that were ran during the creation of this dataset.
- coverage_report(force_field, verbose=False)
Returns a summary of how many molecules within this dataset would be assigned each of the parameters in a force field.
Notes
Parameters which would not be assigned to any molecules in the dataset will not be included in the returned summary.
- Parameters
force_field (ForceField) – The force field containing the parameters to summarize.
verbose (bool) – If true a progress bar will be shown on screen.
- Returns
A dictionary of the form
coverage[handler_name][parameter_smirks] = count
which stores the number of molecules within this dataset that would be assigned to each parameter.- Return type
- dict(*args, **kwargs)
Overwrite the dict method to handle any enums when saving to yaml/json via a dict call.
- export_dataset(file_name, compression=None)
Export the dataset to file so that it can be used to make another dataset quickly.
- Parameters
- Raises
UnsupportedFiletypeError – If the requested file type is not supported.
- Return type
None
Note
The supported file types are:
json
Additionally, the file will automatically compressed depending on the final extension if compression is not explicitly supplied:
json.xz
json.gz
json.bz2
Check serializers.py for more details. Right now bz2 seems to produce the smallest files.
- filter_molecules(molecules, component, component_settings, component_provenance)
Filter a molecule or list of molecules by the component they failed.
- Parameters
molecules (Union[openff.toolkit.topology.molecule.Molecule, List[openff.toolkit.topology.molecule.Molecule]]) – A molecule or list of molecules to be filtered.
component_settings (Dict[str, Any]) – The dictionary representation of the component that filtered this set of molecules.
component (str) – The name of the component.
component_provenance (Dict[str, str]) – The dictionary representation of the component provenance.
- Return type
None
- property filtered: openff.toolkit.topology.molecule.Molecule
A generator which yields a openff molecule representation for each molecule filtered while creating this dataset.
Note
Modifying the molecule will have no effect on the data stored.
- get_molecule_entry(molecule)
Search through the dataset for a molecule and return the dataset index of any exact molecule matches.
- Parameters
molecule (Union[openff.toolkit.topology.molecule.Molecule, str]) – The smiles string for the molecule or an openforcefield.topology.Molecule that is to be searched for.
- Returns
A list of dataset indices which contain the target molecule.
- Return type
- property molecules: Generator[openff.toolkit.topology.molecule.Molecule, None, None]
A generator that creates an openforcefield.topology.Molecule one by one from the dataset.
Note
Editing the molecule will not effect the data stored in the dataset as it is immutable.
- molecules_to_file(file_name, file_type)
Write the molecules to the requested file type.
- Parameters
- Return type
None
Important
The supported file types are:
SMI
INCHI
INCKIKEY
- property n_components: int
Return the amount of components that have been ran during generating the dataset.
- property n_filtered: int
Calculate the total number of molecules filtered by the components used in a workflow to create this dataset.
- property n_molecules: int
Calculate the number of unique molecules to be submitted.
Notes
This method has been improved for better performance on large datasets and has been tested on an optimization dataset of over 10500 molecules.
This function does not calculate the total number of entries of the dataset see n_records
- property n_qc_specs: int
Return the number of QCSpecs on this dataset.
- classmethod parse_file(file_name)
Create a Dataset object from a compressed json file.
- Parameters
file_name (str) – The name of the file the dataset should be created from.
- remove_qcspec(spec_name)
Remove a QCSpec from the dataset.
- Parameters
spec_name (str) – The name of the spec that should be removed.
- Return type
None
Note
The QCSpec settings are not mutable and so they must be removed and a new one added to ensure they are fully validated.
- submit(client, ignore_errors=False, verbose=False)
Submit the dataset to a QCFractal server.
- Parameters
client (qcportal.client.PortalClient) – Instance of a portal client
ignore_errors (bool) – If the user wants to submit the compute regardless of errors set this to
True
. Mainly to override basis coverage.verbose (bool) – If progress bars and submission statistics should be printed
True
or notFalse
.
- Returns
A dictionary of the compute response from the client for each specification submitted.
- Raises
MissingBasisCoverageError – If the chosen basis set does not cover some of the elements in the dataset.
- Return type