BaseDatasetFactory
- class openff.qcsubmit.factories.BaseDatasetFactory(*, qc_specifications={'default': QCSpec(method='B3LYP-D3BJ', basis='DZVP', program='psi4', spec_name='default', spec_description='Standard OpenFF optimization quantum chemistry specification.', store_wavefunction=<WavefunctionProtocolEnum.none: 'none'>, implicit_solvent=None, maxiter=200, scf_properties=[<SCFProperties.Dipole: 'dipole'>, <SCFProperties.Quadrupole: 'quadrupole'>, <SCFProperties.WibergLowdinIndices: 'wiberg_lowdin_indices'>, <SCFProperties.MayerIndices: 'mayer_indices'>], keywords={})}, driver=SinglepointDriver.energy, priority='normal', dataset_tags=['openff'], compute_tag='openff', type='BaseDatasetFactory', workflow=[])[source]
The Base factory which all other dataset factories should inherit from.
- Parameters
qc_specifications (Dict[str, openff.qcsubmit.common_structures.QCSpec]) –
driver (qcportal.singlepoint.record_models.SinglepointDriver) –
priority (str) –
compute_tag (str) –
type (Literal['BaseDatasetFactory']) –
workflow (List[Union[openff.qcsubmit.workflow_components.conformer_generation.StandardConformerGenerator, openff.qcsubmit.workflow_components.filters.RMSDCutoffConformerFilter, openff.qcsubmit.workflow_components.filters.CoverageFilter, openff.qcsubmit.workflow_components.filters.ElementFilter, openff.qcsubmit.workflow_components.filters.MolecularWeightFilter, openff.qcsubmit.workflow_components.filters.RotorFilter, openff.qcsubmit.workflow_components.filters.SmartsFilter, openff.qcsubmit.workflow_components.state_enumeration.EnumerateTautomers, openff.qcsubmit.workflow_components.state_enumeration.EnumerateProtomers, openff.qcsubmit.workflow_components.state_enumeration.EnumerateStereoisomers, openff.qcsubmit.workflow_components.fragmentation.WBOFragmenter, openff.qcsubmit.workflow_components.fragmentation.PfizerFragmenter, openff.qcsubmit.workflow_components.fragmentation.RECAPFragmenter, openff.qcsubmit.workflow_components.filters.ChargeFilter, openff.qcsubmit.workflow_components.filters.ScanFilter, openff.qcsubmit.workflow_components.state_enumeration.ScanEnumerator]]) –
- Return type
None
- __init__(**data)
Create a new model by parsing and validating input data from keyword arguments.
Raises ValidationError if the input data cannot be parsed to form a valid model.
- Parameters
data (Any) –
- Return type
None
Methods
__init__
(**data)Create a new model by parsing and validating input data from keyword arguments.
add_qc_spec
(method, basis, program, ...[, ...])Add a new qcspecification to the factory which will be applied to the dataset.
add_workflow_components
(*components)Take the workflow components validate them then insert them into the workflow.
Clear out any current QCSpecs.
Reset the workflow to be empty.
construct
([_fields_set])Creates a new model setting __dict__ and __fields_set__ from trusted or pre-validated data.
copy
(*[, include, exclude, update, deep])Duplicate a model, optionally choose which fields to include, exclude and change.
create_dataset
(dataset_name, molecules, ...)Process the input molecules through the given workflow then create and populate the corresponding dataset class which acts as a local representation for the collection and tasks to be performed in qcarchive.
create_index
(molecule)Create an index for the current molecule.
dict
(*args, **kwargs)Overwrite the dict method to handle any enums when saving to yaml/json via a dict call.
export
(file_name)Export the whole factory to file including settings and workflow.
export_settings
(file_name)Export the current model to file this will include the workflow as well along with each components settings.
export_workflow
(file_name)Export the workflow components and their settings to file so that they can be loaded later.
from_file
(file_name)Create a factory from the serialised model file.
from_orm
(obj)get_workflow_components
(component_name)Find any workflow components with this component name.
import_settings
(settings[, clear_workflow])Import settings and workflow from a file.
import_workflow
(workflow[, clear_existing])Instance the workflow from a workflow object or from an input file.
json
(*[, include, exclude, by_alias, ...])Generate a JSON representation of the model, include and exclude arguments as per dict().
parse_file
(path, *[, content_type, ...])parse_obj
(obj)parse_raw
(b, *[, content_type, encoding, ...])provenance
(toolkit_registry)Create the provenance of openff-qcsubmit that created the molecule input data.
remove_qcspec
(spec_name)Remove a QCSpec from the dataset.
remove_workflow_component
(component_name)Find and remove any components via its type attribute.
schema
([by_alias, ref_template])schema_json
(*[, by_alias, ref_template])update_forward_refs
(**localns)Try to update ForwardRefs on fields based on this Model, globalns and localns.
validate
(value)Attributes
Return the number of QCSpecs on this dataset.
type
workflow
- classmethod from_file(file_name)[source]
Create a factory from the serialised model file.
- Parameters
file_name (str) –
- provenance(toolkit_registry)[source]
Create the provenance of openff-qcsubmit that created the molecule input data.
- Returns
A dict of the provenance information.
- Parameters
toolkit_registry (openff.toolkit.utils.toolkit_registry.ToolkitRegistry) –
- Return type
Important
We can not check which toolkit was used to generate the Cmiles data but we know that openeye will always be used first when available.
- clear_workflow()[source]
Reset the workflow to be empty.
- Return type
None
- add_workflow_components(*components)[source]
Take the workflow components validate them then insert them into the workflow.
- Parameters
components (Union[openff.qcsubmit.workflow_components.conformer_generation.StandardConformerGenerator, openff.qcsubmit.workflow_components.filters.RMSDCutoffConformerFilter, openff.qcsubmit.workflow_components.filters.CoverageFilter, openff.qcsubmit.workflow_components.filters.ElementFilter, openff.qcsubmit.workflow_components.filters.MolecularWeightFilter, openff.qcsubmit.workflow_components.filters.RotorFilter, openff.qcsubmit.workflow_components.filters.SmartsFilter, openff.qcsubmit.workflow_components.state_enumeration.EnumerateTautomers, openff.qcsubmit.workflow_components.state_enumeration.EnumerateProtomers, openff.qcsubmit.workflow_components.state_enumeration.EnumerateStereoisomers, openff.qcsubmit.workflow_components.fragmentation.WBOFragmenter, openff.qcsubmit.workflow_components.fragmentation.PfizerFragmenter, openff.qcsubmit.workflow_components.fragmentation.RECAPFragmenter, openff.qcsubmit.workflow_components.filters.ChargeFilter, openff.qcsubmit.workflow_components.filters.ScanFilter, openff.qcsubmit.workflow_components.state_enumeration.ScanEnumerator]) – A list of or an individual workflow component which is to be validated and added to the current workflow.
- Raises
InvalidWorkflowComponentError – If an invalid workflow component is attempted to be added to the workflow.
- Return type
None
- get_workflow_components(component_name)[source]
Find any workflow components with this component name.
- Parameters
component_name (str) – The name of the component to be gathered from the workflow.
- Returns
A list of instances of the requested component from the workflow.
- Raises
MissingWorkflowComponentError – If the component could not be found by its component name in the workflow.
- Return type
List[Union[openff.qcsubmit.workflow_components.conformer_generation.StandardConformerGenerator, openff.qcsubmit.workflow_components.filters.RMSDCutoffConformerFilter, openff.qcsubmit.workflow_components.filters.CoverageFilter, openff.qcsubmit.workflow_components.filters.ElementFilter, openff.qcsubmit.workflow_components.filters.MolecularWeightFilter, openff.qcsubmit.workflow_components.filters.RotorFilter, openff.qcsubmit.workflow_components.filters.SmartsFilter, openff.qcsubmit.workflow_components.state_enumeration.EnumerateTautomers, openff.qcsubmit.workflow_components.state_enumeration.EnumerateProtomers, openff.qcsubmit.workflow_components.state_enumeration.EnumerateStereoisomers, openff.qcsubmit.workflow_components.fragmentation.WBOFragmenter, openff.qcsubmit.workflow_components.fragmentation.PfizerFragmenter, openff.qcsubmit.workflow_components.fragmentation.RECAPFragmenter, openff.qcsubmit.workflow_components.filters.ChargeFilter, openff.qcsubmit.workflow_components.filters.ScanFilter, openff.qcsubmit.workflow_components.state_enumeration.ScanEnumerator]]
- remove_workflow_component(component_name)[source]
Find and remove any components via its type attribute.
- Parameters
component_name (str) – The name of the component to be gathered from the workflow.
- Raises
MissingWorkflowComponentError – If the component could not be found by its component name in the workflow.
- Return type
None
- import_workflow(workflow, clear_existing=True)[source]
Instance the workflow from a workflow object or from an input file.
- export_workflow(file_name)[source]
Export the workflow components and their settings to file so that they can be loaded later.
- Parameters
file_name (str) – The name of the file the workflow should be exported to.
- Raises
UnsupportedFiletypeError – If the file type is not supported.
- Return type
None
- export(file_name)[source]
Export the whole factory to file including settings and workflow.
- Parameters
file_name (str) – The name of the file the factory should be exported to.
- Return type
None
- export_settings(file_name)[source]
Export the current model to file this will include the workflow as well along with each components settings.
- Parameters
file_name (str) – The name of the file the settings and workflow should be exported to.
- Raises
UnsupportedFiletypeError – When the file type requested is not supported.
- Return type
None
- import_settings(settings, clear_workflow=True)[source]
Import settings and workflow from a file.
- create_dataset(dataset_name, molecules, description, tagline, metadata=None, processors=None, toolkit_registry=None, verbose=True)[source]
Process the input molecules through the given workflow then create and populate the corresponding dataset class which acts as a local representation for the collection and tasks to be performed in qcarchive.
- Parameters
dataset_name (str) – The name that will be given to the collection on submission to an archive instance.
molecules (Union[str, List[openff.toolkit.topology.molecule.Molecule], openff.toolkit.topology.molecule.Molecule]) – The list of molecules which should be processed by the workflow and added to the dataset, this can also be a file name which is to be unpacked by the openforcefield toolkit.
description (str) – A string describing the dataset this should be detail the purpose of the dataset and outline the selection method of the molecules.
tagline (str) – A short tagline description which will be displayed with collection name in the QCArchive.
metadata (Optional[openff.qcsubmit.common_structures.Metadata]) – Any metadata which should be associated with this dataset this can be changed from the default after making the dataset.
processors (Optional[int]) – The number of processors available to the workflow, note None will use all available processors.
toolkit_registry (Optional[openff.toolkit.utils.toolkit_registry.ToolkitRegistry]) – The openff.toolkit.utils.ToolkitRegistry which declares the available toolkits and the order in which they should be queried for functionality.If
None
is passed the default global registry will be used with all installed toolkits.verbose (bool) – If True a progress bar for each workflow component will be shown.
- Returns
A dataset instance populated with the molecules that have passed through the workflow.
- Return type
openff.qcsubmit.factories.T
- create_index(molecule)[source]
Create an index for the current molecule.
- Parameters
molecule (openff.toolkit.topology.molecule.Molecule) – The molecule for which the dataset index will be generated.
- Returns
The molecule name or the canonical isomeric smiles for the molecule if the name is not assigned or is blank.
- Return type
Important
Each dataset can have a different indexing system depending on the data, in this basic dataset each conformer of a molecule is expanded into its own entry separately indexed entry. This is handled by the dataset however so we just generate a general index for the molecule before adding to the dataset.
- add_qc_spec(method, basis, program, spec_name, spec_description, store_wavefunction='none', overwrite=False, implicit_solvent=None, maxiter=200, scf_properties=None, keywords=None)
Add a new qcspecification to the factory which will be applied to the dataset.
- Parameters
method (str) – The name of the method to use eg B3LYP-D3BJ
basis (Optional[str]) – The name of the basis to use can also be None
program (str) – The name of the program to execute the computation
spec_name (str) – The name the spec should be stored under
spec_description (str) – The description of the spec
store_wavefunction (str) – what parts of the wavefunction that should be saved
overwrite (bool) – If there is a spec under this name already overwrite it
implicit_solvent (Optional[Union[openff.qcsubmit.common_structures.PCMSettings, openff.qcsubmit.common_structures.DDXSettings]]) – The implicit solvent settings if it is to be used.
maxiter (pydantic.v1.types.PositiveInt) – The maximum number of SCF iterations that should be done.
scf_properties (Optional[List[openff.qcsubmit.common_structures.SCFProperties]]) – The list of SCF properties that should be extracted from the calculation.
keywords (Optional[Dict[str, Union[pydantic.v1.types.StrictStr, pydantic.v1.types.StrictInt, pydantic.v1.types.StrictFloat, pydantic.v1.types.StrictBool, List[pydantic.v1.types.StrictFloat]]]]) – Program specific computational keywords that should be passed to the program
- Return type
None
- dict(*args, **kwargs)
Overwrite the dict method to handle any enums when saving to yaml/json via a dict call.
- property n_qc_specs: int
Return the number of QCSpecs on this dataset.
- remove_qcspec(spec_name)
Remove a QCSpec from the dataset.
- Parameters
spec_name (str) – The name of the spec that should be removed.
- Return type
None
Note
The QCSpec settings are not mutable and so they must be removed and a new one added to ensure they are fully validated.