BaseDatasetFactory

class openff.qcsubmit.factories.BaseDatasetFactory(*, qc_specifications={'default': QCSpec(method='B3LYP-D3BJ', basis='DZVP', program='psi4', spec_name='default', spec_description='Standard OpenFF optimization quantum chemistry specification.', store_wavefunction=<WavefunctionProtocolEnum.none: 'none'>, implicit_solvent=None, maxiter=200, scf_properties=[<SCFProperties.Dipole: 'dipole'>, <SCFProperties.Quadrupole: 'quadrupole'>, <SCFProperties.WibergLowdinIndices: 'wiberg_lowdin_indices'>, <SCFProperties.MayerIndices: 'mayer_indices'>], keywords={})}, driver=SinglepointDriver.energy, priority='normal', dataset_tags=['openff'], compute_tag='openff', type='BaseDatasetFactory', workflow=[])[source]

The Base factory which all other dataset factories should inherit from.

Parameters
Return type

None

__init__(**data)

Create a new model by parsing and validating input data from keyword arguments.

Raises ValidationError if the input data cannot be parsed to form a valid model.

Parameters

data (Any) –

Return type

None

Methods

__init__(**data)

Create a new model by parsing and validating input data from keyword arguments.

add_qc_spec(method, basis, program, ...[, ...])

Add a new qcspecification to the factory which will be applied to the dataset.

add_workflow_components(*components)

Take the workflow components validate them then insert them into the workflow.

clear_qcspecs()

Clear out any current QCSpecs.

clear_workflow()

Reset the workflow to be empty.

construct([_fields_set])

Creates a new model setting __dict__ and __fields_set__ from trusted or pre-validated data.

copy(*[, include, exclude, update, deep])

Duplicate a model, optionally choose which fields to include, exclude and change.

create_dataset(dataset_name, molecules, ...)

Process the input molecules through the given workflow then create and populate the corresponding dataset class which acts as a local representation for the collection and tasks to be performed in qcarchive.

create_index(molecule)

Create an index for the current molecule.

dict(*args, **kwargs)

Overwrite the dict method to handle any enums when saving to yaml/json via a dict call.

export(file_name)

Export the whole factory to file including settings and workflow.

export_settings(file_name)

Export the current model to file this will include the workflow as well along with each components settings.

export_workflow(file_name)

Export the workflow components and their settings to file so that they can be loaded later.

from_file(file_name)

Create a factory from the serialised model file.

from_orm(obj)

get_workflow_components(component_name)

Find any workflow components with this component name.

import_settings(settings[, clear_workflow])

Import settings and workflow from a file.

import_workflow(workflow[, clear_existing])

Instance the workflow from a workflow object or from an input file.

json(*[, include, exclude, by_alias, ...])

Generate a JSON representation of the model, include and exclude arguments as per dict().

parse_file(path, *[, content_type, ...])

parse_obj(obj)

parse_raw(b, *[, content_type, encoding, ...])

provenance(toolkit_registry)

Create the provenance of openff-qcsubmit that created the molecule input data.

remove_qcspec(spec_name)

Remove a QCSpec from the dataset.

remove_workflow_component(component_name)

Find and remove any components via its type attribute.

schema([by_alias, ref_template])

schema_json(*[, by_alias, ref_template])

update_forward_refs(**localns)

Try to update ForwardRefs on fields based on this Model, globalns and localns.

validate(value)

Attributes

n_qc_specs

Return the number of QCSpecs on this dataset.

type

workflow

classmethod from_file(file_name)[source]

Create a factory from the serialised model file.

Parameters

file_name (str) –

provenance(toolkit_registry)[source]

Create the provenance of openff-qcsubmit that created the molecule input data.

Returns

A dict of the provenance information.

Parameters

toolkit_registry (openff.toolkit.utils.toolkit_registry.ToolkitRegistry) –

Return type

Dict[str, str]

Important

We can not check which toolkit was used to generate the Cmiles data but we know that openeye will always be used first when available.

clear_workflow()[source]

Reset the workflow to be empty.

Return type

None

add_workflow_components(*components)[source]

Take the workflow components validate them then insert them into the workflow.

Parameters

components (Union[openff.qcsubmit.workflow_components.conformer_generation.StandardConformerGenerator, openff.qcsubmit.workflow_components.filters.RMSDCutoffConformerFilter, openff.qcsubmit.workflow_components.filters.CoverageFilter, openff.qcsubmit.workflow_components.filters.ElementFilter, openff.qcsubmit.workflow_components.filters.MolecularWeightFilter, openff.qcsubmit.workflow_components.filters.RotorFilter, openff.qcsubmit.workflow_components.filters.SmartsFilter, openff.qcsubmit.workflow_components.state_enumeration.EnumerateTautomers, openff.qcsubmit.workflow_components.state_enumeration.EnumerateProtomers, openff.qcsubmit.workflow_components.state_enumeration.EnumerateStereoisomers, openff.qcsubmit.workflow_components.fragmentation.WBOFragmenter, openff.qcsubmit.workflow_components.fragmentation.PfizerFragmenter, openff.qcsubmit.workflow_components.filters.ChargeFilter, openff.qcsubmit.workflow_components.filters.ScanFilter, openff.qcsubmit.workflow_components.state_enumeration.ScanEnumerator]) – A list of or an individual workflow component which is to be validated and added to the current workflow.

Raises

InvalidWorkflowComponentError – If an invalid workflow component is attempted to be added to the workflow.

Return type

None

get_workflow_components(component_name)[source]

Find any workflow components with this component name.

Parameters

component_name (str) – The name of the component to be gathered from the workflow.

Returns

A list of instances of the requested component from the workflow.

Raises

MissingWorkflowComponentError – If the component could not be found by its component name in the workflow.

Return type

List[Union[openff.qcsubmit.workflow_components.conformer_generation.StandardConformerGenerator, openff.qcsubmit.workflow_components.filters.RMSDCutoffConformerFilter, openff.qcsubmit.workflow_components.filters.CoverageFilter, openff.qcsubmit.workflow_components.filters.ElementFilter, openff.qcsubmit.workflow_components.filters.MolecularWeightFilter, openff.qcsubmit.workflow_components.filters.RotorFilter, openff.qcsubmit.workflow_components.filters.SmartsFilter, openff.qcsubmit.workflow_components.state_enumeration.EnumerateTautomers, openff.qcsubmit.workflow_components.state_enumeration.EnumerateProtomers, openff.qcsubmit.workflow_components.state_enumeration.EnumerateStereoisomers, openff.qcsubmit.workflow_components.fragmentation.WBOFragmenter, openff.qcsubmit.workflow_components.fragmentation.PfizerFragmenter, openff.qcsubmit.workflow_components.filters.ChargeFilter, openff.qcsubmit.workflow_components.filters.ScanFilter, openff.qcsubmit.workflow_components.state_enumeration.ScanEnumerator]]

remove_workflow_component(component_name)[source]

Find and remove any components via its type attribute.

Parameters

component_name (str) – The name of the component to be gathered from the workflow.

Raises

MissingWorkflowComponentError – If the component could not be found by its component name in the workflow.

Return type

None

import_workflow(workflow, clear_existing=True)[source]

Instance the workflow from a workflow object or from an input file.

Parameters
  • workflow (Union[str, Dict]) – The name of the file the workflow should be created from or a workflow dictionary.

  • clear_existing (bool) – If the current workflow should be deleted and replaced or extended.

Return type

None

export_workflow(file_name)[source]

Export the workflow components and their settings to file so that they can be loaded later.

Parameters

file_name (str) – The name of the file the workflow should be exported to.

Raises

UnsupportedFiletypeError – If the file type is not supported.

Return type

None

export(file_name)[source]

Export the whole factory to file including settings and workflow.

Parameters

file_name (str) – The name of the file the factory should be exported to.

Return type

None

export_settings(file_name)[source]

Export the current model to file this will include the workflow as well along with each components settings.

Parameters

file_name (str) – The name of the file the settings and workflow should be exported to.

Raises

UnsupportedFiletypeError – When the file type requested is not supported.

Return type

None

import_settings(settings, clear_workflow=True)[source]

Import settings and workflow from a file.

Parameters
  • settings (Union[str, Dict]) – The name of the file the settings should be extracted from or the reference to a settings dictionary.

  • clear_workflow (bool) – If the current workflow should be extended or replaced.

Return type

None

create_dataset(dataset_name, molecules, description, tagline, metadata=None, processors=None, toolkit_registry=None, verbose=True)[source]

Process the input molecules through the given workflow then create and populate the corresponding dataset class which acts as a local representation for the collection and tasks to be performed in qcarchive.

Parameters
  • dataset_name (str) – The name that will be given to the collection on submission to an archive instance.

  • molecules (Union[str, List[openff.toolkit.topology.molecule.Molecule], openff.toolkit.topology.molecule.Molecule]) – The list of molecules which should be processed by the workflow and added to the dataset, this can also be a file name which is to be unpacked by the openforcefield toolkit.

  • description (str) – A string describing the dataset this should be detail the purpose of the dataset and outline the selection method of the molecules.

  • tagline (str) – A short tagline description which will be displayed with collection name in the QCArchive.

  • metadata (Optional[openff.qcsubmit.common_structures.Metadata]) – Any metadata which should be associated with this dataset this can be changed from the default after making the dataset.

  • processors (Optional[int]) – The number of processors available to the workflow, note None will use all available processors.

  • toolkit_registry (Optional[openff.toolkit.utils.toolkit_registry.ToolkitRegistry]) – The openff.toolkit.utils.ToolkitRegistry which declares the available toolkits and the order in which they should be queried for functionality.If None is passed the default global registry will be used with all installed toolkits.

  • verbose (bool) – If True a progress bar for each workflow component will be shown.

Returns

A dataset instance populated with the molecules that have passed through the workflow.

Return type

openff.qcsubmit.factories.T

create_index(molecule)[source]

Create an index for the current molecule.

Parameters

molecule (openff.toolkit.topology.molecule.Molecule) – The molecule for which the dataset index will be generated.

Returns

The molecule name or the canonical isomeric smiles for the molecule if the name is not assigned or is blank.

Return type

str

Important

Each dataset can have a different indexing system depending on the data, in this basic dataset each conformer of a molecule is expanded into its own entry separately indexed entry. This is handled by the dataset however so we just generate a general index for the molecule before adding to the dataset.

add_qc_spec(method, basis, program, spec_name, spec_description, store_wavefunction='none', overwrite=False, implicit_solvent=None, maxiter=200, scf_properties=None, keywords=None)

Add a new qcspecification to the factory which will be applied to the dataset.

Parameters
  • method (str) – The name of the method to use eg B3LYP-D3BJ

  • basis (Optional[str]) – The name of the basis to use can also be None

  • program (str) – The name of the program to execute the computation

  • spec_name (str) – The name the spec should be stored under

  • spec_description (str) – The description of the spec

  • store_wavefunction (str) – what parts of the wavefunction that should be saved

  • overwrite (bool) – If there is a spec under this name already overwrite it

  • implicit_solvent (Optional[openff.qcsubmit.common_structures.PCMSettings]) – The implicit solvent settings if it is to be used.

  • maxiter (pydantic.v1.types.PositiveInt) – The maximum number of SCF iterations that should be done.

  • scf_properties (Optional[List[openff.qcsubmit.common_structures.SCFProperties]]) – The list of SCF properties that should be extracted from the calculation.

  • keywords (Optional[Dict[str, Union[pydantic.v1.types.StrictStr, pydantic.v1.types.StrictInt, pydantic.v1.types.StrictFloat, pydantic.v1.types.StrictBool, List[pydantic.v1.types.StrictFloat]]]]) – Program specific computational keywords that should be passed to the program

Return type

None

clear_qcspecs()

Clear out any current QCSpecs.

Return type

None

dict(*args, **kwargs)

Overwrite the dict method to handle any enums when saving to yaml/json via a dict call.

property n_qc_specs: int

Return the number of QCSpecs on this dataset.

remove_qcspec(spec_name)

Remove a QCSpec from the dataset.

Parameters

spec_name (str) – The name of the spec that should be removed.

Return type

None

Note

The QCSpec settings are not mutable and so they must be removed and a new one added to ensure they are fully validated.