BasicDataset

pydantic model openff.qcsubmit.datasets.BasicDataset[source]

The general QCFractal dataset class which contains all of the molecules and information about them prior to submission.

The class is a simple holder of the dataset and information about it and can do simple checks on the data before submitting it such as ensuring that the molecules have cmiles information and a unique index to be identified by.

Note

The molecules in this dataset are all expanded so that different conformers are unique submissions.

Show JSON schema
{
   "title": "BasicDataset",
   "description": "The general QCFractal dataset class which contains all of the molecules and information about them prior to\nsubmission.\n\nThe class is a simple holder of the dataset and information about it and can do simple checks on the data before\nsubmitting it such as ensuring that the molecules have cmiles information\nand a unique index to be identified by.\n\nNote:\n    The molecules in this dataset are all expanded so that different conformers are unique submissions.",
   "type": "object",
   "properties": {
      "qc_specifications": {
         "title": "Qc Specifications",
         "description": "The QCSpecifications which will be computed for this dataset.",
         "default": {
            "default": {
               "method": "B3LYP-D3BJ",
               "basis": "DZVP",
               "program": "psi4",
               "spec_name": "default",
               "spec_description": "Standard OpenFF optimization quantum chemistry specification.",
               "store_wavefunction": "none",
               "implicit_solvent": null,
               "maxiter": 200,
               "scf_properties": [
                  "dipole",
                  "quadrupole",
                  "wiberg_lowdin_indices",
                  "mayer_indices"
               ],
               "keywords": null
            }
         },
         "type": "object",
         "additionalProperties": {
            "$ref": "#/definitions/QCSpec"
         }
      },
      "driver": {
         "description": "The type of single point calculations which will be computed. Note some services require certain calculations for example optimizations require graident calculations.",
         "default": "energy",
         "allOf": [
            {
               "$ref": "#/definitions/DriverEnum"
            }
         ]
      },
      "priority": {
         "title": "Priority",
         "description": "The priority the dataset should be computed at compared to other datasets currently running.",
         "default": "normal",
         "type": "string"
      },
      "dataset_tags": {
         "title": "Dataset Tags",
         "description": "The dataset tags which help identify the dataset.",
         "default": [
            "openff"
         ],
         "type": "array",
         "items": {
            "type": "string"
         }
      },
      "compute_tag": {
         "title": "Compute Tag",
         "description": "The tag the computes tasks will be assigned to, managers wishing to execute these tasks should use this compute tag.",
         "default": "openff",
         "type": "string"
      },
      "dataset_name": {
         "title": "Dataset Name",
         "description": "The name of the dataset, this will be the name given to the collection in QCArchive.",
         "type": "string"
      },
      "dataset_tagline": {
         "title": "Dataset Tagline",
         "description": "The tagline should be a short description of the dataset which will be displayed by the QCArchive client when the collections are listed.",
         "minLength": 8,
         "pattern": "[a-zA-Z]",
         "type": "string"
      },
      "type": {
         "title": "Type",
         "default": "DataSet",
         "enum": [
            "DataSet"
         ],
         "type": "string"
      },
      "description": {
         "title": "Description",
         "description": "A long description of the datasets purpose and details about the molecules within.",
         "minLength": 8,
         "pattern": "[a-zA-Z]",
         "type": "string"
      },
      "metadata": {
         "title": "Metadata",
         "description": "The metadata describing the dataset.",
         "default": {
            "submitter": "docs",
            "creation_date": "2022-04-29",
            "collection_type": null,
            "dataset_name": null,
            "short_description": null,
            "long_description_url": null,
            "long_description": null,
            "elements": "set()"
         },
         "allOf": [
            {
               "$ref": "#/definitions/Metadata"
            }
         ]
      },
      "provenance": {
         "title": "Provenance",
         "description": "A dictionary of the software and versions used to generate the dataset.",
         "default": {},
         "type": "object",
         "additionalProperties": {
            "type": "string"
         }
      },
      "dataset": {
         "title": "Dataset",
         "description": "The actual dataset to be stored in QCArchive.",
         "default": {},
         "type": "object",
         "additionalProperties": {
            "$ref": "#/definitions/DatasetEntry"
         }
      },
      "filtered_molecules": {
         "title": "Filtered Molecules",
         "description": "The set of workflow components used to generate the dataset with any filtered molecules.",
         "default": {},
         "type": "object",
         "additionalProperties": {
            "$ref": "#/definitions/FilterEntry"
         }
      }
   },
   "required": [
      "dataset_name",
      "dataset_tagline",
      "description"
   ],
   "definitions": {
      "WavefunctionProtocolEnum": {
         "title": "WavefunctionProtocolEnum",
         "description": "Wavefunction to keep from a computation.",
         "enum": [
            "all",
            "orbitals_and_eigenvalues",
            "return_results",
            "none"
         ],
         "type": "string"
      },
      "PCMSettings": {
         "title": "PCMSettings",
         "description": "A class to handle PCM settings which can be used with PSi4.",
         "type": "object",
         "properties": {
            "units": {
               "title": "Units",
               "description": "The units used in the input options atomic units are used by default.",
               "type": "string"
            },
            "codata": {
               "title": "Codata",
               "description": "The set of fundamental physical constants to be used in the module.",
               "default": 2010,
               "type": "integer"
            },
            "cavity_Type": {
               "title": "Cavity Type",
               "description": "Completely specifies type of molecular surface and its discretization.",
               "default": "GePol",
               "type": "string"
            },
            "cavity_Area": {
               "title": "Cavity Area",
               "description": "Average area (weight) of the surface partition for the GePol cavity in the specified units. By default this is in AU.",
               "default": 0.3,
               "type": "number"
            },
            "cavity_Scaling": {
               "title": "Cavity Scaling",
               "description": "If true, the radii for the spheres will be scaled by 1.2. For finer control on the scaling factor for each sphere, select explicit creation mode.",
               "default": true,
               "type": "boolean"
            },
            "cavity_RadiiSet": {
               "title": "Cavity Radiiset",
               "description": "Select set of atomic radii to be used. Currently Bondi-Mantina Bondi, UFF  and Allinger\u2019s MM3 sets available. Radii in Allinger\u2019s MM3 set are obtained by dividing the value in the original paper by 1.2, as done in the ADF COSMO implementation We advise to turn off scaling of the radii by 1.2 when using this set.",
               "default": "Bondi",
               "type": "string"
            },
            "cavity_MinRadius": {
               "title": "Cavity Minradius",
               "description": "Minimal radius for additional spheres not centered on atoms. An arbitrarily big value is equivalent to switching off the use of added spheres, which is the default in AU.",
               "default": 100,
               "type": "number"
            },
            "cavity_Mode": {
               "title": "Cavity Mode",
               "description": "How to create the list of spheres for the generation of the molecular surface.",
               "default": "Implicit",
               "type": "string"
            },
            "medium_SolverType": {
               "title": "Medium Solvertype",
               "description": "Type of solver to be used. All solvers are based on the Integral Equation Formulation of the Polarizable Continuum Model.",
               "default": "IEFPCM",
               "type": "string"
            },
            "medium_Nonequilibrium": {
               "title": "Medium Nonequilibrium",
               "description": "Initializes an additional solver using the dynamic permittivity. To be used in response calculations.",
               "default": false,
               "type": "boolean"
            },
            "medium_Solvent": {
               "title": "Medium Solvent",
               "description": "Specification of the dielectric medium outside the cavity. Note this will always be converted to the molecular formula to aid parsing via PCM.",
               "type": "string"
            },
            "medium_MatrixSymm": {
               "title": "Medium Matrixsymm",
               "description": "If True, the PCM matrix obtained by the IEFPCM collocation solver is symmetrized.",
               "default": true,
               "type": "boolean"
            },
            "medium_Correction": {
               "title": "Medium Correction",
               "description": "Correction, k for the apparent surface charge scaling factor in the CPCM solver.",
               "default": 0.0,
               "minimum": 0,
               "type": "number"
            },
            "medium_DiagonalScaling": {
               "title": "Medium Diagonalscaling",
               "description": "Scaling factor for diagonal of collocation matrices, values commonly used in the literature are 1.07 and 1.0694.",
               "default": 1.07,
               "minimum": 0,
               "type": "number"
            },
            "medium_ProbeRadius": {
               "title": "Medium Proberadius",
               "description": "Radius of the spherical probe approximating a solvent molecule. Used for generating the solvent-excluded surface (SES) or an approximation of it. Overridden by the built-in value for the chosen solvent. Default in AU.",
               "default": 1.0,
               "type": "number"
            }
         },
         "required": [
            "units",
            "medium_Solvent"
         ]
      },
      "SCFProperties": {
         "title": "SCFProperties",
         "description": "The type of SCF property that should be extracted from a single point calculation.",
         "enum": [
            "dipole",
            "quadrupole",
            "mulliken_charges",
            "lowdin_charges",
            "wiberg_lowdin_indices",
            "mayer_indices",
            "mbis_charges"
         ],
         "type": "string"
      },
      "QCSpec": {
         "title": "QCSpec",
         "description": "A basic config class for results structures.",
         "type": "object",
         "properties": {
            "method": {
               "title": "Method",
               "description": "The name of the computational model used to execute the calculation. This could be the QC method or the forcefield name.",
               "default": "B3LYP-D3BJ",
               "type": "string"
            },
            "basis": {
               "title": "Basis",
               "description": "The name of the basis that should be used with the given method, outside of QC this can be the parameterization ie antechamber or None.",
               "default": "DZVP",
               "type": "string"
            },
            "program": {
               "title": "Program",
               "description": "The name of the program that will be used to perform the calculation.",
               "default": "psi4",
               "type": "string"
            },
            "spec_name": {
               "title": "Spec Name",
               "description": "The name the specification will be stored under in QCArchive.",
               "default": "default",
               "type": "string"
            },
            "spec_description": {
               "title": "Spec Description",
               "description": "The description of the specification which will be stored in QCArchive.",
               "default": "Standard OpenFF optimization quantum chemistry specification.",
               "type": "string"
            },
            "store_wavefunction": {
               "description": "The level of wavefunction detail that should be saved in QCArchive. Note that this is done for every calculation and should not be used with optimizations.",
               "default": "none",
               "allOf": [
                  {
                     "$ref": "#/definitions/WavefunctionProtocolEnum"
                  }
               ]
            },
            "implicit_solvent": {
               "title": "Implicit Solvent",
               "description": "If PCM is to be used with psi4 this is the full description of the settings that should be used.",
               "allOf": [
                  {
                     "$ref": "#/definitions/PCMSettings"
                  }
               ]
            },
            "maxiter": {
               "title": "Maxiter",
               "description": "The maximum number of SCF iterations in QM calculations this will be ignored by programs where this does not make sense.",
               "default": 200,
               "exclusiveMinimum": 0,
               "type": "integer"
            },
            "scf_properties": {
               "description": "The SCF properties which should be extracted after every single point calculation.",
               "default": [
                  "dipole",
                  "quadrupole",
                  "wiberg_lowdin_indices",
                  "mayer_indices"
               ],
               "type": "array",
               "items": {
                  "$ref": "#/definitions/SCFProperties"
               }
            },
            "keywords": {
               "title": "Keywords",
               "description": "An optional set of program specific computational keywords that should be passed to the program. These may include, for example, DFT grid settings.",
               "type": "object",
               "additionalProperties": {
                  "anyOf": [
                     {
                        "type": "string"
                     },
                     {
                        "type": "integer"
                     },
                     {
                        "type": "number"
                     },
                     {
                        "type": "boolean"
                     },
                     {
                        "type": "array",
                        "items": {
                           "type": "number"
                        }
                     }
                  ]
               }
            }
         }
      },
      "DriverEnum": {
         "title": "DriverEnum",
         "description": "The type of calculation that is being performed (e.g., energy, gradient, Hessian, ...).",
         "enum": [
            "energy",
            "gradient",
            "hessian",
            "properties"
         ],
         "type": "string"
      },
      "Metadata": {
         "title": "Metadata",
         "description": "A general metadata class which is required to be filled in before submitting a dataset to the qcarchive.",
         "type": "object",
         "properties": {
            "submitter": {
               "title": "Submitter",
               "description": "The name of the submitter/creator of the dataset, this is automatically generated but can be changed.",
               "default": "docs",
               "type": "string"
            },
            "creation_date": {
               "title": "Creation Date",
               "description": "The date the dataset was created on, this is automatically generated.",
               "default": "2022-04-29",
               "type": "string",
               "format": "date"
            },
            "collection_type": {
               "title": "Collection Type",
               "description": "The type of collection that will be created in QCArchive this is automatically updated when attached to a dataset.",
               "type": "string"
            },
            "dataset_name": {
               "title": "Dataset Name",
               "description": "The name that will be given to the collection once it is put into QCArchive, this is updated when attached to a dataset.",
               "type": "string"
            },
            "short_description": {
               "title": "Short Description",
               "description": "A short informative description of the dataset.",
               "minLength": 8,
               "pattern": "[a-zA-Z]",
               "type": "string"
            },
            "long_description_url": {
               "title": "Long Description Url",
               "description": "The url which links to more information about the submission normally a github repo with scripts showing how the dataset was created.",
               "minLength": 1,
               "maxLength": 2083,
               "format": "uri",
               "type": "string"
            },
            "long_description": {
               "title": "Long Description",
               "description": "A long description of the purpose of the dataset and the molecules within.",
               "minLength": 8,
               "pattern": "[a-zA-Z]",
               "type": "string"
            },
            "elements": {
               "title": "Elements",
               "description": "The unique set of elements present in the dataset",
               "default": "set()",
               "type": "array",
               "items": {
                  "type": "string"
               },
               "uniqueItems": true
            }
         }
      },
      "Identifiers": {
         "title": "Identifiers",
         "description": "Canonical chemical identifiers\n\nParameters\n----------\nmolecule_hash : str, Optional\nmolecular_formula : str, Optional\nsmiles : str, Optional\ninchi : str, Optional\ninchikey : str, Optional\ncanonical_explicit_hydrogen_smiles : str, Optional\ncanonical_isomeric_explicit_hydrogen_mapped_smiles : str, Optional\ncanonical_isomeric_explicit_hydrogen_smiles : str, Optional\ncanonical_isomeric_smiles : str, Optional\ncanonical_smiles : str, Optional\npubchem_cid : str, Optional\n    PubChem Compound ID\npubchem_sid : str, Optional\n    PubChem Substance ID\npubchem_conformerid : str, Optional\n    PubChem Conformer ID",
         "type": "object",
         "properties": {
            "molecule_hash": {
               "title": "Molecule Hash",
               "type": "string"
            },
            "molecular_formula": {
               "title": "Molecular Formula",
               "type": "string"
            },
            "smiles": {
               "title": "Smiles",
               "type": "string"
            },
            "inchi": {
               "title": "Inchi",
               "type": "string"
            },
            "inchikey": {
               "title": "Inchikey",
               "type": "string"
            },
            "canonical_explicit_hydrogen_smiles": {
               "title": "Canonical Explicit Hydrogen Smiles",
               "type": "string"
            },
            "canonical_isomeric_explicit_hydrogen_mapped_smiles": {
               "title": "Canonical Isomeric Explicit Hydrogen Mapped Smiles",
               "type": "string"
            },
            "canonical_isomeric_explicit_hydrogen_smiles": {
               "title": "Canonical Isomeric Explicit Hydrogen Smiles",
               "type": "string"
            },
            "canonical_isomeric_smiles": {
               "title": "Canonical Isomeric Smiles",
               "type": "string"
            },
            "canonical_smiles": {
               "title": "Canonical Smiles",
               "type": "string"
            },
            "pubchem_cid": {
               "title": "Pubchem Cid",
               "description": "PubChem Compound ID",
               "type": "string"
            },
            "pubchem_sid": {
               "title": "Pubchem Sid",
               "description": "PubChem Substance ID",
               "type": "string"
            },
            "pubchem_conformerid": {
               "title": "Pubchem Conformerid",
               "description": "PubChem Conformer ID",
               "type": "string"
            }
         },
         "additionalProperties": false
      },
      "Provenance": {
         "title": "Provenance",
         "description": "Provenance information.\n\nParameters\n----------\ncreator : str\n    The name of the program, library, or person who created the object.\nversion : str, Default: \n    The version of the creator, blank otherwise. This should be sortable by the very broad [PEP 440](https://www.python.org/dev/peps/pep-0440/).\nroutine : str, Default: \n    The name of the routine or function within the creator, blank otherwise.",
         "type": "object",
         "properties": {
            "creator": {
               "title": "Creator",
               "description": "The name of the program, library, or person who created the object.",
               "type": "string"
            },
            "version": {
               "title": "Version",
               "description": "The version of the creator, blank otherwise. This should be sortable by the very broad [PEP 440](https://www.python.org/dev/peps/pep-0440/).",
               "default": "",
               "type": "string"
            },
            "routine": {
               "title": "Routine",
               "description": "The name of the routine or function within the creator, blank otherwise.",
               "default": "",
               "type": "string"
            }
         },
         "required": [
            "creator"
         ],
         "$schema": "http://json-schema.org/draft-04/schema#"
      },
      "Molecule": {
         "title": "Molecule",
         "description": "The physical Cartesian representation of the molecular system.\n\nA QCSchema representation of a Molecule. This model contains\ndata for symbols, geometry, connectivity, charges, fragmentation, etc while also supporting a wide array of I/O and manipulation capabilities.\n\nMolecule objects geometry, masses, and charges are truncated to 8, 6, and 4 decimal places respectively to assist with duplicate detection.\n\nNotes\n-----\nAll arrays are stored flat but must be reshapable into the dimensions in attribute ``shape``, with abbreviations as follows:\n\n  * nat: number of atomic = calcinfo_natom\n  * nfr: number of fragments\n  * <varies>: irregular dimension not systematically reshapable",
         "type": "object",
         "properties": {
            "schema_name": {
               "title": "Schema Name",
               "description": "The QCSchema specification to which this model conforms. Explicitly fixed as qcschema_molecule.",
               "default": "qcschema_molecule",
               "pattern": "^(qcschema_molecule)$",
               "type": "string"
            },
            "schema_version": {
               "title": "Schema Version",
               "description": "The version number of ``schema_name`` to which this model conforms.",
               "default": 2,
               "type": "integer"
            },
            "validated": {
               "title": "Validated",
               "description": "A boolean indicator (for speed purposes) that the input Molecule data has been previously checked for schema (data layout and type) and physics (e.g., non-overlapping atoms, feasible multiplicity) compliance. This should be False in most cases. A ``True`` setting should only ever be set by the constructor for this class itself or other trusted sources such as a Fractal Server or previously serialized Molecules.",
               "default": false,
               "type": "boolean"
            },
            "symbols": {
               "title": "Symbols",
               "description": "The ordered array of atomic elemental symbols in title case. This field's index sets atomic order for all other per-atom fields like ``real`` and the first dimension of ``geometry``. Ghost/virtual atoms must have an entry here in ``symbols``; ghostedness is indicated through the ``real`` field.",
               "shape": [
                  "nat"
               ],
               "type": "array",
               "items": {
                  "type": "string"
               }
            },
            "geometry": {
               "title": "Geometry",
               "description": "The ordered array for Cartesian XYZ atomic coordinates [a0]. Atom ordering is fixed; that is, a consumer who shuffles atoms must not reattach the input (pre-shuffling) molecule schema instance to any output (post-shuffling) per-atom results (e.g., gradient). Index of the first dimension matches the 0-indexed indices of all other per-atom settings like ``symbols`` and ``real``.\nSerialized storage is always flat, (3*nat,), but QCSchema implementations will want to reshape it. QCElemental can also accept array-likes which can be mapped to (nat,3) such as a 1-D list of length 3*nat, or the serialized version of the array in (3*nat,) shape; all forms will be reshaped to (nat,3) for this attribute.",
               "shape": [
                  "nat",
                  3
               ],
               "units": "a0",
               "type": "array",
               "items": {
                  "type": "number"
               }
            },
            "name": {
               "title": "Name",
               "description": "Common or human-readable name to assign to this molecule. This field can be arbitrary; see ``identifiers`` for well-defined labels.",
               "type": "string"
            },
            "identifiers": {
               "title": "Identifiers",
               "description": "An optional dictionary of additional identifiers by which this molecule can be referenced, such as INCHI, canonical SMILES, etc. See the :class:``Identifiers`` model for more details.",
               "allOf": [
                  {
                     "$ref": "#/definitions/Identifiers"
                  }
               ]
            },
            "comment": {
               "title": "Comment",
               "description": "Additional comments for this molecule. Intended for pure human/user consumption and clarity.",
               "type": "string"
            },
            "molecular_charge": {
               "title": "Molecular Charge",
               "description": "The net electrostatic charge of the molecule.",
               "default": 0.0,
               "type": "number"
            },
            "molecular_multiplicity": {
               "title": "Molecular Multiplicity",
               "description": "The total multiplicity of the molecule.",
               "default": 1,
               "type": "integer"
            },
            "masses": {
               "title": "Masses",
               "description": "The ordered array of atomic masses. Index order matches the 0-indexed indices of all other per-atom fields like ``symbols`` and ``real``. If this is not provided, the mass of each atom is inferred from its most common isotope. If this is provided, it must be the same length as ``symbols`` but can accept ``None`` entries for standard masses to infer from the same index in the ``symbols`` field.",
               "shape": [
                  "nat"
               ],
               "units": "u",
               "type": "array",
               "items": {
                  "type": "number"
               }
            },
            "real": {
               "title": "Real",
               "description": "The ordered array indicating if each atom is real (``True``) or ghost/virtual (``False``). Index matches the 0-indexed indices of all other per-atom settings like ``symbols`` and the first dimension of ``geometry``. If this is not provided, all atoms are assumed to be real (``True``).If this is provided, the reality or ghostedness of every atom must be specified.",
               "shape": [
                  "nat"
               ],
               "type": "array",
               "items": {
                  "type": "boolean"
               }
            },
            "atom_labels": {
               "title": "Atom Labels",
               "description": "Additional per-atom labels as an array of strings. Typical use is in model conversions, such as Elemental <-> Molpro and not typically something which should be user assigned. See the ``comments`` field for general human-consumable text to affix to the molecule.",
               "shape": [
                  "nat"
               ],
               "type": "array",
               "items": {
                  "type": "string"
               }
            },
            "atomic_numbers": {
               "title": "Atomic Numbers",
               "description": "An optional ordered 1-D array-like object of atomic numbers of shape (nat,). Index matches the 0-indexed indices of all other per-atom settings like ``symbols`` and ``real``. Values are inferred from the ``symbols`` list if not explicitly set. Ghostedness should be indicated through ``real`` field, not zeros here.",
               "shape": [
                  "nat"
               ],
               "type": "array",
               "items": {
                  "type": "number",
                  "multipleOf": 1.0
               }
            },
            "mass_numbers": {
               "title": "Mass Numbers",
               "description": "An optional ordered 1-D array-like object of atomic *mass* numbers of shape (nat). Index matches the 0-indexed indices of all other per-atom settings like ``symbols`` and ``real``. Values are inferred from the most common isotopes of the ``symbols`` list if not explicitly set. If single isotope not (yet) known for an atom, -1 is placeholder.",
               "shape": [
                  "nat"
               ],
               "type": "array",
               "items": {
                  "type": "number",
                  "multipleOf": 1.0
               }
            },
            "connectivity": {
               "title": "Connectivity",
               "description": "A list of bonds within the molecule. Each entry is a tuple of ``(atom_index_A, atom_index_B, bond_order)`` where the ``atom_index`` matches the 0-indexed indices of all other per-atom settings like ``symbols`` and ``real``. Bonds may be freely reordered and inverted.",
               "minItems": 1,
               "type": "array",
               "items": {
                  "type": "array",
                  "minItems": 3,
                  "maxItems": 3,
                  "items": [
                     {
                        "type": "integer",
                        "minimum": 0
                     },
                     {
                        "type": "integer",
                        "minimum": 0
                     },
                     {
                        "type": "number",
                        "minimum": 0,
                        "maximum": 5
                     }
                  ]
               }
            },
            "fragments": {
               "title": "Fragments",
               "description": "List of indices grouping atoms (0-indexed) into molecular fragments within the molecule. Each entry in the outer list is a new fragment; index matches the ordering in ``fragment_charges`` and ``fragment_multiplicities``. Inner lists are 0-indexed atoms which compose the fragment; every atom must be in exactly one inner list. Noncontiguous fragments are allowed, though no QM program is known to support them. Fragment ordering is fixed; that is, a consumer who shuffles fragments must not reattach the input (pre-shuffling) molecule schema instance to any output (post-shuffling) per-fragment results (e.g., n-body energy arrays).",
               "shape": [
                  "nfr",
                  "<varies>"
               ],
               "type": "array",
               "items": {
                  "type": "array",
                  "items": {
                     "type": "number",
                     "multipleOf": 1.0
                  }
               }
            },
            "fragment_charges": {
               "title": "Fragment Charges",
               "description": "The total charge of each fragment in the ``fragments`` list. The index of this list matches the 0-index indices of ``fragments`` list. Will be filled in based on a set of rules if not provided (and ``fragments`` are specified).",
               "shape": [
                  "nfr"
               ],
               "type": "array",
               "items": {
                  "type": "number"
               }
            },
            "fragment_multiplicities": {
               "title": "Fragment Multiplicities",
               "description": "The multiplicity of each fragment in the ``fragments`` list. The index of this list matches the 0-index indices of ``fragments`` list. Will be filled in based on a set of rules if not provided (and ``fragments`` are specified).",
               "shape": [
                  "nfr"
               ],
               "type": "array",
               "items": {
                  "type": "integer"
               }
            },
            "fix_com": {
               "title": "Fix Com",
               "description": "Whether translation of geometry is allowed (fix F) or disallowed (fix T).When False, QCElemental will pre-process the Molecule object to translate the center of mass to (0,0,0) in Euclidean coordinate space, resulting in a different ``geometry`` than the one provided. 'Fix' is used in the sense of 'specify': that is, `fix_com=True` signals that the origin in `geometry` is a deliberate part of the Molecule spec, whereas `fix_com=False` (default) allows that the origin is happenstance and may be adjusted. guidance: A consumer who translates the geometry must not reattach the input (pre-translation) molecule schema instance to any output (post-translation) origin-sensitive results (e.g., an ordinary energy when EFP present).",
               "default": false,
               "type": "boolean"
            },
            "fix_orientation": {
               "title": "Fix Orientation",
               "description": "Whether rotation of geometry is allowed (fix F) or disallowed (fix T). When False, QCElemental will pre-process the Molecule object to orient via the intertial tensor, resulting in a different ``geometry`` than the one provided. 'Fix' is used in the sense of 'specify': that is, `fix_orientation=True` signals that the frame orientation in `geometry` is a deliberate part of the Molecule spec, whereas `fix_orientation=False` (default) allows that the frame is happenstance and may be adjusted. guidance: A consumer who rotates the geometry must not reattach the input (pre-rotation) molecule schema instance to any output (post-rotation) frame-sensitive results (e.g., molecular vibrations).",
               "default": false,
               "type": "boolean"
            },
            "fix_symmetry": {
               "title": "Fix Symmetry",
               "description": "Maximal point group symmetry which ``geometry`` should be treated. Lowercase.",
               "type": "string"
            },
            "provenance": {
               "title": "Provenance",
               "description": "The provenance information about how this Molecule (and its attributes) were generated, provided, and manipulated.",
               "allOf": [
                  {
                     "$ref": "#/definitions/Provenance"
                  }
               ]
            },
            "id": {
               "title": "Id",
               "description": "A unique identifier for this Molecule object. This field exists primarily for Databases (e.g. Fractal's Server) to track and lookup this specific object and should virtually never need to be manually set."
            },
            "extras": {
               "title": "Extras",
               "description": "Additional information to bundle with the molecule. Use for schema development and scratch space.",
               "type": "object"
            }
         },
         "required": [
            "symbols",
            "geometry"
         ],
         "additionalProperties": false,
         "$schema": "http://json-schema.org/draft-04/schema#"
      },
      "MoleculeAttributes": {
         "title": "MoleculeAttributes",
         "description": "A class to hold and validate the molecule attributes associated with a QCArchive entry, All attributes are required\nto be entered into a dataset.\n\nNote:\n    The attributes here are not exhaustive but are based on those given by cmiles and can all be obtain through the openforcefield toolkit Molecule class.",
         "type": "object",
         "properties": {
            "canonical_smiles": {
               "title": "Canonical Smiles",
               "type": "string"
            },
            "canonical_isomeric_smiles": {
               "title": "Canonical Isomeric Smiles",
               "type": "string"
            },
            "canonical_explicit_hydrogen_smiles": {
               "title": "Canonical Explicit Hydrogen Smiles",
               "type": "string"
            },
            "canonical_isomeric_explicit_hydrogen_smiles": {
               "title": "Canonical Isomeric Explicit Hydrogen Smiles",
               "type": "string"
            },
            "canonical_isomeric_explicit_hydrogen_mapped_smiles": {
               "title": "Canonical Isomeric Explicit Hydrogen Mapped Smiles",
               "description": "The fully mapped smiles where every atom should have a numerical tag so that the molecule can be rebuilt to match the order of the coordinates.",
               "type": "string"
            },
            "molecular_formula": {
               "title": "Molecular Formula",
               "description": "The hill formula of the molecule as given by the openfftoolkit.",
               "type": "string"
            },
            "standard_inchi": {
               "title": "Standard Inchi",
               "description": "The standard inchi given by the inchi program ie not fixed hydrogen layer.",
               "type": "string"
            },
            "inchi_key": {
               "title": "Inchi Key",
               "description": "The standard inchi key given by the inchi program.",
               "type": "string"
            },
            "fixed_hydrogen_inchi": {
               "title": "Fixed Hydrogen Inchi",
               "description": "The non-standard inchi with a fixed hydrogen layer to distinguish tautomers.",
               "type": "string"
            },
            "fixed_hydrogen_inchi_key": {
               "title": "Fixed Hydrogen Inchi Key",
               "description": "The non-standard inchikey with a fixed hydrogen layer.",
               "type": "string"
            },
            "unique_fixed_hydrogen_inchi_keys": {
               "title": "Unique Fixed Hydrogen Inchi Keys",
               "description": "The list of unique non-standard inchikey with a fixed hydrogen layer.",
               "type": "array",
               "items": {
                  "type": "string"
               },
               "uniqueItems": true
            }
         },
         "required": [
            "canonical_smiles",
            "canonical_isomeric_smiles",
            "canonical_explicit_hydrogen_smiles",
            "canonical_isomeric_explicit_hydrogen_smiles",
            "canonical_isomeric_explicit_hydrogen_mapped_smiles",
            "molecular_formula",
            "standard_inchi",
            "inchi_key"
         ]
      },
      "DatasetEntry": {
         "title": "DatasetEntry",
         "description": "A basic data class to construct the datasets which holds any information about the molecule and options used in\nthe qcarchive calculation.\n\nNote:\n    * ``extras`` are passed into the qcelemental.models.Molecule on creation.\n    * any extras that should passed to the calculation like extra constrains should be passed to ``keywords``.",
         "type": "object",
         "properties": {
            "index": {
               "title": "Index",
               "description": "The index name the molecule will be stored under in QCArchive. Note that if multipule geometries are provided the index will be augmented with a value indecating the conformer number so -0, -1.",
               "type": "string"
            },
            "initial_molecules": {
               "title": "Initial Molecules",
               "description": "A list of QCElemental Molecule objects which contain the geometries to be used as inputs for the calculation.",
               "type": "array",
               "items": {
                  "$ref": "#/definitions/Molecule"
               }
            },
            "attributes": {
               "title": "Attributes",
               "description": "The complete set of required cmiles attributes for the molecule.",
               "allOf": [
                  {
                     "$ref": "#/definitions/MoleculeAttributes"
                  }
               ]
            },
            "extras": {
               "title": "Extras",
               "description": "Any extra information that should be injected into the QCElemental models before being submited like the cmiles information.",
               "default": {},
               "type": "object"
            },
            "keywords": {
               "title": "Keywords",
               "description": "Any extra keywords that should be used in the QCArchive calculation should be passed here.",
               "default": {},
               "type": "object"
            }
         },
         "required": [
            "index",
            "initial_molecules",
            "attributes"
         ]
      },
      "FilterEntry": {
         "title": "FilterEntry",
         "description": "A basic data class that contains information on components run in a workflow and the associated molecules which were\nremoved by it.",
         "type": "object",
         "properties": {
            "component": {
               "title": "Component",
               "description": "The name of the component ran, this should be one of the components registered with qcsubmit.",
               "type": "string"
            },
            "component_settings": {
               "title": "Component Settings",
               "description": "The run time settings of the component used to filter the molecules.",
               "type": "object"
            },
            "component_provenance": {
               "title": "Component Provenance",
               "description": "A dictionary of the version information of all dependencies of the component.",
               "type": "object",
               "additionalProperties": {
                  "type": "string"
               }
            },
            "molecules": {
               "title": "Molecules",
               "type": "array",
               "items": {
                  "type": "string"
               }
            }
         },
         "required": [
            "component",
            "component_settings",
            "component_provenance",
            "molecules"
         ]
      }
   }
}

Config
  • allow_mutation: bool = True

  • arbitrary_types_allowed: bool = True

  • json_encoders: dict = {<class ‘numpy.ndarray’>: <function DatasetConfig.Config.<lambda> at 0x7efe99957be0>, <enum ‘Enum’>: <function DatasetConfig.Config.<lambda> at 0x7efe99957c70>}

  • validate_assignment: bool = True

Fields
  • type (Literal['DataSet'])

field type: Literal['DataSet'] = 'DataSet'
to_tasks()[source]

Build a dictionary of single QCEngine tasks that correspond to this dataset organised by program name. The tasks can be passed directly to qcengine.compute.

Return type

Dict[str, List[qcelemental.models.results.AtomicInput]]