Download Notebook View in GitHub Open in Google Colab

Using QCArchive with the OpenFF Toolkit

Here we show how to create OpenFF molecules safely from data in the QCArchive using the CMILES entries. This transformation relies on the "canonical_isomeric_explicit_hydrogen_mapped_smiles".

First load up the client you wish to connect to, in this case, we use the public instance.

import qcportal

from openff.toolkit import Molecule

client = qcportal.PortalClient("https://api.qcarchive.molssi.org:443")

print(client.list_datasets_table())

  id  type                record_count  name
----  ----------------  --------------  ----------------------------------------------------------------------------------
torsiondrive                 454  OpenFF Fragmenter Phenyl Benchmark
torsiondrive                 820  OpenFF Group1 Torsions
optimization                 937  OpenFF Optimization Set 1
torsiondrive                  86  Fragment Stability Benchmark
optimization                1132  SMIRNOFF Coverage Set 1
optimization               25500  OpenFF VEHICLe Set 1
torsiondrive                 585  SMIRNOFF Coverage Torsion Set 1
optimization                 189  OpenFF NCI250K Boron 1
optimization               19714  OpenFF Discrepancy Benchmark 1
torsiondrive                 795  OpenFF Substituted Phenyl Set 1
optimization                 352  Pfizer Discrepancy Optimization Dataset 1
optimization                6670  FDA Optimization Dataset 1
torsiondrive                 227  Pfizer Discrepancy Torsion Dataset 1
gridoptimization             311  OpenFF Trivalent Nitrogen Set 1 (deprecated)
reaction                     994  AlkIsod14
reaction                    1846  BHPERI26
singlepoint                  937  OpenFF Optimization Set 1
singlepoint                48280  OpenFF VEHICLe Set 1
singlepoint                  189  OpenFF NCI250K Boron 1
reaction                     710  CYCONF
manybody                    1946  DS14
manybody                    1390  FmH2O10
singlepoint                18864  OpenFF Discrepancy Benchmark 1
manybody                    2919  NC15
reaction                    4544  Butanediol65
manybody                    2085  HB15
reaction                    1065  ACONF
reaction                    1278  HNBrBDE18
reaction                   12936  MPCONF196
reaction                     781  AlkIsomer11
manybody                    2919  A21
manybody                    3336  A24
reaction                    2556  BSR36
reaction                    2130  BH76RC
manybody                    1668  AlkBind12
reaction                     426  C20C24
singlepoint                 1109  SMIRNOFF Coverage Set 1
torsiondrive                  31  OpenFF Primary TorsionDrive Benchmark 1
torsiondrive                 227  OpenFF Full TorsionDrive Benchmark 1
optimization                1885  OpenFF Primary Optimization Benchmark 1
optimization               26736  OpenFF Full Optimization Benchmark 1
singlepoint                    0  Solvated Protein Fragments
torsiondrive                   4  TorsionDrive Paper
gridoptimization             311  OpenFF Trivalent Nitrogen Set 1
optimization                6567  Kinase Inhibitors: WBO Distributions
torsiondrive                 259  OpenFF Primary Benchmark 1 Torsion Set
gridoptimization             311  OpenFF Trivalent Nitrogen Set 2
gridoptimization             126  OpenFF Trivalent Nitrogen Set 3
torsiondrive                 595  OpenFF Primary Benchmark 2 Torsion Set
torsiondrive                  19  OpenFF Group1 Torsions 2
torsiondrive                   6  OpenFF Group1 Torsions 3
torsiondrive                  65  OpenFF Gen 2 Torsion Set 1 Roche
torsiondrive                  93  OpenFF Gen 2 Torsion Set 2 Coverage
torsiondrive                  38  OpenFF Gen 2 Torsion Set 3 Pfizer Discrepancy
torsiondrive                 106  OpenFF Gen 2 Torsion Set 4 eMolecules Discrepancy
torsiondrive                 100  OpenFF Gen 2 Torsion Set 5 Bayer
torsiondrive                   7  OpenFF Gen 2 Torsion Set 6 Supplemental
optimization                 298  OpenFF Gen 2 Opt Set 1 Roche
optimization                 373  OpenFF Gen 2 Opt Set 2 Coverage
optimization                2201  OpenFF Gen 2 Opt Set 4 eMolecules Discrepancy
optimization                 197  OpenFF Gen 2 Opt Set 3 Pfizer Discrepancy
torsiondrive                 142  OpenFF Gen 2 Torsion Set 1 Roche 2
torsiondrive                 157  OpenFF Gen 2 Torsion Set 2 Coverage 2
torsiondrive                  82  OpenFF Gen 2 Torsion Set 3 Pfizer Discrepancy 2
torsiondrive                 272  OpenFF Gen 2 Torsion Set 4 eMolecules Discrepancy 2
singlepoint                  298  OpenFF Gen 2 Opt Set 1 Roche
singlepoint                  352  OpenFF Gen 2 Opt Set 2 Coverage
singlepoint                  197  OpenFF Gen 2 Opt Set 3 Pfizer Discrepancy
singlepoint                 2181  OpenFF Gen 2 Opt Set 4 eMolecules Discrepancy
torsiondrive                 219  OpenFF Gen 2 Torsion Set 5 Bayer 2
torsiondrive                  22  OpenFF Gen 2 Torsion Set 6 Supplemental 2
torsiondrive                1459  SiliconTX Torsion Benchmark Set 1
optimization                1850  OpenFF Gen 2 Opt Set 5 Bayer
singlepoint                 1772  OpenFF Gen 2 Opt Set 5 Bayer
torsiondrive                 462  Fragmenter paper
torsiondrive                 440  OpenFF Fragmenter Validation 1.0
optimization                3485  OpenFF Ehrman Informative Optimization v0.1
optimization                3485  OpenFF Ehrman Informative Optimization v0.2
torsiondrive                  20  OpenFF DANCE 1 eMolecules t142 v1.0
torsiondrive                  87  OpenFF Rowley Biaryl v1.0
optimization                 576  OpenFF Protein Fragments v1.0
torsiondrive                  36  OpenFF Theory Benchmarking Set B3LYP-D3BJ DZVP v1.0
torsiondrive                  36  OpenFF Theory Benchmarking Set B3LYP-D3BJ def2-TZVP v1.0
torsiondrive                  36  OpenFF Theory Benchmarking Set B3LYP-D3BJ def2-TZVPD v1.0
torsiondrive                  36  OpenFF Theory Benchmarking Set B3LYP-D3BJ def2-TZVPP v1.0
torsiondrive                3848  OpenFF-benchmark-ligand-fragments-v1.0
optimization                6716  OpenFF Protein Fragments v2.0
torsiondrive                 845  OpenFF Protein Fragments TorsionDrives v1.0
optimization               85897  OpenFF Sandbox CHO PhAlkEthOH v1.0
optimization                6716  OpenFF Protein Peptide Fragments constrained v1.0
optimization                6709  OpenFF Protein Peptide Fragments unconstrained v1.0
torsiondrive                  36  OpenFF Theory Benchmarking Set B3LYP-D3BJ 6-31+Gss
torsiondrive                  36  OpenFF Theory Benchmarking Set B3LYP-D3BJ def2-TZVPPD
optimization               22680  OpenFF PEPCONF OptimizationDataset v1.0
torsiondrive                 780  OpenFF Substituted Phenyl Set 1 v2.0
singlepoint                  382  OpenFF BCC Refit Study COH v1.0
optimization                1376  OpenFF Roche Opt Set With Protomers and Tautomers v1.0
optimization                 445  Genentech PDB Ligand Expo whole optimization neutral v1.0
optimization                1376  OpenFF Roche Opt Set With Protomers and Tautomers v1.1
torsiondrive                 787  OpenFF WBO Conjugated Series v1.0
torsiondrive                 767  OpenFF Theory Benchmarking Set v1.0
optimization                 864  OpenFF Theory Benchmarking Constrained Optimization Set B3LYP-NL def2-TZVPD v1.0
optimization                 864  OpenFF Theory Benchmarking Constrained Optimization Set MP2 heavy-aug-cc-pVTZ v1.0
optimization                 864  OpenFF Theory Benchmarking Constrained Optimization Set WB97X-V def2-TZVPD v1.0
optimization                2363  Genentech PDB Ligand Expo fragment optimization neutral v1.0
torsiondrive                 260  OpenFF Amide Torsion Set v1.0
optimization                 223  OpenFF Aniline Para Opt v1.0
optimization               69672  OpenFF Industry Benchmark Season 1 v1.0
torsiondrive                 888  OpenFF Gen3 Torsion Set v1.0
torsiondrive                  24  OpenFF Aniline 2D Impropers v1.0
optimization              539385  OpenFF Industry Benchmark Season 1 v1.1
optimization                4650  OpenFF BCC Refit Study COH v2.0
optimization                1416  OpenFF Theory Benchmarking Constrained Optimization Set MP2 heavy-aug-cc-pVTZ v1.1
optimization              501585  OpenFF Industry Benchmark Season 1 - MM v1.1
singlepoint                66552  OpenFF Theory Benchmarking Single Point Energies v1.0
torsiondrive                8052  OpenFF-benchmark-ligand-fragments-v2.0
optimization                 735  OpenFF RESP Polarizability Optimizations v1.0
optimization                 735  OpenFF RESP Polarizability Optimizations v1.1
singlepoint                24000  TorsionNet500 Single Points Dataset v1.0
torsiondrive                   5  OpenFF Protein Dipeptide 2-D TorsionDrive v1.0
torsiondrive                   5  OpenFF Protein Dipeptide 2-D TorsionDrive v1.1
optimization                 600  OpenFF Gen2 Optimization Dataset Protomers v1.0
torsiondrive                  26  OpenFF Protein Dipeptide 2-D TorsionDrive v2.0
singlepoint                65116  OpenFF ESP Fragment Conformers v1.0
optimization               56054  OpenFF ESP Industry Benchmark Set v1.0
torsiondrive                  46  OpenFF Protein Capped 1-mer Sidechains v1.0
torsiondrive                  26  OpenFF Protein Dipeptide 2-D TorsionDrive v2.1
singlepoint                39983  OpenFF ESP Industry Benchmark Set v1.1
torsiondrive                  46  OpenFF Protein Capped 1-mer Sidechains v1.1
torsiondrive                  46  OpenFF Protein Capped 1-mer Sidechains v1.2
torsiondrive                  61  OpenFF Protein Capped 1-mer Sidechains v1.3
optimization                 400  OpenFF multiplicity correction optimization set v1.0
torsiondrive                  99  OpenFF multiplicity correction torsion drive data v1.0
optimization                 759  OpenFF Protein Capped 1-mers 3-mers Optimization Dataset v1.0
torsiondrive                  54  OpenFF Protein Capped 3-mer Backbones v1.0
torsiondrive                 131  OpenFF multiplicity correction torsion drive data v1.1
singlepoint                13467  RNA Single Point Dataset v1.0
optimization                 327  OpenFF Iodine Chemistry Optimization Dataset v1.0
singlepoint                81670  RNA Trinucleotide Single Point Dataset v1.0
torsiondrive                  26  OpenFF Protein Capped 3-mer Omega v1.0
singlepoint                19110  RNA Nucleoside Single Point Dataset v1.0
optimization                 677  OpenFF multi-Br ESP Fragment Conformers v1.0
singlepoint                  650  OpenFF multi-Br ESP Fragment Conformers v1.1
torsiondrive                 169  XtalPi Shared Fragments TorsiondriveDataset v1.0
optimization                2384  XtalPi Shared Fragments OptimizationDataset v1.0
torsiondrive                  43  OpenFF Torsion Coverage Supplement v1.0
torsiondrive                 192  OpenFF RNA Dinucleoside Monophosphate TorsionDrives v1.0
torsiondrive                8737  XtalPi 20-percent Fragments TorsiondriveDataset v1.0
optimization              128180  XtalPi 20-percent Fragments OptimizationDataset v1.0
torsiondrive                  18  OpenFF Torsion Drive Supplement v1.0
optimization                  51  OpenFF Torsion Benchmark Supplement v1.0

Data in the QCArchive is organized into datasets. Let’s grab a molecule from an optimization dataset

dataset = client.get_dataset(
    dataset_type="optimization",
    dataset_name="Kinase Inhibitors: WBO Distributions",
)

Take an arbitrary entry from the collection.

entry = dataset.get_entry(entry_name=dataset.entry_names[-1])

We can view the entry in detail by looking at the dictionary representation.

entry.dict()

{'name': 'cs(=o)(=o)ccncc1ccc(o1)c2ccc3c(c2)c(ncn3)nc4ccc(c(c4)cl)occ5cccc(c5)f-99',
 'initial_molecule': {'schema_name': 'qcschema_molecule',
  'schema_version': 2,
  'validated': True,
  'symbols': array(['C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C',
         'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C',
         'C', 'C', 'C', 'N', 'N', 'N', 'N', 'O', 'O', 'O', 'O', 'F', 'S',
         'Cl', 'H', 'H', 'H', 'H', 'H', 'H', 'H', 'H', 'H', 'H', 'H', 'H',
         'H', 'H', 'H', 'H', 'H', 'H', 'H', 'H', 'H', 'H', 'H', 'H', 'H',
         'H'], dtype='<U2'),
  'geometry': array([[ 2.89795594e+01, -5.61363284e+00, -1.25287815e+00],
         [ 1.93871961e+00, -2.32111000e-01,  6.33108960e-01],
         [ 2.68259179e+01, -4.17201102e+00, -7.72020570e-01],
         [ 3.00076085e+00,  1.52774996e+00,  2.29260082e+00],
         [ 1.49570525e+01, -1.11862575e+00,  1.34408755e+00],
         [ 1.75606324e+01, -1.34772255e+00,  1.00269997e+00],
         [ 2.88520513e+01, -7.66237506e+00, -2.90660376e+00],
         [-1.45658410e-01, -4.97426507e+00, -2.31821525e+00],
         [ 3.61899100e-02, -6.76496484e+00, -4.29656036e+00],
         [ 5.84179420e+00, -2.55574391e+00,  6.16566210e-01],
         [ 2.44170261e+01, -6.82796240e+00, -3.59833171e+00],
         [ 1.47074969e+01, -5.54975040e+00,  2.41523984e+00],
         [ 8.88178633e+00,  2.63147751e+00,  5.47147212e+00],
         [ 6.89608538e+00, -7.77978130e-01,  2.28537091e+00],
         [ 3.36012318e+00, -2.27643411e+00, -2.05974190e-01],
         [ 2.45446332e+01, -4.77917198e+00, -1.94467087e+00],
         [ 5.47938551e+00,  1.29451410e+00,  3.14641803e+00],
         [ 1.35304415e+01, -3.21962903e+00,  2.05047930e+00],
         [ 1.87377014e+01, -3.67784426e+00,  1.36746581e+00],
         [ 2.65707774e+01, -8.26954593e+00, -4.07925220e+00],
         [ 1.73111436e+01, -5.77891513e+00,  2.07375878e+00],
         [ 2.26685911e+00, -4.09354036e+00, -1.91777351e+00],
         [ 2.54682332e+00, -6.85047933e+00, -4.96434856e+00],
         [ 9.36421367e+00, -9.47290490e-01,  3.17955455e+00],
         [ 4.24499629e+00, -9.67072630e+00, -1.91013351e+01],
         [ 2.22291900e+01, -3.22681771e+00, -1.42596726e+00],
         [ 3.96002561e+00, -8.30889479e+00, -6.85900096e+00],
         [ 4.53186197e+00, -9.37210543e+00, -1.12315297e+01],
         [ 3.74896337e+00, -8.84306732e+00, -1.39373610e+01],
         [ 6.46872465e+00,  3.00571708e+00,  4.74281203e+00],
         [ 1.04136881e+01,  7.31465300e-01,  4.77749054e+00],
         [ 1.09100724e+01, -2.98799013e+00,  2.39376886e+00],
         [ 3.09195245e+00, -7.87277816e+00, -9.41970199e+00],
         [ 8.13355678e+00, -9.67381434e+00, -1.60552111e+01],
         [ 4.99185979e+00, -1.32529008e+01, -1.57833896e+01],
         [ 3.91927510e+00, -5.22754616e+00, -3.52328384e+00],
         [ 2.12846153e+01, -3.84659395e+00,  1.02017540e+00],
         [ 2.64477766e+01, -1.02372453e+01, -5.66668056e+00],
         [ 5.54591265e+00, -1.05877799e+01, -1.61420006e+01],
         [ 1.87184396e+01, -8.68283521e+00,  2.53736164e+00],
         [ 3.07562580e+01, -5.14049891e+00, -3.40496340e-01],
         [-4.65003000e-03,  2.67554700e-02,  1.66408900e-02],
         [ 2.69431896e+01, -2.57930069e+00,  5.19680380e-01],
         [ 1.84816744e+00,  3.11097314e+00,  2.92148844e+00],
         [ 1.40590732e+01,  7.06064040e-01,  1.05468507e+00],
         [ 1.86602796e+01,  2.96905580e-01,  4.52302520e-01],
         [ 3.05285053e+01, -8.78494061e+00, -3.27991363e+00],
         [-1.83066261e+00, -4.40414485e+00, -1.31444107e+00],
         [-1.47796825e+00, -7.85402013e+00, -5.12826389e+00],
         [ 6.94015282e+00, -4.16145113e+00, -5.01537200e-02],
         [ 2.26487725e+01, -7.31482942e+00, -4.52177921e+00],
         [ 1.35866831e+01, -7.18299521e+00,  2.96635289e+00],
         [ 9.66759771e+00,  4.01679529e+00,  6.75738125e+00],
         [ 5.25555885e+00, -1.06711597e+01, -2.05945160e+01],
         [ 2.24609368e+00, -1.01756755e+01, -1.91179422e+01],
         [ 4.47789379e+00, -7.63297605e+00, -1.93109470e+01],
         [ 2.26644983e+01, -1.20419194e+00, -1.48882911e+00],
         [ 2.07621173e+01, -3.60554472e+00, -2.83618480e+00],
         [ 6.00251831e+00, -7.85696220e+00, -6.84065293e+00],
         [ 3.83567181e+00, -1.03722487e+01, -6.53978249e+00],
         [ 4.27006061e+00, -1.13641289e+01, -1.07410105e+01],
         [ 6.52448646e+00, -8.89050259e+00, -1.09607687e+01],
         [ 4.04834476e+00, -6.84946966e+00, -1.43964509e+01],
         [ 1.76271456e+00, -9.31303167e+00, -1.42681283e+01],
         [ 9.92112705e+00, -4.58168017e+00,  2.00994347e+00],
         [ 3.28745938e+00, -6.00243631e+00, -9.86032950e+00]]),
  'name': 'C29ClFH26N4O4S',
  'identifiers': {'molecule_hash': '58aaa33a5e7d9445e4276a015de257c3d6bde558',
   'molecular_formula': 'C29ClFH26N4O4S'},
  'molecular_charge': 0.0,
  'molecular_multiplicity': 1,
  'masses': array([12.        , 12.        , 12.        , 12.        , 12.        ,
         12.        , 12.        , 12.        , 12.        , 12.        ,
         12.        , 12.        , 12.        , 12.        , 12.        ,
         12.        , 12.        , 12.        , 12.        , 12.        ,
         12.        , 12.        , 12.        , 12.        , 12.        ,
         12.        , 12.        , 12.        , 12.        , 14.003074  ,
         14.003074  , 14.003074  , 14.003074  , 15.99491462, 15.99491462,
         15.99491462, 15.99491462, 18.99840316, 31.97207117, 34.96885268,
          1.00782503,  1.00782503,  1.00782503,  1.00782503,  1.00782503,
          1.00782503,  1.00782503,  1.00782503,  1.00782503,  1.00782503,
          1.00782503,  1.00782503,  1.00782503,  1.00782503,  1.00782503,
          1.00782503,  1.00782503,  1.00782503,  1.00782503,  1.00782503,
          1.00782503,  1.00782503,  1.00782503,  1.00782503,  1.00782503,
          1.00782503]),
  'real': array([ True,  True,  True,  True,  True,  True,  True,  True,  True,
          True,  True,  True,  True,  True,  True,  True,  True,  True,
          True,  True,  True,  True,  True,  True,  True,  True,  True,
          True,  True,  True,  True,  True,  True,  True,  True,  True,
          True,  True,  True,  True,  True,  True,  True,  True,  True,
          True,  True,  True,  True,  True,  True,  True,  True,  True,
          True,  True,  True,  True,  True,  True,  True,  True,  True,
          True,  True,  True]),
  'atom_labels': array(['', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '',
         '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '',
         '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '',
         '', '', '', '', '', '', '', '', '', '', '', '', '', '', ''],
        dtype='<U1'),
  'atomic_numbers': array([ 6,  6,  6,  6,  6,  6,  6,  6,  6,  6,  6,  6,  6,  6,  6,  6,  6,
          6,  6,  6,  6,  6,  6,  6,  6,  6,  6,  6,  6,  7,  7,  7,  7,  8,
          8,  8,  8,  9, 16, 17,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,
          1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1],
        dtype=int16),
  'mass_numbers': array([12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12,
         12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 14, 14, 14, 14, 16,
         16, 16, 16, 19, 32, 35,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,
          1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1],
        dtype=int16),
  'connectivity': [(0, 2, 2.0),
   (0, 6, 1.0),
   (0, 40, 1.0),
   (1, 3, 2.0),
   (1, 14, 1.0),
   (1, 41, 1.0),
   (2, 15, 1.0),
   (2, 42, 1.0),
   (3, 16, 1.0),
   (3, 43, 1.0),
   (4, 5, 2.0),
   (4, 17, 1.0),
   (4, 44, 1.0),
   (5, 18, 1.0),
   (5, 45, 1.0),
   (6, 19, 2.0),
   (6, 46, 1.0),
   (7, 8, 1.0),
   (7, 21, 2.0),
   (7, 47, 1.0),
   (8, 22, 2.0),
   (8, 48, 1.0),
   (9, 13, 1.0),
   (9, 14, 2.0),
   (9, 49, 1.0),
   (10, 15, 2.0),
   (10, 19, 1.0),
   (10, 50, 1.0),
   (11, 17, 2.0),
   (11, 20, 1.0),
   (11, 51, 1.0),
   (12, 29, 1.0),
   (12, 30, 2.0),
   (12, 52, 1.0),
   (13, 16, 1.0),
   (13, 23, 2.0),
   (14, 21, 1.0),
   (15, 25, 1.0),
   (16, 29, 2.0),
   (17, 31, 1.0),
   (18, 20, 2.0),
   (18, 36, 1.0),
   (19, 37, 1.0),
   (20, 39, 1.0),
   (21, 35, 1.0),
   (22, 26, 1.0),
   (22, 35, 1.0),
   (23, 30, 1.0),
   (23, 31, 1.0),
   (24, 38, 1.0),
   (24, 53, 1.0),
   (24, 54, 1.0),
   (24, 55, 1.0),
   (25, 36, 1.0),
   (25, 56, 1.0),
   (25, 57, 1.0),
   (26, 32, 1.0),
   (26, 58, 1.0),
   (26, 59, 1.0),
   (27, 28, 1.0),
   (27, 32, 1.0),
   (27, 60, 1.0),
   (27, 61, 1.0),
   (28, 38, 1.0),
   (28, 62, 1.0),
   (28, 63, 1.0),
   (31, 64, 1.0),
   (32, 65, 1.0),
   (33, 38, 2.0),
   (34, 38, 2.0)],
  'fragments': [array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
          17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,
          34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50,
          51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65],
         dtype=int32)],
  'fragment_charges': [0.0],
  'fragment_multiplicities': [1],
  'fix_com': True,
  'fix_orientation': True,
  'provenance': {'creator': 'QCElemental',
   'version': 'v0.11.1',
   'routine': 'qcelemental.molparse.from_schema'},
  'id': 9590661,
  'extras': {}},
 'additional_keywords': {},
 'attributes': {'inchi_key': 'BCFGMOOMADDAQU-UHFFFAOYSA-N',
  'provenance': 'cmiles_v0.1.5+1.gdbd63e8_openeye_2019.Apr.2',
  'standard_inchi': 'InChI=1S/C29H26ClFN4O4S/c1-40(36,37)12-11-32-16-23-7-10-27(39-23)20-5-8-26-24(14-20)29(34-18-33-26)35-22-6-9-28(25(30)15-22)38-17-19-3-2-4-21(31)13-19/h2-10,13-15,18,32H,11-12,16-17H2,1H3,(H,33,34,35)',
  'canonical_smiles': 'CS(=O)(=O)CCNCc1ccc(o1)c2ccc3c(c2)c(ncn3)Nc4ccc(c(c4)Cl)OCc5cccc(c5)F',
  'molecular_formula': 'C29H26ClFN4O4S',
  'canonical_isomeric_smiles': 'CS(=O)(=O)CCNCc1ccc(o1)c2ccc3c(c2)c(ncn3)Nc4ccc(c(c4)Cl)OCc5cccc(c5)F',
  'unique_protomer_representation': 'CS(=O)(=O)CCNCc1ccc(o1)c2ccc3c(c2)c(ncn3)Nc4ccc(c(c4)Cl)OCc5cccc(c5)F',
  'unique_tautomer_representation': 'CS(=O)(=O)CCNCc1ccc(-c2ccc3[nH]cnc(=Nc4ccc(OCc5cccc(F)c5)c(Cl)c4)c3c2)o1',
  'canonical_explicit_hydrogen_smiles': '[H]c1c(c(c(c(c1[H])F)[H])C([H])([H])Oc2c(c(c(c(c2Cl)[H])N([H])c3c4c(c(c(c(c4nc(n3)[H])[H])[H])c5c(c(c(o5)C([H])([H])N([H])C([H])([H])C([H])([H])S(=O)(=O)C([H])([H])[H])[H])[H])[H])[H])[H])[H]',
  'canonical_isomeric_explicit_hydrogen_smiles': '[H]c1c(c(c(c(c1[H])F)[H])C([H])([H])Oc2c(c(c(c(c2Cl)[H])N([H])c3c4c(c(c(c(c4nc(n3)[H])[H])[H])c5c(c(c(o5)C([H])([H])N([H])C([H])([H])C([H])([H])S(=O)(=O)C([H])([H])[H])[H])[H])[H])[H])[H])[H]',
  'canonical_isomeric_explicit_hydrogen_mapped_smiles': '[H:41][c:1]1[c:3]([c:16]([c:11]([c:20]([c:7]1[H:47])[F:38])[H:51])[C:26]([H:57])([H:58])[O:37][c:19]2[c:6]([c:5]([c:18]([c:12]([c:21]2[Cl:40])[H:52])[N:32]([H:65])[c:24]3[c:14]4[c:10]([c:15]([c:2]([c:4]([c:17]4[n:30][c:13]([n:31]3)[H:53])[H:44])[H:42])[c:22]5[c:8]([c:9]([c:23]([o:36]5)[C:27]([H:59])([H:60])[N:33]([H:66])[C:28]([H:61])([H:62])[C:29]([H:63])([H:64])[S:39](=[O:34])(=[O:35])[C:25]([H:54])([H:55])[H:56])[H:49])[H:48])[H:50])[H:45])[H:46])[H:43]'},
 'comment': None}

Now we can make a molecule using a few different input options.

# first make a molecule using this record object
molecule_from_entry = Molecule.from_qcschema(entry)

# we could have also used the dictionary representation of the object
molecule_from_dict = Molecule.from_qcschema(entry.dict())

assert molecule_from_entry == molecule_from_dict

molecule = molecule_from_entry

# first let's get the initial molecule from the database
initial_molecule = client.get_molecules(entry.initial_molecule.id)

# note that this molecule uses an object model from QCArchive, _not_ the toolkit
print(type(initial_molecule))

# we check that the molecule has been ordered to match the ordering used in the data base
# by printing out the atomic numbers of both objects in order

for atoms in zip(molecule.atoms, initial_molecule.atomic_numbers):
    print(atoms[0].atomic_number, atoms[1])
    assert atoms[0].atomic_number == atoms[1]

# can compare other things, too
print(molecule.to_hill_formula(), initial_molecule.get_molecular_formula())

# QCArchive molecules don't store all information the
# toolkit needs, like bond orders and formal charges;
# that's why there is a Molecule.from_qcschema() method at all

<class 'qcelemental.models.molecule.Molecule'>
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
7
7
7
7
8
8
8
8
9
16
17
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
C29H26ClFN4O4S C29ClFH26N4O4S

# we can also compare the graph representations of the molecules to make sure they are in the same order
import networkx as nx

# make a graph of the initial molecule using newtorkx and the data in the record
initial_network = nx.Graph()
for index, atomic_number in enumerate(initial_molecule.atomic_numbers):
    initial_network.add_node(index, atomic_number=atomic_number)

for bond in initial_molecule.connectivity:
    initial_network.add_edge(*bond[:2])
# now we can use the new isomorphic check to get the atom mapping
isomorphic, atom_map = Molecule.are_isomorphic(
    molecule,
    initial_network,
    return_atom_map=True,
    aromatic_matching=False,
    formal_charge_matching=False,
    bond_order_matching=False,
    bond_stereochemistry_matching=False,
    atom_stereochemistry_matching=False,
)

# we can check if the graph was found to be isomorphic and whether or not the
# atom mappings are in the same order
assert isomorphic
print(atom_map)
for index1, index2 in atom_map.items():
    assert index1 == index2

{0: 0, 1: 1, 2: 2, 3: 3, 4: 4, 5: 5, 6: 6, 7: 7, 8: 8, 9: 9, 10: 10, 11: 11, 12: 12, 13: 13, 14: 14, 15: 15, 16: 16, 17: 17, 18: 18, 19: 19, 20: 20, 21: 21, 22: 22, 23: 23, 24: 24, 25: 25, 26: 26, 27: 27, 28: 28, 29: 29, 30: 30, 31: 31, 32: 32, 33: 33, 34: 34, 35: 35, 36: 36, 37: 37, 38: 38, 39: 39, 40: 40, 41: 41, 42: 42, 43: 43, 44: 44, 45: 45, 46: 46, 47: 47, 48: 48, 49: 49, 50: 50, 51: 51, 52: 52, 53: 53, 54: 54, 55: 55, 56: 56, 57: 57, 58: 58, 59: 59, 60: 60, 61: 61, 62: 62, 63: 63, 64: 64, 65: 65}

mol = Molecule.from_qcschema(entry)
mol

# OpenFF Toolkit `Molecule` objects can be converted back into QCArchive molecules,
# as long as there are conformer(s)

qc_molecule = molecule.to_qcschema()

qc_molecule

This transformation unlocks functionality of QCEngine (computing energies, gradients, hessians, etc. with a variety of different methods).

Here we will try and compute the energy using RDKit (only run this cell if QCEngine is installed.)

import qcengine

# set up the RDKit task
rdkit_task = {
    "schema_name": "qcschema_input",
    "schema_version": 2,
    "molecule": qc_molecule,
    "driver": "energy",
    "model": {"method": "uff", "basis": None},
    "keywords": {"scf_type": "df"},
}

# now lets compute the energy using qcengine and RDKit and print the result
result = qcengine.compute(rdkit_task, "rdkit")

# note the result is in QC units of hartrees
print(result.return_result)

0.05930479138457709