openff-nagl

A framework for learning classical force field parameters using graph convolutional neural networks.

openff-nagl [OPTIONS] COMMAND [ARGS]...

Options

--n-workers <n_workers>

The number of workers to distribute the labelling across. Use -1 to request one worker per batch.

Default

1

--worker-type <worker_type>

The type of worker to distribute the labelling across.

Default

local

Options

lsf | local

--batch-size <batch_size>

The number of molecules to processes at once on a particular worker.

Default

500

--memory <memory>

The amount of memory (GB) to request per LSF queue worker.

Default

3

--walltime <walltime>

The maximum wall-clock hours to request per LSF queue worker.

Default

2

--queue <queue>

The LSF queue to submit workers to.

Default

cpuqueue

--conda-environment <conda_environment>

The conda environment that LSF workers should run using.

database

CLIs for interacting with databases, such as storing and retrieving molecules.

openff-nagl database [OPTIONS] COMMAND [ARGS]...

retrieve-molecules

Retrieve molecules from database

openff-nagl database retrieve-molecules [OPTIONS]

Options

--input-file <input_file>

Required The path to the SQLite database (.sqlite) to retrieve the labelled molecules from.

--output-file <output_file>

Required The path to the file to save the molecules in. This should be an SDF file.

--partial-charge-method <partial_charge_method>

The partial charge method used

--bond-order-method <bond_order_method>

The bond order method used

store-molecules

Convert pre-computed molecules to database

openff-nagl database store-molecules [OPTIONS]

Options

--input-file <input_file>

Required The path to the input molecules. This should either be an SDF or a GZipped SDF file.

--output-file <output_file>

Required The path to the SQLite database (.sqlite) to save the labelled molecules in.

--partial-charge-method <partial_charge_method>

The partial charge method used

--bond-order-method <bond_order_method>

The bond order method used

--allow-empty-molecules

Whether to allow molecules with no conformers to be stored with zero coordinates

label-molecules

Label molecules from SMILES

openff-nagl label-molecules [OPTIONS]

Options

--input-file <input_file>

Required The path to the input molecules: SDF or smiles. SDFs will be converted to smiles

--output-file <output_file>

Required The path to the SQLite database (.sqlite) to save the labelled molecules in.

--partial-charge-method <partial_charge_method>

The partial charge methods to compute

Default

--bond-order-method <bond_order_method>

The bond order methods to compute

Default

--openeye-only

Only use OpenEye

Default

False

plot

CLIs for plotting.

openff-nagl plot [OPTIONS] COMMAND [ARGS]...

similarity

openff-nagl plot similarity [OPTIONS]

Options

--input-file <input_file>

Required The path to the input SQLITE store file

--output-file <output_file>

Required The path to the SDF file (.sdf) to save the generated conformers in.

prepare

CLIs for preparing molecule sets, such as filtering out molecules which are too large or contain unwanted chemistries, removing counter-ions, or enumerating possible tautomers / protomers.

openff-nagl prepare [OPTIONS] COMMAND [ARGS]...

calculate-similarity

Calculate similarity between datasets

openff-nagl prepare calculate-similarity [OPTIONS]

Options

--input-file <input_file>

Required The path to the input SQLITE store file

--output-file <output_file>

Required The path to the SDF file (.sdf) to save the generated conformers in.

--clean-filenames

If on, only save the base filename instead of the whole path

Default

True

--fingerprint-radius <fingerprint_radius>

Fingerprint radius

Default

3

--skip <skip>

Include every `skip`th molecule from each file

Default

10

generate-conformers

Generate and store conformers

openff-nagl prepare generate-conformers [OPTIONS]

Options

--input-file <input_file>

Required The path to the input molecules. This should either be an SDF or a GZipped SDF file.

--output-file <output_file>

Required The path to the SDF file (.sdf) to save the generated conformers in.

--n-conformer-pool <n_conformer_pool>

The number of conformers to select ELF conformers from

Default

500

--n-conformers <n_conformers>

The max number of conformers to select

Default

10

--conformer-rms-cutoff <conformer_rms_cutoff>

The RMS cutoff [Å] to use when generating the conformers used for charge generation.

Default

0.5

partition

Partition molecules into training, validation, test datasets

openff-nagl prepare partition [OPTIONS]

Options

--input-file <input_file>

Required The path to the input molecules (.sqlite)

--input-source-file <input_source_file>

The path to the input source information (JSON) for data

--training-fraction <training_fraction>

Rough percentage of how much should be in the training set.

Default

0.7

--validation-fraction <validation_fraction>

Rough percentage of how much should be in the validation set.

Default

0.2

--test-fraction <test_fraction>

Rough percentage of how much should be in the test set.

Default

0.1

--output-training-file <output_training_file>

Required The path (.sqlite) to save the training set to.

--output-test-file <output_test_file>

Required The path (.sqlite) to save the test set to.

--output-validation-file <output_validation_file>

Required The path (.sqlite) to save the validation set to.

--output-source-file <output_source_file>

The path (.csv) to save the source information for data.

Default

partitioned-data-sources.csv

--clean-filenames

If on, only save the base filename instead of the whole path

Default

True

--seed <seed>

Seed for diverse selection

Default

-1

select

Selects a set of molecules based on the criteria specified by:

[1] Bleiziffer, Patrick, Kay Schaller, and Sereina Riniker. ‘Machine learning of partial charges derived from high-quality quantum-mechanical calculations.’ JCIM 58.3 (2018): 579-590.

openff-nagl prepare select [OPTIONS]

Options

--input-file <input_file>

Required The path to the input molecules (.sqlite)

--output-file <output_file>

Required The path (.sqlite) to save the filtered molecules to.

--n-min-molecules <n_min_molecules>

Minimum number of molecules to select from each atom environment

Default

4

--element-order <element_order>

Element order

Default

S, F, Cl, Br, I, P, O, N, C

--output-source-file <output_source_file>

The path (.json) to save the source information for data.

Default

selected-data-sources.json

--clean-filenames

If on, only save the base filename instead of the whole path

Default

True