Bespoke executor

The bespoke executor is the main workhorse within BespokeFit. Its role is to ingest and run bespoke fitting workflows, namely by coordinating and launching the individual steps within a bespoke fitting workflow (e.g. generating QC reference data) without any user intervention across multiple CPUs or even multiple nodes in a cluster.

The executor operates by splitting the full bespoke workflow into simplified stages:

  1. Fragmentation: the input molecule is fragmented into smaller pieces in a way that preserves key features of the input molecule.

  2. QC generation: any bespoke QC data, for example 1D torsion scans of each rotatable bond in the input molecule, is generated using the smaller fragments for computational efficiency.

  3. Optimization: the reference data, including any bespoke QC data, is fed into the optimizer (e.g. ForceBalance) specified by the workflow schema in order to train the bespoke parameters

Each stage has its own set of ‘workers’ available to it making it easy to devote more compute where needed. Each worker is a process that is assigned a set of local resources and can be set to a specific task; for example, a worker may perform a 1D torsion scan on two CPU cores.

Note

Workers and task scheduling within BespokeFit are handled behind the scenes by the Celery framework in combination with Redis which handles any necessary storage of inputs and outputs for each stage.

These workers are created and managed by the executor when it is created, and so most users will not need to worry about their details too much unless they are wanting to parallelize fits across multiple nodes in a cluster. The only choice a user needs to make is how many workers to spawn for each stage, and how many compute resources should each type of worker be allowed to use.

There are two main ways to launch a bespoke executor: using the executor command-line interface or using the Python API.

Using the CLI

A dedicated bespoke executor can be launched using the launch command

openff-bespoke executor launch --directory            "bespoke-executor" \
                               --n-fragmenter-workers 1                  \
                               --n-optimizer-workers  1                  \
                               --n-qc-compute-workers 1

By default, the executor will create a single worker for each stage, and will allow each worker to access all of the resources on the machine it is running on. When spawning multiple workers for a stage it is recommended to specify resource limits to avoid over-subscription. For example, Psi4 may provide better performance running two QC calculations in parallel with 8 cores each than running one with 16:

openff-bespoke executor launch --directory            "bespoke-executor" \
                               --n-fragmenter-workers 1                  \
                               --n-optimizer-workers  1                  \
                               --n-qc-compute-workers 2                  \
                               --qc-compute-n-cores   8                  \
                               --qc-compute-max-mem   2.5

Here we request two workers, each with access to eight CPUs and 2.5 GB of memory per CPU (i.e. 16 CPUs in total and 40 GB of memory). The memory limit is not strictly enforced by the executor, and is instead passed to the underlying QC engine via the QCEngine interface. Note that if multiple molecules have been submitted to the executor, molecules at different stages may run in parallel.

See the quick start guide for details on submitting jobs to a running bespoke executor.

Distributed Workers

Bespokefit is able to make use of distributed resources across HPC clusters or multiple machines on the same network via the Celery framework which underpins the workers. In this example we assume the workers and bespoke executor are on different machines. First gather the IP address of the machine which will be running the bespoke executor

ifconfig -a

A bespoke executor with no local workers can then be launched using the launch command

openff-bespoke executor launch --directory            "bespoke-executor" \
                               --n-fragmenter-workers 0                  \
                               --n-optimizer-workers  0                  \
                               --n-qc-compute-workers 0

We now need to provide the address of the executor inorder to connect the remote workers. BespokeFit has a number of run time settings which can be configured via environment variables. The address of the executor should be set to BEFLOW_REDIS_ADDRESS in the environment the workers will be launched from using

export BEFLOW_REDIS_ADDRESS="address"

Bespoke workers of a given type can then be launched using the launch-worker command, the following would start a fragmentation worker.

openff-bespoke launch-worker --worker-type fragmenter

Provided the worker starts successfully a log file will be generated called celery-fragmenter.log which should be checked to make sure the worker has connected to the executor.

Note

The launch-worker command does not allow for configuration of the worker resources, it is recommended that the corresponding environment variable settings are used instead.

The QC Cache

Bespokefit makes extensive use of caching to speed up the parameterization process. The generation of the training data is currently the slowest part of the workflow when running DFT calculations with a high level of theory. To further speed up the process we provide an interface to seed the cache with results from QCArchive which contains hundreds of torsiondrives. The cache command allows you to select a dataset and translate it into local copies of the records which means your molecule data is not shared with QCArchive as the look up is done locally.

First you should start a Bespoke executor and specify the location of the working directory which will store the cache

openff-bespoke executor launch --directory bespoke

While this is running from another terminal run the cache update using any of the available datasets

openff-bespoke cache update --no-launch-redis --qcf-dataset "OpenFF-benchmark-ligand-fragments-v2.0" --qcf-address "https://api.qcarchive.molssi.org:443/"

Using the API

A bespoke executor can be created via the Python API through the BespokeExecutor class:

from openff.bespokefit.executor import BespokeExecutor, BespokeWorkerConfig

executor = BespokeExecutor(
    # Configure the workers that will fragment larger molecules
    n_fragmenter_workers=1,
    fragmenter_worker_config=BespokeWorkerConfig(n_cores=1),
    # Configure the workers that will generate any needed QC
    # reference data such as 1D torsion scans
    n_qc_compute_workers=1,
    qc_compute_worker_config=BespokeWorkerConfig(n_cores="auto"),
    # Configure the workers that will perform the final optimization
    # using the specified engine such as ForceBalance
    n_optimizer_workers=1,
    optimizer_worker_config=BespokeWorkerConfig(n_cores=1),
)

The BespokeWorkerConfig will control how many compute resources are assigned to each worker. In the above example, the fragmenter and optimizer workers are only allowed to use a single core, while the QC compute worker will be allowed to use the full set of CPUs available on the machine (n_cores="auto").

The executor itself is a context manager and will not ‘start’ until the context is entered:

from openff.bespokefit.executor import wait_until_complete

with executor:
    task_id = BespokeExecutor.submit(workflow)
    output = wait_until_complete(task_id)

When an executor ‘starts’ it will spin up all the required child processes, including each worker and a Redis instance (unless Redis is disabled).

Within the executor context bespoke fits can be submitted using the submit() method. As soon as the context manager exists the executor instance is closed, terminating any running jobs. To ensure the submission is allowed to finish, use the wait_until_complete() helper function. This function will block progress in the script until it can return a result.

Configuring from the environment

Both the CLI and the Python API can be configured via environment variables.

Environment Variables openff.bespokefit.executor.services.Settings[source]

The following environment variables may be used to configure the Bespoke Executor. Environment variables are typically set in the shell:

$ BEFLOW_KEEP_TMP_FILES=True openff-bespoke executor ...
env BEFLOW_API_V1_STR: str = '/api/v1'
env BEFLOW_GATEWAY_PORT: int = 8000
env BEFLOW_GATEWAY_LOG_LEVEL: str = 'error'
env BEFLOW_REDIS_ADDRESS: str = 'localhost'
env BEFLOW_REDIS_PORT: int = 6363
env BEFLOW_REDIS_DB: int = 0
env BEFLOW_REDIS_PASSWORD: str = 'bespokefit-server-1'
env BEFLOW_COORDINATOR_MAX_UPDATE_INTERVAL: float = 5.0
env BEFLOW_COORDINATOR_MAX_RUNNING_TASKS: int = 1000
env BEFLOW_FRAGMENTER_WORKER_N_CORES: Union[int, Literal['auto']] = 'auto'
env BEFLOW_FRAGMENTER_WORKER_MAX_MEM: Union[float, Literal['auto']] = 'auto'
env BEFLOW_QC_COMPUTE_WORKER_N_CORES: Union[int, Literal['auto']] = 'auto'
env BEFLOW_QC_COMPUTE_WORKER_MAX_MEM: Union[float, Literal['auto']] = 'auto'
env BEFLOW_QC_COMPUTE_WORKER_N_TASKS: Union[int, Literal['auto']] = 'auto'
env BEFLOW_OPTIMIZER_WORKER_N_CORES: Union[int, Literal['auto']] = 'auto'
env BEFLOW_OPTIMIZER_WORKER_MAX_MEM: Union[float, Literal['auto']] = 'auto'
env BEFLOW_OPTIMIZER_KEEP_FILES: bool = False

Deprecated since version 0.2.1: use BEFLOW_KEEP_TMP_FILES instead

Keep the optimizer’s temporary files.

env BEFLOW_KEEP_TMP_FILES: bool = False

Keep all temporary files.

Temporary files are written to the scratch directory, which can be configured with the --directory CLI argument. By default, a temporary directory is chosen for scratch, so both this environment variable and that CLI argument must be set to preserve temporary files.