class openff.evaluator.datasets.curation.components.filtering.FilterBySubstances[source]

A component which filters the data set so that it only contains properties measured for particular substances.

This method is similar to filter_by_smiles, however here we explicitly define the full substances compositions, rather than individual smiles which should either be included or excluded.


To filter the data set to only include measurements for pure methanol, pure benzene or an aqueous ethanol mix:

>>> schema = FilterBySubstancesSchema(
>>>     substances_to_include=[
>>>         ('CO',),
>>>         ('C1=CC=CC=C1',),
>>>         ('CCO', 'O')
>>>     ]
>>> )

To filter out measurements made for an aqueous mix of benzene:

>>> schema = FilterBySubstancesSchema(
>>>     substances_to_exclude=[('O', 'C1=CC=CC=C1')]
>>> )



apply(data_set, schema[, n_processes])

Apply this curation component to a data set.

classmethod apply(data_set, schema, n_processes=1)

Apply this curation component to a data set.

  • data_set – The data frame to apply the component to.

  • schema – The schema which defines how this component should be applied.

  • n_processes – The number of processes that this component is allowed to parallelize across.


Return type

The data set which has had the component applied to it.