ThermoMLDataSet

class openff.evaluator.datasets.thermoml.ThermoMLDataSet[source]

A dataset of physical property measurements created from a ThermoML dataset.

Examples

For example, we can use the DOI 10.1016/j.jct.2005.03.012 as a key for retrieving the dataset from the ThermoML Archive:

>>> dataset = ThermoMLDataSet.from_doi('10.1016/j.jct.2005.03.012')

You can also specify multiple ThermoML Archive keys to create a dataset from multiple ThermoML files:

>>> thermoml_keys = ['10.1021/acs.jced.5b00365', '10.1021/acs.jced.5b00474']
>>> dataset = ThermoMLDataSet.from_doi(*thermoml_keys)
__init__()[source]

Constructs a new ThermoMLDataSet object.

Methods

__init__()

Constructs a new ThermoMLDataSet object.

add_properties(*physical_properties[, validate])

Adds a physical property to the data set.

filter_by_components(number_of_components)

Filter the data set based on the number of components present in the substance the data points were collected for.

filter_by_elements(*allowed_elements)

Filters out those properties which were estimated for

filter_by_function(filter_function)

Filter the data set using a given filter function.

filter_by_phases(phases)

Filter the data set based on the phase of the property (e.g liquid).

filter_by_pressure(min_pressure, max_pressure)

Filter the data set based on a minimum and maximum pressure.

filter_by_property_types(*property_types)

Filter the data set based on the type of property (e.g Density).

filter_by_smiles(*allowed_smiles)

Filters out those properties which were estimated for

filter_by_temperature(min_temperature, …)

Filter the data set based on a minimum and maximum temperature.

filter_by_uncertainties()

Filters out those properties which don’t have their uncertainties reported.

from_doi(*doi_list)

Load a ThermoML data set from a list of DOIs

from_file(*file_list)

Load a ThermoML data set from a list of files

from_json(file_path)

Create this object from a JSON file.

from_url(*url_list)

Load a ThermoML data set from a list of URLs

from_xml(xml, default_source)

Load a ThermoML data set from an xml object.

json([file_path, format])

Creates a JSON representation of this class.

merge(data_set[, validate])

Merge another data set into the current one.

parse_json(string_contents[, encoding])

Parses a typed json string into the corresponding class structure.

properties_by_substance(substance)

A generator which may be used to loop over all of the properties which were measured for a particular substance.

properties_by_type(property_type)

A generator which may be used to loop over all of properties of a particular type, e.g.

to_pandas()

Converts a PhysicalPropertyDataSet to a pandas.DataFrame object with columns of

validate()

Checks to ensure that all properties within the set are valid physical property object.

Attributes

properties

A list of all of the properties within this set.

property_types

The types of property within this data set.

registered_properties

sources

The sources from which the properties in this data set were gathered.

substances

The substances for which the properties in this data set were collected for.

classmethod from_doi(*doi_list)[source]

Load a ThermoML data set from a list of DOIs

Parameters

doi_list (str) – The list of DOIs to pull data from

Returns

The loaded data set.

Return type

ThermoMLDataSet

classmethod from_url(*url_list)[source]

Load a ThermoML data set from a list of URLs

Parameters

url_list (str) – The list of URLs to pull data from

Returns

The loaded data set.

Return type

ThermoMLDataSet

classmethod from_file(*file_list)[source]

Load a ThermoML data set from a list of files

Parameters

file_list (str) – The list of files to pull data from

Returns

The loaded data set.

Return type

ThermoMLDataSet

add_properties(*physical_properties, validate=True)

Adds a physical property to the data set.

Parameters
  • physical_properties (PhysicalProperty) – The physical property to add.

  • validate (bool) – Whether to validate the properties before adding them to the set.

filter_by_components(number_of_components)

Filter the data set based on the number of components present in the substance the data points were collected for.

Parameters

number_of_components (int) – The allowed number of components in the mixture.

Examples

Filter the dataset to only include pure substance properties.

>>> # Load in the data set of properties which will be used for comparisons
>>> from openff.evaluator.datasets.thermoml import ThermoMLDataSet
>>> data_set = ThermoMLDataSet.from_doi('10.1016/j.jct.2016.10.001')
>>>
>>> data_set.filter_by_components(number_of_components=1)
filter_by_elements(*allowed_elements)
Filters out those properties which were estimated for

compounds which contain elements outside of those defined in allowed_elements.

Parameters

allowed_elements (str) – The symbols (e.g. C, H, Cl) of the elements to retain.

filter_by_function(filter_function)

Filter the data set using a given filter function.

Parameters

filter_function (lambda) – The filter function.

filter_by_phases(phases)

Filter the data set based on the phase of the property (e.g liquid).

Parameters

phases (PropertyPhase) – The phase of property which should be retained.

Examples

Filter the dataset to only include liquid properties.

>>> # Load in the data set of properties which will be used for comparisons
>>> from openff.evaluator.datasets.thermoml import ThermoMLDataSet
>>> data_set = ThermoMLDataSet.from_doi('10.1016/j.jct.2016.10.001')
>>>
>>> from openff.evaluator.datasets import PropertyPhase
>>> data_set.filter_by_temperature(PropertyPhase.Liquid)
filter_by_pressure(min_pressure, max_pressure)

Filter the data set based on a minimum and maximum pressure.

Parameters

Examples

Filter the dataset to only include properties measured between 70-150 kPa.

>>> # Load in the data set of properties which will be used for comparisons
>>> from openff.evaluator.datasets.thermoml import ThermoMLDataSet
>>> data_set = ThermoMLDataSet.from_doi('10.1016/j.jct.2016.10.001')
>>>
>>> from openff.evaluator import unit
>>> data_set.filter_by_temperature(min_pressure=70*unit.kilopascal, max_temperature=150*unit.kilopascal)
filter_by_property_types(*property_types)

Filter the data set based on the type of property (e.g Density).

Parameters

property_types (PropertyType or str) – The type of property which should be retained.

Examples

Filter the dataset to only contain densities and static dielectric constants

>>> # Load in the data set of properties which will be used for comparisons
>>> from openff.evaluator.datasets.thermoml import ThermoMLDataSet
>>> data_set = ThermoMLDataSet.from_doi('10.1016/j.jct.2016.10.001')
>>>
>>> # Filter the dataset to only include densities and dielectric constants.
>>> from openff.evaluator.properties import Density, DielectricConstant
>>> data_set.filter_by_property_types(Density, DielectricConstant)

or

>>> data_set.filter_by_property_types('Density', 'DielectricConstant')
filter_by_smiles(*allowed_smiles)
Filters out those properties which were estimated for

compounds which do not appear in the allowed smiles list.

Parameters

allowed_smiles (str) – The smiles identifiers of the compounds to keep after filtering.

filter_by_temperature(min_temperature, max_temperature)

Filter the data set based on a minimum and maximum temperature.

Parameters

Examples

Filter the dataset to only include properties measured between 130-260 K.

>>> # Load in the data set of properties which will be used for comparisons
>>> from openff.evaluator.datasets.thermoml import ThermoMLDataSet
>>> data_set = ThermoMLDataSet.from_doi('10.1016/j.jct.2016.10.001')
>>>
>>> from openff.evaluator import unit
>>> data_set.filter_by_temperature(min_temperature=130*unit.kelvin, max_temperature=260*unit.kelvin)
filter_by_uncertainties()

Filters out those properties which don’t have their uncertainties reported.

classmethod from_json(file_path)

Create this object from a JSON file.

Parameters

file_path (str) – The path to load the JSON from.

Returns

The parsed class.

Return type

cls

classmethod from_xml(xml, default_source)[source]

Load a ThermoML data set from an xml object.

Parameters
  • xml (str) – The xml string to parse.

  • default_source (Source) – The source to use if one cannot be parsed from the archive itself.

Returns

The loaded ThermoML data set.

Return type

ThermoMLDataSet

json(file_path=None, format=False)

Creates a JSON representation of this class.

Parameters
  • file_path (str, optional) – The (optional) file path to save the JSON file to.

  • format (bool) – Whether to format the JSON or not.

Returns

The JSON representation of this class.

Return type

str

merge(data_set, validate=True)

Merge another data set into the current one.

Parameters
  • data_set (PhysicalPropertyDataSet) – The secondary data set to merge into this one.

  • validate (bool) – Whether to validate the other data set before merging.

classmethod parse_json(string_contents, encoding='utf8')

Parses a typed json string into the corresponding class structure.

Parameters
  • string_contents (str or bytes) – The typed json string.

  • encoding (str) – The encoding of the string_contents.

Returns

The parsed class.

Return type

Any

property properties

A list of all of the properties within this set.

Type

tuple of PhysicalProperty

properties_by_substance(substance)

A generator which may be used to loop over all of the properties which were measured for a particular substance.

Parameters

substance (Substance) – The substance of interest.

Returns

Return type

generator of PhysicalProperty

properties_by_type(property_type)

A generator which may be used to loop over all of properties of a particular type, e.g. all “Density” properties.

Parameters

property_type (str or type of PhysicalProperty) – The type of property of interest. This may either be the string class name of the property or the class type.

Returns

Return type

generator of PhysicalProperty

property property_types

The types of property within this data set.

Type

set of str

property sources

The sources from which the properties in this data set were gathered.

Type

set of Source

property substances

The substances for which the properties in this data set were collected for.

Type

set of Substance

to_pandas()

Converts a PhysicalPropertyDataSet to a pandas.DataFrame object with columns of

  • ‘Id’

  • ‘Temperature (K)’

  • ‘Pressure (kPa)’

  • ‘Phase’

  • ‘N Components’

  • ‘Component 1’

  • ‘Role 1’

  • ‘Mole Fraction 1’

  • ‘Exact Amount 1’

  • ‘Component N’

  • ‘Role N’

  • ‘Mole Fraction N’

  • ‘Exact Amount N’

  • ‘<Property 1> Value (<default unit>)’

  • ‘<Property 1> Uncertainty / (<default unit>)’

  • ‘<Property N> Value / (<default unit>)’

  • ‘<Property N> Uncertainty / (<default unit>)’

  • ‘Source’

where ‘Component X’ is a column containing the smiles representation of component X.

Returns

The create data frame.

Return type

pandas.DataFrame

validate()

Checks to ensure that all properties within the set are valid physical property object.