Tutorial 03 - Analysing Data Sets

Open In Colab

In this tutorial we will be analysing the results of the calculations which we performed in the second tutorial. The tutorial will cover:

  • comparing the estimated data set with the experimental data set.

  • plotting the two data sets.

Note: If you are running this tutorial in google colab you will need to run a setup script instead of following the installation instructions:

[ ]:
# !wget https://raw.githubusercontent.com/openforcefield/openff-evaluator/main/docs/tutorials/colab_setup.ipynb
# %run colab_setup.ipynb

For the sake of clarity all warnings will be disabled in this tutorial:

[ ]:
import warnings

warnings.filterwarnings("ignore")
import logging

logging.getLogger("openff.toolkit").setLevel(logging.ERROR)

Loading the Data Sets

We will begin by loading both the experimental data set and the estimated data set:

[ ]:
import pathlib

from openff.evaluator.datasets import PhysicalPropertyDataSet

experimental_data_set_path = "filtered_data_set.json"
estimated_data_set_path = "estimated_data_set.json"

# If you have not yet completed the previous tutorials or do not have the data set files
# available, this tutorial will use copies provided by the framework

if not (
    pathlib.Path(experimental_data_set_path).exists()
    and pathlib.Path(estimated_data_set_path).exists()
):
    from openff.evaluator.utils import get_data_filename

    experimental_data_set_path = get_data_filename(
        "tutorials/tutorial01/filtered_data_set.json"
    )
    estimated_data_set_path = get_data_filename(
        "tutorials/tutorial02/estimated_data_set.json"
    )

experimental_data_set = PhysicalPropertyDataSet.from_json(experimental_data_set_path)
estimated_data_set = PhysicalPropertyDataSet.from_json(estimated_data_set_path)

if everything went well from the previous tutorials, these data sets will contain the density and \(H_{vap}\) of ethanol and isopropanol:

[ ]:
experimental_data_set.to_pandas().head()
[ ]:
estimated_data_set.to_pandas().head()

Extracting the Results

We will now compare how the value of each property estimated by simulation deviates from the experimental measurement.

To do this we will extract a list which contains pairs of experimental and evaluated properties. We can easily match properties based on the unique ids which were automatically assigned to them on their creation:

[ ]:
properties_by_type = {"Density": [], "EnthalpyOfVaporization": []}

for experimental_property in experimental_data_set:
    # Find the estimated property which has the same id as the
    # experimental property.
    estimated_property = next(
        x for x in estimated_data_set if x.id == experimental_property.id
    )

    # Add this pair of properties to the list of pairs
    property_type = experimental_property.__class__.__name__
    properties_by_type[property_type].append(
        (experimental_property, estimated_property)
    )

Plotting the Results

We will now compare the experimental results to the estimated ones by plotting them using matplotlib:

[ ]:
from matplotlib import pyplot

# Create the figure we will plot to.
figure, axes = pyplot.subplots(nrows=1, ncols=2, figsize=(8.0, 4.0))

# Set the axis titles
axes[0].set_xlabel("OpenFF 1.0.0")
axes[0].set_ylabel("Experimental")
axes[0].set_title("Density $kg m^{-3}$")

axes[1].set_xlabel("OpenFF 1.0.0")
axes[1].set_ylabel("Experimental")
axes[1].set_title("$H_{vap}$ $kJ mol^{-1}$")

# Define the preferred units of the properties
from openff.units import unit

preferred_units = {
    "Density": unit.kilogram / unit.meter**3,
    "EnthalpyOfVaporization": unit.kilojoule / unit.mole,
}

for index, property_type in enumerate(properties_by_type):
    experimental_values = []
    estimated_values = []

    preferred_unit = preferred_units[property_type]

    # Convert the values of our properties to the preferred units.
    for experimental_property, estimated_property in properties_by_type[property_type]:
        experimental_values.append(
            experimental_property.value.to(preferred_unit).magnitude
        )
        estimated_values.append(estimated_property.value.to(preferred_unit).magnitude)

    axes[index].plot(
        estimated_values, experimental_values, marker="x", linestyle="None"
    )

Conclusion

And that concludes the third tutorial!

If you have any questions and / or feedback, please open an issue on the GitHub issue tracker.