Reaction Dataset

ReactionDatasets are useful for computing many methods for a set of reactions. There are currently two types of ReactionDatasets:

  • rxn for datasets based on canonical chemical reactions \(A + B \rightarrow C\)

  • ie for interaction energy datasets \(M_{complex} \rightarrow M_{monomer_1} + M_{monomer_2}\)

Querying

Available result specifications (method, basis set, program, keyword, driver combinations) in a ReactionDataset may be listed with list_values. Beyond those specifications in Datasets, ReactionDatasets provide a stoich field which may be used to select different strategies for computation of interaction and reaction energies. By default, the counterpoise-corrected ("cp") and uncorrected ("default") values are available.

Reaction values, such as interaction or reaction energies, are queried with get_values. For results computed using QCFractal, the underlying Records are retrieved with get_records, and are broken down by Molecule within the reaction.

For examples of querying ReactionDatasets, see the QCArchive examples.

Visualizing

Statistics on ReactionDatasets may be computed using the statistics command, and plotted using the visualize command.

For examples of visualizing ReactionDatasets, see the QCArchive examples.

Creating

An empty dataset can be constructed by choosing a dataset name and a dataset type (dtype).

ds = ptl.collections.Dataset("my_dataset", dtype="rxn")

New reactions can be added by providing the linear combination of Molecules required to compute the desired quantity. When the ReactionDataset is queried these linear combinations are automatically combined for the caller.

ds = ptl.collections.Dataset("Atomization Energies", dtype="ie")

N2 = ptl.Molecule.from_data("""
N 0.0 0.0 1.0975
N 0.0 0.0 0.0
unit angstrom
""")

N_atom = ptl.Molecule.from_data("""
0 2
N 0.0 0.0 0.0
""")


ds.add_rxn("Nitrogen Molecule", [(N2, 1.0), (N_atom, -2.0)])

A given reaction can be examined by using the get_rxn function. We store the molecule_hash followed by the reaction coefficient.

json.dumps(ds.get_rxn("Nitrogen Molecule"), indent=2)
{
  "name": "Nitrogen Molecule",
  "stoichiometry": {
    "default": {
      "1": 1.0,
      "2": -2.0
    }
  },
  "attributes": {},
  "reaction_results": {
    "default": {}
  }
}

Datasets of dtype ie can automatically construct counterpoise-correct (cp) and non-counterpoise-correct (default) n-body expansions. The the number after the stoichiometry corresponds to the number of bodies involved in the computation.

ie_ds = ptl.collections.ReactionDataset("my_dataset", dtype="rxn")

water_dimer_stretch = ptl.data.get_molecule("water_dimer_minima.psimol")
ie_ds.add_ie_rxn("water dimer minima", water_dimer_stretch)

json.dumps(ie_ds.get_rxn("water dimer minima"), indent=2)

{
  "name": "water dimer minima",
  "stoichiometry": {
    "default1": {  # Monomers
      "3": 1.0,
      "4": 1.0
    },
    "cp1": {  # Monomers
      "5": 1.0,
      "6": 1.0
    },
    "default": {  # Complex
      "7": 1.0
    },
    "cp": {  # Complex
      "7": 1.0
    }
  },
  "attributes": {},
  "reaction_results": {
    "default": {}
  }
}

Computing

Computations are performed in the same manner as for a Dataset. See the Dataset Documentation for more information.

API

class qcportal.collections.ReactionDataset(name: str, client: Optional[FractalClient] = None, ds_type: str = 'rxn', **kwargs)[source]

The ReactionDataset class for homogeneous computations on many reactions.

Variables
  • client (client.FractalClient) – A FractalClient connected to a server

  • data (ReactionDataset.DataModel) – A Model representation of the database backbone

  • df (pd.DataFrame) – The underlying dataframe for the Dataset object

  • rxn_index (pd.Index) – The unrolled reaction index for all reactions in the Dataset

class DataModel[source]
Parameters
  • id (str, Default: local)

  • name (str)

  • collection (str)

  • provenance (name=’provenance’ type=Mapping[str, str] required=False default={}, Default: {})

  • tags (List[str], Default: [])

  • tagline (str, Optional)

  • description (str, Optional)

  • group (str, Default: default)

  • visibility (bool, Default: True)

  • view_url_hdf5 (str, Optional)

  • view_url_plaintext (str, Optional)

  • view_metadata (name=’view_metadata’ type=Optional[Mapping[str, str]] required=False default=None, Optional)

  • view_available (bool, Default: False)

  • metadata (Dict[str, Any], Default: {})

  • default_program (str, Optional)

  • default_keywords (name=’default_keywords’ type=Mapping[str, str] required=False default={}, Default: {})

  • default_driver (str, Default: energy)

  • default_units (str, Default: kcal / mol)

  • default_benchmark (str, Optional)

  • alias_keywords (Dict[str, Dict[str, str]], Default: {})

  • records (ReactionEntry, Optional)

  • contributed_values (ContributedValues, Optional)

  • history (Set[Tuple[str, str, str, str, str, str]], Default: set())

  • history_keys (Tuple[str, str, str, str, str, str], Default: (‘driver’, ‘program’, ‘method’, ‘basis’, ‘keywords’, ‘stoichiometry’))

  • ds_type ({rxn,ie}, Default: rxn)

compare(other: Union[ProtoModel, pydantic.main.BaseModel], **kwargs) → bool

Compares the current object to the provided object recursively.

Parameters
  • other (Model) – The model to compare to.

  • **kwargs – Additional kwargs to pass to qcelemental.compare_recursive.

Returns

True if the objects match.

Return type

bool

classmethod construct(_fields_set: Optional[SetStr] = None, **values: Any) → Model

Creates a new model setting __dict__ and __fields_set__ from trusted or pre-validated data. Default values are respected, but no other validation is performed.

copy

Duplicate a model, optionally choose which fields to include, exclude and change.

Parameters
  • include – fields to include in new model

  • exclude – fields to exclude from new model, as with values this takes precedence over include

  • update – values to change/add in the new model. Note: the data is not validated before creating the new model: you should trust this data

  • deep – set to True to make a deep copy of the model

Returns

new model instance

dict(**kwargs) → Dict[str, Any]

Generate a dictionary representation of the model, optionally specifying which fields to include or exclude.

json

Generate a JSON representation of the model, include and exclude arguments as per dict().

encoder is an optional function to supply as default to json.dumps(), other arguments as per json.dumps().

classmethod parse_file(path: Union[str, pathlib.Path], *, encoding: str = None) → qcelemental.models.basemodels.ProtoModel

Parses a file into a Model object.

Parameters
  • path (Union[str, Path]) – The path to the file.

  • encoding (str, optional) – The type of the files, available types are: {‘json’, ‘msgpack’, ‘pickle’}. Attempts to automatically infer the file type from the file extension if None.

Returns

The requested model from a serialized format.

Return type

Model

classmethod parse_raw(data: Union[bytes, str], *, encoding: str = None) → qcelemental.models.basemodels.ProtoModel

Parses raw string or bytes into a Model object.

Parameters
  • data (Union[bytes, str]) – A serialized data blob to be deserialized into a Model.

  • encoding (str, optional) – The type of the serialized array, available types are: {‘json’, ‘json-ext’, ‘msgpack-ext’, ‘pickle’}

Returns

The requested model from a serialized format.

Return type

Model

serialize(encoding: str, *, include: Optional[Set[str]] = None, exclude: Optional[Set[str]] = None, exclude_unset: bool = False) → Union[bytes, str]

Generates a serialized representation of the model

Parameters
  • encoding (str) – The serialization type, available types are: {‘json’, ‘json-ext’, ‘msgpack-ext’}

  • include (Optional[Set[str]], optional) – Fields to be included in the serialization.

  • exclude (Optional[Set[str]], optional) – Fields to be excluded in the serialization.

  • exclude_unset (bool, optional) – If True, skips fields that have default values provided.

Returns

The serialized model.

Return type

Union[bytes, str]

classmethod update_forward_refs(**localns: Any) → None

Try to update ForwardRefs on fields based on this Model, globalns and localns.

add_contributed_values(contrib: qcportal.collections.dataset.ContributedValues, overwrite: bool = False) → None

Adds a ContributedValues to the database. Be sure to call save() to commit changes to the server.

Parameters
  • contrib (ContributedValues) – The ContributedValues to add.

  • overwrite (bool, optional) – Overwrites pre-existing values

add_entry(name: str, molecule: Molecule, **kwargs: Dict[str, Any]) → None

Adds a new entry to the Dataset

Parameters
  • name (str) – The name of the record

  • molecule (Molecule) – The Molecule associated with this record

  • **kwargs (Dict[str, Any]) – Additional arguments to pass to the record

add_ie_rxn(name: str, mol: qcelemental.models.molecule.Molecule, **kwargs) → qcportal.collections.reaction_dataset.ReactionEntry[source]

Add a interaction energy reaction entry to the database. Automatically builds CP and no-CP reactions for the fragmented molecule.

Parameters
  • name (str) – The name of the reaction

  • mol (Molecule) – A molecule with multiple fragments

  • **kwargs – Additional kwargs to pass into build_id_fragments.

Returns

A representation of the new reaction.

Return type

ReactionEntry

add_keywords(alias: str, program: str, keyword: KeywordSet, default: bool = False) → bool

Adds an option alias to the dataset. Not that keywords are not present until a save call has been completed.

Parameters
  • alias (str) – The alias of the option

  • program (str) – The compute program the alias is for

  • keyword (KeywordSet) – The Keywords object to use.

  • default (bool, optional) – Sets this option as the default for the program

add_rxn(name: str, stoichiometry: Dict[str, List[Tuple[qcelemental.models.molecule.Molecule, float]]], reaction_results: Optional[Dict[str, str]] = None, attributes: Optional[Dict[str, Union[int, float, str]]] = None, other_fields: Optional[Dict[str, Any]] = None) → qcportal.collections.reaction_dataset.ReactionEntry[source]

Adds a reaction to a database object.

Parameters
  • name (str) – Name of the reaction.

  • stoichiometry (list or dict) – Either a list or dictionary of lists

  • reaction_results (dict or None, Optional, Default: None) – A dictionary of the computed total interaction energy results

  • attributes (dict or None, Optional, Default: None) – A dictionary of attributes to assign to the reaction

  • other_fields (dict or None, Optional, Default: None) – A dictionary of additional user defined fields to add to the reaction entry

Returns

A complete specification of the reaction

Return type

ReactionEntry

static build_ie_fragments(mol: qcelemental.models.molecule.Molecule, **kwargs) → Dict[str, List[Tuple[qcelemental.models.molecule.Molecule, float]]][source]

Build the stoichiometry for an Interaction Energy.

Parameters
  • mol (Molecule class or str) – Molecule to fragment.

  • do_default (bool) – Create the default (noCP) stoichiometry.

  • do_cp (bool) – Create the counterpoise (CP) corrected stoichiometry.

  • do_vmfc (bool) – Create the Valiron-Mayer Function Counterpoise (VMFC) corrected stoichiometry.

  • max_nbody (int) – The maximum fragment level built, if zero defaults to the maximum number of fragments.

Notes

Returns

ret – A JSON representation of the fragmented molecule.

Return type

dict

compute(method: str, basis: Optional[str] = None, *, keywords: Optional[str] = None, program: Optional[str] = None, stoich: str = 'default', ignore_ds_type: bool = False, tag: Optional[str] = None, priority: Optional[str] = None) → ComputeResponse[source]

Executes a computational method for all reactions in the Dataset. Previously completed computations are not repeated.

Parameters
  • method (str) – The computational method to compute (B3LYP)

  • basis (Optional[str], optional) – The computational basis to compute (6-31G)

  • keywords (Optional[str], optional) – The keyword alias for the requested compute

  • program (Optional[str], optional) – The underlying QC program

  • stoich (str, optional) – The stoichiometry of the requested compute (cp/nocp/etc)

  • ignore_ds_type (bool, optional) – Optionally only compute the “default” geometry

  • tag (Optional[str], optional) – The queue tag to use when submitting compute requests.

  • priority (Optional[str], optional) – The priority of the jobs low, medium, or high.

Returns

An object that contains the submitted ObjectIds of the new compute. This object has the following fields:
  • ids: The ObjectId’s of the task in the order of input molecules

  • submitted: A list of ObjectId’s that were submitted to the compute queue

  • existing: A list of ObjectId’s of tasks already in the database

Return type

ComputeResponse

download(local_path: Union[str, pathlib.Path, None] = None, verify: bool = True, progress_bar: bool = True) → None

Download a remote view if available. The dataset will use this view to avoid server queries for calls to: - get_entries - get_molecules - get_values - list_values

Parameters
  • local_path (Optional[Union[str, Path]], optional) – Local path the store downloaded view. If None, the view will be stored in a temporary file and deleted on exit.

  • verify (bool, optional) – Verify download checksum. Default: True.

  • progress_bar (bool, optional) – Display a download progress bar. Default: True

classmethod from_json(data: Dict[str, Any], client: FractalClient = None) → Collection

Creates a new class from a JSON blob

Parameters
  • data (Dict[str, Any]) – The JSON blob to create a new class from.

  • client (FractalClient, optional) – A FractalClient connected to a server

Returns

A constructed collection.

Return type

Collection

classmethod from_server(client: FractalClient, name: str) → Collection

Creates a new class from a server

Parameters
  • client (FractalClient) – A FractalClient connected to a server

  • name (str) – The name of the collection to pull from.

Returns

A constructed collection.

Return type

Collection

get_entries(subset: Optional[List[str]] = None, force: bool = False) → pandas.core.frame.DataFrame

Provides a list of entries for the dataset

Parameters
  • subset (Optional[List[str]], optional) – The indices of the desired subset. Return all indices if subset is None.

  • force (bool, optional) – skip cache

Returns

A dataframe containing entry names and specifciations. For Dataset, specifications are molecule ids. For ReactionDataset, specifications describe reaction stoichiometry.

Return type

pd.DataFrame

get_index(subset: Optional[List[str]] = None, force: bool = False) → List[str]

Returns the current index of the database.

Returns

ret – The names of all reactions in the database

Return type

List[str]

get_keywords(alias: str, program: str, return_id: bool = False) → Union[KeywordSet, str]

Pulls the keywords alias from the server for inspection.

Parameters
  • alias (str) – The keywords alias.

  • program (str) – The program the keywords correspond to.

  • return_id (bool, optional) – If True, returns the id rather than the KeywordSet object. Description

Returns

The requested KeywordSet or KeywordSet id.

Return type

Union[‘KeywordSet’, str]

get_molecules(subset: Union[str, Set[str], None] = None, stoich: Union[str, List[str]] = 'default', force: bool = False) → pandas.core.frame.DataFrame[source]

Queries full Molecules from the database.

Parameters
  • subset (Optional[Union[str, Set[str]]], optional) – The index subset to query on

  • stoich (Union[str, List[str]], optional) – The stoichiometries to pull from, either a single or multiple stoichiometries

  • force (bool, optional) – Force pull of molecules from server

Returns

Indexed Molecules which match the stoich and subset string.

Return type

pd.DataFrame

get_records(method: str, basis: Optional[str] = None, *, keywords: Optional[str] = None, program: Optional[str] = None, stoich: Union[str, List[str]] = 'default', include: Optional[List[str]] = None, subset: Union[str, Set[str], None] = None) → Union[pandas.core.frame.DataFrame, ResultRecord][source]

Queries the local Portal for the requested keys and stoichiometry.

Parameters
  • method (str) – The computational method to query on (B3LYP)

  • basis (Optional[str], optional) – The computational basis to query on (6-31G)

  • keywords (Optional[str], optional) – The option token desired

  • program (Optional[str], optional) – The program to query on

  • stoich (Union[str, List[str]], optional) – The given stoichiometry to compute.

  • include (Optional[Dict[str, bool]], optional) – The attribute project to perform on the query, otherwise returns ResultRecord objects.

  • subset (Optional[Union[str, Set[str]]], optional) – The index subset to query on

Returns

The name of the queried column

Return type

Union[pd.DataFrame, ‘ResultRecord’]

get_rxn(name: str) → qcportal.collections.reaction_dataset.ReactionEntry[source]

Returns the JSON object of a specific reaction.

Parameters

name (str) – The name of the reaction to query

Returns

ret – The JSON representation of the reaction

Return type

dict

get_values(method: Union[List[str], str, None] = None, basis: Union[List[str], str, None] = None, keywords: Optional[str] = None, program: Optional[str] = None, driver: Optional[str] = None, stoich: str = 'default', name: Union[List[str], str, None] = None, native: Optional[bool] = None, subset: Union[List[str], str, None] = None, force: bool = False) → pandas.core.frame.DataFrame[source]

Obtains values from the known history from the search paramaters provided for the expected return_result values. Defaults to the standard programs and keywords if not provided.

Note that unlike get_records, get_values will automatically expand searches and return multiple method and basis combinations simultaneously.

None is a wildcard selector. To search for None, use “None”.

methodOptional[Union[str, List[str]]], optional

The computational method (B3LYP)

basisOptional[Union[str, List[str]]], optional

The computational basis (6-31G)

keywordsOptional[str], optional

The keyword alias

programOptional[str], optional

The underlying QC program

driverOptional[str], optional

The type of calculation (e.g. energy, gradient, hessian, dipole…)

stoichstr, optional

Stoichiometry of the reaction.

nameOptional[Union[str, List[str]]], optional

Canonical name of the record. Overrides the above selectors.

native: Optional[bool], optional

True: only include data computed with QCFractal False: only include data contributed from outside sources None: include both

subset: Optional[List[str]], optional

The indices of the desired subset. Return all indices if subset is None.

forcebool, optional

Data is typically cached, forces a new query if True

Returns

A DataFrame of values with columns corresponding to methods and rows corresponding to reaction entries. Contributed (native=False) columns are marked with “(contributed)” and may include units in square brackets if their units differ in dimensionality from the ReactionDataset’s default units.

Return type

DataFrame

list_records(dftd3: bool = False, pretty: bool = True, **search: Union[List[str], str, None]) → pandas.core.frame.DataFrame

Lists specifications of available records, i.e. method, program, basis set, keyword set, driver combinations None is a wildcard selector. To search for None, use “None”.

Parameters
  • pretty (bool) – Replace NaN with “None” in returned DataFrame

  • **search (Dict[str, Optional[str]]) – Allows searching to narrow down return.

Returns

Record specifications matching **search.

Return type

DataFrame

list_values(method: Union[List[str], str, None] = None, basis: Union[List[str], str, None] = None, keywords: Optional[str] = None, program: Optional[str] = None, driver: Optional[str] = None, name: Union[List[str], str, None] = None, native: Optional[bool] = None, force: bool = False) → pandas.core.frame.DataFrame

Lists available data that may be queried with get_values. Results may be narrowed by providing search keys. None is a wildcard selector. To search for None, use “None”.

Parameters
  • method (Optional[Union[str, List[str]]], optional) – The computational method (B3LYP)

  • basis (Optional[Union[str, List[str]]], optional) – The computational basis (6-31G)

  • keywords (Optional[str], optional) – The keyword alias

  • program (Optional[str], optional) – The underlying QC program

  • driver (Optional[str], optional) – The type of calculation (e.g. energy, gradient, hessian, dipole…)

  • name (Optional[Union[str, List[str]]], optional) – The canonical name of the data column

  • native (Optional[bool], optional) – True: only include data computed with QCFractal False: only include data contributed from outside sources None: include both

  • force (bool, optional) – Data is typically cached, forces a new query if True

Returns

A DataFrame of the matching data specifications

Return type

DataFrame

parse_stoichiometry(stoichiometry: List[Tuple[Union[qcelemental.models.molecule.Molecule, str], float]]) → Dict[str, float][source]

Parses a stiochiometry list.

Parameters

stoichiometry (list) – A list of tuples describing the stoichiometry.

Returns

A dictionary describing the stoichiometry for use in the database. Keys are molecule hashes. Values are stoichiometric coefficients.

Return type

Dict[str, float]

Notes

This function attempts to convert the molecule into its corresponding hash. The following will happen depending on the form of the Molecule.
  • Molecule hash - Used directly in the stoichiometry.

  • Molecule class - Hash is obtained and the molecule will be added to the database upon saving.

  • Molecule string - Molecule will be converted to a Molecule class and the same process as the above will occur.

save(client: Optional[FractalClient] = None) → ObjectId

Uploads the overall structure of the Collection (indices, options, new molecules, etc) to the server.

Parameters

client (FractalClient, optional) – A FractalClient connected to a server to upload to

Returns

The ObjectId of the saved collection.

Return type

ObjectId

set_default_benchmark(benchmark: str) → bool

Sets the default benchmark value.

Parameters

benchmark (str) – The benchmark to default to.

set_default_program(program: str) → bool

Sets the default program.

Parameters

program (str) – The program to default to.

set_view(path: Union[str, pathlib.Path]) → None

Set a dataset to use a local view.

Parameters

path (Union[str, Path]) – path to an hdf5 file representing a view for this dataset

statistics(stype: str, value: str, bench: Optional[str] = None, **kwargs: Dict[str, Any]) → Union[numpy.ndarray, pandas.core.series.Series, numpy.float64]

Provides statistics for various columns in the underlying dataframe.

Parameters
  • stype (str) – The type of statistic in question

  • value (str) – The method string to compare

  • bench (str, optional) – The benchmark method for the comparison, defaults to default_benchmark.

  • kwargs (Dict[str, Any]) – Additional kwargs to pass to the statistics functions

Returns

Returns an ndarray, Series, or float with the requested statistics depending on input.

Return type

np.ndarray, pd.Series, float

ternary(cvals=None)[source]

Plots a ternary diagram of the DataBase if available

Parameters

cvals (None, optional) – Description

to_file(path: Union[str, pathlib.Path], encoding: str) → None

Writes a view of the dataset to a file

Parameters
  • path (Union[str, Path]) – Where to write the file

  • encoding (str) – Options: plaintext, hdf5

to_json(filename: Optional[str] = None)

If a filename is provided, dumps the file to disk. Otherwise returns a copy of the current data.

Parameters

filename (str, Optional, Default: None) – The filename to drop the data to.

Returns

ret – A JSON representation of the Collection

Return type

dict

visualize(method: Optional[str] = None, basis: Optional[str] = None, keywords: Optional[str] = None, program: Optional[str] = None, stoich: str = 'default', groupby: Optional[str] = None, metric: str = 'UE', bench: Optional[str] = None, kind: str = 'bar', return_figure: Optional[bool] = None) → plotly.Figure[source]
Parameters
  • method (Optional[str], optional) – Methods to query

  • basis (Optional[str], optional) – Bases to query

  • keywords (Optional[str], optional) – Keyword aliases to query

  • program (Optional[str], optional) – Programs aliases to query

  • stoich (str, optional) – Stoichiometry to query

  • groupby (Optional[str], optional) – Groups the plot by this index.

  • metric (str, optional) – The metric to use either UE (unsigned error) or URE (unsigned relative error)

  • bench (Optional[str], optional) – The benchmark level of theory to use

  • kind (str, optional) – The kind of chart to produce, either ‘bar’ or ‘violin’

  • return_figure (Optional[bool], optional) – If True, return the raw plotly figure. If False, returns a hosted iPlot. If None, return a iPlot display in Jupyter notebook and a raw plotly figure in all other circumstances.

Returns

The requested figure.

Return type

plotly.Figure