Dataset

Datasets are useful for computing many methods for a single set of reactions where a reaction in a combination of molecules such as. There are currently two types of datasets:

  • rxn for datasets based on canonical chemical reactions \(A + B \rightarrow C\)

  • ie for interaction energy datasets \(M_{complex} \rightarrow M_{monomer_1} + M_{monomer_2}\)

Querying

Visualizing

Creating

Blank dataset can be constructed by choosing a dataset name and a dataset type (dtype).

ds = ptl.collections.Dataset("my_dataset", dtype="rxn")

New reactions can be added by providing the linear combination of molecules required to compute the desired quantity. When the Dataset is queried these linear combinations area automatically combined for the caller.

ds = ptl.collections.Dataset("Atomization Energies", dtype="ie")

N2 = ptl.Molecule("""
N 0.0 0.0 1.0975
N 0.0 0.0 0.0
unit angstrom
""")

N_atom = ptl.Molecule("""
0 2
N 0.0 0.0 0.0
""")


ds.add_rxn("Nitrogen Molecule", [(N2, 1.0), (N_atom, -2.0)])

A given reaction can be examined by using the get_rxn function. We store the molecule_hash followed by the coefficient.

json.dumps(ds.get_rxn("Nitrogen Molecule"), indent=2)
{
  "name": "Nitrogen Molecule",
  "stoichiometry": {
    "default": {
      "4d7518cc2c741f2b5f48d7c16e2ad4c660e11890": 1.0,
      "636aa99f49b32dd81d7c8cb3741e16c632835cdf": -2.0
    }
  },
  "attributes": {},
  "reaction_results": {
    "default": {}
  }
}

Datasets of dtype ie can be automatically contstruct counterpoise-correct (cp) and non-counterpoise correct (default) n-body expansions. Where the number after the name corresponds to the number of bodies involved in the computation.

ie_ds = ptl.collections.Dataset("my_dataset", dtype="rxn")

water_dimer_stretch = ptl.data.get_molecule("water_dimer_minima.psimol")
ie_ds.add_ie_rxn("water dimer minima", water_dimer_stretch)

json.dumps(ie_ds.get_rxn("water dimer minima"), indent=2)

{
  "name": "water dimer minima",
  "stoichiometry": {
    "default1": {
      "4cd68e5dde15c19fc2f5101d5fc5f19ac8afbc9c": 1.0,
      "da635a2e012a9ea876ea54422256bd93124e4271": 1.0
    },
    "cp1": {
      "9299ecc50e018f60128845e9f14b803da641f816": 1.0,
      "0f6382da1b658b634a05bc7c7f65ad115328f06f": 1.0
    },
    "default": {
      "358ad4bb4620e35cec79b17ec0f40acae1a548cb": 1.0
    },
    "cp": {
      "358ad4bb4620e35cec79b17ec0f40acae1a548cb": 1.0
    }
  },
  "attributes": {},
  "reaction_results": {
    "default": {}
  }
}

Computing

API

class qcfractal.interface.collections.Dataset(name: str, client: Optional[FractalClient] = None, **kwargs)[source]

The Dataset class for homogeneous computations on many molecules.

Variables
  • client (client.FractalClient) – A FractalClient connected to a server

  • data (dict) – JSON representation of the database backbone

  • df (pd.DataFrame) – The underlying dataframe for the Dataset object

class DataModel[source]
add_contributed_values(contrib: qcfractal.interface.collections.dataset.ContributedValues, overwrite=False) → None[source]

Adds a ContributedValues to the database.

Parameters
  • contrib (ContributedValues) – The ContributedValues to add.

  • overwrite (bool, optional) – Overwrites pre-existing values

add_entry(name: str, molecule: qcelemental.models.molecule.Molecule, **kwargs)[source]

Adds a new entry to the Datset

Parameters
  • name (str) – The name of the record

  • molecule (Molecule) – The Molecule associated with this record

  • **kwargs (Dict[str, Any]) – Additional arguements to pass to the record

add_keywords(alias: str, program: str, keyword: KeywordSet, default: bool = False) → bool[source]

Adds an option alias to the dataset. Not that keywords are not present until a save call has been completed.

Parameters
  • alias (str) – The alias of the option

  • program (str) – The compute program the alias is for

  • keyword (KeywordSet) – The Keywords object to use.

  • default (bool, optional) – Sets this option as the default for the program

compute(method: str, basis: Optional[str] = None, *, keywords: Optional[str] = None, program: Optional[str] = None, tag: Optional[str] = None, priority: Optional[str] = None) → qcfractal.interface.models.rest_models.ComputeResponse[source]

Executes a computational method for all reactions in the Dataset. Previously completed computations are not repeated.

Parameters
  • method (str) – The computational method to compute (B3LYP)

  • basis (Optional[str], optional) – The computational basis to compute (6-31G)

  • keywords (Optional[str], optional) – The keyword alias for the requested compute

  • program (Optional[str], optional) – The underlying QC program

  • tag (Optional[str], optional) – The queue tag to use when submitting compute requests.

  • priority (Optional[str], optional) – The priority of the jobs low, medium, or high.

Returns

An object that contains the submitted ObjectIds of the new compute. This object has the following fields:
  • ids: The ObjectId’s of the task in the order of input molecules

  • submitted: A list of ObjectId’s that were submitted to the compute queue

  • existing: A list of ObjectId’s of tasks already in the database

Return type

ComputeResponse

get_contributed_values(key: str) → qcfractal.interface.collections.dataset.ContributedValues[source]

Returns a copy of the requested ContributedValues object.

Parameters

key (str) – The ContributedValues object key.

Returns

The requested ContributedValues object.

Return type

ContributedValues

get_contributed_values_column(key: str) → Series[source]

Returns a Pandas column with the requested contributed values

Parameters
  • key (str) – The ContributedValues object key.

  • scale (None, optional) – All units are based in Hartree, the default scaling is to kcal/mol.

Returns

A pandas Series containing the request values.

Return type

Series

get_history(method: Optional[str] = None, basis: Optional[str] = None, keywords: Optional[str] = None, program: Optional[str] = None, force: bool = False) → DataFrame[source]

Queries known history from the search paramaters provided. Defaults to the standard programs and keywords if not provided.

Parameters
  • method (Optional[str]) – The computational method to compute (B3LYP)

  • basis (Optional[str], optional) – The computational basis to compute (6-31G)

  • keywords (Optional[str], optional) – The keyword alias for the requested compute

  • program (Optional[str], optional) – The underlying QC program

Returns

A DataFrame of the queried parameters

Return type

DataFrame

get_index() → List[str][source]

Returns the current index of the database.

Returns

ret – The names of all reactions in the database

Return type

List[str]

get_keywords(alias: str, program: str) → KeywordSet[source]

Pulls the keywords alias from the server for inspection.

Parameters
  • alias (str) – The keywords alias.

  • program (str) – The program the keywords correspond to.

Returns

The requested KeywordSet

Return type

KeywordSet

list_contributed_values() → List[str][source]

Lists the known keys for all contributed values.

Returns

A list of all known contributed values.

Return type

List[str]

list_history(dftd3: bool = False, get_base: bool = False, pretty: bool = True, **search) → DataFrame[source]

Lists the history of computations completed.

Parameters

**search (Dict[str, Optional[str]]) – Allows searching to narrow down return.

Returns

The computed keys.

Return type

DataFrame

query(method: str, basis: Optional[str] = None, *, keywords: Optional[str] = None, program: Optional[str] = None, field: str = None, as_array: bool = False, force: bool = False) → str[source]

Queries the local Portal for the requested keys.

Parameters
  • method (str) – The computational method to query on (B3LYP)

  • basis (Optional[str], optional) – The computational basis query on (6-31G)

  • keywords (Optional[str], optional) – The option token desired

  • program (Optional[str], optional) – The program to query on

  • field (str, optional) – The result field to query on

  • as_array (bool, optional) – Converts the returned values to NumPy arrays

  • force (bool, optional) – Forces a requery if data is already present

Returns

success – The name of the queried column

Return type

str

Examples

>>> ds.query("B3LYP", "aug-cc-pVDZ", stoich="cp", prefix="cp-")
set_default_benchmark(benchmark: str) → bool[source]

Sets the default benchmark value.

Parameters

benchmark (str) – The benchmark to default to.

set_default_program(program: str) → bool[source]

Sets the default program.

Parameters

program (str) – The program to default to.

statistics(stype: str, value: str, bench: Optional[str] = None, **kwargs)[source]

Provides statistics for various columns in the underlying dataframe.

Parameters
  • stype (str) – The type of statistic in question

  • value (str) – The method string to compare

  • bench (str, optional) – The benchmark method for the comparison, defaults to default_benchmark.

  • kwargs (Dict[str, Any]) – Additional kwargs to pass to the statistics functions

Returns

ret – Returns a DataFrame, Series, or float with the requested statistics depending on input.

Return type

pd.DataFrame, pd.Series, float

visualize(method: Optional[str] = None, basis: Optional[str] = None, keywords: Optional[str] = None, program: Optional[str] = None, groupby: Optional[str] = None, metric: str = 'UE', bench: Optional[str] = None, kind: str = 'bar', return_figure: Optional[bool] = None) → plotly.Figure[source]
Parameters
  • method (Optional[str], optional) – Methods to query

  • basis (Optional[str], optional) – Bases to query

  • keywords (Optional[str], optional) – Keyword aliases to query

  • program (Optional[str], optional) – Programs aliases to query

  • groupby (Optional[str], optional) – Groups the plot by this index.

  • metric (str, optional) – The metric to use either UE (unsigned error) or URE (unsigned relative error)

  • bench (Optional[str], optional) – The benchmark level of theory to use

  • kind (str, optional) – The kind of chart to produce, either ‘bar’ or ‘violin’

  • return_figure (Optional[bool], optional) – If True, return the raw plotly figure. If False, returns a hosted iPlot. If None, return a iPlot display in Jupyter notebook and a raw plotly figure in all other circumstances.

Returns

The requested figure.

Return type

plotly.Figure