Optimization Dataset

The OptimizationDataset collection represents the results of geometry optimization calculations performed on a series of Molecules. The OptimizationDataset uses metadata specifications via Optimization Specification and QCSpecification classes to manage parameters of the geometry optimizer and the underlying gradient calculation, respectively.

The existing OptimizationDataset collections can be listed or selectively returned through FractalClient.list_collections("OptimizationDataset") and FractalClient.get_collection("OptimizationDataset", name), respectively.

Querying the Data

All available optimization data specifications can be listed via

>>> ds.list_specifications()

function. In order to show the status of the optimization calculations for a given set of specifications, one can use:

>>> ds.status(["default"])

For each Molecule, the number of steps in a geometry optimization procedure can be queried through calling:

>>> ds.counts()

function. Individual OptimizationRecords can be obtained using:

>>> ds.get_record(name="CCO-0", specification="default")

Statistics and Visualization

The trajectory of energy change during the course of geometry optimization can be plotted by adopting qcportal.models.OptimizationRecord.show_history() function.

Creating the Datasets

A new collection object for OptimizationDataset can be created using

>>> ds = ptl.collections.OptimizationDataset(name = "QM8-T", client=client)

Specific set of parameters for geometry optimization can be defined and added to the dataset as follows:

>>> spec = {'name': 'default',
>>>         'description': 'Geometric + Psi4/B3LYP-D3/Def2-SVP.',
>>>         'optimization_spec': {'program': 'geometric', 'keywords': None},
>>>         'qc_spec': {'driver': 'gradient',
>>>         'method': 'b3lyp-d3',
>>>         'basis': 'def2-svp',
>>>         'keywords': None,
>>>         'program': 'psi4'}}

>>>  ds.add_specification(**spec)

>>>  ds.save()

Molecules can be added to the OptimizationDataset as new entries for optimization via:

ds.add_entry(name, molecule)

When adding multiple entries of molecules, saving the dataset onto the server should be postponed until after all molecules are added:

>>> for name, molecule in new_entries:
>>>     ds.add_entry(name, molecule, save=False)

>>> ds.save()

Computational Tasks

In order to run a geometry optimization calculation based on a particular set of parameters (the default set in this case), one can adopt the

>>> ds.compute(specification="default", tag="optional_tag")

function from OptimizationDataset class.

API

class qcportal.collections.OptimizationDataset(name: str, client: FractalClient = None, **kwargs)[source]
class DataModel(*, id: str = 'local', name: str, collection: str, provenance: Dict[str, str] = {}, tags: List[str] = [], tagline: str = None, description: str = None, group: str = 'default', visibility: bool = True, view_url_hdf5: str = None, view_url_plaintext: str = None, view_metadata: Dict[str, str] = None, view_available: bool = False, metadata: Dict[str, Any] = {}, records: Dict[str, qcportal.collections.optimization_dataset.OptEntry] = {}, history: Set[str] = {}, specs: Dict[str, qcportal.collections.optimization_dataset.OptEntrySpecification] = {})[source]
Parameters
  • id (str, Default: local)

  • name (str)

  • collection (str)

  • provenance (name=’provenance’ type=Mapping[str, str] required=False default={}, Default: {})

  • tags (List[str], Default: [])

  • tagline (str, Optional)

  • description (str, Optional)

  • group (str, Default: default)

  • visibility (bool, Default: True)

  • view_url_hdf5 (str, Optional)

  • view_url_plaintext (str, Optional)

  • view_metadata (name=’view_metadata’ type=Optional[Mapping[str, str]] required=False default=None, Optional)

  • view_available (bool, Default: False)

  • metadata (Dict[str, Any], Default: {})

  • records (OptEntry, Default: {})

  • history (Set[str], Default: set())

  • specs (OptEntrySpecification, Default: {})

class Config[source]
compare(other: Union[qcelemental.models.basemodels.ProtoModel, pydantic.main.BaseModel], **kwargs)bool

Compares the current object to the provided object recursively.

Parameters
  • other (Model) – The model to compare to.

  • **kwargs – Additional kwargs to pass to qcelemental.compare_recursive.

Returns

True if the objects match.

Return type

bool

classmethod construct(_fields_set: Optional[SetStr] = None, **values: Any)Model

Creates a new model setting __dict__ and __fields_set__ from trusted or pre-validated data. Default values are respected, but no other validation is performed.

copy(*, include: Union[AbstractSetIntStr, MappingIntStrAny] = None, exclude: Union[AbstractSetIntStr, MappingIntStrAny] = None, update: DictStrAny = None, deep: bool = False)Model

Duplicate a model, optionally choose which fields to include, exclude and change.

Parameters
  • include – fields to include in new model

  • exclude – fields to exclude from new model, as with values this takes precedence over include

  • update – values to change/add in the new model. Note: the data is not validated before creating the new model: you should trust this data

  • deep – set to True to make a deep copy of the model

Returns

new model instance

dict(**kwargs)Dict[str, Any]

Generate a dictionary representation of the model, optionally specifying which fields to include or exclude.

property fields
classmethod from_orm(obj: Any)Model
json(**kwargs)

Generate a JSON representation of the model, include and exclude arguments as per dict().

encoder is an optional function to supply as default to json.dumps(), other arguments as per json.dumps().

classmethod parse_file(path: Union[str, pathlib.Path], *, encoding: Optional[str] = None)qcelemental.models.basemodels.ProtoModel

Parses a file into a Model object.

Parameters
  • path (Union[str, Path]) – The path to the file.

  • encoding (str, optional) – The type of the files, available types are: {‘json’, ‘msgpack’, ‘pickle’}. Attempts to automatically infer the file type from the file extension if None.

Returns

The requested model from a serialized format.

Return type

Model

classmethod parse_obj(obj: Any)Model
classmethod parse_raw(data: Union[bytes, str], *, encoding: Optional[str] = None)qcelemental.models.basemodels.ProtoModel

Parses raw string or bytes into a Model object.

Parameters
  • data (Union[bytes, str]) – A serialized data blob to be deserialized into a Model.

  • encoding (str, optional) – The type of the serialized array, available types are: {‘json’, ‘json-ext’, ‘msgpack-ext’, ‘pickle’}

Returns

The requested model from a serialized format.

Return type

Model

classmethod schema(by_alias: bool = True, ref_template: unicode = '#/definitions/{model}')DictStrAny
classmethod schema_json(*, by_alias: bool = True, ref_template: unicode = '#/definitions/{model}', **dumps_kwargs: Any)unicode
serialize(encoding: str, *, include: Optional[Set[str]] = None, exclude: Optional[Set[str]] = None, exclude_unset: Optional[bool] = None, exclude_defaults: Optional[bool] = None, exclude_none: Optional[bool] = None)Union[bytes, str]

Generates a serialized representation of the model

Parameters
  • encoding (str) – The serialization type, available types are: {‘json’, ‘json-ext’, ‘msgpack-ext’}

  • include (Optional[Set[str]], optional) – Fields to be included in the serialization.

  • exclude (Optional[Set[str]], optional) – Fields to be excluded in the serialization.

  • exclude_unset (Optional[bool], optional) – If True, skips fields that have default values provided.

  • exclude_defaults (Optional[bool], optional) – If True, skips fields that have set or defaulted values equal to the default.

  • exclude_none (Optional[bool], optional) – If True, skips fields that have value None.

Returns

The serialized model.

Return type

Union[bytes, str]

to_string(pretty: bool = False)unicode
classmethod update_forward_refs(**localns: Any)None

Try to update ForwardRefs on fields based on this Model, globalns and localns.

classmethod validate(value: Any)Model
add_entry(name: str, initial_molecule: Molecule, additional_keywords: Optional[Dict[str, Any]] = None, attributes: Optional[Dict[str, Any]] = None, save: bool = True)None[source]
Parameters
  • name (str) – The name of the entry, will be used for the index

  • initial_molecule (Molecule) – The list of starting Molecules for the Optimization

  • additional_keywords (Dict[str, Any], optional) – Additional keywords to add to the optimization run

  • attributes (Dict[str, Any], optional) – Additional attributes and descriptions for the entry

  • save (bool, optional) – If true, saves the collection after adding the entry. If this is False be careful to call save after all entries are added, otherwise data pointers may be lost.

add_specification(name: str, optimization_spec: qcportal.models.common_models.OptimizationSpecification, qc_spec: qcportal.models.common_models.QCSpecification, description: Optional[str] = None, protocols: Optional[Dict[str, Any]] = None, overwrite=False)None[source]
Parameters
  • name (str) – The name of the specification

  • optimization_spec (OptimizationSpecification) – A full optimization specification for Optimization

  • qc_spec (QCSpecification) – A full quantum chemistry specification for Optimization

  • description (str, optional) – A short text description of the specification

  • protocols (Optional[Dict[str, Any]], optional) – Protocols for this specification.

  • overwrite (bool, optional) – Overwrite existing specification names

compute(specification: str, subset: Optional[Set[str]] = None, tag: Optional[str] = None, priority: Optional[str] = None)int

Computes a specification for all entries in the dataset.

Parameters
  • specification (str) – The specification name.

  • subset (Set[str], optional) – Computes only a subset of the dataset.

  • tag (Optional[str], optional) – The queue tag to use when submitting compute requests.

  • priority (Optional[str], optional) – The priority of the jobs low, medium, or high.

Returns

The number of submitted computations

Return type

int

counts(entries: Optional[Union[List[str], str]] = None, specs: Optional[Union[List[str], str]] = None)pandas.core.frame.DataFrame[source]

Counts the number of optimization or gradient evaluations associated with the Optimizations.

Parameters
  • entries (Union[str, List[str]]) – The entries to query for

  • specs (Optional[Union[str, List[str]]], optional) – The specifications to query for

  • count_gradients (bool, optional) – If True, counts the total number of gradient calls. Warning! This can be slow for large datasets.

Returns

The queried counts.

Return type

DataFrame

classmethod from_json(data: Dict[str, Any], client: FractalClient = None)Collection

Creates a new class from a JSON blob

Parameters
  • data (Dict[str, Any]) – The JSON blob to create a new class from.

  • client (FractalClient, optional) – A FractalClient connected to a server

Returns

A constructed collection.

Return type

Collection

classmethod from_server(client: FractalClient, name: str)Collection

Creates a new class from a server

Parameters
  • client (FractalClient) – A FractalClient connected to a server

  • name (str) – The name of the collection to pull from.

Returns

A constructed collection.

Return type

Collection

get_entry(name: str)Any

Obtains a record from the Dataset

Parameters

name (str) – The record name to pull from.

Returns

The requested record

Return type

Record

get_record(name: str, specification: str)Any

Pulls an individual computational record of the requested name and column.

Parameters
  • name (str) – The index name to pull the record of.

  • specification (str) – The name of specification to pull the record of.

Returns

The requested Record

Return type

Any

get_specification(name: str)Any
Parameters

name (str) – The name of the specification

Returns

The requested specification.

Return type

Specification

list_specifications(description=True)Union[List[str], pandas.core.frame.DataFrame]

Lists all available specifications

Parameters

description (bool, optional) – If True returns a DataFrame with Description

Returns

A list of known specification names.

Return type

Union[List[str], ‘DataFrame’]

query(specification: str, force: bool = False)pandas.core.series.Series

Queries a given specification from the server

Parameters
  • specification (str) – The specification name to query

  • force (bool, optional) – Force a fresh query if the specification already exists.

Returns

Records collected from the server

Return type

pd.Series

save(client: Optional[FractalClient] = None)ObjectId

Uploads the overall structure of the Collection (indices, options, new molecules, etc) to the server.

Parameters

client (FractalClient, optional) – A FractalClient connected to a server to upload to

Returns

The ObjectId of the saved collection.

Return type

ObjectId

status(specs: Optional[Union[List[str], str]] = None, collapse: bool = True, status: Optional[str] = None, detail: bool = False)pandas.core.frame.DataFrame

Returns the status of all current specifications.

Parameters
  • collapse (bool, optional) – Collapse the status into summaries per specification or not.

  • status (Optional[str], optional) – If not None, only returns results that match the provided status.

  • detail (bool, optional) – Shows a detailed description of the current status of incomplete jobs.

Returns

A DataFrame of all known statuses

Return type

DataFrame

to_json(filename: Optional[str] = None)

If a filename is provided, dumps the file to disk. Otherwise returns a copy of the current data.

Parameters

filename (str, Optional, Default: None) – The filename to drop the data to.

Returns

ret – A JSON representation of the Collection

Return type

dict