Optimization Dataset¶
The OptimizationDataset
collection
represents the results of geometry optimization calculations performed on
a series of Molecules
. The
OptimizationDataset
uses metadata specifications via
Optimization Specification
and
QCSpecification
classes to manage
parameters of the geometry optimizer and the underlying gradient
calculation, respectively.
The existing OptimizationDataset
collections can be listed or selectively returned through
FractalClient.list_collections("OptimizationDataset")
and FractalClient.get_collection("OptimizationDataset", name)
, respectively.
Querying the Data¶
All available optimization data specifications can be listed via
>>> ds.list_specifications()
function. In order to show the status of the optimization calculations for a given set of specifications, one can use:
>>> ds.status(["default"])
For each Molecule
, the number of
steps in a geometry optimization procedure can be queried through calling:
>>> ds.counts()
function. Individual OptimizationRecords
can be obtained using:
>>> ds.get_record(name="CCO-0", specification="default")
Statistics and Visualization¶
The trajectory of energy change during the course of geometry optimization
can be plotted by adopting qcportal.models.OptimizationRecord.show_history()
function.
Creating the Datasets¶
A new collection object for OptimizationDataset
can be created using
>>> ds = ptl.collections.OptimizationDataset(name = "QM8-T", client=client)
Specific set of parameters for geometry optimization can be defined and added to the dataset as follows:
>>> spec = {'name': 'default',
>>> 'description': 'Geometric + Psi4/B3LYP-D3/Def2-SVP.',
>>> 'optimization_spec': {'program': 'geometric', 'keywords': None},
>>> 'qc_spec': {'driver': 'gradient',
>>> 'method': 'b3lyp-d3',
>>> 'basis': 'def2-svp',
>>> 'keywords': None,
>>> 'program': 'psi4'}}
>>> ds.add_specification(**spec)
>>> ds.save()
Molecules
can be added to the
OptimizationDataset
as new entries for optimization via:
ds.add_entry(name, molecule)
When adding multiple entries of molecules, saving the dataset onto the server should be postponed until after all molecules are added:
>>> for name, molecule in new_entries:
>>> ds.add_entry(name, molecule, save=False)
>>> ds.save()
Computational Tasks¶
In order to run a geometry optimization calculation based on a particular set of parameters (the default set in this case), one can adopt the
>>> ds.compute(specification="default", tag="optional_tag")
function from OptimizationDataset
class.
API¶
-
class
qcportal.collections.
OptimizationDataset
(name: str, client: FractalClient = None, **kwargs)[source]¶ -
class
DataModel
(*, id: str = 'local', name: str, collection: str, provenance: Dict[str, str] = {}, tags: List[str] = [], tagline: str = None, description: str = None, group: str = 'default', visibility: bool = True, view_url_hdf5: str = None, view_url_plaintext: str = None, view_metadata: Dict[str, str] = None, view_available: bool = False, metadata: Dict[str, Any] = {}, records: Dict[str, qcportal.collections.optimization_dataset.OptEntry] = {}, history: Set[str] = {}, specs: Dict[str, qcportal.collections.optimization_dataset.OptEntrySpecification] = {})[source]¶ - Parameters
id (str, Default: local)
name (str)
collection (str)
provenance (name=’provenance’ type=Mapping[str, str] required=False default={}, Default: {})
tags (List[str], Default: [])
tagline (str, Optional)
description (str, Optional)
group (str, Default: default)
visibility (bool, Default: True)
view_url_hdf5 (str, Optional)
view_url_plaintext (str, Optional)
view_metadata (name=’view_metadata’ type=Optional[Mapping[str, str]] required=False default=None, Optional)
view_available (bool, Default: False)
metadata (Dict[str, Any], Default: {})
records (
OptEntry
, Default: {})history (Set[str], Default: set())
specs (
OptEntrySpecification
, Default: {})
-
compare
(other: Union[qcelemental.models.basemodels.ProtoModel, pydantic.main.BaseModel], **kwargs) → bool¶ Compares the current object to the provided object recursively.
- Parameters
other (Model) – The model to compare to.
**kwargs – Additional kwargs to pass to
qcelemental.compare_recursive
.
- Returns
True if the objects match.
- Return type
bool
-
classmethod
construct
(_fields_set: Optional[SetStr] = None, **values: Any) → Model¶ Creates a new model setting __dict__ and __fields_set__ from trusted or pre-validated data. Default values are respected, but no other validation is performed.
-
copy
(*, include: Union[AbstractSetIntStr, MappingIntStrAny] = None, exclude: Union[AbstractSetIntStr, MappingIntStrAny] = None, update: DictStrAny = None, deep: bool = False) → Model¶ Duplicate a model, optionally choose which fields to include, exclude and change.
- Parameters
include – fields to include in new model
exclude – fields to exclude from new model, as with values this takes precedence over include
update – values to change/add in the new model. Note: the data is not validated before creating the new model: you should trust this data
deep – set to True to make a deep copy of the model
- Returns
new model instance
-
dict
(**kwargs) → Dict[str, Any]¶ Generate a dictionary representation of the model, optionally specifying which fields to include or exclude.
-
property
fields
¶
-
classmethod
from_orm
(obj: Any) → Model¶
-
json
(**kwargs)¶ Generate a JSON representation of the model, include and exclude arguments as per dict().
encoder is an optional function to supply as default to json.dumps(), other arguments as per json.dumps().
-
classmethod
parse_file
(path: Union[str, pathlib.Path], *, encoding: Optional[str] = None) → qcelemental.models.basemodels.ProtoModel¶ Parses a file into a Model object.
- Parameters
path (Union[str, Path]) – The path to the file.
encoding (str, optional) – The type of the files, available types are: {‘json’, ‘msgpack’, ‘pickle’}. Attempts to automatically infer the file type from the file extension if None.
- Returns
The requested model from a serialized format.
- Return type
Model
-
classmethod
parse_obj
(obj: Any) → Model¶
-
classmethod
parse_raw
(data: Union[bytes, str], *, encoding: Optional[str] = None) → qcelemental.models.basemodels.ProtoModel¶ Parses raw string or bytes into a Model object.
- Parameters
data (Union[bytes, str]) – A serialized data blob to be deserialized into a Model.
encoding (str, optional) – The type of the serialized array, available types are: {‘json’, ‘json-ext’, ‘msgpack-ext’, ‘pickle’}
- Returns
The requested model from a serialized format.
- Return type
Model
-
classmethod
schema
(by_alias: bool = True, ref_template: unicode = '#/definitions/{model}') → DictStrAny¶
-
classmethod
schema_json
(*, by_alias: bool = True, ref_template: unicode = '#/definitions/{model}', **dumps_kwargs: Any) → unicode¶
-
serialize
(encoding: str, *, include: Optional[Set[str]] = None, exclude: Optional[Set[str]] = None, exclude_unset: Optional[bool] = None, exclude_defaults: Optional[bool] = None, exclude_none: Optional[bool] = None) → Union[bytes, str]¶ Generates a serialized representation of the model
- Parameters
encoding (str) – The serialization type, available types are: {‘json’, ‘json-ext’, ‘msgpack-ext’}
include (Optional[Set[str]], optional) – Fields to be included in the serialization.
exclude (Optional[Set[str]], optional) – Fields to be excluded in the serialization.
exclude_unset (Optional[bool], optional) – If True, skips fields that have default values provided.
exclude_defaults (Optional[bool], optional) – If True, skips fields that have set or defaulted values equal to the default.
exclude_none (Optional[bool], optional) – If True, skips fields that have value
None
.
- Returns
The serialized model.
- Return type
Union[bytes, str]
-
to_string
(pretty: bool = False) → unicode¶
-
classmethod
update_forward_refs
(**localns: Any) → None¶ Try to update ForwardRefs on fields based on this Model, globalns and localns.
-
classmethod
validate
(value: Any) → Model¶
-
add_entry
(name: str, initial_molecule: Molecule, additional_keywords: Optional[Dict[str, Any]] = None, attributes: Optional[Dict[str, Any]] = None, save: bool = True) → None[source]¶ - Parameters
name (str) – The name of the entry, will be used for the index
initial_molecule (Molecule) – The list of starting Molecules for the Optimization
additional_keywords (Dict[str, Any], optional) – Additional keywords to add to the optimization run
attributes (Dict[str, Any], optional) – Additional attributes and descriptions for the entry
save (bool, optional) – If true, saves the collection after adding the entry. If this is False be careful to call save after all entries are added, otherwise data pointers may be lost.
-
add_specification
(name: str, optimization_spec: qcportal.models.common_models.OptimizationSpecification, qc_spec: qcportal.models.common_models.QCSpecification, description: Optional[str] = None, protocols: Optional[Dict[str, Any]] = None, overwrite=False) → None[source]¶ - Parameters
name (str) – The name of the specification
optimization_spec (OptimizationSpecification) – A full optimization specification for Optimization
qc_spec (QCSpecification) – A full quantum chemistry specification for Optimization
description (str, optional) – A short text description of the specification
protocols (Optional[Dict[str, Any]], optional) – Protocols for this specification.
overwrite (bool, optional) – Overwrite existing specification names
-
compute
(specification: str, subset: Optional[Set[str]] = None, tag: Optional[str] = None, priority: Optional[str] = None) → int¶ Computes a specification for all entries in the dataset.
- Parameters
specification (str) – The specification name.
subset (Set[str], optional) – Computes only a subset of the dataset.
tag (Optional[str], optional) – The queue tag to use when submitting compute requests.
priority (Optional[str], optional) – The priority of the jobs low, medium, or high.
- Returns
The number of submitted computations
- Return type
int
-
counts
(entries: Optional[Union[List[str], str]] = None, specs: Optional[Union[List[str], str]] = None) → pandas.core.frame.DataFrame[source]¶ Counts the number of optimization or gradient evaluations associated with the Optimizations.
- Parameters
entries (Union[str, List[str]]) – The entries to query for
specs (Optional[Union[str, List[str]]], optional) – The specifications to query for
count_gradients (bool, optional) – If True, counts the total number of gradient calls. Warning! This can be slow for large datasets.
- Returns
The queried counts.
- Return type
DataFrame
-
classmethod
from_json
(data: Dict[str, Any], client: FractalClient = None) → Collection¶ Creates a new class from a JSON blob
- Parameters
data (Dict[str, Any]) – The JSON blob to create a new class from.
client (FractalClient, optional) – A FractalClient connected to a server
- Returns
A constructed collection.
- Return type
Collection
-
classmethod
from_server
(client: FractalClient, name: str) → Collection¶ Creates a new class from a server
- Parameters
client (FractalClient) – A FractalClient connected to a server
name (str) – The name of the collection to pull from.
- Returns
A constructed collection.
- Return type
Collection
-
get_entry
(name: str) → Any¶ Obtains a record from the Dataset
- Parameters
name (str) – The record name to pull from.
- Returns
The requested record
- Return type
Record
-
get_record
(name: str, specification: str) → Any¶ Pulls an individual computational record of the requested name and column.
- Parameters
name (str) – The index name to pull the record of.
specification (str) – The name of specification to pull the record of.
- Returns
The requested Record
- Return type
Any
-
get_specification
(name: str) → Any¶ - Parameters
name (str) – The name of the specification
- Returns
The requested specification.
- Return type
Specification
-
list_specifications
(description=True) → Union[List[str], pandas.core.frame.DataFrame]¶ Lists all available specifications
- Parameters
description (bool, optional) – If True returns a DataFrame with Description
- Returns
A list of known specification names.
- Return type
Union[List[str], ‘DataFrame’]
-
query
(specification: str, force: bool = False) → pandas.core.series.Series¶ Queries a given specification from the server
- Parameters
specification (str) – The specification name to query
force (bool, optional) – Force a fresh query if the specification already exists.
- Returns
Records collected from the server
- Return type
pd.Series
-
save
(client: Optional[FractalClient] = None) → ObjectId¶ Uploads the overall structure of the Collection (indices, options, new molecules, etc) to the server.
- Parameters
client (FractalClient, optional) – A FractalClient connected to a server to upload to
- Returns
The ObjectId of the saved collection.
- Return type
ObjectId
-
status
(specs: Optional[Union[List[str], str]] = None, collapse: bool = True, status: Optional[str] = None, detail: bool = False) → pandas.core.frame.DataFrame¶ Returns the status of all current specifications.
- Parameters
collapse (bool, optional) – Collapse the status into summaries per specification or not.
status (Optional[str], optional) – If not None, only returns results that match the provided status.
detail (bool, optional) – Shows a detailed description of the current status of incomplete jobs.
- Returns
A DataFrame of all known statuses
- Return type
DataFrame
-
to_json
(filename: Optional[str] = None)¶ If a filename is provided, dumps the file to disk. Otherwise returns a copy of the current data.
- Parameters
filename (str, Optional, Default: None) – The filename to drop the data to.
- Returns
ret – A JSON representation of the Collection
- Return type
dict
-
class