Optimization Dataset

The OptimizationDataset collection represents geometry optimizations performed on a series of Molecules. OptimizationDataset use specifications to manage parameters of the geometry optimizer and underlying gradient calculation.

Existing OptimizationDataset can be listed with FractalClient.list_collections("OptimizationDataset") and obtained with FractalClient.get_collection("OptimizationDataset", name).

Querying

List specifications:

ds.list_specifications()

Show status of calculations for a given specification:

ds.status(["default"])

The number of geometry steps for each molecule can be shown:

ds.counts()

Individual OptimizationRecords can be extracted:

ds.get_record(name="CCO-0", specification="default")

Creating

Create a new collection:

ds = ptl.collections.OptimizationDataset(name = "QM8-T", client=client)

Provide a specification:

spec = {'name': 'default',
        'description': 'Geometric + Psi4/B3LYP-D3/Def2-SVP.',
        'optimization_spec': {'program': 'geometric', 'keywords': None},
        'qc_spec': {'driver': 'gradient',
        'method': 'b3lyp-d3',
        'basis': 'def2-svp',
        'keywords': None,
        'program': 'psi4'}}
 ds.add_specification(**spec)
 ds.save()

Add molecules to optimize:

ds.add_entry(name, molecule)

If adding molecules in batches, you may wish to defer saving the dataset to the server until all molecules are added:

for name, molecule in new_entries:
    ds.add_entry(name, molecule, save=False)
ds.save()

Computing

ds.compute(specification="default", tag="optional_tag")

API

class qcportal.collections.OptimizationDataset(name: str, client: FractalClient = None, **kwargs)[source]
class DataModel[source]
Parameters
  • id (str, Default: local)

  • name (str)

  • collection (str)

  • provenance (name=’provenance’ type=Mapping[str, str] required=False default={}, Default: {})

  • tags (List[str], Default: [])

  • tagline (str, Optional)

  • description (str, Optional)

  • group (str, Default: default)

  • visibility (bool, Default: True)

  • view_url_hdf5 (str, Optional)

  • view_url_plaintext (str, Optional)

  • view_metadata (name=’view_metadata’ type=Optional[Mapping[str, str]] required=False default=None, Optional)

  • view_available (bool, Default: False)

  • metadata (Dict[str, Any], Default: {})

  • records (OptEntry, Default: {})

  • history (Set[str], Default: set())

  • specs (OptEntrySpecification, Default: {})

compare(other: Union[ProtoModel, pydantic.main.BaseModel], **kwargs) → bool

Compares the current object to the provided object recursively.

Parameters
  • other (Model) – The model to compare to.

  • **kwargs – Additional kwargs to pass to qcelemental.compare_recursive.

Returns

True if the objects match.

Return type

bool

classmethod construct(_fields_set: Optional[SetStr] = None, **values: Any) → Model

Creates a new model setting __dict__ and __fields_set__ from trusted or pre-validated data. Default values are respected, but no other validation is performed.

copy

Duplicate a model, optionally choose which fields to include, exclude and change.

Parameters
  • include – fields to include in new model

  • exclude – fields to exclude from new model, as with values this takes precedence over include

  • update – values to change/add in the new model. Note: the data is not validated before creating the new model: you should trust this data

  • deep – set to True to make a deep copy of the model

Returns

new model instance

dict(**kwargs) → Dict[str, Any]

Generate a dictionary representation of the model, optionally specifying which fields to include or exclude.

json

Generate a JSON representation of the model, include and exclude arguments as per dict().

encoder is an optional function to supply as default to json.dumps(), other arguments as per json.dumps().

classmethod parse_file(path: Union[str, pathlib.Path], *, encoding: str = None) → qcelemental.models.basemodels.ProtoModel

Parses a file into a Model object.

Parameters
  • path (Union[str, Path]) – The path to the file.

  • encoding (str, optional) – The type of the files, available types are: {‘json’, ‘msgpack’, ‘pickle’}. Attempts to automatically infer the file type from the file extension if None.

Returns

The requested model from a serialized format.

Return type

Model

classmethod parse_raw(data: Union[bytes, str], *, encoding: str = None) → qcelemental.models.basemodels.ProtoModel

Parses raw string or bytes into a Model object.

Parameters
  • data (Union[bytes, str]) – A serialized data blob to be deserialized into a Model.

  • encoding (str, optional) – The type of the serialized array, available types are: {‘json’, ‘json-ext’, ‘msgpack-ext’, ‘pickle’}

Returns

The requested model from a serialized format.

Return type

Model

serialize(encoding: str, *, include: Optional[Set[str]] = None, exclude: Optional[Set[str]] = None, exclude_unset: bool = False) → Union[bytes, str]

Generates a serialized representation of the model

Parameters
  • encoding (str) – The serialization type, available types are: {‘json’, ‘json-ext’, ‘msgpack-ext’}

  • include (Optional[Set[str]], optional) – Fields to be included in the serialization.

  • exclude (Optional[Set[str]], optional) – Fields to be excluded in the serialization.

  • exclude_unset (bool, optional) – If True, skips fields that have default values provided.

Returns

The serialized model.

Return type

Union[bytes, str]

classmethod update_forward_refs(**localns: Any) → None

Try to update ForwardRefs on fields based on this Model, globalns and localns.

add_entry(name: str, initial_molecule: Molecule, additional_keywords: Optional[Dict[str, Any]] = None, attributes: Optional[Dict[str, Any]] = None, save: bool = True) → None[source]
Parameters
  • name (str) – The name of the entry, will be used for the index

  • initial_molecule (Molecule) – The list of starting Molecules for the Optimization

  • additional_keywords (Dict[str, Any], optional) – Additional keywords to add to the optimization run

  • attributes (Dict[str, Any], optional) – Additional attributes and descriptions for the entry

  • save (bool, optional) – If true, saves the collection after adding the entry. If this is False be careful to call save after all entries are added, otherwise data pointers may be lost.

add_specification(name: str, optimization_spec: qcportal.models.common_models.OptimizationSpecification, qc_spec: qcportal.models.common_models.QCSpecification, description: Optional[str] = None, protocols: Optional[Dict[str, Any]] = None, overwrite=False) → None[source]
Parameters
  • name (str) – The name of the specification

  • optimization_spec (OptimizationSpecification) – A full optimization specification for Optimization

  • qc_spec (QCSpecification) – A full quantum chemistry specification for Optimization

  • description (str, optional) – A short text description of the specification

  • protocols (Optional[Dict[str, Any]], optional) – Protocols for this specification.

  • overwrite (bool, optional) – Overwrite existing specification names

compute(specification: str, subset: Set[str] = None, tag: Optional[str] = None, priority: Optional[str] = None) → int

Computes a specification for all entries in the dataset.

Parameters
  • specification (str) – The specification name.

  • subset (Set[str], optional) – Computes only a subset of the dataset.

  • tag (Optional[str], optional) – The queue tag to use when submitting compute requests.

  • priority (Optional[str], optional) – The priority of the jobs low, medium, or high.

Returns

The number of submitted computations

Return type

int

counts(entries: Union[List[str], str, None] = None, specs: Union[List[str], str, None] = None) → pandas.core.frame.DataFrame[source]

Counts the number of optimization or gradient evaluations associated with the Optimizations.

Parameters
  • entries (Union[str, List[str]]) – The entries to query for

  • specs (Optional[Union[str, List[str]]], optional) – The specifications to query for

  • count_gradients (bool, optional) – If True, counts the total number of gradient calls. Warning! This can be slow for large datasets.

Returns

The queried counts.

Return type

DataFrame

classmethod from_json(data: Dict[str, Any], client: FractalClient = None) → Collection

Creates a new class from a JSON blob

Parameters
  • data (Dict[str, Any]) – The JSON blob to create a new class from.

  • client (FractalClient, optional) – A FractalClient connected to a server

Returns

A constructed collection.

Return type

Collection

classmethod from_server(client: FractalClient, name: str) → Collection

Creates a new class from a server

Parameters
  • client (FractalClient) – A FractalClient connected to a server

  • name (str) – The name of the collection to pull from.

Returns

A constructed collection.

Return type

Collection

get_entry(name: str) → Any

Obtains a record from the Dataset

Parameters

name (str) – The record name to pull from.

Returns

The requested record

Return type

Record

get_record(name: str, specification: str) → Any

Pulls an individual computational record of the requested name and column.

Parameters
  • name (str) – The index name to pull the record of.

  • specification (str) – The name of specification to pull the record of.

Returns

The requested Record

Return type

Any

get_specification(name: str) → Any
Parameters

name (str) – The name of the specification

Returns

The requested specification.

Return type

Specification

list_specifications(description=True) → Union[List[str], pandas.core.frame.DataFrame]

Lists all available specifications

Parameters

description (bool, optional) – If True returns a DataFrame with Description

Returns

A list of known specification names.

Return type

Union[List[str], ‘DataFrame’]

query(specification: str, force: bool = False) → str

Queries a given specification from the server

Parameters
  • specification (str) – The specification name to query

  • force (bool, optional) – Force a fresh query if the specification already exists.

save(client: Optional[FractalClient] = None) → ObjectId

Uploads the overall structure of the Collection (indices, options, new molecules, etc) to the server.

Parameters

client (FractalClient, optional) – A FractalClient connected to a server to upload to

Returns

The ObjectId of the saved collection.

Return type

ObjectId

status(specs: Union[str, List[str]] = None, collapse: bool = True, status: Optional[str] = None, detail: bool = False) → pandas.core.frame.DataFrame

Returns the status of all current specifications.

Parameters
  • collapse (bool, optional) – Collapse the status into summaries per specification or not.

  • status (Optional[str], optional) – If not None, only returns results that match the provided status.

  • detail (bool, optional) – Shows a detailed description of the current status of incomplete jobs.

Returns

A DataFrame of all known statuses

Return type

DataFrame

to_json(filename: Optional[str] = None)

If a filename is provided, dumps the file to disk. Otherwise returns a copy of the current data.

Parameters

filename (str, Optional, Default: None) – The filename to drop the data to.

Returns

ret – A JSON representation of the Collection

Return type

dict