Molecule

A Python implementation of the MolSSI QCSchema Molecule object. A “Molecule” many definitions of Molecule depending on the domain; this particular Molecule is an immutable 3D Cartesian representation with support for quantum chemistry constructs.

Creation

A Molecule can be created using the normal kwargs fashion as shown below:

>>> mol = qcel.models.Molecule(**{"symbols": ["He"], "geometry": [0, 0, 0]})

In addition, there is the from_data attribute to create a molecule from standard strings:

>>> mol = qcel.models.Molecule.from_data("He 0 0 0")
>>> mol
<    Geometry (in Angstrom), charge = 0.0, multiplicity = 1:

       Center              X                  Y                   Z
    ------------   -----------------  -----------------  -----------------
    He                0.000000000000     0.000000000000     0.000000000000
>

Identifiers

A number of unique identifiers are automatically created for each molecule. Additional implementation such as InChI and SMILES are actively being looked into.

Molecular Hash

A molecule hash is automatically created to allow each molecule to be uniquely identified. The following keys are used to generate the hash:

  • symbols

  • masses (1.e-6 tolerance)

  • molecular_charge (1.e-4 tolerance)

  • molecular_multiplicity

  • real

  • geometry (1.e-8 tolerance)

  • fragments

  • fragment_charges (1.e-4 tolerance)

  • fragment_multiplicities

  • connectivity

Hashes can be acquired from any molecule object and a FractalServer automatically generates canonical hashes when a molecule is added to the database.

>>> mol = qcel.models.Molecule(**{"symbols": ["He", "He"], "geometry": [0, 0, -3, 0, 0, 3]})
>>> mol.get_hash()
'84872f975d19aafa62b188b40fbadaf26a3b1f84'

Molecular Formula

The molecular formula is also available sorted in alphabetical order with title case symbol names. Any symbol with a count of one does not have a number associated with it.

>>> mol.get_molecular_formula()
    'He2'

Fragments

A Molecule with fragments can be created either using the -- separators in the from_data function or by passing explicit fragments in the Molecule constructor:

>>> mol = qcel.models.Molecule.from_data(
>>>       """
>>>       Ne 0.000000 0.000000 0.000000
>>>       --
>>>       Ne 3.100000 0.000000 0.000000
>>>       """)

>>> mol = qcel.models.Molecule(
>>>       geometry=[0, 0, 0, 3.1, 0, 0],
>>>       symbols=["Ne", "Ne"],
>>>       fragments=[[0], [1]]
>>>       )

Fragments from a molecule containing fragment information can be aquired by:

>>> mol.get_fragment(0)
<    Geometry (in Angstrom), charge = 0.0, multiplicity = 1:

       Center              X                  Y                   Z
    ------------   -----------------  -----------------  -----------------
    Ne                0.000000000000     0.000000000000     0.000000000000
>

Obtaining fragments with ghost atoms is also supported:

>>> mol.get_fragment(0, 1)
<    Geometry (in Angstrom), charge = 0.0, multiplicity = 1:

       Center              X                  Y                   Z
    ------------   -----------------  -----------------  -----------------
    Ne                0.000000000000     0.000000000000     0.000000000000
    Ne      (Gh)      3.100000000572     0.000000000000     0.000000000000
>

Fields

class qcelemental.models.Molecule(orient: bool = False, validate: Optional[bool] = None, **kwargs: Any)[source]

A QCSchema representation of a Molecule. This model contains data for symbols, geometry, connectivity, charges, fragmentation, etc while also supporting a wide array of I/O and manipulation capabilities.

Molecule objects geometry, masses, and charges are truncated to 8, 6, and 4 decimal places respectively to assist with duplicate detection.

Parameters
  • schema_name (ConstrainedStrValue, Default: qcschema_molecule) – The QCSchema specification this model conforms to. Explicitly fixed as qcschema_molecule.

  • schema_version (int, Default: 2) – The version number of schema_name that this Molecule model conforms to.

  • validated (bool, Default: False) – A boolean indicator (for speed purposes) that the input Molecule data has been previously checked for schema (data layout and type) and physics (e.g., non-overlapping atoms, feasible multiplicity) compliance. This should be False in most cases. A True setting should only ever be set by the constructor for this class itself or other trusted sources such as a Fractal Server or previously serialized Molecules.

  • symbols (Array) – An ordered (nat,) array-like object of atomic elemental symbols of shape (nat,). The index of this attribute sets atomic order for all other per-atom setting like real and the first dimension of geometry. Ghost/Virtual atoms must have an entry in this array-like and are indicated by the matching the 0-indexed indices in real field.

  • geometry (Array) – An ordered (nat,3) array-like for XYZ atomic coordinates [a0]. Atom ordering is fixed; that is, a consumer who shuffles atoms must not reattach the input (pre-shuffling) molecule schema instance to any output (post-shuffling) per-atom results (e.g., gradient). Index of the first dimension matches the 0-indexed indices of all other per-atom settings like symbols and real. Can also accept array-likes which can be mapped to (nat,3) such as a 1-D list of length 3*nat, or the serialized version of the array in (3*nat,) shape; all forms will be reshaped to (nat,3) for this attribute.

  • name (str, Optional) – A common or human-readable name to assign to this molecule. Can be arbitrary.

  • identifiers (Identifiers, Optional) – An optional dictionary of additional identifiers by which this Molecule can be referenced, such as INCHI, canonical SMILES, etc. See the :class:Identifiers model for more details.

  • comment (str, Optional) – Additional comments for this Molecule. Intended for pure human/user consumption and clarity.

  • molecular_charge (float, Default: 0.0) – The net electrostatic charge of this Molecule.

  • molecular_multiplicity (int, Default: 1) – The total multiplicity of this Molecule.

  • masses (Array, Optional) – An ordered 1-D array-like object of atomic masses [u] of shape (nat,). Index order matches the 0-indexed indices of all other per-atom settings like symbols and real. If this is not provided, the mass of each atom is inferred from their most common isotope. If this is provided, it must be the same length as symbols but can accept None entries for standard masses to infer from the same index in the symbols field.

  • real (Array, Optional) – An ordered 1-D array-like object of shape (nat,) indicating if each atom is real (True) or ghost/virtual (False). Index matches the 0-indexed indices of all other per-atom settings like symbols and the first dimension of geometry. If this is not provided, all atoms are assumed to be real (True).If this is provided, the reality or ghostality of every atom must be specified.

  • atom_labels (Array, Optional) – Additional per-atom labels as a 1-D array-like of of strings of shape (nat,). Typical use is in model conversions, such as Elemental <-> Molpro and not typically something which should be user assigned. See the comments field for general human-consumable text to affix to the Molecule.

  • atomic_numbers (Array, Optional) – An optional ordered 1-D array-like object of atomic numbers of shape (nat,). Index matches the 0-indexed indices of all other per-atom settings like symbols and real. Values are inferred from the symbols list if not explicitly set.

  • mass_numbers (Array, Optional) – An optional ordered 1-D array-like object of atomic mass numbers of shape (nat). Index matches the 0-indexed indices of all other per-atom settings like symbols and real. Values are inferred from the most common isotopes of the symbols list if not explicitly set.

  • connectivity (List[Tuple[int, int, float]], Optional) – The connectivity information between each atom in the symbols array. Each entry in this list is a Tuple of (atom_index_A, atom_index_B, bond_order) where the atom_index matches the 0-indexed indices of all other per-atom settings like symbols and real.

  • fragments (List[Array], Optional) – An indication of which sets of atoms are fragments within the Molecule. This is a list of shape (nfr) of 1-D array-like objects of arbitrary length. Each entry in the list indicates a new fragment. The index of the list matches the 0-indexed indices of fragment_charges and fragment_multiplicities. The 1-D array-like objects are sets of atom indices indicating the atoms which compose the fragment. The atom indices match the 0-indexed indices of all other per-atom settings like symbols and real.

  • fragment_charges (List[float], Optional) – The total charge of each fragment in the fragments list of shape (nfr,). The index of this list matches the 0-index indices of fragment list. Will be filled in based on a set of rules if not provided (and fragments are specified).

  • fragment_multiplicities (List[int], Optional) – The multiplicity of each fragment in the fragments list of shape (nfr,). The index of this list matches the 0-index indices of fragment list. Will be filled in based on a set of rules if not provided (and fragments are specified).

  • fix_com (bool, Default: False) – An indicator which prevents pre-processing the Molecule object to translate the Center-of-Mass to (0,0,0) in euclidean coordinate space. Will result in a different geometry than the one provided if False.

  • fix_orientation (bool, Default: False) – An indicator which prevents pre-processes the Molecule object to orient via the inertia tensor.Will result in a different geometry than the one provided if False.

  • fix_symmetry (str, Optional) – Maximal point group symmetry which geometry should be treated. Lowercase.

  • provenance (Provenance, Default: {‘creator’: ‘QCElemental’, ‘version’: ‘v0.11.0+0.gbe1b2e7.dirty’, ‘routine’: ‘qcelemental.models.molecule’}) – The provenance information about how this Molecule (and its attributes) were generated, provided, and manipulated.

  • id (Any, Optional) – A unique identifier for this Molecule object. This field exists primarily for Databases (e.g. Fractal’s Server) to track and lookup this specific object and should virtually never need to be manually set.

  • extras (Dict[str, Any], Optional) – Extra information to associate with this Molecule.

API

class qcelemental.models.Molecule(orient: bool = False, validate: Optional[bool] = None, **kwargs: Any)[source]

A QCSchema representation of a Molecule. This model contains data for symbols, geometry, connectivity, charges, fragmentation, etc while also supporting a wide array of I/O and manipulation capabilities.

Molecule objects geometry, masses, and charges are truncated to 8, 6, and 4 decimal places respectively to assist with duplicate detection.

Parameters
  • schema_name (ConstrainedStrValue, Default: qcschema_molecule) – The QCSchema specification this model conforms to. Explicitly fixed as qcschema_molecule.

  • schema_version (int, Default: 2) – The version number of schema_name that this Molecule model conforms to.

  • validated (bool, Default: False) – A boolean indicator (for speed purposes) that the input Molecule data has been previously checked for schema (data layout and type) and physics (e.g., non-overlapping atoms, feasible multiplicity) compliance. This should be False in most cases. A True setting should only ever be set by the constructor for this class itself or other trusted sources such as a Fractal Server or previously serialized Molecules.

  • symbols (Array) – An ordered (nat,) array-like object of atomic elemental symbols of shape (nat,). The index of this attribute sets atomic order for all other per-atom setting like real and the first dimension of geometry. Ghost/Virtual atoms must have an entry in this array-like and are indicated by the matching the 0-indexed indices in real field.

  • geometry (Array) – An ordered (nat,3) array-like for XYZ atomic coordinates [a0]. Atom ordering is fixed; that is, a consumer who shuffles atoms must not reattach the input (pre-shuffling) molecule schema instance to any output (post-shuffling) per-atom results (e.g., gradient). Index of the first dimension matches the 0-indexed indices of all other per-atom settings like symbols and real. Can also accept array-likes which can be mapped to (nat,3) such as a 1-D list of length 3*nat, or the serialized version of the array in (3*nat,) shape; all forms will be reshaped to (nat,3) for this attribute.

  • name (str, Optional) – A common or human-readable name to assign to this molecule. Can be arbitrary.

  • identifiers (Identifiers, Optional) – An optional dictionary of additional identifiers by which this Molecule can be referenced, such as INCHI, canonical SMILES, etc. See the :class:Identifiers model for more details.

  • comment (str, Optional) – Additional comments for this Molecule. Intended for pure human/user consumption and clarity.

  • molecular_charge (float, Default: 0.0) – The net electrostatic charge of this Molecule.

  • molecular_multiplicity (int, Default: 1) – The total multiplicity of this Molecule.

  • masses (Array, Optional) – An ordered 1-D array-like object of atomic masses [u] of shape (nat,). Index order matches the 0-indexed indices of all other per-atom settings like symbols and real. If this is not provided, the mass of each atom is inferred from their most common isotope. If this is provided, it must be the same length as symbols but can accept None entries for standard masses to infer from the same index in the symbols field.

  • real (Array, Optional) – An ordered 1-D array-like object of shape (nat,) indicating if each atom is real (True) or ghost/virtual (False). Index matches the 0-indexed indices of all other per-atom settings like symbols and the first dimension of geometry. If this is not provided, all atoms are assumed to be real (True).If this is provided, the reality or ghostality of every atom must be specified.

  • atom_labels (Array, Optional) – Additional per-atom labels as a 1-D array-like of of strings of shape (nat,). Typical use is in model conversions, such as Elemental <-> Molpro and not typically something which should be user assigned. See the comments field for general human-consumable text to affix to the Molecule.

  • atomic_numbers (Array, Optional) – An optional ordered 1-D array-like object of atomic numbers of shape (nat,). Index matches the 0-indexed indices of all other per-atom settings like symbols and real. Values are inferred from the symbols list if not explicitly set.

  • mass_numbers (Array, Optional) – An optional ordered 1-D array-like object of atomic mass numbers of shape (nat). Index matches the 0-indexed indices of all other per-atom settings like symbols and real. Values are inferred from the most common isotopes of the symbols list if not explicitly set.

  • connectivity (List[Tuple[int, int, float]], Optional) – The connectivity information between each atom in the symbols array. Each entry in this list is a Tuple of (atom_index_A, atom_index_B, bond_order) where the atom_index matches the 0-indexed indices of all other per-atom settings like symbols and real.

  • fragments (List[Array], Optional) – An indication of which sets of atoms are fragments within the Molecule. This is a list of shape (nfr) of 1-D array-like objects of arbitrary length. Each entry in the list indicates a new fragment. The index of the list matches the 0-indexed indices of fragment_charges and fragment_multiplicities. The 1-D array-like objects are sets of atom indices indicating the atoms which compose the fragment. The atom indices match the 0-indexed indices of all other per-atom settings like symbols and real.

  • fragment_charges (List[float], Optional) – The total charge of each fragment in the fragments list of shape (nfr,). The index of this list matches the 0-index indices of fragment list. Will be filled in based on a set of rules if not provided (and fragments are specified).

  • fragment_multiplicities (List[int], Optional) – The multiplicity of each fragment in the fragments list of shape (nfr,). The index of this list matches the 0-index indices of fragment list. Will be filled in based on a set of rules if not provided (and fragments are specified).

  • fix_com (bool, Default: False) – An indicator which prevents pre-processing the Molecule object to translate the Center-of-Mass to (0,0,0) in euclidean coordinate space. Will result in a different geometry than the one provided if False.

  • fix_orientation (bool, Default: False) – An indicator which prevents pre-processes the Molecule object to orient via the inertia tensor.Will result in a different geometry than the one provided if False.

  • fix_symmetry (str, Optional) – Maximal point group symmetry which geometry should be treated. Lowercase.

  • provenance (Provenance, Default: {‘creator’: ‘QCElemental’, ‘version’: ‘v0.11.0+0.gbe1b2e7.dirty’, ‘routine’: ‘qcelemental.models.molecule’}) – The provenance information about how this Molecule (and its attributes) were generated, provided, and manipulated.

  • id (Any, Optional) – A unique identifier for this Molecule object. This field exists primarily for Databases (e.g. Fractal’s Server) to track and lookup this specific object and should virtually never need to be manually set.

  • extras (Dict[str, Any], Optional) – Extra information to associate with this Molecule.

align(ref_mol, *, do_plot=False, verbose=0, atoms_map=False, run_resorting=False, mols_align=False, run_to_completion=False, uno_cutoff=0.001, run_mirror=False)[source]

Finds shift, rotation, and atom reordering of concern_mol (self) that best aligns with ref_mol.

Wraps qcel.molutil.B787() for qcel.models.Molecule. Employs the Kabsch, Hungarian, and Uno algorithms to exhaustively locate the best alignment for non-oriented, non-ordered structures.

Parameters
  • concern_mol (qcel.models.Molecule) – Molecule of concern, to be shifted, rotated, and reordered into best coincidence with ref_mol.

  • ref_mol (qcel.models.Molecule) – Molecule to match.

  • atoms_map (bool, optional) – Whether atom1 of ref_mol corresponds to atom1 of concern_mol, etc. If true, specifying True can save much time.

  • mols_align (bool, optional) – Whether ref_mol and concern_mol have identical geometries by eye (barring orientation or atom mapping) and expected final RMSD = 0. If True, procedure is truncated when RMSD condition met, saving time.

  • do_plot (bool, optional) – Pops up a mpl plot showing before, after, and ref geometries.

  • run_to_completion (bool, optional) – Run reorderings to completion (past RMSD = 0) even if unnecessary because mols_align=True. Used to test worst-case timings.

  • run_resorting (bool, optional) – Run the resorting machinery even if unnecessary because atoms_map=True.

  • uno_cutoff (float, optional) – TODO

  • run_mirror (bool, optional) – Run alternate geometries potentially allowing best match to ref_mol from mirror image of concern_mol. Only run if system confirmed to be nonsuperimposable upon mirror reflection.

  • verbose (int, optional) – Print level.

Returns

Molecule is internal geometry of self optimally aligned and atom-ordered

to ref_mol. Presently all fragment information is discarded.

data[‘rmsd’] is RMSD [A] between ref_mol and the optimally aligned geometry computed. data[‘mill’] is a AlignmentMill with fields (shift, rotation, atommap, mirror) that prescribe the transformation from concern_mol and the optimally aligned geometry.

Return type

Molecule, data

compare(other)[source]

Checks if two molecules are identical. This is a molecular identity defined by scientific terms, and not programing terms, so it’s less rigorous than a programmatic equality or a memory equivalent is.

classmethod from_data(data: Union[str, Dict[str, Any], numpy.array, bytes], dtype: Optional[str] = None, *, orient: bool = False, validate: bool = None, **kwargs: Dict[str, Any]) → qcelemental.models.molecule.Molecule[source]

Constructs a molecule object from a data structure.

Parameters
  • data (Union[str, Dict[str, Any], np.array]) – Data to construct Molecule from

  • dtype (Optional[str], optional) – How to interpret the data, if not passed attempts to discover this based on input type.

  • orient (bool, optional) – Orientates the molecule to a standard frame or not.

  • validate (bool, optional) – Validates the molecule or not.

  • **kwargs (Dict[str, Any]) – Additional kwargs to pass to the constructors.

Returns

A constructed molecule class.

Return type

Molecule

classmethod from_file(filename: str, dtype: Optional[str] = None, *, orient: bool = False, **kwargs)[source]

Constructs a molecule object from a file.

Parameters
  • filename (str) – The filename to build

  • dtype (Optional[str], optional) – The type of file to interpret.

  • orient (bool, optional) – Orientates the molecule to a standard frame or not.

  • **kwargs – Any additional keywords to pass to the constructor

Returns

A constructed molecule class.

Return type

Molecule

get_fragment(real: Union[int, List], ghost: Union[int, List, None] = None, orient: bool = False, group_fragments: bool = True) → qcelemental.models.molecule.Molecule[source]

Get new Molecule with fragments preserved, dropped, or ghosted.

Parameters
  • real – Fragment index or list of indices (0-indexed) to be real atoms in new Molecule.

  • ghost – Fragment index or list of indices (0-indexed) to be ghost atoms (basis fns only) in new Molecule.

  • orient – Whether or not to align (inertial frame) and phase geometry upon new Molecule instantiation (according to _orient_molecule_internal)?

  • group_fragments – Whether or not to group real fragments at the start of the atom list and ghost fragments toward the back. Previous to v0.5, this was always effectively True. True is handy for finding duplicate (atom-order-independent) molecules by hash. False preserves fragment order (though collapsing gaps for absent fragments) like Psi4’s extract_subsets. False is handy for gradients where atom order of returned values matters.

Returns

New py::class:qcelemental.model.Molecule with self’s fragments present, ghosted, or absent.

Return type

mol

get_hash()[source]

Returns the hash of the molecule.

get_molecular_formula()[source]

Returns the molecular formula for a molecule. Atom symbols are sorted from A-Z.

Examples

>>> methane = qcelemental.models.Molecule('''
... H      0.5288      0.1610      0.9359
... C      0.0000      0.0000      0.0000
... H      0.2051      0.8240     -0.6786
... H      0.3345     -0.9314     -0.4496
... H     -1.0685     -0.0537      0.1921
... ''')
>>> methane.get_molecular_formula()
CH4
>>> hcl = qcelemental.models.Molecule('''
... H      0.0000      0.0000      0.0000
... Cl     0.0000      0.0000      1.2000
... ''')
>>> hcl.get_molecular_formula()
ClH
measure(measurements: Union[List[int], List[List[int]]], *, degrees: bool = True) → Union[float, List[float]][source]

Takes a measurement of the moleucle from the indicies provided.

Parameters
  • measurements (Union[List[int], List[List[int]]]) – Either a single list of indices or multiple. Return a distance, angle, or dihedral depending if 2, 3, or 4 indices is provided, respectively. Values are returned in Bohr (distance) or degree.

  • degrees (bool, optional) – Returns degrees by default, radians otherwise.

Returns

Either a value or list of the measured values.

Return type

Union[float, List[float]]

nelectrons(ifr: int = None) → int[source]

Number of electrons.

Parameters

ifr (int, optional) – If not None, only compute for the ifr-th (0-indexed) fragment.

Returns

Return type

Number of electrons in entire molecule or in fragment.

nuclear_repulsion_energy(ifr: int = None) → float[source]

Nuclear repulsion energy.

Parameters

ifr (int, optional) – If not None, only compute for the ifr-th (0-indexed) fragment.

Returns

Return type

Nuclear repulsion energy in entire molecule or in fragment.

orient_molecule()[source]

Centers the molecule and orients via inertia tensor before returning a new Molecule

pretty_print()[source]

Print the molecule in Angstroms. Same as print_out() only always in Angstroms. (method name in libmints is print_in_angstrom)

scramble(*, do_shift: bool = True, do_rotate=True, do_resort=True, deflection=1.0, do_mirror=False, do_plot=False, do_test=False, run_to_completion=False, run_resorting=False, verbose=0)[source]

Generate a Molecule with random or directed translation, rotation, and atom shuffling. Optionally, check that the aligner returns the opposite transformation.

Parameters
  • ref_mol (qcel.models.Molecule) – Molecule to perturb.

  • do_shift (bool or array-like, optional) – Whether to generate a random atom shift on interval [-3, 3) in each dimension (True) or leave at current origin. To shift by a specified vector, supply a 3-element list.

  • do_rotate (bool or array-like, optional) – Whether to generate a random 3D rotation according to algorithm of Arvo. To rotate by a specified matrix, supply a 9-element list of lists.

  • do_resort (bool or array-like, optional) – Whether to shuffle atoms (True) or leave 1st atom 1st, etc. (False). To specify shuffle, supply a nat-element list of indices.

  • deflection (float, optional) – If do_rotate, how random a rotation: 0.0 is no change, 0.1 is small perturbation, 1.0 is completely random.

  • do_mirror (bool, optional) – Whether to construct the mirror image structure by inverting y-axis.

  • do_plot (bool, optional) – Pops up a mpl plot showing before, after, and ref geometries.

  • do_test (bool, optional) – Additionally, run the aligner on the returned Molecule and check that opposite transformations obtained.

  • run_to_completion (bool, optional) – By construction, scrambled systems are fully alignable (final RMSD=0). Even so, True turns off the mechanism to stop when RMSD reaches zero and instead proceed to worst possible time.

  • run_resorting (bool, optional) – Even if atoms not shuffled, test the resorting machinery.

  • verbose (int, optional) – Print level.

Returns

Molecule is scrambled copy of ref_mol (self). data[‘rmsd’] is RMSD [A] between ref_mol and the scrambled geometry. data[‘mill’] is a AlignmentMill with fields (shift, rotation, atommap, mirror) that prescribe the transformation from ref_mol to the returned geometry.

Return type

Molecule, data

Raises

AssertionError – If do_test=True and aligner sanity check fails for any of the reverse transformations.

show(ngl_kwargs: Optional[Dict[str, Any]] = None) → nglview.NGLWidget[source]

Creates a 3D representation of a moleucle that can be manipulated in Jupyter Notebooks and exported as images (.png).

Parameters

ngl_kwargs (Optional[Dict[str, Any]], optional) – Addition nglview NGLWidget kwargs

Returns

A nglview view of the molecule

Return type

nglview.NGLWidget

to_file(filename: str, dtype: Optional[str] = None) → None[source]

Writes the Molecule to a file.

Parameters
  • filename (str) – The filename to write to

  • dtype (Optional[str], optional) – The type of file to write, attempts to infer dtype from the filename if not provided.

to_string(dtype: str, units: str = None, *, atom_format: str = None, ghost_format: str = None, width: int = 17, prec: int = 12, return_data: bool = False)[source]

Returns a string that can be used by a variety of programs.

Unclear if this will be removed or renamed to “to_psi4_string” in the future

Suggest psi4 –> psi4frag and psi4 route to to_string