from_string

qcelemental.molparse.from_string(molstr, dtype=None, *, name=None, fix_com=None, fix_orientation=None, fix_symmetry=None, return_processed=False, enable_qm=True, enable_efp=True, missing_enabled_return_qm='none', missing_enabled_return_efp='none', verbose=1) → Union[Dict, Tuple[Dict, Dict]][source]

Construct a molecule dictionary from any recognized string format.

Parameters
  • molstr (str) – Multiline string specification of molecule in a recognized format.

  • dtype ({'xyz', 'xyz+', 'psi4', 'psi4+'}, optional) – Molecule format name; see below for details.

  • return_processed (bool, optional) – Additionally return intermediate dictionary.

  • enable_qm (bool, optional) – Consider quantum mechanical domain in processing the string constants into the returned molrec.

  • enable_efp (bool, optional) – Consider effective fragment potential domain in processing the string contents into the returned molrec. Only relevant if dtype supports EFP.

  • missing_enabled_return_qm ({'minimal', 'none', 'error'}) – If enable_qm=True, what to do if it has no atoms/fragments? Respectively, return a fully valid but empty molrec, return empty dictionary, or throw error.

  • missing_enabled_return_efp ({'minimal', 'none', 'error'}) – If enable_efp=True, what to do if it has no atoms/fragments? Respectively, return a fully valid but empty molrec, return empty dictionary, or throw error.

  • name (str, optional) – Override molstr information for label for molecule; should be valid Python identifier. One of a very limited number of fields (three others follow) for trumping molstr. Provided for convenience, since the alternative would be collect the resulting molrec (discarding the Mol if called from class), editing it, then remaking the Mol.

  • fix_com (bool, optional) – Override molstr information for whether translation of geom is allowed or disallowed.

  • fix_orientation (bool, optional) – Override molstr information for whether rotation of geom is allowed or disallowed.

  • fix_symmetry (str, optional) – Override molstr information for maximal point group symmetry which geometry should be treated.

Returns

  • molrec (dict) – Molecule dictionary spec. See from_arrays().

  • molinit (dict, optional) – Intermediate “molrec”-like dictionary containing molstr info after parsing by this function but before the validation and defaulting of from_arrays that returns the proper molrec. Only provided if return_processed is True.

Raises

qcelemental.MoleculeFormatError – After processing of molstr, only an empty string should remain. Anything left is a syntax error.

Notes

Several formats are interpretable

xyz - Strict XYZ format
-----------------------

    String Layout
    -------------
    <number of atoms>
    comment line
    <element_symbol or atomic_number> <x> <y> <z>
    ...
    <element_symbol or atomic_number> <x> <y> <z>

    QM Domain
    ---------
    Specifiable: geom, elem/elez (element identity)
    Inaccessible: mass, real (vs. ghost), elbl (user label), name, units (assumed [A]),
                  input_units_to_au, fix_com/orientation/symmetry, fragmentation,
                  molecular_charge, molecular_multiplicity

    Notes
    -----
    <number of atoms> is pattern-matched but ignored.

xyz+ - Enhanced XYZ format
--------------------------

    String Layout
    -------------
    <number of atoms> [<bohr|au|ang>]
    [<molecular_charge> <molecular_multiplicity>] comment line
    <psi4_nucleus_spec> <x> <y> <z>
    ...
    <psi4_nucleus_spec> <x> <y> <z>

    QM Domain
    ---------
    Specifiable: geom, elem/elez (element identity), mass, real (vs. ghost), elbl (user label),
                 units (defaults [A]), molecular_charge, molecular_multiplicity
    Inaccessible: name, input_units_to_au, fix_com/orientation/symmetry, fragmentation

    Notes
    -----
    <number of atoms> is pattern-matched but ignored.

psi4 - Psi4 molecule {...} format
---------------------------------

    QM Domain
    ---------
    Specifiable: geom, elem/elez (element identity), mass, real (vs. ghost), elbl (user label),
                 units (defaults [A]), fix_com/orientation/symmetry, fragment_separators,
                 fragment_charges, fragment_multiplicities, molecular_charge, molecular_multiplicity
    Inaccessible: name, input_units_to_au

        PubChem
        -------
        pubchem : <cid|name|formula> [*]

        A string like the above searches the PubChem database and substitutes the below. Adding the wildcard
        searches for multiple matches and raises ChoicesError with matches for further consideration attached.

        Specifiable: geom, elem/elez (element identity), units (fixed [A]), molecular_charge,
                     molecular_multiplicity (fixed singlet), name

    EFP Domain
    ----------
    Specifiable: units, fix_com/orientation/symmetry, fragment_files, hint_types, geom_hints
    Inaccessible: anything atomic or fragment details -- geom, elem/elez (element identity),
                  mass, real (vs. ghost), elbl (user label), fragment_separators, fragment_charges,
                  fragment_multiplicities, molecular_charge, molecular_multiplicity

psi4+ - Psi4 non-Cartesian molecule {...} format
------------------------------------------------
Like `dtype=psi4` (although combination with EFP not tested) except
that instead of pure-Cartesian geometry, allow variables, zmatrix,
and un-fully-specified geometries. *Not* MolSSI standard, but we're
not dropping zmatrix yet. Note that in Psi4 internal coordinates
defined through a zmatrix have no bearing on geometry
optimization internals or constraints.