How do I search for specific types of computations?#
This notebook introduces you to the basics of connecting to a QCArchive server and retrieving computation results using information like molecule, basis set, method, or other computation details.
You can retrieve results from QCArchive using the get_records
method if you know the ID of the computation you’d like to retrieve.
However, you can also query the database for computations having specific details using query
methods.
import qcportal as ptl
Create a client object and connect to the demo server#
The PortalClient
is how you interact with the server, including querying records and submitting computations.
The demo server allows for unauthenticated guest access, so no username/password is necessary to read from the server. However, you will need to log in to submit or modify computations.
# Guest access
client = ptl.PortalClient("https://qcademo.molssi.org")
WARNING: This client version is newer than the server version. This may work if the versions are close, but expect exceptions and errors if attempting things the server does not support. client version: 0.60.post54+g7fe949180, server version: 0.59
Connecting with username/password
If you have a username/password, you would include those in the client connection.
client = ptl.PortalClient("https://qcademo.molssi.org", username="YOUR_USERNAME", password="YOUR_PASSWORD")
⚠️Caution⚠️: Always handle credentials with care. Never commit sensitive information like usernames or passwords to public repositories.
Querying Records#
Use the `query_records method`` for general queries. This method allows you to search across all records in the database, regardless of the computation type. Please note that since query_records searches all record types, you can only query fields that are common to all records.
help(client.query_records)
Help on method query_records in module qcportal.client:
query_records(*, record_id: 'Optional[Union[int, Iterable[int]]]' = None, record_type: 'Optional[Union[str, Iterable[str]]]' = None, manager_name: 'Optional[Union[str, Iterable[str]]]' = None, status: 'Optional[Union[RecordStatusEnum, Iterable[RecordStatusEnum]]]' = None, dataset_id: 'Optional[Union[int, Iterable[int]]]' = None, parent_id: 'Optional[Union[int, Iterable[int]]]' = None, child_id: 'Optional[Union[int, Iterable[int]]]' = None, created_before: 'Optional[Union[datetime, str]]' = None, created_after: 'Optional[Union[datetime, str]]' = None, modified_before: 'Optional[Union[datetime, str]]' = None, modified_after: 'Optional[Union[datetime, str]]' = None, creator_user: 'Optional[Union[int, str, Iterable[Union[int, str]]]]' = None, limit: 'int' = None, include: 'Optional[Iterable[str]]' = None) -> 'RecordQueryIterator[BaseRecord]' method of qcportal.client.PortalClient instance
Query records of all types based on common fields
This is a general query of all record types, so it can only filter by fields
that are common among all records.
Do not rely on the returned records being in any particular order.
Parameters
----------
record_id
Query records whose ID is in the given list
record_type
Query records whose type is in the given list
manager_name
Query records that were completed (or are currently runnning) on a manager is in the given list
status
Query records whose status is in the given list
dataset_id
Query records that are part of a dataset is in the given list
parent_id
Query records that have a parent is in the given list
child_id
Query records that have a child is in the given list
created_before
Query records that were created before the given date/time
created_after
Query records that were created after the given date/time
modified_before
Query records that were modified before the given date/time
modified_after
Query records that were modified after the given date/time
creator_user
Query records created by a user in the given list (usernames or IDs)
limit
The maximum number of records to return. Note that the server limit is always obeyed.
include
Additional fields to include in the returned record
Returns
-------
:
An iterator that can be used to retrieve the results of the query
For example, to query for computations created between January 10, 2023 and January 14, 2023, we could do the following.
results = client.query_records(created_after="2023/01/10", created_before="2023/01/14")
---------------------------------------------------------------------------
PortalRequestError Traceback (most recent call last)
Cell In[4], line 1
----> 1 results = client.query_records(created_after="2023/01/10", created_before="2023/01/14")
File ~/work/QCFractal/QCFractal/qcportal/qcportal/client.py:843, in PortalClient.query_records(self, record_id, record_type, manager_name, status, dataset_id, parent_id, child_id, created_before, created_after, modified_before, modified_after, creator_user, limit, include)
825 filter_dict = {
826 "record_id": make_list(record_id),
827 "record_type": make_list(record_type),
(...) 838 "limit": limit,
839 }
841 filter_data = RecordQueryFilters(**filter_dict)
--> 843 return RecordQueryIterator[BaseRecord](self, filter_data, None, include)
File /opt/hostedtoolcache/Python/3.11.12/x64/lib/python3.11/typing.py:1289, in _BaseGenericAlias.__call__(self, *args, **kwargs)
1286 if not self._inst:
1287 raise TypeError(f"Type {self._name} cannot be instantiated; "
1288 f"use {self.__origin__.__name__}() instead")
-> 1289 result = self.__origin__(*args, **kwargs)
1290 try:
1291 result.__orig_class__ = self
File ~/work/QCFractal/QCFractal/qcportal/qcportal/record_models.py:861, in RecordQueryIterator.__init__(self, client, query_filters, record_type, include)
858 self.record_type = record_type
859 self.include = include
--> 861 QueryIteratorBase.__init__(self, client, query_filters, batch_limit)
File ~/work/QCFractal/QCFractal/qcportal/qcportal/base_models.py:108, in QueryIteratorBase.__init__(self, client, query_filters, batch_limit)
105 # Total number of rows/whatever we want to fetch
106 self._total_limit = query_filters.limit
--> 108 self.reset()
File ~/work/QCFractal/QCFractal/qcportal/qcportal/base_models.py:118, in QueryIteratorBase.reset(self)
115 self._current_batch: Optional[List[T]] = None
116 self._fetched: int = 0
--> 118 self._fetch_batch()
File ~/work/QCFractal/QCFractal/qcportal/qcportal/base_models.py:145, in QueryIteratorBase._fetch_batch(self)
141 new_limit = self._batch_limit
143 self._query_filters.limit = new_limit
--> 145 self._current_batch = self._request()
146 self._fetched += len(self._current_batch)
File ~/work/QCFractal/QCFractal/qcportal/qcportal/record_models.py:865, in RecordQueryIterator._request(self)
863 def _request(self) -> List[_Record_T]:
864 if self.record_type is None:
--> 865 record_ids = self._client.make_request(
866 "post",
867 f"api/v1/records/query",
868 List[int],
869 body=self._query_filters,
870 )
871 else:
872 # Get the record type string. This is kind of ugly, but works.
873 record_type_str = self.record_type.__fields__["record_type"].default
File ~/work/QCFractal/QCFractal/qcportal/qcportal/client_base.py:497, in PortalClientBase.make_request(self, method, endpoint, response_model, body_model, url_params_model, body, url_params, allow_retries, additional_headers)
494 if isinstance(parsed_url_params, pydantic.BaseModel):
495 parsed_url_params = parsed_url_params.dict()
--> 497 r = self._request(
498 method,
499 endpoint,
500 body=serialized_body,
501 url_params=parsed_url_params,
502 allow_retries=allow_retries,
503 additional_headers=additional_headers,
504 )
505 d = deserialize(r.content, r.headers["Content-Type"])
507 if response_model is None:
File ~/work/QCFractal/QCFractal/qcportal/qcportal/client_base.py:461, in PortalClientBase._request(self, method, endpoint, body, url_params, internal_retry, allow_retries, additional_headers)
456 except:
457 # If this error comes from, ie, the web server or something else, then
458 # we have to use 'reason'
459 details = {"msg": r.reason}
--> 461 raise PortalRequestError(f"Request failed: {details['msg']}", r.status_code, details)
463 return r
PortalRequestError: Request failed: 400 Bad Request: Invalid body: 1 validation error for ParsingModel[RecordQueryFilters]
__root__ -> creator_user
extra fields not permitted (type=value_error.extra) (HTTP status 400)
Our results from this query will be in something called an iterator.
An iterator can be made into a list by casting or used in a for
loop.
results_list = list(results)
print(f"Found {len(results_list)} results.")
After the results are retrieved, you can work with the records as shown in the “How do I work with computation records?” tutorial.
Querying by computation details#
If you want to query by computation specifications such as basis set, method, molecule, etc, you will need to use a more specific query methods.
For example, if you want to query single point computations, you should use the query_singlepoints
method.
Documentation for the query_singlepoints
method is shown below.
help(client.query_singlepoints)
As shown in the help message above, you can query single points on many different parameters.
For example, you might choose to query the database for mp2
calculations using the aug-cc-pvtz
basis using the psi4
program.
For the sake of demonstration in this notebook, we are limiting the number of results to 5 records.
results = client.query_singlepoints(method="mp2", basis="aug-cc-pvtz", program="psi4", limit=5)
After retrieving the results, we can loop through them and view information about the records.
for record in results:
print(record.id, record.molecule)