Queue Manager API

This page documents all valid options for the YAML file inputs to the config manager. This first section outlines each of the headers (top level objects) and a description for each one. The final file will look like the following:

common:
    option_1: value_for1
    another_opt: 42
server:
    option_for_server: "some string"

This is the complete set of options, auto-generated from the parser itself, so it should be accurate for the given release. If you are using a developmental build or want to see the schema yourself, you can run the qcfractal-manager --schema command and it will display the whole schema for the YAML input.

Each section below here is summarized the same way, showing all the options for that YAML header in the form of their pydantic API which the YAML is fed into in a one-to-one match of options.

class qcfractal.cli.qcfractal_manager.ManagerSettings(*, common: qcfractal.cli.qcfractal_manager.CommonManagerSettings = CommonManagerSettings(adapter=<AdapterEnum.pool: 'pool'>, tasks_per_worker=1, cores_per_worker=2, memory_per_worker=6.626, max_workers=1, retries=2, scratch_directory=None, verbose=False, nodes_per_job=1, nodes_per_task=1, cores_per_rank=1), server: qcfractal.cli.qcfractal_manager.FractalServerSettings = FractalServerSettings(fractal_uri='localhost:7777', username=None, password=None, verify=None), manager: qcfractal.cli.qcfractal_manager.QueueManagerSettings = QueueManagerSettings(manager_name='unlabeled', queue_tag=None, log_file_prefix=None, update_frequency=30.0, test=False, ntests=5, max_queued_tasks=None), cluster: qcfractal.cli.qcfractal_manager.ClusterSettings = ClusterSettings(node_exclusivity=False, scheduler=None, scheduler_options=[], task_startup_commands=[], walltime='06:00:00', adaptive=<AdaptiveCluster.adaptive: 'adaptive'>), dask: qcfractal.cli.qcfractal_manager.DaskQueueSettings = DaskQueueSettings(interface=None, extra=None, lsf_units=None), parsl: qcfractal.cli.qcfractal_manager.ParslQueueSettings = ParslQueueSettings(executor=ParslExecutorSettings(address=None), provider=ParslProviderSettings(partition=None, launcher=None)))[source]

The config file for setting up a QCFractal Manager, all sub fields of this model are at equal top-level of the YAML file. No additional top-level fields are permitted, but sub-fields may have their own additions.

Not all fields are required and many will depend on the cluster you are running, and the adapter you choose to run on.

Parameters

common

class qcfractal.cli.qcfractal_manager.CommonManagerSettings(_env_file: Optional[Union[pathlib.Path, str]] = '<object object>', _env_file_encoding: Optional[str] = None, _secrets_dir: Optional[Union[pathlib.Path, str]] = None, *, adapter: qcfractal.cli.qcfractal_manager.AdapterEnum = AdapterEnum.pool, tasks_per_worker: int = 1, cores_per_worker: qcfractal.cli.qcfractal_manager.ConstrainedIntValue = 2, memory_per_worker: qcfractal.cli.qcfractal_manager.ConstrainedFloatValue = 6.626, max_workers: qcfractal.cli.qcfractal_manager.ConstrainedIntValue = 1, retries: qcfractal.cli.qcfractal_manager.ConstrainedIntValue = 2, scratch_directory: str = None, verbose: bool = False, nodes_per_job: qcfractal.cli.qcfractal_manager.ConstrainedIntValue = 1, nodes_per_task: qcfractal.cli.qcfractal_manager.ConstrainedIntValue = 1, cores_per_rank: int = 1)[source]

The Common settings are the settings most users will need to adjust regularly to control the nature of task execution and the hardware under which tasks are executed on. This block is often unique to each deployment, user, and manager and will be the most commonly updated options, even as config files are copied and reused, and even on the same platform/cluster.

Parameters
  • adapter ({dask,pool,parsl}, Default: pool) – Which type of Distributed adapter to run tasks through.

  • tasks_per_worker (int, Default: 1) – Number of concurrent tasks to run per Worker which is executed. Total number of concurrent tasks is this value times max_workers, assuming the hardware is available. With the pool adapter, and/or if max_workers=1, tasks_per_worker is the number of concurrent tasks.

  • cores_per_worker (ConstrainedInt, Default: 2) – Number of cores to be consumed by the Worker and distributed over the tasks_per_worker. These cores are divided evenly, so it is recommended that quotient of cores_per_worker/tasks_per_worker be a whole number else the core distribution is left up to the logic of the adapter. The default value is read from the number of detected cores on the system you are executing on.

    In the case of node-parallel tasks, this number means the number of cores per node.

  • memory_per_worker (ConstrainedFloat, Default: 6.626) – Amount of memory (in GB) to be consumed and distributed over the tasks_per_worker. This memory is divided evenly, but is ultimately at the control of the adapter. Engine will only allow each of its calls to consume memory_per_worker/tasks_per_worker of memory. Total memory consumed by this manager at any one time is this value times max_workers. The default value is read from the amount of memory detected on the system you are executing on.

  • max_workers (ConstrainedInt, Default: 1) – The maximum number of Workers which are allowed to be run at the same time. The total number of concurrent tasks will maximize at this quantity times tasks_per_worker.The total number of Jobs on a cluster which will be started is equal to this parameter in most cases, and should be assumed 1 Worker per Job. Any exceptions to this will be documented. In node exclusive mode this is equivalent to the maximum number of nodes which you will consume. This must be a positive, non zero integer.

  • retries (ConstrainedInt, Default: 2) – Number of retries that QCEngine will attempt for RandomErrors detected when running its computations. After this many attempts (or on any other type of error), the error will be raised.

  • scratch_directory (str, Optional) – Scratch directory for Engine execution jobs.

  • verbose (bool, Default: False) – Turn on verbose mode or not. In verbose mode, all messages from DEBUG level and up are shown, otherwise, defaults are all used for any logger.

  • nodes_per_job (ConstrainedInt, Default: 1) – The number of nodes to request per job. Only used by the Parsl adapter at present

  • nodes_per_task (ConstrainedInt, Default: 1) – The number of nodes to use for each tasks. Only relevant for node-parallel executables.

  • cores_per_rank (int, Default: 1) – The number of cores per MPI rank for MPI-parallel applications. Only relevant for node-parallel codes and the most relevant to codes that with hybrid MPI+OpenMP parallelism (e.g., NWChem).

server

class qcfractal.cli.qcfractal_manager.FractalServerSettings(_env_file: Optional[Union[pathlib.Path, str]] = '<object object>', _env_file_encoding: Optional[str] = None, _secrets_dir: Optional[Union[pathlib.Path, str]] = None, *, fractal_uri: str = 'localhost:7777', username: str = None, password: str = None, verify: bool = None)[source]

Settings pertaining to the Fractal Server you wish to pull tasks from and push completed tasks to. Each manager supports exactly 1 Fractal Server to be in communication with, and exactly 1 user on that Fractal Server. These can be changed, but only once the Manager is shutdown and the settings changed. Multiple Managers however can be started in parallel with each other, but must be done as separate calls to the CLI.

Caution: The password here is written in plain text, so it is up to the owner/writer of the configuration file to ensure its security.

Parameters
  • fractal_uri (str, Default: localhost:7777) – Full URI to the Fractal Server you want to connect to

  • username (str, Optional) – Username to connect to the Fractal Server with. When not provided, a connection is attempted as a guest user, which in most default Servers will be unable to return results.

  • password (str, Optional) – Password to authenticate to the Fractal Server with (alongside the username)

  • verify (bool, Optional) – Use Server-side generated SSL certification or not.

manager

class qcfractal.cli.qcfractal_manager.QueueManagerSettings(_env_file: Optional[Union[pathlib.Path, str]] = '<object object>', _env_file_encoding: Optional[str] = None, _secrets_dir: Optional[Union[pathlib.Path, str]] = None, *, manager_name: str = 'unlabeled', queue_tag: Optional[Union[str, List[str]]] = None, log_file_prefix: str = None, update_frequency: qcfractal.cli.qcfractal_manager.ConstrainedFloatValue = 30, test: bool = False, ntests: qcfractal.cli.qcfractal_manager.ConstrainedIntValue = 5, max_queued_tasks: qcfractal.cli.qcfractal_manager.ConstrainedIntValue = None)[source]

Fractal Queue Manger settings. These are options which control the setup and execution of the Fractal Manager itself.

Parameters
  • manager_name (str, Default: unlabeled) – Name of this scheduler to present to the Fractal Server. Descriptive names help the server identify the manager resource and assists with debugging.

  • queue_tag (Union[str, List[str]], Optional) – Only pull tasks from the Fractal Server with this tag. If not set (None/null), then pull untagged tasks, which should be the majority of tasks. This option should only be used when you want to pull very specific tasks which you know have been tagged as such on the server. If the server has no tasks with this tag, no tasks will be pulled (and no error is raised because this is intended behavior). If multiple tags are provided, tasks will be pulled (but not necessarily executed) in order of the tags.

  • log_file_prefix (str, Optional) – Full path to save a log file to, including the filename. If not provided, information will still be reported to terminal, but not saved. When set, logger information is sent both to this file and the terminal.

  • update_frequency (ConstrainedFloat, Default: 30) – Time between heartbeats/update checks between this Manager and the Fractal Server. The lower this value, the shorter the intervals. If you have an unreliable network connection, consider increasing this time as repeated, consecutive network failures will cause the Manager to shut itself down to maintain integrity between it and the Fractal Server. Units of seconds

  • test (bool, Default: False) – Turn on testing mode for this Manager. The Manager will not connect to any Fractal Server, and instead submits netsts worth trial tasks per quantum chemistry program it finds. These tasks are generated locally and do not need a running Fractal Server to work. Helpful for ensuring the Manager is configured correctly and the quantum chemistry codes are operating as expected.

  • ntests (ConstrainedInt, Default: 5) – Number of tests to run if the test flag is set to True. Total number of tests will be this number times the number of found quantum chemistry programs. Does nothing if test is False.If set to 0, then this submits no tests, but it will run through the setup and client initialization.

  • max_queued_tasks (ConstrainedInt, Optional) – Generally should not be set. Number of tasks to pull from the Fractal Server to keep locally at all times. If None, this is automatically computed as ceil(common.tasks_per_worker*common.max_workers*2.0) + 1. As tasks are completed, the local pool is filled back up to this value. These tasks will all attempt to be run concurrently, but concurrent tasks are limited by number of cluster jobs and tasks per job. Pulling too many of these can result in under-utilized managers from other sites and result in less FIFO returns. As such it is recommended not to touch this setting in general as you will be given enough tasks to fill your maximum throughput with a buffer (assuming the queue has them).

cluster

class qcfractal.cli.qcfractal_manager.ClusterSettings(_env_file: Optional[Union[pathlib.Path, str]] = '<object object>', _env_file_encoding: Optional[str] = None, _secrets_dir: Optional[Union[pathlib.Path, str]] = None, *, node_exclusivity: bool = False, scheduler: qcfractal.cli.qcfractal_manager.SchedulerEnum = None, scheduler_options: List[str] = [], task_startup_commands: List[str] = [], walltime: str = '06:00:00', adaptive: qcfractal.cli.qcfractal_manager.AdaptiveCluster = AdaptiveCluster.adaptive)[source]

Settings tied to the cluster you are running on. These settings are mostly tied to the nature of the cluster jobs you are submitting, separate from the nature of the compute tasks you will be running within them. As such, the options here are things like wall time (per job), which Scheduler your cluster has (like PBS or SLURM), etc. No additional options are allowed here.

Parameters
  • node_exclusivity (bool, Default: False) – Run your cluster jobs in node-exclusivity mode. This option may not be available to all scheduler types and thus may not do anything. Related to this, the flags we have found for this option may not be correct for your scheduler and thus might throw an error. You can always add the correct flag/parameters to the scheduler_options parameter and leave this as False if you find it gives you problems.

  • scheduler ({slurm,pbs,sge,moab,lsf,cobalt}, Optional) – Option of which Scheduler/Queuing system your cluster uses. Note: not all scheduler options are available with every adapter.

  • scheduler_options (List[str], Default: []) – Additional options which are fed into the header files for your submitted jobs to your cluster’s Scheduler/Queuing system. The directives are automatically filled in, so if you want to set something like ‘#PBS -n something’, you would instead just do ‘-n something’. Each directive should be a separate string entry in the list. No validation is done on this with respect to valid directives so it is on the user to know what they need to set.

  • task_startup_commands (List[str], Default: []) – Additional commands to be run before starting the Workers and the task distribution. This can include commands needed to start things like conda environments or setting environment variables before executing the Workers. These commands are executed first before any of the distributed commands run and are added to the batch scripts as individual commands per entry, verbatim.

  • walltime (str, Default: 06:00:00) – Wall clock time of each cluster job started. Presented as a string in HH:MM:SS form, but your cluster may have a different structural syntax. This number should be set high as there should be a number of Fractal tasks which are run for each submitted cluster job. Ideally, the job will start, the Worker will land, and the Worker will crunch through as many tasks as it can; meaning the job which has a Worker in it must continue existing to minimize time spend redeploying Workers.

  • adaptive ({static,adaptive}, Default: adaptive) – Whether or not to use adaptive scaling of Workers or not. If set to ‘static’, a fixed number of Workers will be started (and likely NOT restarted when the wall clock is reached). When set to ‘adaptive’ (the default), the distributed engine will try to adaptively scale the number of Workers based on tasks in the queue. This is str instead of bool type variable in case more complex adaptivity options are added in the future.

dask

class qcfractal.cli.qcfractal_manager.DaskQueueSettings(*, interface: str = None, extra: List[str] = None, lsf_units: str = None, **kwargs)[source]

Settings for the Dask Cluster class. Values set here are passed directly into the Cluster objects based on the cluster.scheduler settings. Although many values are set automatically from other settings, there are some additional values such as interface and extra which are passed through to the constructor.

Valid values for this field are functions of your cluster.scheduler and no linting is done ahead of trying to pass these to Dask.

NOTE: The parameters listed here are a special exception for additional features Fractal has engineered or options which should be considered for some of the edge cases we have discovered. If you try to set a value which is derived from other options in the YAML file, an error is raised and you are told exactly which one is forbidden.

Please see the docs for the provider for more information.

Parameters
  • interface (str, Optional) – Name of the network adapter to use as communication between the head node and the compute node.There are oddities of this when the head node and compute node use different ethernet adapter names and we have not figured out exactly which combination is needed between this and the poorly documented ip keyword which appears to be for Workers, but not the Client.

  • extra (List[str], Optional) – Additional flags which are fed into the Dask Worker CLI startup, can be used to overwrite pre-configured options. Do not use unless you know exactly which flags to use.

  • lsf_units (str, Optional) – Unit system for an LSF cluster limits (e.g. MB, GB, TB). If not set, the units are are attempted to be set from the lsf.conf file in the default locations. This does nothing if the cluster is not LSF

parsl

class qcfractal.cli.qcfractal_manager.ParslQueueSettings(_env_file: Optional[Union[pathlib.Path, str]] = '<object object>', _env_file_encoding: Optional[str] = None, _secrets_dir: Optional[Union[pathlib.Path, str]] = None, *, executor: qcfractal.cli.qcfractal_manager.ParslExecutorSettings = ParslExecutorSettings(address=None), provider: qcfractal.cli.qcfractal_manager.ParslProviderSettings = ParslProviderSettings(partition=None, launcher=None), **values: Any)[source]

The Parsl-specific configurations used with the common.adapter = parsl setting. The parsl config is broken up into a top level Config class, an Executor sub-class, and a Provider sub-class of the Executor. Config -> Executor -> Provider. Each of these have their own options, and extra values fed into the ParslQueueSettings are fed to the Config level.

It requires both executor and provider settings, but will default fill them in and often does not need any further configuration which is handled by other settings in the config file.

Parameters

executor

class qcfractal.cli.qcfractal_manager.ParslExecutorSettings(*, address: str = None, **kwargs)[source]

Settings for the Parsl Executor class. This serves as the primary mechanism for distributing Workers to jobs. In most cases, you will not need to set any of these options, as several options are automatically inferred from other settings. Any option set here is passed through to the HighThroughputExecutor class of Parsl.

https://parsl.readthedocs.io/en/latest/stubs/parsl.executors.HighThroughputExecutor.html

NOTE: The parameters listed here are a special exception for additional features Fractal has engineered or options which should be considered for some of the edge cases we have discovered. If you try to set a value which is derived from other options in the YAML file, an error is raised and you are told exactly which one is forbidden.

Parameters

address (str, Optional) – This only needs to be set in conditional cases when the head node and compute nodes use a differently named ethernet adapter.

An address to connect to the main Parsl process which is reachable from the network in which Workers will be running. This can be either a hostname as returned by hostname or an IP address. Most login nodes on clusters have several network interfaces available, only some of which can be reached from the compute nodes. Some trial and error might be necessary to identify what addresses are reachable from compute nodes.

provider

class qcfractal.cli.qcfractal_manager.ParslProviderSettings(*, partition: str = None, launcher: qcfractal.cli.qcfractal_manager.ParslLauncherSettings = None, **kwargs)[source]

Settings for the Parsl Provider class. Valid values for this field depend on your choice of cluster.scheduler and are defined in the Parsl docs for the providers with some minor exceptions. The initializer function for the Parsl settings will indicate which

NOTE: The parameters listed here are a special exception for additional features Fractal has engineered or options which should be considered for some of the edge cases we have discovered. If you try to set a value which is derived from other options in the YAML file, an error is raised and you are told exactly which one is forbidden.

SLURM: https://parsl.readthedocs.io/en/latest/stubs/parsl.providers.SlurmProvider.html PBS/Torque/Moab: https://parsl.readthedocs.io/en/latest/stubs/parsl.providers.TorqueProvider.html SGE (Sun GridEngine): https://parsl.readthedocs.io/en/latest/stubs/parsl.providers.GridEngineProvider.html

Parameters
  • partition (str, Optional) – The name of the cluster.scheduler partition being submitted to. Behavior, valid values, and evenits validity as a set variable are a function of what type of queue scheduler your specific cluster has (e.g. this variable should NOT be present for PBS clusters). Check with your Sys. Admins and/or your cluster documentation.

  • launcher (ParslLauncherSettings, Optional) – The Parsl Launcher to use with your Provider. If left to None, defaults are assumed (check the Provider’s defaults), otherwise this should be a dictionary requiring the option launcher_class as a str to specify which Launcher class to load, and the remaining settings will be passed on to the Launcher’s constructor.