.. Copyright 2022-2025 The Ramble Authors Licensed under the Apache License, Version 2.0 or the MIT license , at your option. This file may not be copied, modified, or distributed except according to those terms. .. _workspace-config: ============================ Workspace Configuration File ============================ Ramble workspaces are controlled through their configuration files. Each workspace has a configuration file stored at ``$workspace/configs/ramble.yaml``. This document will describe the syntax for writing a workspace configuration file. Within the ``ramble.yaml`` file, all content lives under the top level ``ramble`` dictionary: .. code-block:: console ramble: ... This dictionary is used to control all of the aspects of the Ramble workspace. ----------------- Ramble Dictionary ----------------- The ramble dictionary is used to control the experiments a workspace is responsible for configuring, executing, analyzing, and archiving. .. code-block:: yaml ramble: variables: mpi_command: 'mpirun -n {n_ranks}' batch_submit: '{execute_experiment}' applications: hostname: workloads: serial: experiments: test_exp: variables: n_ranks: '1' n_nodes: '1' Within a ramble configuration file, configuration scopes for an experiment include, ``application``, ``workload``, and ``experiment``. They are denoted by these words in the configuration file. The name ``hostname`` name of the ramble application (as seen by ``ramble list``), while the name ``serial`` is the name of the workload (as seen by ``ramble info hostname``). The name ``test_exp`` is user defined, and will be explained in :ref:`experiment-names`. The name ``variables`` defines arbitrary variables, and will be explained in :ref:`variable-dictionaries`. .. _experiment-names: ^^^^^^^^^^^^^^^^ Experiment Names ^^^^^^^^^^^^^^^^ While the names of applications and workloads are defined by the application definition file, experiment names are more arbitrary. Experiment names are string, and can take variables for expansion. .. code-block:: yaml ramble: applications: hostname: workloads: serial: experiments: test_{n_ranks}_{n_nodes}: variables: mpi_command: 'mpirun -n {n_ranks}' batch_submit: '{execute_experiment}' n_ranks: '1' n_nodes: '1' In the above example, the experiment name would be: ``test_1_1`` when it is created. **NOTE:** Each experiment has a namespace that follows this pattern: ``application.workload.experiment``. Every experiment needs a unique namespace, or ramble will throw an error. .. _variable-dictionaries: ^^^^^^^^^^^^^^^^^^^^^ Variable Dictionaries ^^^^^^^^^^^^^^^^^^^^^ Within a variable dictionary, arbitrary variables can be defined. Defined variables apply to all experiments within their scope. These variables can be referred to within the YAML file, or template files using python keyword ( ``{var_name}`` ) syntax to perform variable expansion. If a variable is defined within multiple dictionaries, values defined closer to individual experiments take precedence. .. code-block:: yaml ramble: variables: mpi_command: 'mpirun -n {n_ranks}' batch_submit: '{execute_experiment}' processes_per_node: '16' n_ranks: '{n_nodes}*{processes_per_node}' applications: hostname: variables: n_threads: '1' workloads: serial: variables: n_nodes: '1' experiments: test_exp: variables: n_ranks: '1' In this example, ``n_ranks`` will take a value of ``1`` within the ``test_exp`` experiment. This experiment will also include definitions for ``processes_per_node``, ``n_nodes``, and ``n_threads``. .. _ramble-supported-functions: ~~~~~~~~~~~~~~~~~~~ Supported Functions ~~~~~~~~~~~~~~~~~~~ Ramble's variable expansion logic supports several mathematical operators and functions to help construct useful variable definitions. Supported math operators are: * ``+`` (addition) * ``-`` (subtraction) * ``*`` (multiplication) * ``/`` (division) * ``//`` (floor division) * ``**`` (exponent) * ``^`` (bitwise exclusive or) * ``-`` (unary subtraction) * ``==`` (equal) * ``!=`` (not equal) * ``>`` (greater than) * ``>=`` (greator or equal than) * ``<`` (less than) * ``<=`` (less or equal than) * ``and`` (logical and) * ``or`` (logical or) * ``%`` (modulo) Supported functions are: * ``str()`` (explicit string cast) * ``int()`` (explicit integer cast) * ``float()`` (explicit float cast) * ``min()`` (minimum) * ``max()`` (maximum) * ``ceil()`` (ceiling of input) * ``floor()`` (floor of input) * ``range()`` (construct range, see :ref:`ramble vector logic` for more information) * ``simplify_str()`` (convert input string to only alphanumerical characters and dashes) * ``randrange`` (from `random.randrange`) * ``randint`` (from `random.randint`) * ``re_search(regex, str)`` (determine if ``str`` contains pattern ``regex``, based on ``re.search``) String slicing is supported: * ``str[start:end:step]`` (string slicing) Dictionary references are supported: * ``dict_name["key"]`` (dictionary subscript) .. _ramble-escaped-variables: ~~~~~~~~~~~~~~~~~ Escaped Variables ~~~~~~~~~~~~~~~~~ When referring to variables in Ramble, sometimes it is useful to be able to escape curly braces to prevent the expander from fully expanding the variable reference. Curly braces that are prefixed with a back slash (i.e. ``\{`` or ``\}``) will be replaced with an unexpanded curly brace by Ramble's expander. Each time the variable is expanded, the escaped curly braces will be replaced with unescaped curly braces (i.e. ``\{`` will expand to ``{``). Additional back slashes can be added to prevent multiple expansions (i.e. ``\\{`` will expand to ``\{``). .. _ramble-vector-logic: ^^^^^^^^^^^^^^^^^^^^^^^^^^ List (or Vector) Variables ^^^^^^^^^^^^^^^^^^^^^^^^^^ Variables can be defined as a list of values as well (again, following the same math and variable expansion syntax as defined above). .. code-block:: yaml ramble: variables: mpi_command: 'mpirun -n {n_ranks}' batch_submit: '{execute_experiment}' processes_per_node: '16' n_ranks: '{n_nodes}*{processes_per_node}' applications: hostname: variables: n_threads: '1' workloads: serial: variables: n_nodes: ['1', '2', '3', '4'] experiments: test_exp_{n_nodes}: variables: n_ranks: '1' There are two notable aspects of this config file are: 1. ``n_nodes`` is a list of values 2. The experiment name references variable values. All lists defined within any experiment namespace are required to be the same length. They are zipped together, and iterated over to generate unique experiments. In addition to accepting explicit lists, Ramble supports using `Python's range() function `_ to create a list. With this functionality, the example above could be re-written as: .. code-block:: yaml ramble: variables: mpi_command: 'mpirun -n {n_ranks}' batch_submit: '{execute_experiment}' processes_per_node: '16' n_ranks: '{n_nodes}*{processes_per_node}' applications: hostname: variables: n_threads: '1' workloads: serial: variables: n_nodes: 'range(1, 5)' experiments: test_exp_{n_nodes}: variables: n_ranks: '1' .. _ramble-matrix-logic: ^^^^^^^^^^^^^^^^^ Variable Matrices ^^^^^^^^^^^^^^^^^ In addition to allowing variables, Ramble's config file has a special syntax for define variable matrices. Matrices consume list variables, and generate a matrix of variables with it. Each independent matrix performs the cross product of any list variables it consumes. .. code-block:: yaml ramble: variables: mpi_command: 'mpirun -n {n_ranks}' batch_submit: '{execute_experiment}' n_ranks: '{n_nodes}*{processes_per_node}' applications: hostname: variables: n_threads: '1' workloads: serial: variables: processes_per_node: ['16', '32'] n_nodes: ['1', '2', '3', '4'] experiments: test_exp_{n_nodes}_{processes_per_node}: variables: n_ranks: '1' matrix: - processes_per_node In the above example, the ``processes_per_node`` variable is consumed as part of a matrix. The result is a matrix of shape 1x2. After this matrix is consumed, it will be crossed with the zipped vectors (creating 8 unique experiments). Multiple matrices are allowed to be defined: .. code-block:: yaml :linenos: ramble: variables: mpi_command: 'mpirun -n {n_ranks}' batch_submit: '{execute_experiment}' n_ranks: '{n_nodes}*{processes_per_node}' applications: hostname: variables: n_threads: '1' workloads: serial: variables: processes_per_node: ['16', '32'] partition: ['part1', 'part2'] n_nodes: ['1', '2', '3', '4'] experiments: test_exp_{n_nodes}_{processes_per_node}: variables: n_ranks: '1' matrices: - - processes_per_node - partition - - n_nodes The result of this is that two matrices are created. The first is a 2x2 matrix, while the second is a 1x4 matrix. All matrices are required to have the same number of elements, as they are flattened and zipped together. In this case, there would be 4 experiments, each defined by a unique ``(processes_per_node, partition, n_nodes)`` tuple. .. _ramble-explicit-zips: ^^^^^^^^^^^^^^^^^^^^^^ Explicit Variable Zips ^^^^^^^^^^^^^^^^^^^^^^ A common pattern in python for iterating over multiple lists in lock-step is to use something called a zip. For more information on how this behaves in practice, see `Python's zip documentation `_. Ramble's workspace config contains syntax for defining explicit variable zips. These zips are named grouping of variables that are related and should be iterated over together when generating experiments. Zips consume list variables and generate a named grouping, which can be consumed by matrices just as list variables would be. Below is an example showing how to define explicit zips: .. code-block:: yaml :linenos: ramble: variables: mpi_command: 'mpirun -n {n_ranks}' batch_submit: '{execute_experiment}' n_ranks: '{n_nodes}*{processes_per_node}' applications: hostname: variables: n_threads: '1' workloads: serial: variables: processes_per_node: ['16', '32'] partition: ['part1', 'part2'] n_nodes: ['1', '2', '3', '4'] experiments: test_exp_{n_nodes}_{processes_per_node}: variables: n_ranks: '1' zips: partition_defs: - partition - processes_per_node matrix: - partition_defs - n_nodes Which would result in eight experiments, crossing the ``n_nodes`` variable with the zip of ``partition`` and ``processes_per_node``. .. _ramble-experiment-variants: ^^^^^^^^^^^^^^^ Variant Control ^^^^^^^^^^^^^^^ Within a workspace configuration file, experiments are able to define variants. Variants are able to manipulate specific aspects of experiments and applications. More information on these configuration options can be seen in the :ref:`Variants Configuration Section` documentation. To begin with, the only variant that can be specific is the ``package_manager``. The ``package_manager`` variant is used to define which package manager is used to configure and execute the experiments. To select ``spack`` as the package manager, the following block can be added to any scope that variables can be defined in. .. code-block:: yaml variants: package_manager: spack For more information about controlling package managers see the :ref:`package manager documentation `. .. _ramble-experiment-exclusion: ^^^^^^^^^^^^^^^^^^^^ Experiment Exclusion ^^^^^^^^^^^^^^^^^^^^ When writing a workspace configuration file, experiments can be explicitly excluded from the generated set using an ``exclude`` block inside the experiment definition. This block contains definitions of ``variables``, ``matrices``, ``zips``, and optional mathematical ``where`` statements to define which experiments should be excluded from the generation process. .. code-block:: yaml :linenos: ramble: variables: mpi_command: 'mpirun -n {n_ranks}' batch_submit: '{execute_experiment}' n_ranks: '{n_nodes}*{processes_per_node}' applications: hostname: variables: n_threads: '1' workloads: serial: variables: processes_per_node: ['16', '32'] partition: ['part1', 'part2'] n_nodes: ['1', '2', '3', '4'] experiments: test_exp_{n_nodes}_{processes_per_node}: variables: n_ranks: '1' zips: partition_defs: - partition - processes_per_node matrices: - - partition_defs - n_nodes exclude: variables: n_nodes: ['2', '3'] matrix: - partition_defs - n_nodes In the example above, of the eight experiments that would be generated from the experiment definition, four will be excluded. In the defined ``exclude`` block experiments with ``n_nodes = 2`` or ``n_nodes = 3`` will be excluded from the generation process. This logic can be replicated in a ``where`` statement as well: .. code-block:: yaml :linenos: ramble: variables: mpi_command: 'mpirun -n {n_ranks}' batch_submit: '{execute_experiment}' n_ranks: '{n_nodes}*{processes_per_node}' applications: hostname: variables: n_threads: '1' workloads: serial: variables: processes_per_node: ['16', '32'] partition: ['part1', 'part2'] n_nodes: ['1', '2', '3', '4'] experiments: test_exp_{n_nodes}_{processes_per_node}: variables: n_ranks: '1' zips: partition_defs: - partition - processes_per_node matrices: - - partition_defs - n_nodes exclude: where: - '{n_nodes} == 2' - '{n_nodes} == 3' ``where`` statements can contain mathematical operations, but must result in a boolean value. If any of the ``where`` statements evalaute to ``True`` within an experiment, that experiment will be excluded from generation. To be more explicit, all ``where`` statements are joined together with ``or`` operators. Within any single ``where`` statement, operators can be joined together with ``and`` and ``or`` operators as well. .. _ramble-experiment-repeats: ^^^^^^^^^^^^^^^^^^ Experiment Repeats ^^^^^^^^^^^^^^^^^^ Ramble provides a simple mechanism to repeat the same experiment a specified number of times, and calculates summary statistics for the set of repeated experiments. To enable repeats, an ``n_repeats`` block can be added at the application, workload, or experiment level. .. code-block:: yaml ramble: config: n_repeats: int repeats_success_strict: [True/False] applications: hostname: n_repeats: int workloads: serial: n_repeats: int experiments: test_experiment: n_repeats: int More information on setting repeats at the config level can be found in the :ref:`configuration files` documentation. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Environment Variable Control ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Environment variables can be controlled using an :ref:`env_var config section`, defined at the appropriate level of the workspace config. As a concrete example: .. code-block:: yaml env_vars: set: SET_VAR: set_val append: - var-separator: ',' vars: APPEND_VAR: app_val paths: PATH: app_path prepend: - paths: PATH: prepend_path unset: - LD_LIBRARY_PATH Would result in roughly the following bash commands: .. code-block:: console export SET_VAR=set_val export APPEND_VAR=$APPEND_VAR,app_val export PATH=prepend_path:$PATH:app_path unset LD_LIBRARY_PATH ^^^^^^^^^^^^^^^^^^^^^ Templatized Workloads ^^^^^^^^^^^^^^^^^^^^^ As previously shown, variables can be defined using lists or matrices. In addition to controlling several aspects of experiments, list and matrix variables can be used to replicate an experiment across workloads. .. code-block:: yaml ramble: applications: hostname: variables: application_workloads: ['parallel', 'serial', 'local'] workloads: '{application_workloads}': experiments: test_exp: variables: n_ranks: '1' In the above example, we use the ``application_workloads`` variable to define the names of the workloads we'd like to generate experiments for. Any variable can be used to define the name of the workloads, except those reserved by Ramble. These can be seen in the :ref:`ramble-reserved-variables` section. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Cross Experiment Variable References ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Variables can be defined to pull the value of a variable out of a different experiment. This is particularly useful when an experiment needs the path to something ramble automatically generates in a different experiment. .. code-block:: yaml ramble: variables: mpi_command: 'mpirun -n {n_ranks}' batch_submit: '{execute_experiment}' processes_per_node: '16' n_ranks: '{n_nodes}*{processes_per_node}' applications: hostname: variables: n_threads: '1' workloads: serial: variables: n_nodes: '1' experiments: test_exp1: variables: n_ranks: '1' real_value: 'exp1_value' test_exp2: variables: n_ranks: '1' test_value: real_value in hostname.serial.test_exp1 In the above example, ``test_value`` extracts the value of ``real_value`` as defined in the experiment ``hostname.serial.test_exp1``. When evaluated, this will set ``test_value`` to ``'exp1_value'``. ^^^^^^^^^^^^^^^^^^^^ Experiment Modifiers ^^^^^^^^^^^^^^^^^^^^ In addition to containing application definitions, Ramble also provides experiment modifiers. Experiment modifiers encapsulate several aspects of a standard modification to an experiment, such as prepending a binary with a tool or profiler, and can be applied to experiments to modify their behavior. Available experiment modifiers can be seen using ``ramble mods list``, and more information about a particular modifier can be see with ``ramble mods info ``. Modifiers can be applied to experiments using the following YAML syntax: .. code-block:: yaml ramble: variables: mpi_command: 'mpirun -n {n_ranks}' batch_submit: '{execute_experiment}' processes_per_node: '16' applications: gromacs: workloads: water_bare: experiments: test_exp1: modifiers: - name: intel-aps mode: mpi on_executable: - '*' variables: n_ranks: '1' Modifiers can be defined at any level variables can be defined at (and are even their own config section). When defining a modifier, the ``name`` attribute is the name of the modifier that will be applied. The ``mode`` attribute is a modifier specific setting allowing the user to select the modifier behavior. Modes can be seen by looking at the modifier information, and represent modes of use for the modifier. Modes group several general aspects of a modifier into one usage mode, and can allow a general modifier to present many operational entry points. The ``on_executable`` attribute is a list of experiment executables that the modifier should be applied to. These executable names are matched using python's ``fnmatch.fnmatch`` functionality. If it is not set, modifiers will attempt to determine their own ``mode`` attribute. This will succeed if the modifier has a single mode of operation. If there are multiple modes, this will raise an exception. Every modifier has a ``disabled`` mode that is defined by default. This mode will never be automatically enabled, but it will allow experiments to turn off the modifier without having to remove the modifier from the experiment definitions. If the ``on_executable`` attribute is not set, it will default to ``'*'`` which will match all executables. Modifier classes can (and should) be implemented to only act on the correct executable types (i.e. executables with ``use_mpi=true``). .. _experiment_tags: ^^^^^^^^^^^^^^^ Experiment Tags ^^^^^^^^^^^^^^^ While applications and workloads can be tagged within an application definition file (using the ``tags()`` or ``workload()`` directives), workloads and experiments can also be tagged within a workspace configuration file. This allows users to define their own tags to communicate what an experiment and workload might be used for beyond the information captured in the application definition file. The below example shows how tags can be defined within a workspace: .. code-block:: yaml ramble: variables: mpi_command: 'mpirun -n {n_ranks}' batch_submit: '{execute_experiment}' processes_per_node: '16' applications: gromacs: workloads: water_bare: tags: - wltag experiments: test_exp1: tags: - tag1 variables: n_ranks: '1' test_exp2: tags: - tag2 variables: n_ranks: '1' In the above example, all experiments are tagged with the ``wltag`` tag. Only the ``test_exp1`` experiment is tagged with the ``tag1`` tag, while the ``test_exp2`` experiment is tagged with the ``tag2`` tag. These tags are propagated into a workspace's results file, and can be used to filter pipeline commands, as show in the :ref:`filtering experiments documentation `. .. _workspace_including_external_files: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Including External Configuration Files ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Ramble workspace configuration files support referring to external configuration files. This allows a workspace to be composed of external files and directories. .. code-block:: YAML ramble: include: - /absolute/path/to/applications.yaml - $workspace_root/directory/in/workspace/ Supported path variables include: * ``$workspace_root`` - Root directory of workspace * ``$workspace`` - Root directory of workspace * ``$workspace_configs`` - Configs directory in workspace * ``$workspace_software`` - Software directory in workspace * ``$workspace_logs`` - Logs directory in workspace * ``$workspace_inputs`` - Experiments directory in workspace * ``$workspace_shared`` - Shared directory in workspace * ``$workspace_archives`` - Archives directory in workspace * ``$workspace_deployments`` - Deployments directory in workspace For more information, see the relevant portion of Spack's documentation on `including configurations `_. .. _workspace_internals: ^^^^^^^^^^^^^^^^^^^^^ Controlling Internals ^^^^^^^^^^^^^^^^^^^^^ Within a workspace config, an internals dictionary can be used to control several internal aspects of the application, workload, and experiment. This config section is defined in the :ref:`internals config section`. Below are examples of using this within a workspace config file. """""""""""""""""" Custom Executables """""""""""""""""" Custom executables can be created within the internals dictionary. Below is an example, showing how to create a ``lscpu`` executable at the application level. .. code-block:: yaml ramble: applications: hostname: internals: custom_executables: lscpu: template: - 'lscpu' use_mpi: false redirect: '{log_file}' ... The above example creates a custom executable, named ``lscpu`` that will inject the command ``lscpu`` into the command for an experiment when it is used. It is important to note that this only creates the executable, and does not use it. """""""""""""""""""""""""""" Controlling Executable Order """""""""""""""""""""""""""" The internals dictionary allows the ability to control the order pre-defined executables (or custom executables) are pieced together to build an experiment. .. code-block:: yaml ramble: applications: hostname: internals: custom_executables: lscpu: template: - 'lscpu' use_mpi: false redirect: '{log_file}' executables: - serial - builtin::env_vars - lscpu The above example builds off of the custom executable example, and shows how one can control the order of the executables in the formatted executable expansions. The default for the hostname application is ``[builtin::env_vars, serial/parallel]`` but this changes the order and injects ``lscpu`` into the expansion. """""""""""""""""""""""""" Using Executable Injection """""""""""""""""""""""""" Executable order can also be controlled via the ``executable_injection`` block within the ``internals`` block. Injecting the ``lscpu`` executable to the end of the list of executables can be performed with the following: .. code-block:: yaml ramble: applications: hostname: internals: custom_executables: lscpu: template: - 'lscpu' use_mpi: false redirect: '{log_file}' executable_injection: - name: lscpu This is a generic way to add the ``lscpu`` custom executable to the end of the list of executables for the experiment. For more information on this see the :ref:`internals config section` documentation. """"""""""""""""""""""""""""""" Overriding Variable Definitions """"""""""""""""""""""""""""""" When defining custom executables, sometimes it's useful to be able to override specific variable definitions for only this executable definition. As an example, consider running a command to get information from every node in a job allocation. While the actual experiment might be utilizing many processes on each compute node, the custom executable only wants to run a single process on each compute node. Ramble provides the ability for users to define variables that are scoped to only the custom executable instead of the entire experiment. Consider the following example: .. code-block:: yaml ramble: applications: gromacs: internals: custom_executables: all_hosts: template: - 'hostname' use_mpi: true variables: n_ranks: '{n_nodes}' processes_per_node: '1' redirect: '{log_file}' In this example, a custom executable named ``all_hosts`` is defined. Within this executable, the value of ``n_ranks`` is defined to be the value of ``n_nodes``, and ``processes_per_node`` is defined to be ``1``, causing only one rank per compute node. This would print the hostname of each node in the experiment once. .. _ramble-reserved-variables: ^^^^^^^^^^^^^^^^^^ Reserved Variables ^^^^^^^^^^^^^^^^^^ There are several reserved, auto-generated, and required variables for Ramble to function properly. This section will describe them. """""""""""""""""" Required Variables """""""""""""""""" Ramble requires the following variables to be defined: * ``n_ranks`` - Defines the number of MPI ranks to use. If not explicitly set, is defined as: ``{processes_per_node}*{n_nodes}`` * ``n_nodes`` - Defines the number of machines needed for the experiment. If not explicitly set, is defined as: ``ceiling({n_ranks}/{processes_per_node})`` * ``processes_per_node`` - Defines how many ranks should be on each node. If not explicitly set, is defined as: ``ceiling({n_ranks}/{n_nodes})`` * ``mpi_command`` - Template for generating an MPI command * ``batch_submit`` - Template for generating a batch system submit command """"""""""""""""""" Generated Variables """"""""""""""""""" Ramble automatically generates definitions for the following variables: * ``application_name`` - Set to the name of the application * ``workload_name`` - Set to the name of the workload within the application * ``experiment_name`` - Set to the name of the experiment * ``env_name`` - By default defined as ``{application_name}``. Can be overridden to control the software environment to use. * ``application_run_dir`` - Absolute path to ``$workspace_root/experiments/{application_name}`` * ``workload_run_dir`` - Absolute path to ``$workspace_root/experiments/{application_name}/{workload_name}`` * ``experiment_run_dir`` - Absolute path to ``$workspace_root/experiments/{application_name}/{workload_name}/{experiment_name}`` * ``application_input_dir`` - Absolute path to ``$workspace_root/inputs/{application_name}`` * ``workload_input_dir`` - Absolute path to ``$workspace_root/inputs/{application_name}/{workload_name}`` * ``experiment_index`` - Index, in set, of experiment. If part of a chain, shares a value with its root. * ``env_path`` - Absolute path to ``$workspace_root/software/{package_manager_name}/{env_name}.{workload_name}`` if no package manager is used, ``{package_manager_name}`` is replaced with ``no-package-manager``. * ``log_dir`` - Absolute path to ``$workspace_root/logs`` * ``log_file`` - Absolute path to ``{experiment_run_dir}/{experiment_name}.out`` * ```` - Applications that have input files have variables defined that contain the absolute path to: ``$workspace_root/inputs/{application_name}/{workload_name}/`` where ```` is the name as defined in the ``input_file`` directive. * ```` - Any files with the ``.tpl`` extension in ``$workspace_root/configs`` have a variable generated that resolves to the absolute path to: ``{experiment_run_dir}/`` where ```` is the filename of the template, without the extension. """""""""""""""""""""""""""""""""""""""""""" Package Manager Specific Generated Variables """""""""""""""""""""""""""""""""""""""""""" Ramble also generates or requires the following variables, depending on the package manager used: * ``_path`` - Set to the installation location for the package for all packages defined in an experiment's environment definition. ```` is the name of the package as defined in the ``software:packages`` dictionary. When the package manager is ``spack`` this is the equivalent to the output of ``spack location -i`` for each install spec. Any applications that have required packages require path variables to be defined when a package manager is not used. As an example: .. code-block:: yaml ramble: variants: package_manager: spack software: packages: grm: pkg_spec: gromacs@2023.1 environments: grm_env: packages: - grm Defines a software environment named ``grm_env``. The default environment used has the same name as the application the experiment is generated from. In experiments which use this ``grm_env`` environment, a variable is defined named: ``gromacs``, as that is the package named defined by the ``pkg_spec`` attribute of the ``grm`` package definition. This variable contains the path to the installation location for the ``gromacs`` package. **NOTE**: Package installation location variables are only generated when actually performing the setup of a workspace. When a ``--dry-run`` is performed, these paths are not populated. ------------------- Software Dictionary ------------------- Within a ramble.yaml file, the ``software:`` dictionary controls the software stack installation that ramble performs. This configuration section is defined in the :ref:`Software section` documentation. a packages dictionary, and an environments dictionary. The ``ramble workspace concretize`` command can help construct a functional software dictionary based on the experiments listed. It is important to note that packages and environments that are not used by an experiment are not installed. Application definition files can define one or more ``software_spec`` directives, which are packages the application might need to run properly. Additionally, packages can be marked as required through the ``required_package`` directive. ------------------------------------------- Controlling MPI Libraries and Batch Systems ------------------------------------------- Some workspaces might be configured with the goal of exploring the performance of different MPI libraries (e.g. MPICH vs. Open MPI), or of performing the same experiment in multiple batch schedulers (e.g. SLURM, PBS Pro, and Flux). This section will show how to perform these experiments within a workspace configuration file. ^^^^^^^^^^^^^^^^^^^ MPI Command Control ^^^^^^^^^^^^^^^^^^^ When writing a ramble configuration file to perform the same experiment with different MPI libraries, the MPI section within the Ramble dictionary is insufficient for changing the flags used based on the MPI library used. However, Ramble's variable definitions can be used to control this on a per-experiment basis. Below is an example of running a Gromacs experiment in both MPICH and OpenMPI: .. code-block:: yaml ramble: variants: package_manager: spack variables: batch_submit: '{execute_experiment}' mpi_command: - 'mpirun -n {n_ranks} -ppn {processes_per_node} ' # MPICH - 'mpirun -n {n_ranks} -nperhost {processes_per_node} ' # OpenMPI applications: gromacs: workloads: water_bare: experiments: '{env_name}': variables: n_ranks: '1' n_nodes: '1' env_name: ['gromacs-mpich', 'gromacs-ompi'] software: packages: gcc9: pkg_spec: gcc@9.3.0 target=x86_64 mpich: pkg_spec: mpich@4.0.2 target=x86_64 compiler: gcc9 ompi: pkg_spec: openmpi@4.1.4 target=x86_64 compiler: gcc9 gromacs: pkg_spec: gromacs@2022.4 compiler: gcc9 environments: gromacs-{mpi}: variables: mpi: ['mpich', 'ompi'] packages: - gromacs - '{mpi}' In the above example, you can see how ``env_name`` is used to test both an OpenMPI and MPICH version of Gromacs. Additionally, the ``mpi_command`` variable is used to define how ``mpirun`` should look for each of the MPI libraries. Using the previously described Ramble vector syntax, this configuration file will generate 2 experiments. Both ``env_name`` and ``mpi_command`` will be zipped together, giving each experiment a tuple of: ``(mpi_command, env_name)`` which allows us to pair a specific MPI command to the corresponding Gromacs spec. ^^^^^^^^^^^^^^^^^^^^ Batch System Control ^^^^^^^^^^^^^^^^^^^^ Similar to the previously describe MPI command control, experiments can use different batch systems by overriding the ``batch_submit`` variable. Below is an example configuration file showing how the ``batch_submit`` variable can be used to submit the same experiment to multiple batch systems. .. code-block:: yaml ramble: variants: package_manager: spack variables: mpi_command: 'mpirun -n {n_ranks} -ppn {processes_per_node}' batch_system: - slurm - pbs batch_submit: - 'sbatch {execute_slurm}' - 'qsub {execute_pbs}' applications: gromacs: workloads: water_bare: experiments: '{batch_system}' variables: n_ranks: '1' n_nodes: '1' software: packages: gcc9: pkg_spec: gcc@9.3.0 target=x86_64 impi2021: pkg_spec: intel-oneapi-mpi@2021.11.0 target=x86_64 compiler: gcc9 gromacs: pkg_spec: gromacs@2022.4 compiler: gcc9 environments: gromacs: packages: - impi2021 - gromacs The above example overrides the generated ``batch_submit`` variable to change how different experiments are submitted. In this example, we submit the same experiment to both SLURM and PBS. Note that each of the two ``batch_submit`` commands submits a different template. This means the workspace's configs directory should have two files: ``execute_slurm.tpl`` and ``execute_pbs.tpl`` which will be template submission scripts to each of the batch systems. ----------------- Experiment Chains ----------------- Multiple experiments can be executed within the same context by a process known as chaining, this allows multiple experiments (potentially from multiple applications) to be executed in the same context and is useful for many potential use cases such as running multiple experiments on the same physical hardware There are two important parts for defining an experiment chain. The first of these is simply defining the experiment chain, and the second is defining experiments which are only intended to be used when chained into another experiment, known as template experiments. ^^^^^^^^^^^^^^^^^^^^^^^^^^ Defining Experiment Chains ^^^^^^^^^^^^^^^^^^^^^^^^^^ The following example shows how to specify a chain of experiments: .. code-block:: yaml ramble: variables: mpi_command: 'mpirun -n {n_ranks}' batch_submit: '{execute_experiment}' processes_per_node: '16' n_ranks: '{n_nodes}*{processes_per_node}' applications: hostname: variables: n_threads: '1' workloads: serial: variables: n_nodes: '1' experiments: test_exp1: variables: n_ranks: '1' test_exp2: variables: n_ranks: '1' chained_experiments: - name: hostname.serial.test_exp1 command: '{execute_experiment}' order: 'after_chain' variables: n_ranks: '2' In the above example, the ``hostname.serial.test_exp2`` experiment defines an experiment chain. The chain is defined by mergining the ``chained_experiments`` dictionaries and inserting itself at the appropriate location. Experiments can be defined with in the ``chained_experiments`` dictionary using the following format: .. code-block:: yaml chained_experiments: # List of experiments to chain - name: Fully qualified experiment namespace command: Command that executes the sub experiment order: Order to chain this experiment. Defaults to 'after_root' variables: Variables dictionary to override the variables from the original experiment Each chained experiment receives its own unique namespace. These take the form of: ``.chain..`` In the above example, the chained experiment would have a namespace of: ``hostname.serial.test_exp2.chain.0.hostname.serial.test_exp1`` The ``name`` attribute can use `globbing syntax`_ to chain multiple experiments at once. The ``order`` keyword is optional. Valid options include: * ``before_chain`` Chained experiment is injected at the beginning of the chain * ``before_root`` Chained experiment is injected right before the root experiment in the chain * ``after_root`` Chained experiment is injected right after the root experiment in the chain * ``after_chain`` Chained experiment is injected at the end of the chain The ``root`` experiment is defined as the initial experiment that started the chain. When examining the entire chain, the root experiment is the only one that does not have ``chain.{idx}`` in its name. The ``variables`` keyword is optional. It can be used to override the definition of variables from the chained experiment if needed. Once the experiments are defined, the final order of the chain can be viewed using ``ramble workspace info -vvv``. **NOTE** When using the ``experiment_index`` variable, all experiments in a chain share the same value. This ensures the resulting experiment will be complete when executed. ^^^^^^^^^^^^^^^^^^^^^^^ Suppressing Experiments ^^^^^^^^^^^^^^^^^^^^^^^ The below example shows how to suppress generation of an experiment, by marking it as a template. .. code-block:: yaml ramble: variables: mpi_command: 'mpirun -n {n_ranks}' batch_submit: '{execute_experiment}' processes_per_node: '16' n_ranks: '{n_nodes}*{processes_per_node}' applications: hostname: variables: n_threads: '1' workloads: serial: variables: n_nodes: '1' experiments: test_exp1: template: true variables: n_ranks: '1' test_exp2: variables: n_ranks: '1' chained_experiments: - name: hostname.serial.test_exp1 command: '{execute_experiment}' order: 'after_chain' variables: n_ranks: '2' In the above example, the ``template`` keyword is used to mark ``hostname.serial.test_exp1`` as a template experiment. This prevents it from being used as a stand-alone experiment, but it will still be generated and used when it's chained into other experiments. ^^^^^^^^^^^^^^^^^^^^ Variable Inheritance ^^^^^^^^^^^^^^^^^^^^ In some cases, it's useful for an experiment to take values for its variables from the root of the chain. For example, if an allreduce benchmark should be run on all of the nodes within a job before the actual experiment begins, but the number of nodes changes based on the root experiment. In this case, a workspace might be more simply defined if the root experiment can inject its own definition for the number of nodes into the chained experiments. To accomplish this, the: ``inherit_variables`` attribute within a chained experiment definition can be used to define which variables should be inherited from the root experiment. .. code-block:: yaml ramble: variables: mpi_command: 'mpirun -n {n_ranks}' batch_submit: '{execute_experiment}' processes_per_node: '16' n_ranks: '{n_nodes}*{processes_per_node}' applications: hostname: variables: n_threads: '1' workloads: serial: variables: n_nodes: '1' experiments: test_exp1: template: true variables: n_nodes: '1' test_exp2: variables: n_nodes: '4' chained_experiments: - name: hostname.serial.test_exp1 command: '{execute_experiment}' order: 'after_chain' inherit_variables: - n_ranks In the example above, the ``hostname.serial.test_exp2`` experiment represents the root of the experiment chain. The ``inherit_variables`` list will cause this root experiment to inject its own value for ``n_nodes`` into the chained experiment, overriding its explicitly defined value in the experiment definition. ^^^^^^^^^^^^^^^^^^^^^^^^^ Defining Chains of Chains ^^^^^^^^^^^^^^^^^^^^^^^^^ Ramble supports the ability to define chains of experiment chains. This allows an experiment to automatically implicitly include all of the experiments chained into the explicitly chained experiment. Below is an example showing how chains of chains can be defined: .. code-block:: yaml ramble: variables: mpi_command: 'mpirun -n {n_ranks}' batch_submit: '{execute_experiment}' processes_per_node: '16' n_ranks: '{n_nodes}*{processes_per_node}' applications: hostname: variables: n_threads: '1' workloads: serial: variables: n_nodes: '1' experiments: child_level2_experiment: template: true variables: n_ranks: '1' child_level1_experiment: template: true variables: n_ranks: '1' chained_experiments: - name: hostname.serial.child_level2_experiment order: 'before_root' command: '{execute_experiment}' parent_experiment: variables: n_ranks: '1' chained_experiments: - name: hostname.serial.child_level1_experiment command: '{execute_experiment}' In the above example, the resulting experiment chain would be: .. code-block:: yaml - hostname.serial.parent_experiment.chain.0.hostname.serial.child_level2_experiment - hostname.serial.parent_experiment - hostname.serial.parent_experiment.chain.1.hostname.serial.child_level1_experiment