8) Variable Expansion, Indirection, and Software Stack Parameterization
In this tutorial, you will learn how to use variable expansion, indirection, and software stack parameterization when generating experiments. For this tutorial, we will use WRF, a free and open-source application for atmospheric research and operational forecasting applications.
This tutorial builds off of concepts introduced in previous tutorials. Please make sure you review those before starting with this tutorial’s content.
NOTE: In this tutorial, you will encounter expected errors when copying and pasting the commands. This is to help show situations you might run into when trying to use Ramble on your own, and illustrate how you might fix them.
Create a Workspace
To begin with, you need a workspace to configure the experiments. This can be created with the following command:
$ ramble workspace create var_expansion_and_indirection
Activate the Workspace
Several of Ramble’s commands require an activated workspace to function properly. Activate the newly created workspace using the following command: (NOTE: you only need to run this if you do not currently have the workspace active).
$ ramble workspace activate var_expansion_and_indirection
Configure Experiment Definitions
To being with, you need to configure the workspace. The workspace’s root
location can be seen under the Location output of:
$ ramble workspace info
Additionally, the files can be edited directly with:
$ ramble workspace edit
Within the ramble.yaml file, write the following contents, which are the
final configuration from the previous tutorial.
# Copyright 2022-2025 The Ramble Authors
# Licensed under the Apache License, Version 2.0 <LICENSE-APACHE or
# https://www.apache.org/licenses/LICENSE-2.0> or the MIT license
# <LICENSE-MIT or https://opensource.org/licenses/MIT>, at your
# option. This file may not be copied, modified, or distributed
# except according to those terms.
ramble:
variants:
package_manager: spack
env_vars:
set:
OMP_NUM_THREADS: '{n_threads}'
variables:
n_ranks: '{processes_per_node}*{n_nodes}'
batch_submit: '{execute_experiment}'
mpi_command: mpirun -n {n_ranks}
platform: [platform1, platform2]
processes_per_node: [16, 18]
zips:
platform_config:
- platform
- processes_per_node
applications:
wrfv4:
workloads:
CONUS_12km:
experiments:
scaling_{n_nodes}_{platform}:
variables:
n_nodes: [1, 2]
matrix:
- platform_config
- n_nodes
software:
packages:
gcc9:
pkg_spec: gcc@9.4.0
intel-mpi:
pkg_spec: intel-oneapi-mpi@2021.11.0
compiler: gcc9
wrfv4:
pkg_spec: wrf@4.2 build_type=dm+sm compile_type=em_real nesting=basic ~chem
~pnetcdf
compiler: gcc9
environments:
wrfv4:
packages:
- intel-mpi
- wrfv4
The above configuration will execute 4 experiments, comprising a basic scaling study on three different sets of nodes across two different platforms.
You will expand this definition to perform the same sweep over multiple MPI implementations. Over the course of this tutorial, you will learn how to use variable expansion and indirection to construct more complex experiments.
Define Additional MPI and Parameterize Software Environments
To begin with, you will parameterize the software stack definitions to generate
experiments using both IntelMPI and OpenMPI. For this section, you can focus on
the software portion of the ramble.yaml configuration file. For more
information on how this section is constructed, see the Software config
section documentation.
To start with, you will create an OpenMPI package definition. This might look like the following:
packages:
openmpi:
pkg_spec: openmpi@3.1.6 +orterunprefix
In the definition of the Intel MPI package above, you’ll see we originally
specified a compiler attribute (with the value of gcc9). This can be
explicitly selected if you like, however when using Spack, Ramble generates
Spack environments with unify: true
(See Spack’s environment documentation
for more details). As a result, OpenMPI should be compiled with the same
compiler used for WRF.
We also need to generate additional software environments, however we will parameterize the generation of these using a new variable definition.
environments:
wrfv4-{mpi_name}:
packages:
- {mpi_name}
- wrfv4
variables:
mpi_name: ['intel-mpi', 'openmpi']
Will create two software environments. One named wrfv4-intel-mpi and
another named wrfv4-openmpi. However, the definition of mpi_name can be
hoisted to the workspace level because we need to include it in the experiment
generation as well. The result might look like the following:
# Copyright 2022-2025 The Ramble Authors
# Licensed under the Apache License, Version 2.0 <LICENSE-APACHE or
# https://www.apache.org/licenses/LICENSE-2.0> or the MIT license
# <LICENSE-MIT or https://opensource.org/licenses/MIT>, at your
# option. This file may not be copied, modified, or distributed
# except according to those terms.
ramble:
variants:
package_manager: spack
env_vars:
set:
OMP_NUM_THREADS: '{n_threads}'
variables:
n_ranks: '{processes_per_node}*{n_nodes}'
batch_submit: '{execute_experiment}'
mpi_command: mpirun -n {n_ranks}
platform: [platform1, platform2]
processes_per_node: [16, 18]
mpi_name: [intel-mpi, openmpi]
zips:
platform_config:
- platform
- processes_per_node
applications:
wrfv4:
workloads:
CONUS_12km:
experiments:
scaling_{n_nodes}_{platform}:
variables:
n_nodes: [1, 2]
matrix:
- platform_config
- n_nodes
software:
packages:
gcc9:
pkg_spec: gcc@9.4.0
intel-mpi:
pkg_spec: intel-oneapi-mpi@2021.11.0
compiler: gcc9
openmpi:
pkg_spec: openmpi@3.1.6 +orterunprefix
wrfv4:
pkg_spec: wrf@4.2 build_type=dm+sm compile_type=em_real nesting=basic ~chem
~pnetcdf
compiler: gcc9
environments:
wrfv4-{mpi_name}:
packages:
- '{mpi_name}'
- wrfv4
NOTE The reference to {mpi_name} within the environment package list is
escaped using single quotes. This is to prevent YAML from parsing this as a
dictionary.
At this point, executing:
$ ramble workspace info
Should result in the following error:
==> Error: Experiment wrfv4.CONUS_12km.scaling_1_platform1 is not unique.
As you have implicitly defined 8 experiments (2 from n_nodes, times 2 from
platform_config, times another 2 from mpi_name), but you haven’t
updated the experiment name template. To resolve this, add {mpi_name} into
the experiment name template. Additionally, you may explicitly add mpi_name
into the matrix. The result might look like the following:
# Copyright 2022-2025 The Ramble Authors
# Licensed under the Apache License, Version 2.0 <LICENSE-APACHE or
# https://www.apache.org/licenses/LICENSE-2.0> or the MIT license
# <LICENSE-MIT or https://opensource.org/licenses/MIT>, at your
# option. This file may not be copied, modified, or distributed
# except according to those terms.
ramble:
variants:
package_manager: spack
env_vars:
set:
OMP_NUM_THREADS: '{n_threads}'
variables:
n_ranks: '{processes_per_node}*{n_nodes}'
batch_submit: '{execute_experiment}'
mpi_command: mpirun -n {n_ranks}
platform: [platform1, platform2]
processes_per_node: [16, 18]
mpi_name: [intel-mpi, openmpi]
zips:
platform_config:
- platform
- processes_per_node
applications:
wrfv4:
workloads:
CONUS_12km:
experiments:
scaling_{n_nodes}_{platform}_{mpi_name}:
variables:
n_nodes: [1, 2]
matrix:
- platform_config
- n_nodes
- mpi_name
software:
packages:
gcc9:
pkg_spec: gcc@9.4.0
intel-mpi:
pkg_spec: intel-oneapi-mpi@2021.11.0
compiler: gcc9
openmpi:
pkg_spec: openmpi@3.1.6 +orterunprefix
wrfv4:
pkg_spec: wrf@4.2 build_type=dm+sm compile_type=em_real nesting=basic ~chem
~pnetcdf
compiler: gcc9
environments:
wrfv4-{mpi_name}:
packages:
- '{mpi_name}'
- wrfv4
Variable Expansion and Indirection
At this stage, you have defined a workspace that will execute 8 experiments.
It is important to point out that different MPI implementations have different
command line flags for controlling their behavior. The existing mpi_command
should work fine with both Intel MPI, and OpenMPI but to illustrate how
variable expansion and indirection can be used you will now add a flag to
control the number of MPI ranks per compute node.
For Intel MPI this is:
-ppn {processes_per_node}
While in OpenMPI this is:
--map-by ppr:{processes_per_node}:node
One way to define this is to define mpi_command as a list variable, with
the appropriate MPI command line arguments. Then you can define an explicit zip
that combines mpi_command and mpi_name. However, for the purposes of
this tutorial you will instead use variable expansion and indirection to lookup
variable definitions.
In Ramble, every variable can be defines as a combination of other variables. For example:
variables:
processes_per_node: 4
n_nodes: 2
n_ranks: '{processes_per_node}*{n_nodes}'
Would result in n_ranks having a value of 8, as each of the variable
references are expanded and then the math is evaluated.
Additionally, variable references are allowed to be nested to parameterize which variables you want to use. For example:
variables:
openmpi_args: '--np {n_ranks} --map-by ppr:{processes_per_node}:node -x OMP_NUM_THREADS'
intel-mpi_args: '-n {n_ranks} -ppn {processes_per_node}'
mpi_command: 'mpirun {{mpi_name}_args}'
Allows the mpi_command definition to change based on the definition of
mpi_name. This is called variable indirection. If we employ variable
indirection to help parameterize the MPI arguments as shown above, the
resulting configuration might look like the following:
# Copyright 2022-2025 The Ramble Authors
# Licensed under the Apache License, Version 2.0 <LICENSE-APACHE or
# https://www.apache.org/licenses/LICENSE-2.0> or the MIT license
# <LICENSE-MIT or https://opensource.org/licenses/MIT>, at your
# option. This file may not be copied, modified, or distributed
# except according to those terms.
ramble:
variants:
package_manager: spack
env_vars:
set:
OMP_NUM_THREADS: '{n_threads}'
variables:
n_ranks: '{processes_per_node}*{n_nodes}'
platform: [platform1, platform2]
processes_per_node: [16, 18]
# Execution Template
batch_submit: '{execute_experiment}'
mpi_command: mpirun {{mpi_name}_args}
# Experiment Expansions
mpi_name: [intel-mpi, openmpi]
intel-mpi_args: -n {n_ranks} -ppn {processes_per_node}
openmpi_args: --np {n_ranks} --map-by ppr:{processes_per_node}:node -x OMP_NUM_THREADS
zips:
platform_config:
- platform
- processes_per_node
applications:
wrfv4:
workloads:
CONUS_12km:
experiments:
scaling_{n_nodes}_{platform}_{mpi_name}:
variables:
n_nodes: [1, 2]
matrix:
- platform_config
- n_nodes
- mpi_name
software:
packages:
gcc9:
pkg_spec: gcc@9.4.0
intel-mpi:
pkg_spec: intel-oneapi-mpi@2021.11.0
compiler: gcc9
openmpi:
pkg_spec: openmpi@3.1.6 +orterunprefix
wrfv4:
pkg_spec: wrf@4.2 build_type=dm+sm compile_type=em_real nesting=basic ~chem
~pnetcdf
compiler: gcc9
environments:
wrfv4-{mpi_name}:
packages:
- '{mpi_name}'
- wrfv4
NOTE The arguments for the various MPI implementations may not run on your system if you require additional arguments. To be able to execute these on your system, make sure you modify these appropriately.
At this point, you have described the 8 experiments you want to run, however they are still not completely defined. Running:
$ ramble workspace setup --dry-run
Should result in the following error:
==> Error: Environment wrfv4 is not defined.
This is because the default software environment every application uses is
named the same as the application (in this case, both would be named
wrfv4). You changed the name of the software environment, but didn’t
connect each experiment to the proper environment.
Controlling Experiment Software Environments
To control the software environment used within an experiment, Ramble allows
you to use the env_name variable definition. Because mpi_name is a list
variable, you might want env_name to be a list that is zipped with
mpi_name to make sure they are iterated over together. However, you may
also utilize variable indirection / expansion to fix this issue. For the
purposes of this tutorial, we will use indirection instead of explicit zips.
The resulting configuration file might look like the following:
# Copyright 2022-2025 The Ramble Authors
# Licensed under the Apache License, Version 2.0 <LICENSE-APACHE or
# https://www.apache.org/licenses/LICENSE-2.0> or the MIT license
# <LICENSE-MIT or https://opensource.org/licenses/MIT>, at your
# option. This file may not be copied, modified, or distributed
# except according to those terms.
ramble:
variants:
package_manager: spack
env_vars:
set:
OMP_NUM_THREADS: '{n_threads}'
variables:
n_ranks: '{processes_per_node}*{n_nodes}'
platform: [platform1, platform2]
processes_per_node: [16, 18]
# Execution Template
batch_submit: '{execute_experiment}'
mpi_command: mpirun {{mpi_name}_args}
# Experiment Expansions
mpi_name: [intel-mpi, openmpi]
intel-mpi_args: -n {n_ranks} -ppn {processes_per_node}
openmpi_args: --np {n_ranks} --map-by ppr:{processes_per_node}:node -x OMP_NUM_THREADS
zips:
platform_config:
- platform
- processes_per_node
applications:
wrfv4:
workloads:
CONUS_12km:
experiments:
scaling_{n_nodes}_{platform}_{mpi_name}:
variables:
n_nodes: [1, 2]
env_name: wrfv4-{mpi_name}
matrix:
- platform_config
- n_nodes
- mpi_name
software:
packages:
gcc9:
pkg_spec: gcc@9.4.0
intel-mpi:
pkg_spec: intel-oneapi-mpi@2021.11.0
compiler: gcc9
openmpi:
pkg_spec: openmpi@3.1.6 +orterunprefix
wrfv4:
pkg_spec: wrf@4.2 build_type=dm+sm compile_type=em_real nesting=basic ~chem
~pnetcdf
compiler: gcc9
environments:
wrfv4-{mpi_name}:
packages:
- '{mpi_name}'
- wrfv4
In this case, we defined env_name to be wrfv4-{mpi_name} which matches
the definition of the software environments.
Dry Run Setup
Before executing the experiments, you can perform:
$ ramble workspace setup --dry-run
And examine the contents of the rendered execute_experiment scripts in some
experiment directories. Looking at these, you should see the correct MPI
arguments within the relevant experiments.
Execute Experiments
Now that you have made the appropriate modifications, set up, execute, and analyze the new experiments using:
$ ramble workspace setup
$ ramble on
$ ramble workspace analyze
This creates a results file in the root of the workspace that contains
extracted figures of merit. If the experiments were successful, this file will
show the following results:
Average Timestep Time: Time (in seconds) on average each timestep takes
Cumulative Timestep Time: Time (in seconds) spent executing all timesteps
Minimum Timestep Time: Minimum time (in seconds) spent on any one timestep
Maximum Timestep Time: Maximum time (in seconds) spent on any one timestep
Number of timesteps: Count of total timesteps performed
Avg. Max Ratio Time: Ratio of Average Timestep Time and Maximum Timestep Time
Clean the Workspace
Once you are finished with the tutorial content, make sure you deactivate your workspace:
$ ramble workspace deactivate
Additionally, you can remove the workspace and all of its content with:
$ ramble workspace remove var_expansion_and_indirection