Ramble Workspace

In Ramble, a workspace is a self contained directory representing a set of experiments that should be executed. This document describes overall aspects of workspaces, and how to use them.

Creating Workspaces

The ramble workspace create command can be used to create workspaces.

Workspaces are created with a standard structure, and some basic configuration files that the user can modify to control the exact behavior of the experiments within the workspace.

Ramble can create two types of workspaces:

anonymous workspaces
named workspaces

Named Workspace

By default, Ramble creates named workspaces, which are workspaces which Ramble will manage. To create a named workspace, use:

$ ramble workspace create <name_of_workspace>

These workspaces are created by default in $ramble/var/ramble/workspaces, but that location can be changed. For example, the following command will change the default location for creating workspaces to ~/.ramble/workspaces:

$ ramble config add 'config:workspace_dirs:~/.ramble/workspaces'

Anonymous Workspace

Anonymous workspaces are workspaces that Ramble will not manage and will live in a specified directory. To create an anonymous workspace, use:

$ ramble workspace create -d <path_to_workspace>

Workspace Links

In order to save disk space, sometimes it can be useful to share internal workspace directories across workspaces when they reuse aspects of each others workflows. Ramble provides a way to create a new workspace where the inputs and software directories are symbolic links to external directories (whether in a workspace or not), to help minimize duplication of files across workspaces.

To use this, when creating a workspace you can use the --inputs-dir and --software-dir argument to provide paths for the source of these symbolic links.

As an example:

$ ramble workspace create -d foo
$ ramble workspace create -d bar --software-dir foo/software --inputs-dir foo/inputs

In the above example, two workspaces are created (foo and bar). The workspace named bar has symbolic links for its software and inputs directories that link to the same named directories in the foo workspace.

Additionally, these directories do not need to be part of any workspace, and could instead be external directories used to have a common storage location for software environments and input files.

Workspace Structure

Ramble creates workspaces using the following structure by default:

$workspace
| - configs/
|   | - ramble.yaml
|   | - execute_experiment.tpl
|   | - auxiliary_software_files/
| - experiments/
| - inputs/
| - logs/
| - software/

This various parts of this directory structure are defined as:

configs/: Contain configuration for the workspace
configs/auxiliary_software_files: Contain files used by the package managers
experiments/: Contain experiments define by the workspace configuration
inputs: Contain the inputs experiments in this workspace require
logs: Contain some logging output from ramble
software: Contain software environments an application’s package manager creates

In the configs directory, the ramble.yaml file is the primary workspace configuration file. The definition for this file is documented in the workspace config documentation

Workspace Template Files

Every file with the .tpl extension is considered a template file in the workspace. Every one of these are rendered into each experiment (with the extension omitted).

Workflows can be constructed by chaining multiple of these template files together. Ramble will define a variable within each experiment that will be the name of the file (without the extenison) and the value will be the absolute path to the rendered template.

As an example, if the file configs/execute_experiment.tpl exists, each experiment will have a variable execute_experiment who’s value is set to something like: {workspace_root}/experiments/{application_name}/{workload_name}/{experiment_name}/execute_experiment

Ramble supports arbitrary format template files. Variables can be referenced within these files using the standard { and } syntax. Nested variable expansion is possible by using repeated curly braces (i.e. {{foo}} will evaluate {foo}, and if this expands to bar then the result will be the expansion of {bar}).

NOTE: Some file formats require escaping curly braces to ensure their format is correct. This happens frequently with JSON and YAML formatted template files. For more information on escaping expansion characters, see Escaped Variables in the workspace config documentation

Activating a Workspace

Several Ramble commands require an activated workspace to function properly. A workspace can be activated in a few different ways:

$ ramble workspace activate <name_or_path>

will activate a workspace until it is deactivated, while

$ ramble -D <path_to_workspace workspace ...

or

$ ramble -w <workspace_name> workspace ...

will activate a workspace for the specific command.

Printing Workspace Information

In order to see an overview of what experiments a workspace contains, one can use:

$ ramble workspace info

To get basic information, and:

$ ramble workspace info -vvv

To get more detailed information, including which variables are defined and where they come from.

Concretizing a Workspace

The software definitions in a workspace need to be concretized before the workspace can be set up. To have Ramble pull software definitions from the application definition files, one can use:

$ ramble workspace concretize

To remove any unused software definitions from the workspace configuration, as well as unused experiment templates, one can use:

$ ramble workspace concretize --simplify

Note: This command will also remove comments within the edited section of the workspace config file.

Workspace Deployments

A deployment is one mechanism of transferring a configured workspace from one location to another. Ramble provides commands to handle creating (and pushing) a deployment from a local workspace to a remote location, or pulling a deployment from a remote location into a local workspace.

A deployment is a directory that contains the necessary artifacts required to recreate the experiments in the workspace on a separate machine. Deployments copy the workspace configuration file, along with creating an object repository, containing the application, modifier, and any package manager files needed for the experiments (that might not be upstreamed). This section describes the commands that can be used to use deployments.

Preparing a Workspace Deployment

Once a workspace is configured, it can be used to create a deployment. To prepare a deployment, one can use:

$ ramble deployment push

This will populate a directory named deployments, where the default is the name of the workspace.

The name of the created deployment can be controlled using:

$ ramble deployment push -d <deployment_name>

Additionally, Ramble can create a tar of the deployment using:

$ ramble deployment push -t

And upload the deployment to a remote URL using:

$ ramble deployment push -u <remote_url>

The arguments -d and -u can refer to variables defined within any configuration scope that is workspace level or lower (i.e. site, user, etc..). This does not include variables defined within the applications configuration section.

For example:

ramble:
  variables:
    test_name: test
    test_url: gs://test-bucket/test-dir
  ...

When paired with

$ ramble deployment push -d '{test_name}' -u '{test_url}'

Would attempt to create a deployment in gs://test-bucket/test-dir/test.

Pulling a Workspace Deployment

To apply a deployment to an existing workspace, the pull sub-command can be used. For example:

ramble workspace pull -p file://path/to/deployment

Will overwrite the contents of the currently active workspace with the contents from the deployment contained in file://path/to/deployment.

It is important to note that this command is destructive, and there is no way to revert a workspace back to its state prior to the pull action.

Setting up a Workspace

To make Ramble fully configure a workspace, one can use:

$ ramble workspace setup

This can be an expensive process, and Ramble will:

Install software
Download input files
Create all experiment directives and content

To perform a light-weight test version of this, one can use:

$ ramble workspace setup --dry-run

Which will create experiments, but it won’t download anything, or execute any package manager commands.

Phase Selection

Some workflows would benefit from more fine-grained control of the phases that are executed by Ramble. A good example is that sometimes one only wants to run the make_experiments phase of a workspace instead of all of the phases.

The ramble workspace setup command has a --phases argument, which can take phase filters which will be used to down-select the phases which should be executed.

As an example:

$ ramble workspace setup --phases make_experiments

Would execute only the make_experiments phase of all experiments that have this phase.

The --phases argument supports wildcard matching, i.e.:

$ ramble workspace setup --phases *_experiments

Would execute all phases that have then _experiments suffix.

Filtering Experiments

Several of the workspace commands support filtering the experiments they should act on. This can be performed using the --where argument for inclusive filtering, the --exclude-where argument for exclusive filtering, or the --filter-tags argument to filter based on experiment tags.. These arguments take a string representing a logical expression, which can use variables the experiment would define. If the logical expression evaluates to true, the experiment will be included or excluded for action (respectively).

As an example:

$ ramble workspace setup --where '"{n_ranks}" < 500'

Will only setup experiments that have less than 500 ranks, and:

$ ramble workspace setup --exclude-where '"{application_name}" == "hostname"'

Will exclude all experiments from the hostname application.

To filter by tags, see the following example:

$ ramble workspace setup --filter-tags my-tag

Will only setup experiments that have the my-tag on them.

The commands that accept these filters are:

$ ramble workspace analyze
$ ramble workspace archive
$ ramble workspace mirror
$ ramble workspace setup
$ ramble on

NOTE: The exclusive filter takes precedence over the inclusive filter.

Software Environments

When setting up a workspace, Ramble will install software defined by the workspace configuration file. Ramble uses external package mangers to perform the installation and generate software environments for each experiment.

As an example, if the applications and workspace configuration file provide a configuration for Spack, Ramble will generate `Spack environments<https://spack.readthedocs.io/en/latest/environments.html>`_.

By default, Ramble uses the following format for creating a spack environment file:

spack:
  concretizer:
    unify: true
  specs:
  - packages
  - for
  - environment
  include:
  - files
  - from
  - auxiliary_software_files

In addition to generating a spack.yaml file for each software environment, Ramble will expand unique copies of each file contained in the configs/auxiliary_software_files directory into every software environment it generates.

These can be used to modify the behavior of Spack environments generated by Ramble.

Workspace Inventory and Hash

Setting up a workspace will create inventory files that can be used to identify which aspects of experiments or workspaces change between different invocations.

Most of an experiment’s inventory is defined regardless of if --dry-run is used or not. The notable exception to this is the software hashes. The file that is hashed depends on if the underlying software environment is fully defined or not.

As an example, if Spack applications are used, --dry-run only creates (and hashes) spack.yaml files, which are not concrete. When --dry-run is not used, Ramble will cause Spack to generate spack.lock files, which will then be hashed, giving better information about if the file changes or not.

The hash for a workspace is written to $workspace/workspace_hash.sha256, and the inventories are written to $workspace/experiments/<application>/<workload>/<experiment>/ramble_inventory.json and $workspace/ramble_inventory.json.

Below is an example of a workspace inventory:

{
  "experiments": [
    {
      "name": "gromacs.water_bare.test",
      "digest": "3f4a333db9f76a06826e4c3775bb4384af8904f474a74a4b1eb61f4d6d02939c",
      "contents": {
        "attributes": [
          {
            "name": "variables",
            "digest": "0fc2c3b848885404201f5435389e9028460ea68affd6c78149b7a8c7e925d004"
          },
          {
            "name": "modifiers",
            "digest": "4f53cda18c2baa0c0354bb5f9a3ecbe5ed12ab4d8e11ba873c2f11161202b945"
          },
          {
            "name": "chained_experiments",
            "digest": "74234e98afe7498fb5daf1f36ac2d78acc339464f950703b8c019892f982b90b"
          },
          {
            "name": "internals",
            "digest": "44136fa355b3678a1146ad16f7e8649e94fb4fc21fe77e8310c060f61caaff8a"
          },
          {
            "name": "env_vars",
            "digest": "035f0c03572706ee6da6f0f74614717b201aabe0f7671fc094478d1a97e5dcc4"
          },
          {
            "name": "template",
            "digest": "fcbcf165908dd18a9e49f7ff27810176db8e9f63b4352213741664245224f8aa"
          }
        ],
        "inputs": [
          {
            "name": "water_bare",
            "digest": "2fb58b2b856117515c75be9141450cca14642be2a1afe53baae3c85d06935caf"
          }
        ],
        "software": [
          {
            "name": "software/gromacs.water_bare",
            "digest": "12f222f06ca05cb6fca37368452b3adedf316bc224ea447e894c87d672333cca"
          }
        ],
        "templates": [
          {
            "name": "execute_experiment",
            "digest": "ea07af55040670edaf23e2bfd0b537c8ed70280a3616021a5203bdf65e08a4c6"
          }
        ]
      }
    }
  ],
  "versions": [
    {
      "name": "ramble",
      "version": "0.3.0 (9947210de68fb42dfd843ed1ab982aba0145e9d3)",
      "digest": "02f5fbbfe0a9fe38b99186619e7fb1d11e6398c637a24bb972fffa66e82bf3fe"
    },
    {
      "name": "spack",
      "version": "0.20.0.dev0 (3c3a4c75776ece43c95df46908dea026ac2a9276)",
      "digest": "21fb90b4cffd46b2257469da346cdf0bcf7070227290262b000bb6c467acfc44"
    }
  ]
}

As mentioned above, the only part that varies when switching --dry-run on and off are the digest values for each software attribute. The hash of the workspace is the hash of its inventory file. All hashes are sha256.

Executing a Workspace

Once a workspace is set up, the experiments inside it can be executed using:

$ ramble on

Custom Executors

When executing the experiments within a workspace, an executor is used. Executors are arbitrary strings which are expanded for each experiment, and then executed directly.

The default executor is '{batch_submit}' as this is the variable that is used to generate the execution command in the all_experiments script.

Custom executors can be defined using the --executor argument to ramble on as in:

$ ramble on --executor 'echo "{experiment_namespace}"'

This executor will echo each experiment’s fully qualified namespace instead of executing the experiment.

The value of the executor will be expanded for each experiment, and executed independently. Custom executors can be used to have more control over what actions to perform with an experiment.

Analyzing a Workspace

After the experiments inside a workspace are complete, they can be analyzed using:

$ ramble workspace analyze

By default this creates text output describing the figures of merit from the workspace’s experiments. The format can be controlled using:

$ ramble workspace analyze --format text json yaml

With supported formats being text, json, or yaml.

Ramble also include an experimental capability to uplodate figures of merit into a back-end data base. Currently BigQuery is the only supported back-end, however more back-ends can be implemented. To upload data, one can use:

$ ramble workspace analyze --upload

This will automatically read the upload configuration from the upload block of Ramble’s config file.

Archiving a Workspace

A workspace can be archived to either:

Share with other people
Keep for future reproduction

In order to archive a workspace, one can use:

$ ramble workspace archive

An archive can be automatically uploaded to a mirror using:

$ ramble workspace archive -t --upload-url <mirror_url>

When Ramble creates an archive, it will collect the following files:

All files in $workspace/configs
Generated files for each software environment. (i.e. Each spack.yaml for spack environments)

For each experiment, the following are collected:

Every rendered template (created from a $workspace/configs/*.tpl file)
Every file a success criteria or figure of merit would be extract from
Every file that matches an archive_pattern from the application.py