Documentation

Welcome to the developer documentation for SigOpt. If you have a question you can’t answer, feel free to contact us!
Welcome to the new SigOpt docs! If you're looking for the classic SigOpt documentation then you can find that here. Otherwise, happy optimizing!

Parallelism

Running an Experiment on several machines at once is easy and natural with the SigOpt API. Before you start running experiments in parallel, make sure you know how to set up Runs, and how to create an experiment.

Create an Experiment

Create your experiment on a controller machine; you only need to perform this step once. Be sure to make a note of your experiment's id because you'll need it in subsequent steps.

Then set the parallel_bandwidth field as the number of parallel workers you are using in your experiment. This field ensures SigOpt can factor your level of parallelism into account and provide better run configurations.

Initialize the Workers

Initialize each of your workers with the EXPERIMENT_ID from the experiment that you just created. All workers, whether individual threads or machines, will receive the same experiment ID.

Run SigOpt Optimization Experiments in Parallel

Now, start the optimization loop on each worker machine. Workers will individually communicate with SigOpt's API, creating Runs, and evaluating your metric.

Why This Works

A large benefit of SigOpt's parallelization is that each worker communicates asynchronously with the SigOpt API, so you do not need to worry about task management.

SigOpt acts as a distributed scheduler for your SigOpt Runs, ensuring that each worker machine receives parameter assignments at the moment it asks for a new parameter configuration. SigOpt tracks which SigOpt Runs are currently active, so machines independently running the jobs will not receive duplicates.

Example Setups

In these examples each machine needs to be configured with API tokens.

Scenario 1:

The user has a code repository, a local computer (ex. Macbook) and a group of remote machines with copies of the code repository.

On your local computer, create an experiment.yml file with the following contents:

name: sigopt parallel example
parameters:
 - name: hidden_layer_size
   type: int
   bounds:
     min: 32
     max: 512
 - name: activation_function
   type: categorical
   categorical_values: ['relu', 'tanh']
metrics:
 - name: holdout_accuracy
   strategy: optimize
   objective: maximize
   threshold: 0.1
parallel_bandwidth: 2
budget: 30

Create an Experiment using the CLI command:

$ sigopt create experiment

Remotely connect to each of the remote machines (ex. via ssh) and start parallel workers with the CLI command:

$ sigopt start-worker 1234 python run-model.py

Scenario 2:

The user has a code repository, a coordination host (ex. local or remote machine) and a group of remote machines with copies of the code repository.

On the coordination host, create an Experiment:

import sigopt

experiment = sigopt.create_experiment(
  name="sigopt parallel example",
  parameters=[
    dict(name="hidden_layer_size", type="int", bounds=dict(min=32, max=512)),
    dict(name="activation_fn", type="categorical", categorical_values=["relu", "tanh"]),
  ],
  metrics=[
    dict(name="holdout_accuracy", strategy="optimize", objective="maximize"),
    dict(name="inference_time", strategy="constraint", objective="minimize", threshold=0.1),
  ],
  parallel_bandwidth=1,
  budget=30,
)

Start parallel workers:

for machine_number in range(experiment.parallel_bandwidth):
  run_command_on_machine(
    machine_number,
    f"sigopt start-worker {experiment.id} python run-model.py",
  )