Run in Parallel
Running an experiment on several machines at once is easy and natural with the SigOpt API.
Before you start running experiments in parallel, make sure you know how to Create an Experiment, and that you feel comfortable with the basics of the Optimization Loop.
Create an Experiment
Create your experiment on a master machine. You only need to perform this step once. Make a note of your experiment's id because you'll need it in the next step.
Be sure to set the parallel_bandwidth
field as the number of parallel workers you are using in your experiment. This field ensures SigOpt can factor your level of parallelism into account and provide better suggestions.
If you're logged in, you can also track your experiment's progress on your Experiment Dashboard.
Initialize the Workers
Initialize each of your workers them with the EXPERIMENT_ID
from the experiment that you just created.
All workers, whether individual threads or machines, will receive the same experiment id.
Run the Optimization Loop in Parallel
Now, start the optimization loop on each worker machine. Workers will individually communicate with SigOpt's API, creating Suggestions, evaluating your metric, and then creating Observations.
Why This Works
A large benefit of SigOpt's parallelization is that each worker communicates individually with the SigOpt API, so you do not need to worry about task managment.
SigOpt acts as a distributed scheduler for your Suggestions, ensuring that each worker machine receives the best possible Suggestion at the moment it creates a new Suggestion. SigOpt tracks which Suggestions are currently open
, so machines independently creating Suggestions will not receive duplicates.
Using Metadata
Metadata is user-provided key/value pairs that SigOpt stores on your behalf under the metadata field. Metadata on Observations can be inspected using both the API and the web interface, making it ideally suited for tracking information about your distributed system.
As a starting point, we recommend tracking a unique tag for each machine in the metadata of Observations. As your distributed job is running, you can view which machines have most recently reported Observations on the experiment's web dashboard.
Show Me the Code
These code snippets provide an example combine suggested master/worker division of labor, as well as incorporating metadata to track which machines have reported Observations.
Master: Create Experiment, Spin up Workers
from sigopt import Connection
def master(api_token, num_workers=1):
# Create the SigOpt connection
conn = Connection(client_token=api_token)
# Create the experiment on master
experiment = conn.experiments().create(
name="Classifier Accuracy",
project="sigopt-examples",
parameters=[
{
'bounds': {
'max': 1.0,
'min': 0.001
},
'name': 'gamma',
'type': 'double'
}
],
metrics=[dict(name='Accuracy', objective='maximize')],
observation_budget=20,
parallel_bandwidth=num_workers,
)
for _ in range(num_workers):
# Launch a worker and run the run_worker
# function (below) on the worker machine
# You implement this function
spin_up_worker(
api_token=api_token,
experiment_id=experiment.id,
)
Worker: Run Optimization Loop with Metadata
import socket
from sigopt import Connection
# Each worker runs the same optimization loop
# for the experiment created on master
def run_worker(api_token, experiment_id):
# Create the SigOpt connection
conn = Connection(client_token=args.api_token)
# Keep track of the hostname for logging purposes
hostname = socket.gethostname()
experiment = conn.experiments(experiment_id).fetch()
while experiment.progress.observation_count < experiment.observation_budget:
# Receive a Suggestion
suggestion = conn.experiments(experiment.id).suggestions().create()
# Evaluate Your Metric
# You implement this function
value = evaluate_metric(suggestion.assignments)
# Report an Observation
# Include the hostname so that you can track
# progress on the web interface
conn.experiments(experiment.id).observations().create(
suggestion=suggestion.id,
value=value,
metadata=dict(hostname=hostname),
)
# Update the experiment object
experiment = conn.experiments(experiment.id).fetch()
Recovering From Machine Failure
Recovering Open Suggestions
In the event that one or more of your machines fail, you may have a Suggestion or two in an open
state
. You can list open Suggestions and continue to work on them:
suggestions = conn.experiments(experiment_id).suggestions().fetch(state="open")
for suggestion in suggestions.iterate_pages():
value = evaluate_metric(suggestion.assignments) # implement this
conn.experiments(experiment_id).observations().create(
suggestion=suggestion.id,
value=value,
)
Or you can simply delete open Suggestions:
conn.experiments(experiment_id).suggestions().delete(state="open")