Archived Documentation

Welcome to the developer documentation for SigOpt. If you have a question you can’t answer, feel free to contact us!
You are currently viewing archived SigOpt documentation. The newest documentation can be found here.

All-Constraint Experiments

Early in the experimentation process, users often want to understand the relationship between parameters and metrics. In particular, users may want to study which parameter regions consistently yield high-performing models. By conducting an experiment with all Constraint Metrics, SigOpt users can efficiently search for many high-performing models as defined through constraints on each of the metrics under analysis. All-Constraint experiments focus on different parameter configurations, increasing the chances of finding models that meet business goals.

Diversity Accelerates Model Development

Let us go through an example. Suppose we want to classify chess end-games for White King and Rook against Black King. We use the UCI dataset known as Chess created by Michael Bain and Arthur van Hoff at the Turing Institute, Glasgow, UK. We are interested in performing hyperparameter tuning of XGBoost models. We will use the following parameter space in our experiments:

list_of_parameters = [
  dict(name="num_boost_round", bounds=dict(min=1, max=200), type="int"),
  dict(name="eta", bounds=dict(min=-5, max=0), type="double"),
  dict(name="gamma", bounds=dict(min=0, max=5), type="double"),
  dict(name="max_depth", bounds=dict(min=1, max=32), type="int"),
  dict(name="min_child_weight", bounds=dict(min=1, max=5), type="double"),
]

Defining our metrics

Now, let us say that we want to search for models with a high F1 score and low model complexity. We then define two metrics: f1_score and the actual average_depth of the model. We are interested in models that achieve higher than 0.8 of f1_score and have average_depth lower than 10. It is also a good idea to store other metrics to inspect the models further. For example, we can keep track of each model's inference_time on the test set.

If we are confident that f1_score and average_depth capture everything about our problem, we can run a Multimetric Experiment to search for the Pareto Efficient Frontier points. The minimum-performance thresholds can (optionally) be incorporated as Metric Thresholds.

xgb_multimetric_threshold = [
  {"name": "f1_score", "strategy": "optimize",  "objective": "maximize", "threshold": 0.8},
  {"name": "average_depth", "strategy": "optimize", "objective": "minimize", "threshold": 10},
]

Our new All-Constraint experiment looks very similar, but replaces the optimize strategy with the constraint strategy.

xgb_all_constraints = [
  {"name": "f1_score", "strategy": "constraint", "objective": "maximize", "threshold": 0.8},
  {"name": "average_depth", "strategy": "constraint", "objective": "minimize", "threshold": 10},
]

SigOpt allows our users to store additional metrics for consideration during analysis of the experiment. These should be defined during model creation as well.

xgb_stored_metrics = [
  {"name": "inference_time", "strategy": "store"},
  {"name": "precision", "strategy": "store"},
  {"name": "recall", "strategy": "store"}
]

Running our experiment

With the above lists of parameters and metrics, we can easily create SigOpt experiments:

experiment_meta = dict(
  name='chess xgboost_experiment',
  parameters=list_of_parameters,
  metrics=active_metrics + xgb_stored_metrics,  #active_metrics is xgb_multimetric_threshold or xgb_all_constraints
  observation_budget=150,
  parallel_bandwidth=1,
)

experiment = conn.experiments().create(**experiment_meta)
print(f"Created experiment: https://app.sigopt.com/experiment/{experiment.id}")

After running these experiments, we observed the below results. In blue, we show the metric values resulting from SigOpt suggestions. In orange, we display the final results for each experiment. For the Multimetric Experiment, the best_assignments are the points on the Pareto Efficient Frontier. For an All-Constraint experiment, all points that meet the user's constraints are returned in the best_assignments endpoint. Notice that the Multimetric experiment finds many dominant points, and an All-Constraint experiment finds more configurations that satisfy the user's constraints.

Dealing with unforeseen requirements

An All-Constraint experiment finds more points that satisfy the user's constraints, at the cost of a less well-defined Pareto frontier. Why is this valuable? Suppose that, after this experiment, we talk to other stakeholders of our project; now they explicitly state that low inference time is critical for our application. Instead of rerunning this experiment (which could take a while), we decide to revisit our current results. Below we display the results after filtering the points by inference time (less than 0.1s).

Since our Multimetric experiment had a limited goal (highest f1_score and lowest average_depth), all models failed to achieve low inference time. All-Constraint experiments recognize that other goals may exist, and they search for a diverse range of outcomes to service future demands. Specifically, note that:

  • The All-Constraint experiment found nine viable models, whereas the Multimetric experiment did not find models with low inference time.
  • None of the points from our earlier Pareto Efficient frontier met this prediction time requirement.

Analyzing parameters

The value of an All-Constraint experiment is most striking when we use our Parallel Coordinate plot. See the comparison to the Multimetric experiment below when we filter the models by inference_time. Notice that only models with low num_boost_round remain active.

There are some useful insights to be gained here about the parameters and the resulting metric values.

  • High num_boost_round yields high F1 score -- this is not surprising, but our Multimetric experiment learns this and then spends its energy exploiting that information to make a better Pareto frontier.
  • In contrast, All-Constraint finds models with low num_boost_round. That is critical for producing models with good performance and faster inference time.
  • For the full range of satisfactory models, all models require eta (learning rate) values between [-1.5, 0].
  • Most viable models have gamma values less than 3.
  • All-Constraint finds more models with lower max_depth than Multimetric, especially between values 5 and 15.
  • The entire range of min_child_weight values seems to produce acceptable results -- the metrics seem unaffected by this parameter alone. However, for satisfactory models, it looks like max_depth and min_child_weight are inverse correlated.

Conceptualizing the Value of an All-Constraint Experiment

What is happening here? Let us consider a graphic conceptual (and oversimplified) representation of this situation. If we think about the base problem (without this 0.1s inference limit) we see the world only as the above figure on the left. A Multimetric Experiment is hyper-focused on just maximizing our stated goals, and as such, very little energy is spent on anything not mathematically optimal (left bump). On the contrary, an All-Constraint experiment searches for all high-performing configurations yielding values on both regions of the parameter space.

The graph on the right shows the results when a new requirement, such as inference time (in green) is considered. Both high-value regions are explored, but neither fully-exploited as a standard SigOpt experiment would do. By learning more about the full range of satisfaction parameters, we are more likely to satisfy unforeseen business requirements.

Creating an All-Constraint Experiment

Below we create a new SigOpt All-Constraint experiment using the above XGBoost hyperparameter tuning example. The goal of such an experiment is to explore high-performing regions of the parameter space effectively. Recall that the main distinction is a list of Constraint Metrics with no optimized metrics. SigOpt engine will automatically focus on diverse parameter configurations rather than focusing on the optimal achievable values for each metric. As discussed earlier, for an exploration strategy that focuses on Pareto Efficient Frontier of two metrics, we recommend users run a Multimetric Experiment instead.

from sigopt import Connection

conn = Connection(client_token=SIGOPT_API_TOKEN)
experiment = conn.experiments().create(
  name="All-constraint experiment",
  project="sigopt-examples",
  parameters=[
    dict(
      name="num_boost_round",
      bounds=dict(
        min=1,
        max=200,
        ),
      type="int"
      ),
    dict(
      name="eta",
      bounds=dict(
        min=-5,
        max=0,
        ),
      type="double"
      ),
    dict(
      name="gamma",
      bounds=dict(
        min=0,
        max=5,
        ),
      type="double"
      ),      
    ],
  metrics=[
    dict(
      name="f1_score",
      objective="maximize",
      strategy="constraint",      
      threshold=0.8,
      ),
    dict(
      name="average_depth",
      objective="minimize",
      strategy="constraint",      
      threshold=10,
      )
    ],
  observation_budget=65,
  parallel_bandwidth=2,
  )

print("Created experiment: /experiment/" + experiment.id)

To report the metric values, we follow the same convention for reporting Observations with multiple values.

# Report the observed values for a SigOpt Suggestion

observation = conn.experiments(experiment.id).observations().create(
  suggestion=SUGGESTION_ID,
  values=[
    dict(
      name="f1_score",
      value=0.803,
      ),
    dict(
      name="average_depth",
      value=2.78,
      )
    ]
  )

Selecting and Updating the Metric Thresholds

In many applications, it is straightforward to specify the minimum performance criteria for each metric. For example, inference time and model size are limited by the production setting's desired response time and memory constraints. A simple lower bound on accuracy is the fraction of examples of the majority class. A constant predictor that always reports the average training value gives the minimum level of performance expected for an intelligent system for regression problems. To conduct an effective exploration, we recommend users set conservative threshold values. SigOpt understands that configurations that do not meet the constraints are undesirable; therefore, setting a high threshold at the beginning of your experimentation can prematurely discourage SigOpt from sampling promising regions of the parameter space. As the experiment progresses, the metric thresholds can be updated on the properties page of an experiment or directly through our API. An example of this is given below.

experiment = conn.experiments(experiment.id).update(
  metrics=[
    dict(
      name="f1_score",
      threshold=0.85,
      ),
    dict(
      name="average_depth",
      threshold=8.0,
      )
    ]
  )

Limitations

  • observation_budget must be set when a All-Constraint experiment is created.
  • The maximum number of constraint metrics is 4.
  • The maximum number of dimensions for All-Constraint is 50.
  • Experiments with Parameter Constraints are not permitted.
  • Experiments with Parameter Conditions are not permitted.
  • Multitask experiments are not permitted.