Documentation

Welcome to the developer documentation for SigOpt. If you have a question you can’t answer, feel free to contact us!
Welcome to the new SigOpt docs! If you're looking for the classic SigOpt documentation then you can find that here. Otherwise, happy optimizing!

Experiment and Optimization Tutorial

We'll walk through an example of instrumenting a model in order to run a model parameter optimization with SigOpt. In this tutorial, you will learn how to:

  • Install the SigOpt Python client
  • Set your SigOpt API token
  • Set the project
  • Instrument your model
  • Configure your Experiment
  • Create your first Experiment and optimize your model metric withSigOpt
  • Visualize your Experiment results

For notebook instructions and tutorials, check out our GitHub notebook tutorials repo or directly open the Run notebook tutorial in Google Colab by clicking the button below:

Step 1 - Install SigOpt Python Client Back to Top

Install the SigOpt Python package and the libraries required to run the model used for this tutorial. This example has been tested with xgboost 1.4.2 and scikit-learn 0.24.2.

$ pip install sigopt xgboost==1.4.2 scikit-learn==0.24.2

# to confirm that sigopt is installed
$ sigopt --help

Step 2 - Set Your API Token Back to Top

Once you've installed SigOpt, you need to get your API token in order to use the SigOpt API and later explore your Runs and Experiments in the SigOpt app. To find your API token, go directly to the API Token page.

$ sigopt config

Step 3 - Set Project Back to Top

Runs are created within projects. The project allows you to sort and filter through your Runs and Experiments and view useful charts to gain insights into everything you've tried.

$ export SIGOPT_PROJECT=sigopt_run_xgb_classifier

Step 4 - Instrument Your Model Back to Top

Use SigOpt methods to log and track key model information.

from xgboost import XGBClassifier
from sklearn.multiclass import OneVsRestClassifier
from sklearn.model_selection import cross_val_score
from sklearn.preprocessing import OneHotEncoder, StandardScaler
from sklearn import datasets
import numpy
import sigopt
import time

DATASET_NAME = "Sklearn Wine"
FEATURE_ENG_PIPELINE_NAME = "Sklearn Standard Scalar"
PREDICTION_TYPE = "Multiclass"
DATASET_SRC = "sklearn.datasets"


def get_data():
  """
    Load sklearn wine dataset, and scale features to be zero mean, unit variance.
    One hot encode labels (3 classes), to be used by sklearn OneVsRestClassifier.
    """
  data = datasets.load_wine()
  X = data["data"]
  y = data["target"]
  scaler = StandardScaler()
  X_scaled = scaler.fit_transform(X)
  enc = OneHotEncoder()
  Y = enc.fit_transform(y[:, numpy.newaxis]).toarray()
  return (X_scaled, Y)


MODEL_NAME = "OneVsRestClassifier(XGBoostClassifier)"


def evaluate_xgboost_model(X, y, number_of_cross_val_folds=5, max_depth=6, learning_rate=0.3, min_split_loss=0):
  t0 = time.time()
  classifier = OneVsRestClassifier(
    XGBClassifier(
      objective="binary:logistic",
      max_depth=max_depth,
      learning_rate=learning_rate,
      min_split_loss=min_split_loss,
      use_label_encoder=False,
      verbosity=0,
    )
  )
  cv_accuracies = cross_val_score(classifier, X, y, cv=number_of_cross_val_folds)
  tf = time.time()
  training_and_validation_time = tf - t0
  return numpy.mean(cv_accuracies), training_and_validation_time


def run_and_track_in_sigopt():

  (features, labels) = get_data()

  sigopt.log_dataset(DATASET_NAME)
  sigopt.log_metadata(key="Dataset Source", value=DATASET_SRC)
  sigopt.log_metadata(key="Feature Eng Pipeline Name", value=FEATURE_ENG_PIPELINE_NAME)
  sigopt.log_metadata(
    key="Dataset Rows", value=features.shape[0]
  )  # assumes features X are like a numpy array with shape
  sigopt.log_metadata(key="Dataset Columns", value=features.shape[1])
  sigopt.log_metadata(key="Execution Environment", value="Colab Notebook")
  sigopt.log_model(MODEL_NAME)
  sigopt.params.setdefaults(
    max_depth=numpy.random.randint(low=3, high=15),
    learning_rate=numpy.random.random(size=1)[0],
    min_split_loss=numpy.random.random(size=1)[0] * 10,
  )

  args = dict(
    X=features,
    y=labels,
    max_depth=sigopt.params.max_depth,
    learning_rate=sigopt.params.learning_rate,
    min_split_loss=sigopt.params.min_split_loss,
  )

  mean_accuracy, training_and_validation_time = evaluate_xgboost_model(**args)

  sigopt.log_metric(name="accuracy", value=mean_accuracy)
  sigopt.log_metric(name="training and validation time (s)", value=training_and_validation_time)


run_and_track_in_sigopt()

5 - Define Your Experiment Configuration Back to Top

The experiment definition will include the name, project, parameters, metrics and other options that you would like to run your Experiment with.

The names of the parameters are expected to match the names of the properties/attributes on sigopt.params. Similarly the metrics should match the names of the metrics passed to sigopt.log_metric calls.

name: XGBoost Optimization
metrics:
- name: accuracy
  strategy: optimize
  objective: maximize
parameters:
- name: max_depth
  bounds:
    min: 3
    max: 12
  type: int
- name: learning_rate
  bounds:
    min: 0
    max: 10
  type: double
- name: min_split_loss
  bounds:
    min: 0
    max: 10
  type: double
budget: 20

Step 7 - Run the Code Back to Top

Run your model Run code, with SigOpt methods integrated.

$ sigopt optimize -e experiment.yml python run_and_track_in_sigopt.py

Once you've run the code above, SigOpt will conveniently output links to the Runs and Experiments pages on our web application.

Step 7 - Visualize Your Experiment Results Back to Top

Click on an individual run link to view your completed Run in our web application. Here's a view of a Run page:

From the Run page, click on the Experiment Name to open the corresponding Experiment page. Or click one of the printed Experiment page links from the program output.

Conclusion Back to Top

In this tutorial, we covered the recommended way to instrument and optimize your model, and visualize your results with SigOpt. You learned that experiments are collections of runs that search through a defined parameter space to satisfy the experiment search criteria.

Check out our tutorial, Runs Tutorial, for a closer look at a single Run, and see how to track one-off runs without creating an experiment.