Archived Documentation

Welcome to the developer documentation for SigOpt. If you have a question you can’t answer, feel free to contact us!
You are currently viewing archived SigOpt documentation. The newest documentation can be found here.

Training Runs Tutorial

We'll walk through an example of instrumenting and executing training run code with SigOpt. In this tutorial, you will learn how to:

  • Install the SigOpt Python client
  • Set your SigOpt API token
  • Set the project
  • Load the extension (Notebook environment only)
  • Instrument your model
  • Create your first run and log your model metric and parameters toSigOpt
  • View Your Run in the SigOpt web application

Step 1 - Install SigOpt Python Client Back to Top

Install the SigOpt Python package and the libraries required to run the model used for this tutorial. This example has been tested with xgboost 1.4.2 and scikit-learn 0.24.2.

! pip install sigopt
! pip install xgboost==1.4.2
! pip install scikit-learn==0.24.2

Step 2 - Set Your API Token Back to Top

Once you've installed SigOpt, you need to get your API token in order to use the SigOpt API and later explore your runs in the SigOpt app. To find your API token, go directly to the API Token page.

import os
os.environ['SIGOPT_API_TOKEN'] = MY_API_TOKEN

Step 3 - Set Project Back to Top

Training runs are created within projects. The project allows you to sort and filter through your training runs and view useful charts to gain insights into everything you've tried.

os.environ['SIGOPT_PROJECT'] = "SigOpt_Run_XGB_Classifier"

Step 4 - Load the SigOpt Extension Back to Top

If you're not in a notebook environment, skip to the next step.

If you're in a notebook environment, load the SigOpt extension to enable magic commands.

import sigopt
%load_ext sigopt

Step 5 - Instrument Your Model Back to Top

Use SigOpt methods to log and track key model information.

from xgboost import XGBClassifier
from sklearn.multiclass import OneVsRestClassifier
from sklearn.model_selection import cross_val_score
from sklearn.preprocessing import OneHotEncoder, StandardScaler
from sklearn import datasets
import numpy
import sigopt
import time

DATASET_NAME = "Sklearn Wine"
FEATURE_ENG_PIPELINE_NAME = "Sklearn Standard Scalar"
PREDICTION_TYPE = "Multiclass"
DATASET_SRC = "sklearn.datasets"


def get_data():
    """
    Load sklearn wine dataset, and scale features to be zero mean, unit variance.
    One hot encode labels (3 classes), to be used by sklearn OneVsRestClassifier.
    """
    data = datasets.load_wine()
    X = data["data"]
    y = data["target"]
    scaler = StandardScaler()
    X_scaled = scaler.fit_transform(X)
    enc = OneHotEncoder()
    Y = enc.fit_transform(y[:, numpy.newaxis]).toarray()
    return (X_scaled, Y)


MODEL_NAME = "OneVsRestClassifier(XGBoostClassifier)"


def evaluate_xgboost_model(
    X, y, number_of_cross_val_folds=5, max_depth=6, learning_rate=0.3, min_split_loss=0
):
    t0 = time.time()
    classifier = OneVsRestClassifier(
        XGBClassifier(
            objective="binary:logistic",
            max_depth=max_depth,
            learning_rate=learning_rate,
            min_split_loss=min_split_loss,
            use_label_encoder=False,
            verbosity=0,
        )
    )
    cv_accuracies = cross_val_score(classifier, X, y, cv=number_of_cross_val_folds)
    tf = time.time()
    training_and_validation_time = tf - t0
    return numpy.mean(cv_accuracies), training_and_validation_time


def run_and_track_in_sigopt():

    (features, labels) = get_data()

    sigopt.log_dataset(DATASET_NAME)
    sigopt.log_metadata(key="Dataset Source", value=DATASET_SRC)
    sigopt.log_metadata(
        key="Feature Eng Pipeline Name", value=FEATURE_ENG_PIPELINE_NAME
    )
    sigopt.log_metadata(
        key="Dataset Rows", value=features.shape[0]
    )  # assumes features X are like a numpy array with shape
    sigopt.log_metadata(key="Dataset Columns", value=features.shape[1])
    sigopt.log_metadata(key="Execution Environment", value="Colab Notebook")
    sigopt.log_model(MODEL_NAME)

    args = dict(
        X=features,
        y=labels,
        max_depth=sigopt.get_parameter(
            "max_depth", default=numpy.random.randint(low=3, high=15, dtype=int)
        ),
        learning_rate=sigopt.get_parameter(
            "learning_rate", default=numpy.random.random(size=1)[0]
        ),
        min_split_loss=sigopt.get_parameter(
            "min_split_loss", default=numpy.random.random(size=1)[0] * 10
        ),
    )

    mean_accuracy, training_and_validation_time = evaluate_xgboost_model(**args)

    sigopt.log_metric(name="accuracy", value=mean_accuracy)
    sigopt.log_metric(
        name="training and validation time (s)", value=training_and_validation_time
    )


Step 6 - Run the Code Back to Top

Run your model training run code, with SigOpt methods integrated.

%%run My_First_Run
run_and_track_in_sigopt()

Step 7 - View Your Run Back to Top

Click on the run link to view your completed run in our web application. Here's a view of a training run page:

Conclusion Back to Top

In this tutorial, we've covered the recommended way to instrument your training run with SigOpt. After your model has been instrumented, it is easy to take advantage of SigOpt's optimization features. Optimization helps find the parameters for your model that give you the best metric (e.g. maximizing an accuracy metric).

Check out the tutorial, Experiment and Optimization Tutorial, to see how you can create an experiment.