Archived Documentation

Welcome to the developer documentation for SigOpt. If you have a question you can’t answer, feel free to contact us!
You are currently viewing archived SigOpt documentation. The newest documentation can be found here.

SigOpt and scikit-learn

SigOpt's Python API Client works naturally with any machine learning library in Python, but to make things even easier we offer an additional SigOpt + scikit-learn package that can train and tune a model in just one line of code.

The SigOpt + scikit-learn package supports:

SigOpt's sklearn package is available via pip, with source code on GitHub:

pip install sigopt_sklearn

Find your SigOpt API token on the API tokens page.


The simplest use case for SigOpt in conjunction with scikit-learn is optimizing estimator hyperparameters using cross validation. A short example that tunes the parameters of an SVM on a small dataset is provided below:

from sklearn import svm, datasets
from import SigOptSearchCV

# find your SigOpt client token here : 

iris = datasets.load_iris()

# define parameter domains
svc_parameters = {'kernel': ['linear', 'rbf'], 'C': [0.5, 100]}

# define sklearn estimator
svr = svm.SVC()

# define SigOptCV search strategy
clf = SigOptSearchCV(svr, svc_parameters, cv=5,
                     client_token=client_token, n_jobs=5, n_iter=20)

# perform CV search for best parameters and fits estimator
# on all data using best found configuration,

# clf.predict() now uses best found estimator
# clf.best_score_ contains CV score for best found estimator
# clf.best_params_ contains best found param configuration


SigOptSearchCV also works with XGBoost's XGBClassifier wrapper. A hyperparameter search over XGBClassifier models can be done using the same interface:

import xgboost as xgb
from xgboost.sklearn import XGBClassifier
from sklearn import datasets
from import SigOptSearchCV

# find your SigOpt client token here :
client_token = SIGOPT_API_TOKEN
iris = datasets.load_iris()

xgb_params = {
 'learning_rate': [0.01, 0.5],
 'n_estimators': [10, 50],
 'max_depth': [3, 10],
 'min_child_weight': [6, 12],
 'gamma': [0, 0.5],
 'subsample': [0.6, 1.0],
 'colsample_bytree': [0.6, 1.0]

xgbc = XGBClassifier()

clf = SigOptSearchCV(xgbc, xgb_params, cv=5,
                     client_token=client_token, n_jobs=5, n_iter=70, verbose=1),


This class concurrently trains and tunes several classification models within sklearn to facilitate model selection efforts when investigating new datasets. In the following tutorial video, SigOpt Research Engineer Ian Dewancker walks through how to use the ensemble classifier on an example activity recognition dataset using Amazon Web Services (AWS):

To run the SigOpt ensemble classifier on your own, first run the following on the command line to download the example dataset:

# Human Activity Recognition Using Smartphone
unzip UCI\ HAR\
cd UCI\ HAR\ Dataset

Next, run the following code in Python:

import numpy
import pandas as pd
from sigopt_sklearn.ensemble import SigOptEnsembleClassifier

def load_datafile(filename):
  X = []
  with open(filename, "r") as f:
    for l in f:
      X.append(numpy.array([float(v) for v in l.split()]))
  X = numpy.vstack(X)
  return X
X_train = load_datafile("train/X_train.txt")
y_train = load_datafile("train/y_train.txt").ravel()
X_test = load_datafile("test/X_test.txt")
y_test = load_datafile("test/y_test.txt").ravel()

# fit and tune several classification models concurrently
# find your SigOpt client token here :
sigopt_clf = SigOptEnsembleClassifier()
sigopt_clf.parallel_fit(X_train, y_train, est_timeout=(40 * 60),

# compare model performance on hold out set
ensemble_train_scores = [
  est.score(X_train, y_train)
  for est
  in sigopt_clf.estimator_ensemble
ensemble_test_scores = [
  est.score(X_test, y_test)
  for est
  in sigopt_clf.estimator_ensemble
data = sorted(
    [est.__class__.__name__ for est in sigopt_clf.estimator_ensemble],
  key=lambda x: (x[2], x[1])
pd.DataFrame(data, columns=['Classifier ALGO.', 'Train ACC.', 'Test ACC.'])

CV Fold Timeouts

SigOptSearchCV performs evaluations on cv folds in parallel using joblib. Timeouts are now supported in the master branch of joblib and SigOpt can use this timeout information to learn to avoid hyperparameter configurations that are too slow.

You'll need to install joblib from source for this example to work:

pip uninstall joblib
git clone
cd joblib; python install

Next, run the following code in Python:

from sklearn import svm, datasets
from import SigOptSearchCV

# find your SigOpt client token here :
client_token = SIGOPT_API_TOKEN
dataset = datasets.fetch_20newsgroups_vectorized()
X =
y =

# define parameter domains
svc_parameters = {'kernel': ['linear', 'rbf'], 'C': [0.5, 100],
                  'max_iter': [10, 200], 'tol': [1e-2, 1e-6]}
svr = svm.SVC()

# SVM fitting can be quite slow, so we set timeout = 180 seconds
# for each fit.  SigOpt will then avoid configurations that are too slow
clf = SigOptSearchCV(svr, svc_parameters, cv=5, timeout=180,
                     client_token=client_token, n_jobs=5, n_iter=40), y)