Documentation

Welcome to the developer documentation for SigOpt. If you have a question you can’t answer, feel free to contact us!
This feature is currently in alpha. Please contact us if you would like more information.

AWS Cluster Create and Manage

Before you begin, you'll need a Kubernetes cluster. Once it's created, you can reuse that cluster as many times as needed. Furthermore, if you're sharing a cluster that's already been created by another user, you can skip this section and go to Sharing your K8s Cluster.

SigOpt Orchestrate currently only supports launching a Kubernetes (K8s) cluster on AWS. If you have a Kubernetes cluster you'd like to use for your orchestration, please see the Bring your own K8s Cluster section for instructions on how to do so.

AWS Configuration Back to Top

If your local development environment is not already configured to use AWS, the easiest way to get started is to configure your AWS Access Key and AWS Secret Key via the aws command line interface:

aws configure

See the AWS docs for more about configuring your AWS credentials.

Enable Full Access Back to Top

SigOpt Orchestrate requires that AWS accounts creating clusters have access to the following services:

  • AutoScaling full access
  • CloudFormation full access
  • IAM full access
  • EC2 full access
  • ECR full access
  • EKS full access
  • SSM full access

If you are an account admin, you may already have the correct permissions. Otherwise, for your convenience we have created a JSON policy document that you can use to create an IAM Policy for yourself or other users:

{
   "Version":"2012-10-17",
   "Statement":[
      {
         "Sid":"VisualEditor0",
         "Effect":"Allow",
         "Action":[
            "iam:*",
            "ecr:*",
            "ec2:*",
            "cloudformation:*",
            "autoscaling:*",
            "eks:*",
            "ssm:*"
         ],
         "Resource":"*"
      }
   ]
}

Next, you can use the awscli (installed with SigOpt Orchestrate) to create this policy for your team:

aws iam create-policy \
  --policy-name sigopt-orchestrate-full-access \
  --policy-document https://raw.githubusercontent.com/sigopt/sigopt-examples/master/orchestrate/aws_iam/orchestrate_full_access_policy_document.json

Check out our reference on AWS Permissions to learn more about full access, run only users, and cluster sharing.

SigOpt Orchestrate clusters with GPU machines use an AWS-managed EKS-optimized AMI with GPU support. To use this AMI, AWS requires that you accept an end user license agreement (EULA). This can be done here by subscribing to the AMI.

Cluster Configuration File Back to Top

The cluster configuration file is commonly referred to as cluster.yml, but you can name yours anything you like. The file is used when we create a SigOpt Orchestrate cluster, with orchestrate cluster create -f cluster.yml.

The available fields are:

FieldRequired?Description
cpu, or gpuYYou must provide at least one of either cpu or gpu. Define the CPU compute that your cluster will need in terms of: instance_type, max_nodes, and min_nodes. It's ok if max_nodes and min_nodes are the same value.
cluster_nameYYou must provide a name for your cluster. You will share this with anyone else who wants to connect to your cluster.
awsNOverride environment-provided values for aws_access_key_id or aws_secret_access_key.
kubernetes_versionNThe version of Kubernetes to use for your cluster. Currently supports Kubernetes 1.16, 1.17, 1.18, and 1.19. Defaults to the latest stable version supported by SigOpt Orchestrate, which is currently 1.18.
providerNCurrently, AWS is our only supported provider for creating clusters. You can, however, use a custom provider to connect to your own Kubernetes cluster with the orchestrate cluster connect. See section on Bringing your own K8s Cluster.

Example

The example YAML file below defines a CPU cluster named tiny-cluster with two t2.small AWS instances.

# cluster.yml

# AWS is currently our only supported provider for cluster create
# You can connect to custom clusters via orchestrate connect
provider: aws

# We have provided a name that is short and descriptive
cluster_name: tiny-cluster

# Your cluster config can have CPU nodes, GPU nodes, or both.
# The configuration of your nodes is defined in the sections below.

# (Optional) Define CPU compute here
cpu:
  # AWS instance type
  instance_type: t2.small
  # max_nodes and min_nodes can be the same value
  max_nodes: 2
  min_nodes: 2

# # (Optional) Define GPU compute here
# gpu:
#   # AWS GPU-enabled instance type
#   # This can be any p* instance type
#   instance_type: p2.xlarge
#   max_nodes: 2
#   min_nodes: 2

kubernetes_version: '1.18'

Notes on Choosing Instance Types Back to Top

It is tempting to choose an instance type that exactly matches the needs of a single training run. Because SigOpt Orchestrate is focused on experimentation, you will likely be running more than one training run at a time, maybe even hundreds at a time. For this reason it is a good idea to make your cluster as efficient as possible.

Each node will reserve some amount of resources for the system and for Kubernetes system pods. Because of this, your runs will not be able to use 100% of the resources on each node. If you chose larger instances for your cluster then your training runs will be able to use closer to 100% of the resources on each node.

Another reason to choose larger instance types is to support varying workloads. If you choose to switch to working on a different project, invite another user to share your cluster or even just change the resources used by each training run, then you will benefit by choosing instance types that have more resources.

Create Cluster Back to Top

To create the cluster on AWS, run:

orchestrate cluster create -f cluster.yml

Cluster creation can take between 15-30 mins. If you notice an error, please try re-running the same command. SigOpt Orchestrate will reuse the same EKS cluster so the second run will be much faster.

Check Cluster Status Back to Top

Test that your cluster was created correctly:

orchestrate cluster test

SigOpt Orchestrate will respond with:

Successfully connected to kubernetes cluster: tiny-cluster

Destroy your Cluster Back to Top

Destroying your cluster can take between 15-30 mins. To destroy your cluster please run:

orchestrate cluster destroy --cluster-name <cluster-name> --provider aws

Share your Kubernetes cluster Back to Top

You can grant other users permission to run on your Kubernetes cluster by modifying the relevant IAM Role.

SigOpt Orchestrate creates a role for every cluster, named <cluster-name>-k8s-access-role, which we call the cluster access role. SigOpt Orchestrate uses the cluster access role under the hood to access your cluster. To allow a second user to access the cluster, modify the cluster access role's trust relationship to give another user access. See instructions for Modifying a Role on AWS for how to change the trust relationship.

Below is an example trust relationship from a newly created cluster:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::123456789:user/alexandra"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

Below is an example trust relationship from a cluster which two people can access:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": [
          "arn:aws:iam::123456789:user/alexandra",
          "arn:aws:iam::123456789:user/ben"
        ]
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

After the IAM Role has been modified, new users can run:

orchestrate cluster connect --cluster-name <cluster-name> --provider aws

Now, the second user should be able to run commands on the cluster. Try running something simple, such as:

orchestrate test