When initially setting up your environment, it can be helpful to have a few debugging commands handy.
Firstly, check to see if your Kubernetes Jobs started correctly:
$ sigopt cluster kubectl get jobs
NAME. COMPLETIONS DURATION AGE experiment-controller-999999 0/1 2m22s
If you see Jobs that have
0/1 Completions and no Duration, there's likely an issue with the Job.
You can then get more detailed information about the specific job:
$ sigopt cluster kubectl describe jobs/ experiment-controller-999999
This will contain a lot of information about the Job, and list any errors at the bottom.
If the Job has started successfully but you're still not getting results, next check the Pod(s) with a similar process.
$ sigopt cluster kubectl get pods
NAME READY STATUS RESTARTS AGE experiment-controller-999999-xxxxx 0/1 ImagePullBackOff 0 5m
A Pod Status such as
ImagePullBackOff might indicate an issue with credentials not being properly loaded into Kubernetes, or trying to access the wrong registry.
More detailed information about the Pod and any errors can be queried:
$ sigopt cluster kubectl describe pods/ experiment-controller-999999-xxxxx
Dockerfile: define model training environmentReference