ColonyOS
ColonyOS is a meta operating system that allows users to run HPC workloads on Kubernetes and other platforms, with a unified interface for job submission and monitoring. It uses a type of grid-computing model, called a "colony", to manage resources and jobs.
In this guide, we deploy a Colonies server and an Executor on ICE Connect EKC and run a job.
Install the colonies CLI tool
Download and install the latest colonies
from the releases page.
wget https://github.com/colonyos/colonies/releases/download/v1.7.12/colonies_1.7.12_linux_amd64.tar.gz
tar -xvf colonies_1.7.12_linux_amd64.tar.gz
sudo mv colonies /usr/local/bin
You should now be able to run colonies
from the command line. If not, make sure /usr/local/bin
is in your PATH
environment variable.
Deploy the Colonies server
Read the EKC Usage guide to learn how to create a project and namespace.
- In Rancher, open Apps and select colonyos.
- Click Install, then choose an app name and namespace.
- Click Next.
Generate a key pair
Use the colonies
CLI tool to generate a new key pair:
INFO[0000] Generated new private key
Id=a92bea7c9bccd587dbbd2fff02c1aeed3e37772b9f42b066926d18710f4aecff
PrvKey=5437af8dec12c6654a3e08425bc0ef9c8c4063a846c73f96bb7675a7f4ddd6ad
Install the Rancher app
On the Rancher app installation page, choose a hostname, and fill in the generated keys Id
and PrvKey
, e.g.
Setting | Value |
---|---|
Colonies Server hostname | johannes-colonyos.icedc.se |
Colonies Server Id | a92bea7c9bccd587dbbd2fff02c1aeed3e37772b9f42b066926d18710f4aecff |
Colonies Server PrvKey | 5437af8dec12c6654a3e08425bc0ef9c8c4063a846c73f96bb7675a7f4ddd6ad |
PostgreSQL Password | choose a random password |
You can leave the other settings at their default values.
Click Install to deploy the Colonies server.
Access the Colonies server
Create a local file colonies.env
with the PrvKey
key you generated, and the server hostname:
export COLONIES_TLS="true"
export COLONIES_SERVER_TLS="true"
export COLONIES_SERVER_HOST="johannes-colonyos.icedc.se"
export COLONIES_SERVER_PORT="443"
export COLONIES_SERVER_PRVKEY="5437af8dec12c6654a3e08425bc0ef9c8c4063a846c73f96bb7675a7f4ddd6ad"
export COLONIES_COLONY_NAME=null
export COLONIES_PRVKEY=null
Source the file to set the environment variables in your current shell:
Check that the server is running and reachable:
╭───────────────────┬─────────────────────────────────────┬─────────┬────────────────┬──────────────┬───────────┬────────╮
│ NAME │ HOST │ APIPORT │ ETCDCLIENTPORT │ ETCDPEERPORT │ RELAYPORT │ LEADER │
├───────────────────┼─────────────────────────────────────┼─────────┼────────────────┼──────────────┼───────────┼────────┤
│ colonies-server-0 │ colonies-server-0.colonies-internal │ 80 │ 2379 │ 2380 │ 2381 │ True │
╰───────────────────┴─────────────────────────────────────┴─────────┴────────────────┴──────────────┴───────────┴────────╯
Add a colony
Generate a new key pair for the colony:
INFO[0000] Generated new private key
Id=759f26fba53a52278b95654ec0a7694d5572921f39b49fb5394587d22b3f6bea
PrvKey=25e99b4e91ecfca9321d0d02b629de60f57db74fbbf63eccafe9e957645d86ca
Add the colony to the server:
colonies colony add \
--name mycolony \
--colonyid 759f26fba53a52278b95654ec0a7694d5572921f39b49fb5394587d22b3f6bea
INFO[0000] Colony added
ColonyID=759f26fba53a52278b95654ec0a7694d5572921f39b49fb5394587d22b3f6bea
ColonyName=mycolony
Add the colony name to COLONIES_COLONY_NAME
and the private key to COLONIES_COLONY_PRVKEY
in the environment file:
export COLONIES_TLS="true"
export COLONIES_SERVER_TLS="true"
export COLONIES_SERVER_HOST="johannes-colonyos.icedc.se"
export COLONIES_SERVER_PORT="443"
export COLONIES_SERVER_PRVKEY="5437af8dec12c6654a3e08425bc0ef9c8c4063a846c73f96bb7675a7f4ddd6ad"
export COLONIES_COLONY_NAME="mycolony"
export COLONIES_COLONY_PRVKEY="25e99b4e91ecfca9321d0d02b629de60f57db74fbbf63eccafe9e957645d86ca"
export COLONIES_PRVKEY=null
Source the file:
Check that the colony is added:
╭──────────┬──────────────────────────────────────────────────────────────────╮
│ NAME │ COLONYID │
├──────────┼──────────────────────────────────────────────────────────────────┤
│ mycolony │ 759f26fba53a52278b95654ec0a7694d5572921f39b49fb5394587d22b3f6bea │
╰──────────┴──────────────────────────────────────────────────────────────────╯
Add a user
Generate a new key pair for the user:
INFO[0000] Generated new private key
Id=f97622fc11388135fea596dc395342fc1da649677a0886fd7aa8c3480a42a0ea
PrvKey=362d7637c397105945f278b3856f8312a098da0841d18ca19ce40ee50fc1139c
Add the user to the server:
colonies user add \
--name="johannes" \
--email="johannes.sjolund@ri.se" \
--phone="+46102284984" \
--userid="f97622fc11388135fea596dc395342fc1da649677a0886fd7aa8c3480a42a0ea"
INFO[0000] User added
ColonyName=mycolony
Email=johannes.sjolund@ri.se
Phone=+46102284984
UserId=f97622fc11388135fea596dc395342fc1da649677a0886fd7aa8c3480a42a0ea
Username=johannes
Add the private key to COLONIES_PRVKEY
in the environment file:
export COLONIES_TLS="true"
export COLONIES_SERVER_TLS="true"
export COLONIES_SERVER_HOST="johannes-colonyos.icedc.se"
export COLONIES_SERVER_PORT="443"
export COLONIES_SERVER_PRVKEY="5437af8dec12c6654a3e08425bc0ef9c8c4063a846c73f96bb7675a7f4ddd6ad"
export COLONIES_COLONY_NAME="mycolony"
export COLONIES_COLONY_PRVKEY="25e99b4e91ecfca9321d0d02b629de60f57db74fbbf63eccafe9e957645d86ca"
export COLONIES_PRVKEY="362d7637c397105945f278b3856f8312a098da0841d18ca19ce40ee50fc1139c"
Source the file:
Check that the user is added:
╭──────────┬────────────────────────┬──────────────╮
│ USERNAME │ EMAIL │ PHONE │
├──────────┼────────────────────────┼──────────────┤
│ johannes │ johannes.sjolund@ri.se │ +46102284984 │
╰──────────┴────────────────────────┴──────────────╯
Deploy the Kubernetes executor
- In Rancher, open Apps and select colonyos-executor.
- Click Install, then choose an app name and namespace.
- Click Next.
Install the Rancher app
On the Rancher app installation page, fill in the fields matching the server, colony, and user settings:
Setting | Value |
---|---|
Colonies Server Hostname | johannes-colonyos.icedc.se |
Colonies Server Port | 443 |
Colony Name | mycolony |
Colony PrvKey | 25e99b4e91ecfca9321d0d02b629de60f57db74fbbf63eccafe9e957645d86ca |
User PrvKey | 362d7637c397105945f278b3856f8312a098da0841d18ca19ce40ee50fc1139c |
If you want to use the S3 storage backend, fill in your S3 bucket credentials as well. Otherwise, leave them at their default values. The same goes for the Executor and Metadata settings.
Click Install to deploy the executor.
Access the executor
The executor is now running in the Kubernetes cluster. You can use the colonies
CLI tool to check its status:
╭─────────┬──────────────────┬────────────────────────┬─────────────────────╮
│ NAME │ TYPE │ LOCATION │ LAST HEARD FROM │
├─────────┼──────────────────┼────────────────────────┼─────────────────────┤
│ icekube │ ice-kubeexecutor │ ICE Datacenter, Sweden │ 2024-03-04 11:44:09 │
╰─────────┴──────────────────┴────────────────────────┴─────────────────────╯
Run a GPU job
To run a job that runs the nvidia-smi
command in a TensorFlow container with GPU support, create the local file nvidia-smi.json
with the following content:
{
"conditions": {
"executortype": "ice-kubeexecutor",
"nodes": 1,
"processes-per-node": 1,
"mem": "2000Mi",
"cpu": "500m",
"gpu": {
"name": "nvidia-gtx-2080ti",
"count": 2
},
"walltime": 600
},
"funcname": "execute",
"kwargs": {
"cmd": "nvidia-smi",
"args": [],
"docker-image": "tensorflow/tensorflow:2.14.0rc1-gpu",
"rebuild-image": false
},
"maxexectime": 600,
"maxretries": 3
}
Submit the job to the executor:
INFO[0000] Process submitted ProcessId=0d562ce41f7d9e41926cc01207f7b5052b75e98299ae6be9b962f7a644bf27b5
INFO[0000] Printing logs from process ProcessId=0d562ce41f7d9e41926cc01207f7b5052b75e98299ae6be9b962f7a644bf27b5
Mon Mar 4 11:03:12 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 545.29.02 Driver Version: 545.29.02 CUDA Version: 12.3 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA GeForce RTX 2080 Ti Off | 00000000:1B:00.0 Off | N/A |
| 32% 31C P0 59W / 250W | 0MiB / 11264MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
| 1 NVIDIA GeForce RTX 2080 Ti Off | 00000000:23:00.0 Off | N/A |
| 33% 32C P0 21W / 250W | 0MiB / 11264MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| No running processes found |
+---------------------------------------------------------------------------------------+