Skip to content

ColonyOSColonyOS

ColonyOS is a meta operating system that allows users to run HPC workloads on Kubernetes and other platforms, with a unified interface for job submission and monitoring. It uses a type of grid-computing model, called a "colony", to manage resources and jobs.

In this guide, we deploy a Colonies server and an Executor on ICE Connect EKC and run a job.

Official Helm Charts

Install the colonies CLI tool

Download and install the latest colonies from the releases page.

wget https://github.com/colonyos/colonies/releases/download/v1.7.12/colonies_1.7.12_linux_amd64.tar.gz
tar -xvf colonies_1.7.12_linux_amd64.tar.gz
sudo mv colonies /usr/local/bin

You should now be able to run colonies from the command line. If not, make sure /usr/local/bin is in your PATH environment variable.

Deploy the Colonies server

Read the EKC Usage guide to learn how to create a project and namespace.

  1. In Rancher, open Apps and select colonyos.
  2. Click Install, then choose an app name and namespace.
  3. Click Next.

Generate a key pair

Use the colonies CLI tool to generate a new key pair:

colonies key generate
INFO[0000] Generated new private key
Id=a92bea7c9bccd587dbbd2fff02c1aeed3e37772b9f42b066926d18710f4aecff
PrvKey=5437af8dec12c6654a3e08425bc0ef9c8c4063a846c73f96bb7675a7f4ddd6ad

Install the Rancher app

On the Rancher app installation page, choose a hostname, and fill in the generated keys Id and PrvKey, e.g.

Setting Value
Colonies Server hostname johannes-colonyos.icedc.se
Colonies Server Id a92bea7c9bccd587dbbd2fff02c1aeed3e37772b9f42b066926d18710f4aecff
Colonies Server PrvKey 5437af8dec12c6654a3e08425bc0ef9c8c4063a846c73f96bb7675a7f4ddd6ad
PostgreSQL Password choose a random password

You can leave the other settings at their default values.

Click Install to deploy the Colonies server.

Access the Colonies server

Create a local file colonies.env with the PrvKey key you generated, and the server hostname:

colonies.env
export COLONIES_TLS="true"
export COLONIES_SERVER_TLS="true"
export COLONIES_SERVER_HOST="johannes-colonyos.icedc.se"
export COLONIES_SERVER_PORT="443"
export COLONIES_SERVER_PRVKEY="5437af8dec12c6654a3e08425bc0ef9c8c4063a846c73f96bb7675a7f4ddd6ad"
export COLONIES_COLONY_NAME=null
export COLONIES_PRVKEY=null

Source the file to set the environment variables in your current shell:

source colonies.env

Check that the server is running and reachable:

colonies cluster info
╭───────────────────┬─────────────────────────────────────┬─────────┬────────────────┬──────────────┬───────────┬────────╮
│ NAME              │ HOST                                │ APIPORT │ ETCDCLIENTPORT │ ETCDPEERPORT │ RELAYPORT │ LEADER │
├───────────────────┼─────────────────────────────────────┼─────────┼────────────────┼──────────────┼───────────┼────────┤
│ colonies-server-0 │ colonies-server-0.colonies-internal │ 80      │ 2379           │ 2380         │ 2381      │ True   │
╰───────────────────┴─────────────────────────────────────┴─────────┴────────────────┴──────────────┴───────────┴────────╯

Add a colony

Generate a new key pair for the colony:

colonies key generate
INFO[0000] Generated new private key
Id=759f26fba53a52278b95654ec0a7694d5572921f39b49fb5394587d22b3f6bea
PrvKey=25e99b4e91ecfca9321d0d02b629de60f57db74fbbf63eccafe9e957645d86ca

Add the colony to the server:

colonies colony add \
--name mycolony \
--colonyid 759f26fba53a52278b95654ec0a7694d5572921f39b49fb5394587d22b3f6bea
INFO[0000] Colony added
ColonyID=759f26fba53a52278b95654ec0a7694d5572921f39b49fb5394587d22b3f6bea
ColonyName=mycolony

Add the colony name to COLONIES_COLONY_NAME and the private key to COLONIES_COLONY_PRVKEY in the environment file:

colonies.env
export COLONIES_TLS="true"
export COLONIES_SERVER_TLS="true"
export COLONIES_SERVER_HOST="johannes-colonyos.icedc.se"
export COLONIES_SERVER_PORT="443"
export COLONIES_SERVER_PRVKEY="5437af8dec12c6654a3e08425bc0ef9c8c4063a846c73f96bb7675a7f4ddd6ad"
export COLONIES_COLONY_NAME="mycolony"
export COLONIES_COLONY_PRVKEY="25e99b4e91ecfca9321d0d02b629de60f57db74fbbf63eccafe9e957645d86ca"
export COLONIES_PRVKEY=null

Source the file:

source colonies.env

Check that the colony is added:

colonies colony ls
╭──────────┬──────────────────────────────────────────────────────────────────╮
│ NAME     │ COLONYID                                                         │
├──────────┼──────────────────────────────────────────────────────────────────┤
│ mycolony │ 759f26fba53a52278b95654ec0a7694d5572921f39b49fb5394587d22b3f6bea │
╰──────────┴──────────────────────────────────────────────────────────────────╯

Add a user

Generate a new key pair for the user:

colonies key generate
INFO[0000] Generated new private key
Id=f97622fc11388135fea596dc395342fc1da649677a0886fd7aa8c3480a42a0ea
PrvKey=362d7637c397105945f278b3856f8312a098da0841d18ca19ce40ee50fc1139c

Add the user to the server:

colonies user add \
--name="johannes" \
--email="johannes.sjolund@ri.se" \
--phone="+46102284984" \
--userid="f97622fc11388135fea596dc395342fc1da649677a0886fd7aa8c3480a42a0ea"
INFO[0000] User added
ColonyName=mycolony
Email=johannes.sjolund@ri.se
Phone=+46102284984
UserId=f97622fc11388135fea596dc395342fc1da649677a0886fd7aa8c3480a42a0ea
Username=johannes

Add the private key to COLONIES_PRVKEY in the environment file:

colonies.env
export COLONIES_TLS="true"
export COLONIES_SERVER_TLS="true"
export COLONIES_SERVER_HOST="johannes-colonyos.icedc.se"
export COLONIES_SERVER_PORT="443"
export COLONIES_SERVER_PRVKEY="5437af8dec12c6654a3e08425bc0ef9c8c4063a846c73f96bb7675a7f4ddd6ad"
export COLONIES_COLONY_NAME="mycolony"
export COLONIES_COLONY_PRVKEY="25e99b4e91ecfca9321d0d02b629de60f57db74fbbf63eccafe9e957645d86ca"
export COLONIES_PRVKEY="362d7637c397105945f278b3856f8312a098da0841d18ca19ce40ee50fc1139c"

Source the file:

source colonies.env

Check that the user is added:

colonies user ls
╭──────────┬────────────────────────┬──────────────╮
│ USERNAME │ EMAIL                  │ PHONE        │
├──────────┼────────────────────────┼──────────────┤
│ johannes │ johannes.sjolund@ri.se │ +46102284984 │
╰──────────┴────────────────────────┴──────────────╯

Deploy the Kubernetes executor

  1. In Rancher, open Apps and select colonyos-executor.
  2. Click Install, then choose an app name and namespace.
  3. Click Next.

Install the Rancher app

On the Rancher app installation page, fill in the fields matching the server, colony, and user settings:

Setting Value
Colonies Server Hostname johannes-colonyos.icedc.se
Colonies Server Port 443
Colony Name mycolony
Colony PrvKey 25e99b4e91ecfca9321d0d02b629de60f57db74fbbf63eccafe9e957645d86ca
User PrvKey 362d7637c397105945f278b3856f8312a098da0841d18ca19ce40ee50fc1139c

If you want to use the S3 storage backend, fill in your S3 bucket credentials as well. Otherwise, leave them at their default values. The same goes for the Executor and Metadata settings.

Click Install to deploy the executor.

Access the executor

The executor is now running in the Kubernetes cluster. You can use the colonies CLI tool to check its status:

colonies executor ls
╭─────────┬──────────────────┬────────────────────────┬─────────────────────╮
│ NAME    │ TYPE             │ LOCATION               │ LAST HEARD FROM     │
├─────────┼──────────────────┼────────────────────────┼─────────────────────┤
│ icekube │ ice-kubeexecutor │ ICE Datacenter, Sweden │ 2024-03-04 11:44:09 │
╰─────────┴──────────────────┴────────────────────────┴─────────────────────╯

Run a GPU job

To run a job that runs the nvidia-smi command in a TensorFlow container with GPU support, create the local file nvidia-smi.json with the following content:

nvidia-smi.json
{
    "conditions": {
        "executortype": "ice-kubeexecutor",
        "nodes": 1,
        "processes-per-node": 1,
        "mem": "2000Mi",
        "cpu": "500m",
        "gpu": {
            "name": "nvidia-gtx-2080ti",
            "count": 2
        },
        "walltime": 600
    },
    "funcname": "execute",
    "kwargs": {
        "cmd": "nvidia-smi",
        "args": [],
        "docker-image": "tensorflow/tensorflow:2.14.0rc1-gpu",
        "rebuild-image": false
    },
    "maxexectime": 600,
    "maxretries": 3
}

Submit the job to the executor:

colonies function submit --spec nvidia-smi.json --follow
INFO[0000] Process submitted                             ProcessId=0d562ce41f7d9e41926cc01207f7b5052b75e98299ae6be9b962f7a644bf27b5
INFO[0000] Printing logs from process                    ProcessId=0d562ce41f7d9e41926cc01207f7b5052b75e98299ae6be9b962f7a644bf27b5
Mon Mar  4 11:03:12 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 545.29.02              Driver Version: 545.29.02    CUDA Version: 12.3     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 2080 Ti     Off | 00000000:1B:00.0 Off |                  N/A |
| 32%   31C    P0              59W / 250W |      0MiB / 11264MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   1  NVIDIA GeForce RTX 2080 Ti     Off | 00000000:23:00.0 Off |                  N/A |
| 33%   32C    P0              21W / 250W |      0MiB / 11264MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+