Skip to content

RayRay

Ray is an open-source unified compute framework that makes it easy to scale AI and Python workloads.

This app chart is an extension of the Jupyter Node app that adds support for Ray cluster monitoring and debugging using a graphical web-based dashboard.

You can use this app to either

Before using this app, you should be familiar with Jupyter Node.

This document was last updated for Ray version 2.11.0.

Helm Chart on GitLab

Installation

Install the app through Rancher.

Configure

Read the Jupyter Node documentation for configuration options.

The default app Docker image is nvidia/cuda:12.4.1-devel-ubuntu22.04.

You can also build a custom image with the necessary Python packages and Ray libraries. It must be based on Ubuntu since apt is used to install packages.

Ray monitoring

Enable monitoring: Enable Ray monitoring and debugging services. If disabled, Grafana and other monitoring services will not be installed automatically.

Grafana image, Prometheus image, Loki image, Promtail image (required): The Docker images to use for the monitoring services.

Cluster monitoring volume size (required): The size of the persistent volume for storing monitoring data and log files. The default is 10Gi. It is mounted at /tmp/ray in the Ray deployment.

Static Ray cluster

In this example, we will deploy a single-node Ray cluster, using the Rancher app interface.

Configuration

The default app Docker image is nvidia/cuda:12.4.1-devel-ubuntu22.04.

You can use the Autostart script field to install software when the app starts. The default script will install Python, create a virtual environment, install Ray, and start its head node.

Autostart script
apt update && apt install -y python3 python3-venv python3-dev python3-pip python-is-python3
python -m venv /opt/venv
source /opt/venv/bin/activate
pip install "ray[default,serve]"
ray start --head --port=6379

Usage

The app installs a jupyter-server-proxy to provide access to the following services:

Service Internal port Path Access
Ray Serve 8000 / Public
JupyterLab 8888 /jupyter Private
Ray Dashboard 8265 /jupyter/ray Private
TensorBoard 6006 /jupyter/tensorboard Private

You can use the app through the JupyterLab interface on the URL you set in the Rancher app field Subdomain for icedc.se followed by /jupyter, e.g.

https://myname.icedc.se/jupyter

The proxy will provide public internet access to Ray Serve at port 8000. The JupyterLab login page will restrict access to private services.

JupyterLab with proxy

Screenshot of the JupyterLab web interface with proxy services.

Ray

Log in to JupyterLab and open a terminal, or use SSH over NodePort or Visual Studio Code.

You can use the ray command to interact with the Ray cluster.

ray status
======== Autoscaler status: 2024-04-18 14:09:08.648983 ========
Node status
---------------------------------------------------------------
Active:
 1 node_d4037e0f63ca4b76260f071cad1c75e05be16b91f2b9a427cafa1597
Pending:
 (no pending nodes)
Recent failures:
 (no failures)

Resources
---------------------------------------------------------------
Usage:
 3.0/64.0 CPU
 8.0/8.0 GPU
 0B/37.98GiB memory
 44B/18.99GiB object_store_memory

Demands:
 (no resource demands)

Read the Ray documentation for more information on how to use Ray.

Ray Serve

Ray Serve is a scalable and programmable model serving library built on Ray. You can use it to deploy machine learning models as HTTP endpoints.

See the Examles section for how to deploy a chatbot model using Ray Serve.

Build and push the image to ICE Harbor. Replace myname with your project name.

docker build -t torch-ray:latest .
docker tag torch-ray:latest registry.ice.ri.se/myname/torch-ray:latest
docker push registry.ice.ri.se/myname/torch-ray:latest

Ray Dashboard

The Ray Dashboard is accessed through the JupyterLab launcher. The dashboard is a web-based user interface for monitoring and debugging Ray clusters. It provides a real-time view of the cluster state, including the resource usage of deployments and tasks, and the logs of the Ray processes.

Ray Dashboard

Screenshot of the Ray Dashboard.

For this purpose, the Ray Dashboard uses Grafana, Prometheus, Loki, and Promtail to collect, store and visualize the data. These services are automatically started when the Ray app is launched.

Grafana log parsing

After you have started a Ray Serve deployment, you can view the logs in Grafana.

Open Ray Dashboard ➡ Metrics ➡ View in Grafana.

The default credentials for Grafana are admin/admin.

From the main menu, click Connections ➡ Data Sources. On the row with Loki, click Explore.

Grafana Loki data source

Screenshot of the Grafana Loki data source.

You can now query the logs with Grafana using the Loki data source.

Grafana Loki query

Screenshot of the Grafana Loki query interface.

Virtual cluster with autoscaling

Cluster autoscaling and distributed training is implemented in Ray using the KubeRay operator and RayClusters service. These deployments require cluster-admin permissions in the Kubernetes cluster.

ICE Connect EKC does not allow users to create their own Kubernetes clusters. However, you can use vCluster to create a virtual Kubernetes cluster inside the icekube cluster. This virtual cluster can be used to deploy the required Ray services.

Virtual Ray cluster

Schematic of the virtual Ray cluster.

Namespace

When you create workloads in the virtual cluster, they will appear in the icekube namespace you used to create the virtual cluster. This means the rules for quotas, resource limits etc. will still apply.

Some workloads installed by vCluster do not specify default resource limits. When you create your namespace in Rancher, you must set Container Resource Limits.

Rancher Namespace Container Resource Limit

Screenshot of the Rancher settings for Container Resource Limit.

Reasonable default reservations and limits are 500 mCPUs and 500 MiB memory. Note that you should not set a CPU Limit.

Dockerfile

Since this requires additional software, you should build a custom Docker image.

The following example is based on the nvidia/cuda image and installs the necessary software packages.

FROM nvidia/cuda:12.4.1-devel-ubuntu22.04

# Set environment variables
ENV DEBIAN_FRONTEND=noninteractive \
    SHELL=/bin/bash

# Upgrade apt packages
RUN apt-get update && apt-get upgrade -y \
    && apt-get install -y --no-install-recommends \
    openssh-client openssh-server rsyslog \
    pkg-config apt-utils bash-completion build-essential \
    python3 python3-venv python3-dev python3-pip python-is-python3 \
    curl git git-lfs htop iputils-ping jq less vim tree unzip wget zip \
    && apt-get clean

# kubectl latest
RUN curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl"
RUN install -o root -g root -m 0755 kubectl /usr/local/bin/kubectl
RUN rm -f kubectl

# helm latest
RUN curl -fsSL -o get_helm.sh https://raw.githubusercontent.com/helm/helm/master/scripts/get-helm-3
RUN chmod 700 get_helm.sh
RUN ./get_helm.sh
RUN rm -f get_helm.sh

# vCluster
RUN curl -L -o vcluster "https://github.com/loft-sh/vcluster/releases/download/v0.20.0-beta.1/vcluster-linux-amd64"
RUN install -o root -g root -m 0755 vcluster /usr/local/bin/vcluster
RUN rm -f vcluster

# Set up Python virtual environment
ENV VIRTUAL_ENV=/opt/venv
RUN python3 -m venv "$VIRTUAL_ENV"
ENV PATH="$VIRTUAL_ENV/bin:$PATH"

# Install Python packages
RUN pip install --upgrade pip
RUN pip install setuptools pipreqs pip-upgrader jupyterlab jupyter-server-proxy

# Set working directory
WORKDIR /root

CMD ["sleep", "infinity"]

Build and push the image to ICE Harbor. Replace myname with your project name.

docker build -t vcluster:latest .
docker tag vcluster:latest registry.ice.ri.se/myname/vcluster:latest
docker push registry.ice.ri.se/myname/vcluster:latest

Starting

Use the following Autostart script to start the vCluster and create a virtual cluster.

Autostart script
rm -f ~/.kube/config
vcluster create mycluster -f /config/vcluster-config/config.yaml

You can use the default Entrypint override if you want to use the JupyterLab interface.

Entrypoint override
jupyter lab --ip=0.0.0.0 --port=8888 --allow-root --no-browser

Log in with JupyterLab or SSH and check the status.

cat autostart.log
10:19:51 done Switched active kube context to vcluster_mycluster_johannes-dev_
10:19:51 warn Since you are using port-forwarding to connect, you will need to leave this terminal open
- Use CTRL+C to return to your previous kube context
- Use `kubectl get namespaces` in another terminal to access the vcluster
Forwarding from 127.0.0.1:12769 -> 8443
Handling connection for 12769

Using the virtual cluster

You can now access the virtual cluster using kubectl from your Jupyter container, even resources restricted on icekube, such as the kube-system namespace.

kubectl get pods -n kube-system
NAME                       READY   STATUS    RESTARTS   AGE
coredns-6997864d8b-2tff9   1/1     Running   0          4h14m

Install the KubeRay operator

Set the default namespace for kubectl. Otherwise, you have to append -n default to most commands.

kubectl config set-context --current --namespace=default

To install the KubeRay operator, use the official helm chart.

You must also disable the seccomp profile for the operator to work in the host cluster icekube.

helm repo add kuberay https://ray-project.github.io/kuberay-helm/
helm repo update
helm install kuberay-operator kuberay/kuberay-operator \
    --version 1.1.0 --set securityContext.seccompProfile=null

This will install the KubeRay operator in the default namespace.

kubectl get pods
NAME                                READY   STATUS    RESTARTS   AGE
kuberay-operator-68bcc864f5-qpwgg   1/1     Running   0          97s

You can check the installation progress:

kubectl describe pods

If you need to uninstall the operator:

helm uninstall kuberay-operator

Install RayService

Follow the official instructions.

curl -LO https://raw.githubusercontent.com/ray-project/kuberay/release-1.1.0/ray-operator/config/samples/ray-service.sample.yaml
kubectl apply -f ray-service.sample.yaml

Check that it is running.

kubectl get services
NAME                                          TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)                                         AGE
kubernetes                                    ClusterIP   10.43.46.49     <none>        443/TCP                                         7h9m
kuberay-operator                              ClusterIP   10.43.59.87     <none>        8080/TCP                                        14m
rayservice-sample-raycluster-x46ww-head-svc   ClusterIP   10.43.195.129   <none>        10001/TCP,8265/TCP,6379/TCP,8080/TCP,8000/TCP   6m22s
rayservice-sample-head-svc                    ClusterIP   10.43.147.145   <none>        10001/TCP,8265/TCP,6379/TCP,8080/TCP,8000/TCP   4m26s
rayservice-sample-serve-svc                   ClusterIP   10.43.4.44      <none>        8000/TCP                                        4m26s

Using Ray

You now have a working Ray cluster with autoscaling.

Follow the official Ray documentation to use RayService.

RayService example

This is a RayService configuration file for the example chatbot model.

chatbot.yaml
apiVersion: ray.io/v1
kind: RayService
metadata:
  name: rayservice-sample
spec:
  serveConfigV2: |
    applications:
      - name: ChatBotGroup
        import_path: demo:bot
        route_prefix: /generate
        runtime_env:
          working_dir: "https://gitlab.ice.ri.se/ice/demo/chat/-/archive/0.0.1/chat-0.0.1.zip"
        deployments:
          - name: ChatBot
            user_config:
              repo_id: "TheBloke/Mistral-7B-OpenOrca-GGUF"
              model_file: "mistral-7b-openorca.Q2_K.gguf"
            ray_actor_options:
              num_gpus: 1.0
            num_replicas: 2
            max_ongoing_requests: 5
            autoscaling_config:
                target_ongoing_requests: 2
                min_replicas: 1
                max_replicas: 10
      - name: ChatClient
        import_path: demo:app
        route_prefix: /
        runtime_env: {
          working_dir: "https://gitlab.ice.ri.se/ice/demo/chat/-/archive/0.0.1/chat-0.0.1.zip"
        }
        deployments:
        - name: ChatIngress
          num_replicas: 1
  rayClusterConfig:
    rayVersion: 2.11.0
    headGroupSpec:
      rayStartParams:
        dashboard-host: 0.0.0.0
      template:
        spec:
          containers:
            - name: ray-head
              image: registry.ice.ri.se/ice/demo-chat:latest
              resources:
                limits:
                  cpu: 1
                  memory: 8Gi
                requests:
                  cpu: 1
                  memory: 8Gi
              ports:
                - containerPort: 6379
                  name: gcs-server
                - containerPort: 8265
                  name: dashboard
                - containerPort: 10001
                  name: client
                - containerPort: 8000
                  name: serve
    workerGroupSpecs:
      - replicas: 1
        minReplicas: 1
        maxReplicas: 100
        groupName: small-group
        rayStartParams: {}
        template:
          spec:
            containers:
              - name: ray-worker
                image: registry.ice.ri.se/ice/demo-chat:latest
                lifecycle:
                  preStop:
                    exec:
                      command:
                        - /bin/sh
                        - -c
                        - ray stop
                resources:
                  limits:
                    cpu: "1"
                    memory: 16Gi
                    nvidia.com/gpu: "1"
                  requests:
                    cpu: 1000m
                    memory: 16Gi
                    nvidia.com/gpu: "1"
                env:
                - name: HF_HUB_ENABLE_HF_TRANSFER
                  value: "1"
                - name: CONCURRENCY_LIMIT
                  value: "2"
                - name: TITLE
                  value: "Chat - Mistral 7B OpenOrca"
                - name: CHAT_LABEL
                  value: "mistral-7b-openorca.Q2_K.gguf"
                - name: SYSTEM_PROMPT
                  value: "You are MistralOrca, a large language model trained by Alignment Lab AI. Write out your reasoning step-by-step to be sure you get the right answers!"
                - name: ABOUT_TEXT
                  value: "This is a chat client for the text generation model [TheBloke/Mistral-7B-OpenOrca-GGUF](https://huggingface.co/TheBloke/Mistral-7B-OpenOrca-GGUF). It is trained on a diverse range of internet text, and is capable of generating human-like responses to text prompts.\n\n⭐ [Source code](https://gitlab.ice.ri.se/ice/demo/chat)"
            nodeSelector:
              accelerator: nvidia-gtx-2080ti

Create the RayService.

kubectl apply -f chatbot.yaml

Check the status.

kubectl get pods -l=ray.io/is-ray-node=yes
NAME                                                      READY   STATUS              RESTARTS   AGE
rayservice-sample-raycluster-pjcsp-head-22pdx             0/1     ContainerCreating   0          4m11s
ervice-sample-raycluster-pjcsp-worker-small-group-65lrp   0/1     Init:0/1            0          4m11s