Ray
Ray is an open-source unified compute framework that makes it easy to scale AI and Python workloads.
This app chart is an extension of the Jupyter Node app that adds support for Ray cluster monitoring and debugging using a graphical web-based dashboard.
You can use this app to either
- Deploy a static Ray cluster, for example with a single multi-GPU server head node, or
- Deploy a virtual cluster hosted on ICE Connect EKC, with the KubeRay operator and RayCluster service.
Before using this app, you should be familiar with Jupyter Node.
This document was last updated for Ray version 2.11.0.
Installation
Install the app through Rancher.
Configure
Read the Jupyter Node documentation for configuration options.
The default app Docker image is nvidia/cuda:12.4.1-devel-ubuntu22.04.
You can also build a custom image with the necessary Python packages and Ray libraries. It must be based on Ubuntu since apt is used to install packages.
Ray monitoring
Enable monitoring: Enable Ray monitoring and debugging services. If disabled, Grafana and other monitoring services will not be installed automatically.
Grafana image, Prometheus image, Loki image, Promtail image (required): The Docker images to use for the monitoring services.
Cluster monitoring volume size (required): The size of the persistent volume for storing monitoring data and log files. The default is 10Gi. It is mounted at /tmp/ray in the Ray deployment.
Static Ray cluster
In this example, we will deploy a single-node Ray cluster, using the Rancher app interface.
Configuration
The default app Docker image is nvidia/cuda:12.4.1-devel-ubuntu22.04.
You can use the Autostart script field to install software when the app starts. The default script will install Python, create a virtual environment, install Ray, and start its head node.
apt update && apt install -y python3 python3-venv python3-dev python3-pip python-is-python3
python -m venv /opt/venv
source /opt/venv/bin/activate
pip install "ray[default,serve]"
ray start --head --port=6379
Usage
The app installs a jupyter-server-proxy to provide access to the following services:
| Service | Internal port | Path | Access |
|---|---|---|---|
| Ray Serve | 8000 | / |
Public |
| JupyterLab | 8888 | /jupyter |
Private |
| Ray Dashboard | 8265 | /jupyter/ray |
Private |
| TensorBoard | 6006 | /jupyter/tensorboard |
Private |
You can use the app through the JupyterLab interface on the URL you set in the Rancher app field Subdomain for icedc.se followed by /jupyter, e.g.
https://myname.icedc.se/jupyter
The proxy will provide public internet access to Ray Serve at port 8000. The JupyterLab login page will restrict access to private services.
Ray
Log in to JupyterLab and open a terminal, or use SSH over NodePort or Visual Studio Code.
You can use the ray command to interact with the Ray cluster.
======== Autoscaler status: 2024-04-18 14:09:08.648983 ========
Node status
---------------------------------------------------------------
Active:
1 node_d4037e0f63ca4b76260f071cad1c75e05be16b91f2b9a427cafa1597
Pending:
(no pending nodes)
Recent failures:
(no failures)
Resources
---------------------------------------------------------------
Usage:
3.0/64.0 CPU
8.0/8.0 GPU
0B/37.98GiB memory
44B/18.99GiB object_store_memory
Demands:
(no resource demands)
Read the Ray documentation for more information on how to use Ray.
Ray Serve
Ray Serve is a scalable and programmable model serving library built on Ray. You can use it to deploy machine learning models as HTTP endpoints.
See the Examles section for how to deploy a chatbot model using Ray Serve.
Build and push the image to ICE Harbor. Replace myname with your project name.
docker build -t torch-ray:latest .
docker tag torch-ray:latest registry.ice.ri.se/myname/torch-ray:latest
docker push registry.ice.ri.se/myname/torch-ray:latest
Ray Dashboard
The Ray Dashboard is accessed through the JupyterLab launcher. The dashboard is a web-based user interface for monitoring and debugging Ray clusters. It provides a real-time view of the cluster state, including the resource usage of deployments and tasks, and the logs of the Ray processes.
For this purpose, the Ray Dashboard uses Grafana, Prometheus, Loki, and Promtail to collect, store and visualize the data. These services are automatically started when the Ray app is launched.
Grafana log parsing
After you have started a Ray Serve deployment, you can view the logs in Grafana.
Open Ray Dashboard ➡ Metrics ➡ View in Grafana.
The default credentials for Grafana are admin/admin.
From the main menu, click Connections ➡ Data Sources. On the row with Loki, click Explore.
You can now query the logs with Grafana using the Loki data source.
Virtual cluster with autoscaling
Cluster autoscaling and distributed training is implemented in Ray using the KubeRay operator and RayClusters service. These deployments require cluster-admin permissions in the Kubernetes cluster.
ICE Connect EKC does not allow users to create their own Kubernetes clusters. However, you can use vCluster to create a virtual Kubernetes cluster inside the icekube cluster. This virtual cluster can be used to deploy the required Ray services.
Namespace
When you create workloads in the virtual cluster, they will appear in the icekube namespace you used to create the virtual cluster. This means the rules for quotas, resource limits etc. will still apply.
Some workloads installed by vCluster do not specify default resource limits. When you create your namespace in Rancher, you must set Container Resource Limits.
Reasonable default reservations and limits are 500 mCPUs and 500 MiB memory. Note that you should not set a CPU Limit.
Dockerfile
Since this requires additional software, you should build a custom Docker image.
The following example is based on the nvidia/cuda image and installs the necessary software packages.
FROM nvidia/cuda:12.4.1-devel-ubuntu22.04
# Set environment variables
ENV DEBIAN_FRONTEND=noninteractive \
SHELL=/bin/bash
# Upgrade apt packages
RUN apt-get update && apt-get upgrade -y \
&& apt-get install -y --no-install-recommends \
openssh-client openssh-server rsyslog \
pkg-config apt-utils bash-completion build-essential \
python3 python3-venv python3-dev python3-pip python-is-python3 \
curl git git-lfs htop iputils-ping jq less vim tree unzip wget zip \
&& apt-get clean
# kubectl latest
RUN curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl"
RUN install -o root -g root -m 0755 kubectl /usr/local/bin/kubectl
RUN rm -f kubectl
# helm latest
RUN curl -fsSL -o get_helm.sh https://raw.githubusercontent.com/helm/helm/master/scripts/get-helm-3
RUN chmod 700 get_helm.sh
RUN ./get_helm.sh
RUN rm -f get_helm.sh
# vCluster
RUN curl -L -o vcluster "https://github.com/loft-sh/vcluster/releases/download/v0.20.0-beta.1/vcluster-linux-amd64"
RUN install -o root -g root -m 0755 vcluster /usr/local/bin/vcluster
RUN rm -f vcluster
# Set up Python virtual environment
ENV VIRTUAL_ENV=/opt/venv
RUN python3 -m venv "$VIRTUAL_ENV"
ENV PATH="$VIRTUAL_ENV/bin:$PATH"
# Install Python packages
RUN pip install --upgrade pip
RUN pip install setuptools pipreqs pip-upgrader jupyterlab jupyter-server-proxy
# Set working directory
WORKDIR /root
CMD ["sleep", "infinity"]
Build and push the image to ICE Harbor. Replace myname with your project name.
docker build -t vcluster:latest .
docker tag vcluster:latest registry.ice.ri.se/myname/vcluster:latest
docker push registry.ice.ri.se/myname/vcluster:latest
Starting
Use the following Autostart script to start the vCluster and create a virtual cluster.
rm -f ~/.kube/config
vcluster create mycluster -f /config/vcluster-config/config.yaml
You can use the default Entrypint override if you want to use the JupyterLab interface.
Log in with JupyterLab or SSH and check the status.
10:19:51 done Switched active kube context to vcluster_mycluster_johannes-dev_
10:19:51 warn Since you are using port-forwarding to connect, you will need to leave this terminal open
- Use CTRL+C to return to your previous kube context
- Use `kubectl get namespaces` in another terminal to access the vcluster
Forwarding from 127.0.0.1:12769 -> 8443
Handling connection for 12769
Using the virtual cluster
You can now access the virtual cluster using kubectl from your Jupyter container, even resources restricted on icekube, such as the kube-system namespace.
Install the KubeRay operator
Set the default namespace for kubectl. Otherwise, you have to append -n default to most commands.
To install the KubeRay operator, use the official helm chart.
You must also disable the seccomp profile for the operator to work in the host cluster icekube.
helm repo add kuberay https://ray-project.github.io/kuberay-helm/
helm repo update
helm install kuberay-operator kuberay/kuberay-operator \
--version 1.1.0 --set securityContext.seccompProfile=null
This will install the KubeRay operator in the default namespace.
You can check the installation progress:
If you need to uninstall the operator:
Install RayService
Follow the official instructions.
curl -LO https://raw.githubusercontent.com/ray-project/kuberay/release-1.1.0/ray-operator/config/samples/ray-service.sample.yaml
kubectl apply -f ray-service.sample.yaml
Check that it is running.
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 10.43.46.49 <none> 443/TCP 7h9m
kuberay-operator ClusterIP 10.43.59.87 <none> 8080/TCP 14m
rayservice-sample-raycluster-x46ww-head-svc ClusterIP 10.43.195.129 <none> 10001/TCP,8265/TCP,6379/TCP,8080/TCP,8000/TCP 6m22s
rayservice-sample-head-svc ClusterIP 10.43.147.145 <none> 10001/TCP,8265/TCP,6379/TCP,8080/TCP,8000/TCP 4m26s
rayservice-sample-serve-svc ClusterIP 10.43.4.44 <none> 8000/TCP 4m26s
Using Ray
You now have a working Ray cluster with autoscaling.
Follow the official Ray documentation to use RayService.
RayService example
This is a RayService configuration file for the example chatbot model.
apiVersion: ray.io/v1
kind: RayService
metadata:
name: rayservice-sample
spec:
serveConfigV2: |
applications:
- name: ChatBotGroup
import_path: demo:bot
route_prefix: /generate
runtime_env:
working_dir: "https://gitlab.ice.ri.se/ice/demo/chat/-/archive/0.0.1/chat-0.0.1.zip"
deployments:
- name: ChatBot
user_config:
repo_id: "TheBloke/Mistral-7B-OpenOrca-GGUF"
model_file: "mistral-7b-openorca.Q2_K.gguf"
ray_actor_options:
num_gpus: 1.0
num_replicas: 2
max_ongoing_requests: 5
autoscaling_config:
target_ongoing_requests: 2
min_replicas: 1
max_replicas: 10
- name: ChatClient
import_path: demo:app
route_prefix: /
runtime_env: {
working_dir: "https://gitlab.ice.ri.se/ice/demo/chat/-/archive/0.0.1/chat-0.0.1.zip"
}
deployments:
- name: ChatIngress
num_replicas: 1
rayClusterConfig:
rayVersion: 2.11.0
headGroupSpec:
rayStartParams:
dashboard-host: 0.0.0.0
template:
spec:
containers:
- name: ray-head
image: registry.ice.ri.se/ice/demo-chat:latest
resources:
limits:
cpu: 1
memory: 8Gi
requests:
cpu: 1
memory: 8Gi
ports:
- containerPort: 6379
name: gcs-server
- containerPort: 8265
name: dashboard
- containerPort: 10001
name: client
- containerPort: 8000
name: serve
workerGroupSpecs:
- replicas: 1
minReplicas: 1
maxReplicas: 100
groupName: small-group
rayStartParams: {}
template:
spec:
containers:
- name: ray-worker
image: registry.ice.ri.se/ice/demo-chat:latest
lifecycle:
preStop:
exec:
command:
- /bin/sh
- -c
- ray stop
resources:
limits:
cpu: "1"
memory: 16Gi
nvidia.com/gpu: "1"
requests:
cpu: 1000m
memory: 16Gi
nvidia.com/gpu: "1"
env:
- name: HF_HUB_ENABLE_HF_TRANSFER
value: "1"
- name: CONCURRENCY_LIMIT
value: "2"
- name: TITLE
value: "Chat - Mistral 7B OpenOrca"
- name: CHAT_LABEL
value: "mistral-7b-openorca.Q2_K.gguf"
- name: SYSTEM_PROMPT
value: "You are MistralOrca, a large language model trained by Alignment Lab AI. Write out your reasoning step-by-step to be sure you get the right answers!"
- name: ABOUT_TEXT
value: "This is a chat client for the text generation model [TheBloke/Mistral-7B-OpenOrca-GGUF](https://huggingface.co/TheBloke/Mistral-7B-OpenOrca-GGUF). It is trained on a diverse range of internet text, and is capable of generating human-like responses to text prompts.\n\n⭐ [Source code](https://gitlab.ice.ri.se/ice/demo/chat)"
nodeSelector:
accelerator: nvidia-gtx-2080ti
Create the RayService.
Check the status.




