Experimental Kubernetes Cluster (EKC)
ICE Connect EKC is a shared, multi-tenant, GPU-enabled Kubernetes (K8s) cluster, in which users can deploy systems of applications called containers. EKC is managed using the Rancher platform, available here:
Project
Each ICE connect project that have EKC enabled have a Rancher project with the same name. The total resources for the project is limited by the quota assigned for the project. Each project is further divided into one or more namespaces that you create in Rancher. The sum of the namespace quotas must not exceed the project total. The namespaces must be named with the project name followed by a hyphen (-) and then the specific namespace name, e.g myproject-default.
You pay for resources reserved. The data is updated hourly. For instance, if you run an app for one hour and reserve 2 CPU cores, 8 GiB RAM, 16 GiB persistent storage, and 3 Nvidia Geforce RTX 2080 Ti GPUs. Then you will pay for exactly that. Even though you may not use 100% of the CPU and all the allocated RAM etc. You stop paying when the app is deleted or suspended.
A summary of historically reserved Kubernetes resources is found on ICE Connect, under Project ➡ Billing. Instant usage is available in Rancher.
Containers
Containers are similar to virtual machines, but only virtualize software above the operating system level, making them smaller and more portable. They allow software applications to run in isolated user spaces, independent of other environments running in parallel.
This isolation, also called sandboxing, enables multiple independent users to share the same physical hardware. It helps users to scale their computational resource usage efficiently, reducing the cost associated with under-utilized hardware, compared to virtual machines or bare-metal servers.
Kubernetes
Kubernetes is a system for automating the deployment, scaling, and management of containers. It supports the development of automated configuration, coordination, and management of networked software applications.
Here are a few important concepts to understand:
- Containers are lightweight, executable packages of software that include everything needed to run an application: code, runtime, libraries, and settings.
- Kubernetes helps you manage these containers. If a container crashes, Kubernetes can automatically restart it. It also helps manage service discovery and load balancing, storage orchestration, automated rollouts and rollbacks, and more.
Developing applications
When developing and deploying applications on EKC keep the following in mind:
- Uptime: The cluster is experimental and does not guarantee 100% uptime. Servers may be rebooted at any time due to maintenance (although we try to schedule these during Sunday nights).
- Resilience: Design your applications to handle interruptions. For instance, machine learning training should implement checkpoints to persistent storage. This way, processes can be resumed from the last checkpoint without data loss after a reboot.