Custom Images for AKIS GPU Cluster
This is a collection of customized Jupyter images to be used in the JupyterHub running on the Kubernetes AKIS GPU Cluster.
tensorflow-cpu-with-vscode
- Based on:
jupyter/tensorflow-notebook:tensorflow-2.15.0.post1
(quay.io, src, doc) - Available as:
christofkaufmann/tensorflow-cpu-notebook:latest
- Features:
- Jupyter enabled
- Contains tensorflow-2.15.0.post1 without GPU support
- Contains R
- Added VS Code via code-server as web UI and via VS Code CLI for tunnels
- Added Notebook git Puller (nbgitpuller)
- Provide default home directory contents including VS Code Extensions and settings in
/etc/skel/
. Can be copied on first startup using post start hook.
tensorflow-gpu-with-vscode
- Based on:
jupyter/scipy-notebook
(quay.io, src, doc) - Available as:
christofkaufmann/tensorflow-gpu-notebook:latest
- Features:
- Jupyter enabled
- Contains tensorflow-2.15.1 with GPU support (requires nvidia-driver)
- Added VS Code via code-server as web UI and via VS Code CLI for tunnels
- Added Notebook git Puller (nbgitpuller)
- Provide default home directory contents including VS Code Extensions and settings in
/etc/skel/
. Can be copied on first startup using post start hook.
keras3-cpu
- Based on:
jupyter/tensorflow-notebook:latest
(quay.io, src, doc) - Available as:
christofkaufmann/keras3-cpu-notebook:latest
- Features:
- Jupyter enabled
- Contains Keras3 with tensorflow as backend
- Added VS Code via code-server as web UI and via VS Code CLI for tunnels
- Added Notebook git Puller (nbgitpuller)
- Provide default home directory contents including VS Code Extensions and settings in
/etc/skel/
. Can be copied on first startup using post start hook.
keras3-gpu
- Based on:
jupyter/tensorflow-notebook:cuda-latest
(quay.io, src, doc) - Available as:
christofkaufmann/keras3-gpu-notebook:latest
- Features:
- Jupyter enabled
- Contains Keras3 with tensorflow as backend for GPU support (requires nvidia-driver)
- Added VS Code via code-server as web UI and via VS Code CLI for tunnels
- Added Notebook git Puller (nbgitpuller)
- Provide default home directory contents including VS Code Extensions and settings in
/etc/skel/
. Can be copied on first startup using post start hook.
pytorch-gpu
- Based on:
jupyter/scipy-notebook:latest
(quay.io, src, doc) - Available as:
christofkaufmann/pytorch-gpu-notebook:latest
- Features:
- Jupyter enabled
- Contains PyTorch with GPU support (requires nvidia-driver)
- Contains Huggingface packages
- Added VS Code via code-server as web UI and via VS Code CLI for tunnels
- Added Notebook git Puller (nbgitpuller)
- Provide default home directory contents including VS Code Extensions and settings in
/etc/skel/
. Can be copied on first startup using post start hook.
Usage
You can use the image from docker hub or build it locally.
JupyterHub
In JupyterHub: Just specify e. g. for CPU image:
singleuser:
image:
name: christofkaufmann/tensorflow-cpu-notebook
tag: latest
Local
When you want to try it out locally you can use docker
or podman
. The following instructions are for docker
, but for podman
you should be able to replace the word docker
by podman
.
docker run -d -it -p 8888:8888 -e GRANT_SUDO=yes -e JUPYTER_ENABLE_LAB=yes --user root --name tf-cpu christofkaufmann/tensorflow-cpu-notebook
docker logs tf-cpu
# use the link that starts with http://127.0.0.1:8888/... to try it out
# if it is not running on you local machine, forward the port before clicking the link using: ssh build-machine -L 8888:localhost:8888
Note: To test GPU capabilities, add --cap-add SYS_ADMIN --gpus all
(docker) or --cap-add SYS_ADMIN --device 'nvidia.com/gpu=all'
(podman) to the run command.
Note, there are some differences, when running it in docker compared to when running it in Kubernetes. In Kubernetes the home directory will be a mounted volume, that will persist between sessions. On first start the volume is empty (e. g. VS Code extensions are gone). We save important configuration at /etc/skel/
and copy it back on first startup to circumvent this.
To get a shell into the running container, use:
docker exec -it tf-cpu bash
In Kubernetes the user will be jovyan
. You can switch to it in docker as usual with su jovyan -
. Note, that jovyan
owns not only its home directory, but also /opt/conda/
, which allows installations of Python packages using mamba
or pip
, but even in Kubernetes these will not persist between sessions. Also /etc/jupyter/
is writable by jovyan
. When you are done, you can exit the container as usual. Clean up afterwards with:
docker stop tf-cpu # paused, continue with: docker start tf-cpu
docker rm tf-cpu # removes container (your changes), but not the image
Build
Here is the procedure, when you want to build the CPU image from the Dockerfile before running:
cd tensorflow-cpu-with-vscode
docker build --rm --tag tensorflow-cpu-with-vscode . # for podman add: --format docker
docker run --rm -d -it -p 8888:8888 -e GRANT_SUDO=yes -e JUPYTER_ENABLE_LAB=yes --user root --name tf-cpu tensorflow-cpu-with-vscode
docker logs tf-cpu
docker exec -it tf-cpu bash
# do your thing
docker stop tf-cpu # also removes container, because of the --rm parameter on docker run
And the same for the GPU images (tensorflow-gpu-with-vscode
, keras3-gpu
, pytorch-gpu
):
cd tensorflow-gpu-with-vscode
docker build --rm --tag tensorflow-gpu-with-vscode . # for podman add: --format docker
docker run --rm -d -it -p 8888:8888 -e GRANT_SUDO=yes -e JUPYTER_ENABLE_LAB=yes --user root --cap-add SYS_ADMIN --gpus all --name tf-gpu tensorflow-gpu-with-vscode
docker logs tf-gpu
docker exec -it tf-gpu bash
# do your thing, e. g. try nvidia-smi (it should be mounted from the host by the docker run command options we supplied)
docker stop tf-gpu # also removes container, because of the --rm parameter on docker run
Build with Kaniko
Pushing to docker hub is done from CI/CD automatically when the Dockerfile changes on the master branch. There --target build
is added in the build command to skip the test layer. Also, kaniko
is used instead of docker
, since we cannot run docker
inside Kubernetes easily. Sometimes kaniko
behaves a bit different compared to docker build
. To test the building process with kaniko
for the CPU image:
cd tensorflow-cpu-with-vscode
docker run \
-v .:/workspace \
gcr.io/kaniko-project/executor:v1.23.2-debug \
--context dir:///workspace/ \
--dockerfile /workspace/Dockerfile \
--no-push \
--use-new-run \
--snapshot-mode=redo \
--cache=false \
--compressed-caching=false
or for the GPU images:
cd tensorflow-gpu-with-vscode
# or:
cd keras3-gpu
docker run \
-v ".:/workspace" \
gcr.io/kaniko-project/executor:v1.23.2-debug \
--context dir:///workspace/ \
--dockerfile /workspace/Dockerfile \
--no-push \
--use-new-run \
--snapshot-mode=redo \
--cache=false \
--compressed-caching=false
Afterwards look for the container ID with docker ps -a
and remove it with docker rm <CONTAINER ID>
. However, you cannot run the built image afterwards. Use this rather to test the CI/CD pipeline.