Skip to content
Snippets Groups Projects
Select Git revision
  • development
  • master default protected
2 results

gpu-cluster-images

  • Clone with SSH
  • Clone with HTTPS
  • Christof Kaufmann's avatar
    Christof Kaufmann authored
    keras3-cpu: Fix broken PDF export of notebooks
    
    Closes #27
    
    See merge request !40
    92afd9b7
    History

    Custom Images for AKIS GPU Cluster

    This is a collection of customized Jupyter images to be used in the JupyterHub running on the Kubernetes AKIS GPU Cluster.

    tensorflow-cpu-with-vscode

    • Based on: jupyter/tensorflow-notebook:tensorflow-2.15.0.post1 (quay.io, src, doc)
    • Available as: christofkaufmann/tensorflow-cpu-notebook:latest
    • Features:
      • Jupyter enabled
      • Contains tensorflow-2.15.0.post1 without GPU support
      • Contains R
      • Added VS Code via code-server as web UI and via VS Code CLI for tunnels
      • Added Notebook git Puller (nbgitpuller)
      • Provide default home directory contents including VS Code Extensions and settings in /etc/skel/. Can be copied on first startup using post start hook.

    tensorflow-gpu-with-vscode

    • Based on: jupyter/scipy-notebook (quay.io, src, doc)
    • Available as: christofkaufmann/tensorflow-gpu-notebook:latest
    • Features:
      • Jupyter enabled
      • Contains tensorflow-2.15.1 with GPU support (requires nvidia-driver)
      • Added VS Code via code-server as web UI and via VS Code CLI for tunnels
      • Added Notebook git Puller (nbgitpuller)
      • Provide default home directory contents including VS Code Extensions and settings in /etc/skel/. Can be copied on first startup using post start hook.

    keras3-cpu

    • Based on: jupyter/tensorflow-notebook:latest (quay.io, src, doc)
    • Available as: christofkaufmann/keras3-cpu-notebook:latest
    • Features:
      • Jupyter enabled
      • Contains Keras3 with tensorflow as backend
      • Added VS Code via code-server as web UI and via VS Code CLI for tunnels
      • Added Notebook git Puller (nbgitpuller)
      • Provide default home directory contents including VS Code Extensions and settings in /etc/skel/. Can be copied on first startup using post start hook.

    keras3-gpu

    • Based on: jupyter/tensorflow-notebook:cuda-latest (quay.io, src, doc)
    • Available as: christofkaufmann/keras3-gpu-notebook:latest
    • Features:
      • Jupyter enabled
      • Contains Keras3 with tensorflow as backend for GPU support (requires nvidia-driver)
      • Added VS Code via code-server as web UI and via VS Code CLI for tunnels
      • Added Notebook git Puller (nbgitpuller)
      • Provide default home directory contents including VS Code Extensions and settings in /etc/skel/. Can be copied on first startup using post start hook.

    pytorch-gpu

    • Based on: jupyter/scipy-notebook:latest (quay.io, src, doc)
    • Available as: christofkaufmann/pytorch-gpu-notebook:latest
    • Features:
      • Jupyter enabled
      • Contains PyTorch with GPU support (requires nvidia-driver)
      • Contains Huggingface packages
      • Added VS Code via code-server as web UI and via VS Code CLI for tunnels
      • Added Notebook git Puller (nbgitpuller)
      • Provide default home directory contents including VS Code Extensions and settings in /etc/skel/. Can be copied on first startup using post start hook.

    Usage

    You can use the image from docker hub or build it locally.

    JupyterHub

    In JupyterHub: Just specify e. g. for CPU image:

    singleuser:
      image:
        name: christofkaufmann/tensorflow-cpu-notebook
        tag: latest

    Local

    When you want to try it out locally you can use docker or podman. The following instructions are for docker, but for podman you should be able to replace the word docker by podman.

    docker run -d -it -p 8888:8888 -e GRANT_SUDO=yes -e JUPYTER_ENABLE_LAB=yes --user root --name tf-cpu christofkaufmann/tensorflow-cpu-notebook
    docker logs tf-cpu
    # use the link that starts with http://127.0.0.1:8888/... to try it out
    # if it is not running on you local machine, forward the port before clicking the link using: ssh build-machine -L 8888:localhost:8888

    Note: To test GPU capabilities, add --cap-add SYS_ADMIN --gpus all (docker) or --cap-add SYS_ADMIN --device 'nvidia.com/gpu=all' (podman) to the run command.

    Note, there are some differences, when running it in docker compared to when running it in Kubernetes. In Kubernetes the home directory will be a mounted volume, that will persist between sessions. On first start the volume is empty (e. g. VS Code extensions are gone). We save important configuration at /etc/skel/ and copy it back on first startup to circumvent this.

    To get a shell into the running container, use:

    docker exec -it tf-cpu bash

    In Kubernetes the user will be jovyan. You can switch to it in docker as usual with su jovyan -. Note, that jovyan owns not only its home directory, but also /opt/conda/, which allows installations of Python packages using mamba or pip, but even in Kubernetes these will not persist between sessions. Also /etc/jupyter/ is writable by jovyan. When you are done, you can exit the container as usual. Clean up afterwards with:

    docker stop tf-cpu  # paused, continue with: docker start tf-cpu
    docker rm tf-cpu    # removes container (your changes), but not the image

    Build

    Here is the procedure, when you want to build the CPU image from the Dockerfile before running:

    cd tensorflow-cpu-with-vscode
    docker build --rm --tag tensorflow-cpu-with-vscode .   # for podman add: --format docker
    docker run --rm -d -it -p 8888:8888 -e GRANT_SUDO=yes -e JUPYTER_ENABLE_LAB=yes --user root --name tf-cpu tensorflow-cpu-with-vscode
    docker logs tf-cpu
    docker exec -it tf-cpu bash
    # do your thing
    docker stop tf-cpu  # also removes container, because of the --rm parameter on docker run

    And the same for the GPU images (tensorflow-gpu-with-vscode, keras3-gpu, pytorch-gpu):

    cd tensorflow-gpu-with-vscode
    docker build --rm --tag tensorflow-gpu-with-vscode .   # for podman add: --format docker
    docker run --rm -d -it -p 8888:8888 -e GRANT_SUDO=yes -e JUPYTER_ENABLE_LAB=yes --user root --cap-add SYS_ADMIN --gpus all --name tf-gpu tensorflow-gpu-with-vscode
    docker logs tf-gpu
    docker exec -it tf-gpu bash
    # do your thing, e. g. try nvidia-smi (it should be mounted from the host by the docker run command options we supplied)
    docker stop tf-gpu  # also removes container, because of the --rm parameter on docker run

    Build with Kaniko

    Pushing to docker hub is done from CI/CD automatically when the Dockerfile changes on the master branch. There --target build is added in the build command to skip the test layer. Also, kaniko is used instead of docker, since we cannot run docker inside Kubernetes easily. Sometimes kaniko behaves a bit different compared to docker build. To test the building process with kaniko for the CPU image:

    cd tensorflow-cpu-with-vscode
    docker run \
        -v .:/workspace \
        gcr.io/kaniko-project/executor:v1.23.2-debug \
        --context dir:///workspace/ \
        --dockerfile /workspace/Dockerfile \
        --no-push \
        --use-new-run \
        --snapshot-mode=redo \
        --cache=false \
        --compressed-caching=false

    or for the GPU images:

    cd tensorflow-gpu-with-vscode
    # or:
    cd keras3-gpu
    docker run \
        -v ".:/workspace" \
        gcr.io/kaniko-project/executor:v1.23.2-debug \
        --context dir:///workspace/ \
        --dockerfile /workspace/Dockerfile \
        --no-push \
        --use-new-run \
        --snapshot-mode=redo \
        --cache=false \
        --compressed-caching=false

    Afterwards look for the container ID with docker ps -a and remove it with docker rm <CONTAINER ID>. However, you cannot run the built image afterwards. Use this rather to test the CI/CD pipeline.