Working with Containerised Applications

All material (C) 2022-2024 by CSC -IT Center for Science Ltd. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 Unported License, http://creativecommons.org/licenses/by-sa/4.0/

Outline

  • Primer on Docker images for Apptainer (Recap)
  • HPC disk systems and efficient image conversion
  • Good practice tips in converting images




Primer on Docker images for Apptainer (Recap)

Docker images: Overview

  • The docker is a leading container platform and is widely adopted in industry/cloud applications
  • Plenty of images are archived in docker registries
  • Provides even GPU-accelerated solutions
  • Handy when directly building Apptainer image seems more difficult
  • Good news: Singularity/Apptainer nicely integrates with docker images
  • Challenge: Some docker images may not work smoothly with Apptainer

Docker vs. Apptainer images

  • Docker image
    • Image is a layered structure
    • Stored in the local image cache when using docker client
    • Saves some space by sharing some layers across different local images when storing
    • Caching the build layers can speed up the build process
  • Apptainer image
    • Single image file (.sif) / images are flat
    • Stored as a normal file
    • Easy portability : transferring and sharing images across a cluster is very easy
    • Image can be used as binary (./image.sif)

Running Apptainers from pre-built images from Docker registry

  • Public images:
    • apptainer pull/build imagename.sif docker://reponame/imagename:tag
  • Private images:
    • apptainer pull –docker-login docker://privaterepo/imagename:tag
  • Local docker archive:
    • apptainer build local_apptainer_image.sif docker-archive://local_docker.tar

Sharing images via container registries

  • Sharing images without registries is usually preferred
  • Share your images via registries and maintain proper codes in GitHub/Gitlab
  • Sharing images via registries requires pushing your image to registry
  • Production images needs to be properly named and tagged

Biocontainers: Bioinformatics containers

  • A community-driven effort
  • Focus is to create and manage bioinformatics software containers
  • Focus on popular Omics’ methods (Genomics, proteomics, metagenomics, metabolomics)
  • Can be integrated into bioinformatics pipelines and different architectures
  • Provides ready-made containers for bioinformatics community

Container image registries: Docker Hub

  • A registry from Docker
  • A centralized management of user accounts, image chesums and public/private repositories
  • Referencing an image in DockerHub registry: docker://username/image[:tag]
  • Not all images can work smoothly with Apptainer
    • Applications with root access
    • Applications with entrypoints

Container image registries: Red Hat Quay

  • Quay (Red Hat) is a container image registry
  • a scalable open source platform to host container images across any size organization
  • Create your own public repositories
  • Provides CI support for automated builds for BioConda GitHub
  • All Biocontainers are docker-based and are publicly available for free
  • Referencing an image: docker://quay.io/username/image[:tag]

Reminder: Take care of Apptainer cache in Puhti

  • Default location: $HOME/.apptainer
    • Disk space in $HOME in HPC systems is limited for a user
    • Your $HOME directory can be quickly filled up, causing disk space issues
  • One can configure Apptainer directories using two environment variables:
    • APPTAINER_CACHEDIR: Cache folder for images from a container registry.
    • APPTAINER_TMPDIR: Temporary directory to build container file-systems.
  • Useful tips:
    • apptainer cache list # show storage capacity used by the cache
    • apptainer cache clean # clean up everything




HPC Disk Systems and Efficient Image Conversion

CSC disk and storage areas: Know your disk spaces

How to use host folders on HPC systems

  • No data persistence in container file systems
  • Containers are stateless
  • Decoupling container from storage
  • One needs to bind mount your own folder (HOME,PROJAPPL,SCRATCH)
    • Apptainer run -B /host/path:/guest/path apptainer_image.simg
  • A friendly tool at CSC: apptainer_wrapper command

NVMe disks for faster image conversion

  • Use only when working with bigger images
  • Not all compute nodes have these NVMe disks
  • You can request these resources in batch script/interactive node
  • Use the environment variable $LOCAL_SCRATCH to access local storage on compute node
  • In batch jobs, remember to copy the files back to your scratch folders

Tiny batch script that sses NVMe disks

  • Image conversion example




Good Practice Tips in Image Conversion

Good practice tips in using docker images (1/2)

  • Maintain tags of the image (reproducibility!)
    • generic URI for docker: docker://<user >/<repo-name >[: <tag >]
    • developers release several different versions of the same container with different tags
  • Hashes are even more unique
    • Always pull the same image: library://debian:sha256.b92c7fdfcc615
  • Pay attention to special tag: latest
    • pulling an image with latest: apptainer pull library://debian:latest

Good practice tips in using docker images (2/2)

  • Don’t use any image that would be running under root
    • Works fine in docker world and results in bugs in apptainer
  • If you modify an image, don’t install to $HOME or $TMP
  • Docker images with ENTRYPOINT scripts are usually broken
    • ENTRYPOINT [ “entrypoint_script.sh” ]