Snakemake Hackathon in CSC Supercomputers
Outline
- Primer on CSC HPC environment
- Running Snakemake workflows at CSC
- Good practices for running high-throughput workflows
CSC Computing Environment at Glance
- Puhti: A general-purpose supercomputer
- Mahti: A massively parallel supercomputer
- LUMI: A European pre-exascale supercomputer operated by CSC
- Pouta: Cloud resources offered via OpenStack (IaaS)
- Rahti: Container-as-a-platform service via Openshift/K8S (PaaS)
- Allas: Cloud-based object storage for all services
Connecting to CSC Supercomputers
- A simple way is to login via web interface
- Login with SSH
- Commandline access to the supercomputers
- Mac and Linux have SSH. On Windows, Powershell can be used, but we recommend the web interfaces, or clients like MobaXterm or PuTTY
- Plain SSH will not allow displaying remote graphics
Main Disk Areas in Puhti/Mahti
- Home directory (
$HOME
)
- Other users cannot access your home directory
- ProjAppl directory (
/projappl/project_name
)
- Shared with project members
- Possible to limit access (
chmod g-rw
) to subfolders
- Scratch directory (
/scratch/project_name
)
- Shared with other project members
- Files older than 180 days will be automatically removed on Puhti
- These directories reside on the Lustre parallel file system
- Default quotas and more info in disk areas section of Docs CSC
Additional Fast Local Disk Areas
$TMPDIR
on login nodes
- Each of the login nodes have 2900 GiB of fast local storage in
$TMPDIR
- The local disk is meant for temporary storage (e.g. compiling software) and is cleaned frequently
- NVMe disks on some compute nodes on Puhti
- Interactive, I/O and GPU nodes have fast local disks (NVMe) in
$LOCAL_SCRATCH
- Also, the GPU nodes on Mahti have fast local storage available
- You must copy data to and from the fast disk during your batch job since the NVMe is accessible only during your job allocation
- If your job reads and/or writes a lot of small files, using this can give a huge performance boost!
The Allas Object Storage
- Allas is a CSC cloud storage service
- Possible to upload data from personal laptops or organizational storage systems into Allas
- Meant for data storage during the lifetime of your CSC projects
- Allas accessing tools (e.g., a-tools, rclone) available on Puhti and Mahti
- Allas is NOT a backup service
Submitting Jobs to CSC Supercomputers
- Login nodes are used to set up jobs (and to launch them)
- Jobs are run on the compute nodes
- A batch job system (scheduler) is used to run and manage the jobs
- On CSC machines, we use Slurm
- The syntax is different but basic operation is similar
Module Systems in Supercomputers
- CSC uses module system to manage software stack with different (possibly conflicting) requirements
- The general syntax:
module load modulename
- Other useful module commands:
module avail
: Modules currently available for loading (hides modules that can’t be loaded at the moment due to dependencies)
module spider modulename
: Search for an application in the list of all existing modules
module spider modulename/version
: Gives information on how to load a specific version of a module (prerequisites etc.)
Submitting, Cancelling and Status of Batch Jobs
- A batch job script is submitted to the queue with the command:
- List all your jobs that are queuing/running:
- Detailed info of a queuing/running job:
scontrol show job <jobid>
- A job can be deleted using the command:
- Display the resource usage and efficiency of a completed job:
Getting Started with Snakemake at CSC
- Use pre-installed Snakemake as a module:
- Puhti: module load snakemake/version
- LUMI :
- module use /appl/local/csc/modulefiles/
- module load snakemake/8.4.6
- Do your own installations
- Install your application stack:
- Local installations (as modules or custom installations)
- Docker engine (Not possible)
- Singularity/Apptainer
- Conda (Not supported at CSC)
Methods of Running Snakemake at CSC
- Deploy Snakemake with native slurm executor
- Jobs can spread across full cluster
- Pay attention to overheads (Slurm accounting DB/Batch Queueing)
- Submit Snakemake workflow as a normal batch job
- Deploy Snakemake using HyperQueue as a sub-job scheduler
Deploying Snakemake with Native slurm Executor
- Syntax for slurm executor depends on Snakemake version/plugin
- Submits a job to cluster for each rule
- Not for large number of small sub-job steps (<30 min)
module load snakemake/8.4.6
snakemake -s Snakefile --jobs 1 \
--latency-wait 60 \
--executor cluster-generic \
--cluster-generic-submit-cmd "sbatch --time 10 \
--account=project_xxxx --job-name=hello-world \
--tasks-per-node=1 --cpus-per-task=1 --mem-per-cpu=4000 --partition=test"
or
snakemake --jobs 1 -s Snakefile \
--executor slurm --default-resources \
slurm_account=project_xxxx slurm_partition=test
Submit Snakemake as a Batch Job
- One can request more resources if needed
- All rules are run in the same job allocation
#!/bin/bash
#SBATCH --job-name=myTest
#SBATCH --account=project_xxxxx
#SBATCH --time=00:10:00
#SBATCH --mem-per-cpu=2G
#SBATCH --partition=test
#SBATCH --cpus-per-task=4
module load snakemake/8.4.6
snakemake -s Snakefile --use-singularity --jobs 4
Running Snakemake Using HyperQueue Executor
Good Practices for Running HT Workflows
- Avoid unnecessary reads and writes of data on Lustre file system to improve I/O performance
- If unavoidable, use fast local NVMe disk, not Lustre (i.e. /scratch)
- Don’t run too many/short job steps – they will bloat Slurm accounting DB
- Don’t run too long jobs without a restarting option.
Good Practices for Running HT Workflows
- Don’t use Conda installations on Lustre (/projappl, /scratch, $HOME)
- Containerize Conda environments instead to improve performance
- Don’t create a lot of files, especially within a single folder
- If you’re creating 10 000+ files, you should probably rethink your workflow
- Consider removing temporary files after job is finished
- Whenever possible, separate serial jobs from parallel ones for efficient usage of resourses.
Good Practices for Running HT Workflows
- Use version control of tools for reproducibility
- Use containers for easy portability
- Set singularity/Apptainer cache directory to scratch folder (to avoid blowing up home directory)
- Avoid big databases on Lustre or any databases inside of a container
- If you are downloading lot of data on the fly in your workflow, try to stage it locally.