Running Workflows at CSC

All material (C) 2021-2024 by CSC -IT Center for Science Ltd. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 Unported License, http://creativecommons.org/licenses/by-sa/4.0/

Outline

Methods of running (bio)workflows at CSC
Good practices for running high-throughput workflows

Reminder: Submitting workflow jobs at CSC supercomputers

Login nodes are used to set up jobs (and to launch them from batch scripts)
- Don’t launch workflows on login nodes
Jobs are run on the compute nodes
- Interactive nodes can be used as well
- Singularity/Apptainer is installed on all (compute and login) nodes
Slurm batch scheduler is used to run and manage jobs

Reminder: Managing batch jobs

A batch job script is submitted to the queue with the command:
- sbatch example_job.sh
List all your jobs that are queuing/running:
- squeue -u $USER
Detailed info of a queuing/running job:
- scontrol show job <jobid>
A job can be deleted using the command:
- scancel <jobid>
Display the resource usage and efficiency of a completed job:
- seff <jobid>

Methods of running workflows at CSC

Deploy workflows with native slurm executor
- Jobs can spread across full cluster
- Pay attention to overheads (Slurm accounting DB/Batch Queueing)
Submit workflows as a normal batch jobs
- Can request full node
Deploy workflows using HyperQueue as a sub-job scheduler
- Can use multiple nodes

Running Nextflow using native slurm executor

In nextflow.config file, one can have the following exerpt:

profiles {

 standard {  
     process.executor = 'local' 
   }

 puhti {  
     process.clusterOptions = '--account=project_xxxx --ntasks-per-node=1 --cpus-per-task=4 --ntasks=1 --time=00:00:05'
     process.executor = 'slurm'
     process.queue = 'small'
     process.memory = '10GB'   
    }
    
}

Usage:
> nextflow run -profile puhti …

Wrapping Nextflow pipeline as a (normal) batch job

One can request more resources if needed
All rules are run in the same job allocation

#!/bin/bash
#SBATCH --time=00:15:00            # Change your runtime settings
#SBATCH --partition=test           # Change partition as needed
#SBATCH --account=<project>        # Add your project name here
#SBATCH --cpus-per-task=<value>    # Change as needed
#SBATCH --mem-per-cpu=1G           # Increase as needed

# Load Nextflow module
module load nextflow/22.10.1

# Actual Nextflow command here
nextflow run workflow.nf <options>

Running Nextflow using HyperQueue executor

Multiple nodes can be deployed under the same job allocation
No need for queueing subjobs more on HyperQueue

# Specify a location for the HyperQueue server
export HQ_SERVER_DIR=${PWD}/hq-server-${SLURM_JOB_ID}
mkdir -p "${HQ_SERVER_DIR}"

# Start the server in the background (&) and wait until it has started
hq server start &
until hq job list &>/dev/null ; do sleep 1 ; done

# Start the workers in the background and wait for them to start
srun --overlap --cpu-bind=none --mpi=none hq worker start --cpus=${SLURM_CPUS_PER_TASK} &
hq worker wait "${SLURM_NTASKS}"

# Ensure Nextflow uses the right executor and knows how much it can submit
echo "executor {
  queueSize = $(( 40*SLURM_NNODES ))
  name = 'hq'
  cpus = $(( 40*SLURM_NNODES ))
}" >> nextflow.config

nextflow run <newtflow.nf> <options>

# Wait for all jobs to finish, then shut down the workers and server
hq job wait all
hq worker stop all
hq server stop

Getting started with Snakemake at CSC

Use pre-installed Snakemake as a module:
- Puhti: module load snakemake/version
- LUMI :
  - module use /appl/local/csc/modulefiles/
  - module load snakemake/8.4.6
Do your own installations
- tykky container wrapper/pip installation in user space
Install your application stack:
- Local installations (as modules or custom installations)
- Docker engine (Not possible)
- Singularity/Apptainer
- Conda (Not supported at CSC)

Deploying Snakemake with native slurm executor

Syntax for slurm executor depends on Snakemake version/plugin
Submits a job to cluster for each rule
Not for large number of small sub-job steps (<30 min)

module load snakemake/8.4.6

snakemake -s Snakefile --jobs 1 \
 --latency-wait 60 \
 --executor cluster-generic \
 --cluster-generic-submit-cmd "sbatch --time 10 \
 --account=project_xxxx --job-name=hello-world \
 --tasks-per-node=1 --cpus-per-task=1 --mem-per-cpu=4000 --partition=test"

 or

snakemake --jobs 1  -s Snakefile \
--executor slurm --default-resources \
slurm_account=project_xxxx slurm_partition=test

Submit Snakemake as a batch job

One can request more resources if needed
All rules are run in the same job allocation

#!/bin/bash
#SBATCH --job-name=myTest
#SBATCH --account=project_xxxxx
#SBATCH --time=00:10:00
#SBATCH --mem-per-cpu=2G
#SBATCH --partition=test
#SBATCH --cpus-per-task=4

module load snakemake/8.4.6
snakemake -s Snakefile --use-singularity --jobs 4

Running Snakemake using HyperQueue executor

Useful for high-throughput jobs
Multiple nodes can be deployed under the same job allocation
No need for queueing jobs
Documentation on HyperQueue usage

Good practices for running HT workflows (1/3)

Avoid unnecessary reads and writes of data on Lustre file system to improve I/O performance
- If unavoidable, use fast local NVMe disk, not Lustre (i.e. /scratch)
Don’t run too many/short job steps – they will bloat Slurm accounting DB
- Avoid slurm scheduler
Don’t run too long jobs without a restarting option.

Good practices for running HT workflows (2/3)

Don’t use Conda installations on Lustre (/projappl, /scratch, $HOME)
- Containerize Conda environments instead to improve performance
Don’t create a lot of files, especially within a single folder
If you’re creating 10 000+ files, you should probably rethink your workflow
Consider removing temporary files after job is finished
Whenever possible, separate serial jobs from parallel ones for efficient usage of resourses.

Good practices for running HT workflows (3/3)

Use version control of tools for reproducibility
Use containers for easy portability
Set singularity/Apptainer cache directory to scratch folder (to avoid blowing up home directory)
Avoid big databases on Lustre or any databases inside of a container
If you are downloading lot of data on the fly in your workflow, try to stage it locally.