Nextflow: A Workflow Manager for Bioinformatics Applications

All material (C) 2021-2022 by CSC -IT Center for Science Ltd. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 Unported License, http://creativecommons.org/licenses/by-sa/4.0/

Outline

  • Introduction to Nextflow
  • Getting started with Nextflow at CSC
  • Good practices for running Nextflow
  • Tutorials

Introduction to Nextflow

  • A tool for managing scientific workflows
  • Written in groovy, extension of Java programming language
  • Follows dataflow programming model
    • Communication by dataflow vairables
  • Documentation Nextflow homepage

Core Features of Nextflow

  • Reproducibility
  • Portability
  • Parallelisation(implicit)
  • Continuous checkpoints
  • Easy prototyping

Nextflow: Essential Building Blocks

Getting started with Nextflow at CSC

  • Required
    • Runs on any Linux platform, macOS
    • at least Java 8
  • Nextflow installation:
    • As a module on Puhti/Mahti: module load nextflow
    • Own installation: curl get.nextflow.io | bash; mv nextflow ~/bin
  • Supported software stack:
    • Local installations as modules
    • Docker engine (Not possible)
    • Singularity/Apptainer
    • Conda (Not recommended)

Run Nextflow as a (Normal) Batch Job

Running Nextflow using built-in slurm executor

In nextflow.config file, once can have the following exerpt:

profiles {

standard {
process.executor = ‘local’
}

puhti {
process.clusterOptions = ‘–account=project_xxxx –ntasks-per-node=1 –cpus-per-task=4 –ntasks=1 –time=00:00:05’ process.executor = ‘slurm’ process.queue = ‘small’ process.memory = ‘10GB’
}

}

Usage:
> nextflow run -profile puhti

Running Nextflow using HyperQueue Executor

- Good documentation on CSC docs page

Good Practices for Running Nextflow Pipelines at CSC (1/2)

  • Use version control of tools for reproducibility
  • Use containers for easy portability
  • Set singularity/Apptainer cache directory to scratch folder (to avoid blowing up home directory)
  • Avoid big databases on Lustre or any databases inside of a container
  • Try cleaning up temporary files created by workflow when possible (Big problem for Nextflow)

Good Practices for Running Nextflow Pipelines at CSC (2/2)

  • Avoid in-built slurm executor by Nextflow
  • Mount writable file system to avoid permission errors when working with containers

(Short)Tutorials