Introduction to Bio-workflows

All material (C) 2021-2024 by CSC -IT Center for Science Ltd. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 Unported License, http://creativecommons.org/licenses/by-sa/4.0/

Outline

  • Workflows in bioinformatics
  • Nextflow as a workflow manager
  • The essential building blocks of Nextflow
  • Running and inspecting Nextflow pipelines

What is a Workflow ?

Nextflow

  • A tool for managing scientific workflows
  • Written in groovy, extension of Java programming language
  • Follows dataflow programming model
    • Communication by dataflow vairables
  • Workflow system for creating scalable, portable, and reproducible workflows
  • Documentation Nextflow homepage

Nextflow: Core Features

Nextflow: Core Features

Nextflow: Core Features

Nextflow: Core Features

Nextflow: Core Features

Nextflow: Essential Building Blocks

More on Nexflow Channels

  • Two different kinds of channels: Queue and Value channels.
  • Create a channel : syntax - Channel.<method>
  • Value channel: can be used multiple times in workflow
    • Channel.value (‘single value/list object/map object’)
  • Queue channel: consumed when they are used by a process or an operator.
    • Channel.fromList ([‘salmon’, ‘kallisto’])
    • Channel.fromPath( ’data/*.fq.gz’ )
    • Channel.fromFilePairs(’data/FA33*_{1,2}.fq.gz’)
    • Channel.fromSRA(‘SRP043510’)

Getting started with Nextflow

  • Required
    • Runs on any Linux platform, macOS
    • at least Java 8 (e.g., module load biojava/16)
  • Nextflow installation
    • As a module on Puhti/Mahti: module load nextflow/<version>
    • Own installation: curl https://get.nextflow.io | bash; mv nextflow ~/bin
  • Runtime environment for applications
    • Local installations/modules
    • Docker engine (Not possible on HPC systems)
    • Singularity/Apptainer
    • Conda (Not recommended at CSC)

Configuring Sigularity/Apptainer with Nextflow

  • Options to use singularity/Apptainer with nextflow:
    • Commandline option : –with-singularity /path/to/image.img

    • Declaring in configuration file: In ‘nextflow.config’ file as profile

       singularity {
          singularity.enabled = true  # or apptainer.enabled=true
          singularity.autoMounts = true
          process.executor = "local"
          }

  • Run Nextflow pipelines : nextflow run main.nf -profile singularity

Nextflow configuration file(s)

Nextflow: Hello-World Example

- Executing pipeline - nextflow run <pipeline-name.nf> <options>

Inspecting Nextflow Results

  • Nextflow creates a folder (i.e., inside work directory) for each process
  • Each folder contains
    • Links to input files
    • Output files
    • Number of hidden files
    • Script used for the process
  • You can publish results to a different folder

nf-core Pipelines: Community-driven curated workflows

  • nf-core: A community effort to collect a curated set of analysis pipelines built using Nextflow
  • You don’t need to install any nf-core tools to run nf-core pipelines at CSC
  • Provides nice guidelines and pipelines
    • Pipelines: released (68); development (36)
  • Each pipeline has its own documentation
  • Join https://nf-co.re/join via #slack