Disk areas in CSC’s HPC environment

In this section, you will learn how to work in different disk areas in CSC’s HPC environment

All materials (c) 2020-2023 by CSC – IT Center for Science Ltd. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 Unported License, http://creativecommons.org/licenses/by-sa/4.0/

Overview of disk areas

Disk and storage overview

Main disk areas in Puhti/Mahti

  • Home directory ($HOME)
    • Other users cannot access your home directory
  • ProjAppl directory (/projappl/project_name)
    • Shared with project members
    • Possible to limit access (chmod g-rw) to subfolders
  • Scratch directory (/scratch/project_name)
    • Shared with project members
    • Files older than 180 days will be automatically removed
  • These directories reside on the Lustre parallel file system
  • Default quotas and more info in disk areas section of Docs CSC

Moving data between and to/from supercomputers

Displaying current status of disk areas

  • Use the csc-workspaces command to show available projects and quotas

Disk and storage overview (revisited)

Additional fast local disk areas

  • $TMPDIR on login nodes
    • Each of the login nodes have 2900 GiB of fast local storage in $TMPDIR
    • The local disk is meant for temporary storage (e.g. compiling software) and is cleaned frequently
  • NVMe disks on some compute nodes on Puhti
    • Interactive, I/O and GPU nodes have fast local disks (NVMe) in $LOCAL_SCRATCH
    • Also, the GPU nodes on Mahti have fast local storage available
    • You must copy data to and from the fast disk during your batch job since the NVMe is accessible only during your job allocation
    • If your job reads and/or writes a lot of small files, using this can give a huge performance boost!

What are the different disk areas for?

  • Allas – for data which is not actively used
  • $HOME – small, only for the most important (small) files, personal access only
  • /scratch – main working area, shared with project members, only for data in active use
  • /projappl – not cleaned up, e.g. for shared binaries
  • Login node $TMPDIR – compiling, temporary storage, fast I/O
  • Compute node NVMe $LOCAL_SCRATCH – fast I/O in batch jobs

Best practices

  • None of the disk areas are automatically backed up by CSC, so make sure to perform regular backups to, e.g., Allas
  • Don’t run databases or Conda on Lustre (/projappl, /scratch, $HOME)
    • Containerize Conda environments with Tykky and use other CSC services like Kaivos or cPouta for databases (Rahti also an option but connecting to it from Puhti/Mahti is complicated)
  • Don’t create a lot of files, especially within a single folder
    • If you’re creating 10 000+ files, you should probably rethink your workflow
  • Consider using fast local disks when working with many small files
  • Lustre best practices and efficient I/O in high-throughput workflows