All material (C) 2022-2023 by CSC -IT Center for Science Ltd. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 Unported License, http://creativecommons.org/licenses/by-sa/4.0/
High-throughput Computing in HPC Environment
Our scientific computational problems need:
Faster computation
Scalability
Parallel processing
Improved accuracy
High-throughput computing (HTC) is more about running many independent tasks that require a large amount of computing power(i.e., across many different computers)
Good Practices for High-throughput Computing
Are you using conda environment for your application?
Conda environment may contain easily several thousands of files
When you load your application, you have to import all these files and can add overhead on Lustre file system
Takes lot of time when loading your environment in high-thoughput workflows
Solution:
Containerise your application; from apllication point of view, whole environment is a single file
e.g., use Tykky wrapper
Good Practices for High-throughput Computing
Do you have lot of I/O operations in your application?
Pay special attention to where these operations are performed
Note that Lustre file system is designed for efficient parallel IO of large files
Intensive IO-operations risk degrading the file system performance for all users
Solution:
Fast local NVMe disk on Puhti and Mahti GPU-nodes
Ramdisk (/dev/shm) on Mahti CPU-nodes
Good Practices for High-throughput Computing
If you have fewer jobs, an easy solution is use array jobs
Does your workflow involve running a substantial amount of (short) batch jobs?
it poses problems for batch job schedulers such as Slurm used in HPC systems
short jobs can also have a scheduling overhead
Solution:
Execute with minimal invocations of sbatch and srun
Use built-in option for farming-type workloads
Use external tools such as HyperQueue or GNU Parallel
How About Complex Workflows in Scientific Computing?
How to Run and Manage Complex Workflows?
If running your jobs gets more complex, requiring e.g. dependencies between subtasks, use workflow tools