Mini-intro to possibilities of using R in CSC’s supercomputer Puhti

Samantha Wittke, CSC (Geoinformatics specialist) Creative Commons License

CSC - IT center for science

  • Non-profit company producing IT services for research and higher education
  • Owned by ministry of education and culture (70%) and higher education institutions (30%)
  • Headquaters in Keilaniemi, Espoo
  • Side offices and supercomputers in Kajaani

CSC services

research.csc.fi/en/service-catalog


Compute & Analyze

  • Webservices, virtual machines in the cloud: cPouta / ePouta / Rahti

  • Heavy computations on the supercomputer: Puhti / Mahti / LUMI

  • Teaching and collaborating: CSC Notebooks

Store, Share & Publish Data

  • Project lifetime data storage: Allas

  • Share and publish data: Fairdata

  • Share and publish geospatial data: Paituli


Working with privacy related data: Sensitive Data (SD) services

Why use CSC services?

  • CSC specialist support
  • “Outsource” heavy/specialized computations
  • Free of charge for open science Finnish universities and research institutes

Supercomputer

Main differences to own computer:

  • Not faster, but bigger
  • For speed up: parallelism
  • Memory and CPU(/GPU) availability (application needs to make use of this!)
  • Non-interactive for heavy computations
  • Resource knowledge

Possibilities - supercomputer

  • Use more memory/CPU/GPU than your own computer has available

    → analyse large files, Machine learning model training


  • Speed up so called embarrassingly parallel analyses (many identical, but separate tasks)

    → doing same thing to multiple map tiles/ data chunks

Puhti supercomputer - Basics

Puhti supercomputer - Applications

  • CloudCompare
  • FORCE
  • GDAL/OGR
  • GRASS GIS
  • LasTools
  • MatLab
  • OpenDroneMap
  • Orfeo Toolbox
  • PCL
  • PDAL
  • Python geospatial packages: geoconda
  • QGIS
  • R geospatial packages: r-env
  • SagaGIS
  • SNAP, Sen2cor, sen2mosaic
  • WhiteboxTools
  • Zonation
  • Deep learning: pytorch, tensorflow

Something missing? Ask us :) servicedesk@csc.fi

Puhti supercomputer - r-env

Instructions on additional R package installation in CSC docs

Puhti supercomputer - Data availability

  • Large commonly used geospatial datasets with open license
  • Removes transfer bottleneck
  • Located at: /appl/data/geo/
  • All Puhti users have read access
  • ~13 TB of datasets available:
    • Paituli data
    • SYKE open datasets
    • LUKE Multi-source national forest inventory
    • Virtual rasters for NLS DEMs
    • Sentinel and Landsat mosaics


List of spatial data in computing environment

Running your own R script in Puhti

  1. Get CSC user account.
  2. Log in to Puhti web interface (www.puhti.csc.fi).
  3. Move your data and scripts to Puhti.
  4. Open RStudio.
  5. Check R package availability.
  6. Fix paths of your input/output files.
  7. Test your script with some test data.

Make use of the power of Puhti

  1. Write a batch job script.
  2. Run your scripts with all data as batch job (or interactively)
  3. Make use of several cores using future package in your R code, if needed.

“My R code runs slow, what can be done?”

  • Try to understand which part of the code takes time and why
    • Use system.time() or tictoc package
  • Different R packages may provide same functions but are implemented differently (i.e. run faster/slower)
    • e.g. prefer sf over sp and terra over raster.
  • Always be suspicious of for-loops!
  • Consider parallelization
  • Understand that number of cores != multiplier of speedup

Parallelization locally and on the supercomputer

Within R:

  • use package future
  • or snow, foreach, Rmpi,…



Outside R:

Training



→ follow our training calendar

4. Support

docs.csc.fi

research.csc.fi


+ servicedesk@csc.fi

+ User support session in Zoom every Wednesday at 14.00

4.1 How we can help

→ servicedesk@csc.fi

→CSC as project partner / subcontractor

Summary - Why use a supercomputer?

⌛ Resource needs (time, memory, storage, GPU)

👾 “Outsource” heavy computations, keep own computer free

🏘 Prebuilt environments, application availability

📊 Run many experiments at same time

🌐 Data availability

👥 Collaboration possibility

❓ CSC specialist support

💸 Free of charge for open science at Finnish universities and research institutes.

Getting started

Visit our Geocomputing page