CSC logo

Geoportti logo

CSC and GeoPortti services for research using spatial data

Agenda

Q&A

  1. How to obtain time-series data of land coverage from Finland

  2. If you don’t know which cloud service or supercomputer to use, where do you get the info for comparison?

    • Some general intro is here: https://research.csc.fi/computing
    • For heavy computing -> supercomputers, if unsure, start with Puhti.
    • For web applications Rahti or cPouta, web-developers usually understand the difference.
    • For databases cPouta (a new service likely comes soon for this).
    • For node-locked licensed software cPouta.
    • cPouta for very many other use-cases
  3. Any examples of sensitive data?

    • this can be, e.g., people’s health data or other personal information
    • field parcel data with for example information about the farmers can also be sensitive
  4. Where can I find todays materials?
    Workshop home-page: https://ssl.eventilla.com/event/pEb7a
    Older materials: https://research.csc.fi/geocomputing-seminars

As the Puhti intro session got cut short, here the material also in text format: https://docs.csc.fi/computing/running/getting-started/

  1. Is there instructions on how to use geocubes somewhere?

  2. How often are you updating the data in geocubes? E.g. new forest inventory data?

    • New yearly versions added.
  3. How to avoid timeout error because of proxy settings? For example in QGIS. Please share your tipps here.

  4. Is there docuentation for multi-resultion processing of geocubes available?

    • currently no, but may be add it to the website
  5. Sentinel 2 data in Allas: are the image tiles spatially and/or temporally harmonized or is this work ongoing?

  6. What are “buckets” in Allas?

    • Organizational unit of data in Allas. Kind of one-level pseudo-folder.
    • Allas does not have other organizational units.
    • S3 is protocol to work with object storage, used also in Amazon Web Services S3 storages. Comparable to Azure Storage resource.
  7. Any reason behind having no data from eastern finland?

    • Only files needed for specific use case have been downloaded, could be extended.
    • And feel free to extend :)
    • There are probably data from Eastern Finland but the tiles only covering a slice of Finland and mostly locating on the Russian side of the border might not be included. The data have been originally downloaded for purposes in research projects so the extent of data downloaded (and uploaded to Allas) might vary between years, depending on the focus of the project (so the data gathering has not been systematic by any means, apart from the annual timeframe of growing season). From year 2021 the whole country (all 70 tiles shown in the slides) should be available.
  8. Could you add here a sample Sentinel-2 bucket request from Allas to Puhti?

Facebook group is called gis velhot (if someone is interested in joining :) )

Paituli webpage, Paituli web interface

  1. Might anyone know a source for yearly/growth season temperature sum data for a given geolocation?

  2. Are the FMI data interpolated for locations where there are no observations?

    • Yes, it is interpolated based on observations, but interpolations considers height, vegetation, water etc. So the interpolation is rather advanced.
    • This dataset is described in publication: Aalto, J. A., Pirinen, P., Jylhä, K., 2016, New gridded daily climatology of Finland – permutation-based uncertainty estimates and temporal trends in climate, Journal of Geophysical Research : Atmospheres. ISSN: 2169-897X
  3. Hei, miten siis pääsee alkuun että pääsee loggaamaan puhtiin? / Where do I log in to Puhti?

Materials for the hands-on exercise with R: https://github.com/csc-training/geocomputing/tree/master/R/puhti#interactive-working

  1. Puhti OOD: what does ‘multithreaded’ mean?

    • Parallel computing support in R is mostly limited to two different options: multiprocessing and multithreading. Multiprocessing means that you launch multiple R processes (that is, many copies of R that run side by side). Typically one R process is allocated to one core. This is what most “multicore” R jobs do and it is the most common parallelisation method in R.
    • Multithreading is another way to use multiple cores and can speed up certain (linear algebra) operations. Direct, documented support for it in R is less common, although some packages do support it.
    • You can find some information about multithreading in R here: https://docs.csc.fi/support/tutorials/parallel-r/ (see under “multithreading”) and https://docs.csc.fi/apps/r-env/#improving-performance-using-threading an
    • But note also: “If you do not know, then you likely do not need it.”
  2. How many cores does the Puhti have for the R to use?

How to do own R package installations on Puhti: https://docs.csc.fi/apps/r-env/#r-package-installations

  1. (from teams) it writes the new files, but I get the error message:
    Error in x$.self$finalize() : attempt to apply non-function
    • as long as it writes the files, it is ok to ignore this one

You can get the project number to put into the batch job script from my.csc.fi

  1. What did this mean again: #SBATCH --time=0:05:00 ?

    • with this line we ask for 5 minutes of computing time; because we know/guess that our script will be done after 5 minutes
    • you can find more information about available options here
  2. What is a ‘module’ in HPC environment? Like what does module load R do?
    * in the background it loads all the necessary information that is needed to run the R application, setting paths, etc, i.e. it prepares the computing environment for doing something with R
    * you can read more about modules here: https://docs.csc.fi/computing/modules/

  3. Could you also run R-batch script from a terminal inside R-studio?

    • The terminal in RStudio lives inside the RStudio container installation and does not have the sbatch command available
    • -> Batch jobs need to be submitted from Puhti login node shell from the Puhti web interface or via ssh connection.

To run the sbatch script you need to go to a login shell from the Puhti webinterface dashboard; not via the RStudio application

For instructions on how you can start working on your own projects, see: https://research.csc.fi/geocomputing

For instructions on the Puhti web interface: https://docs.csc.fi/computing/webinterface/

  1. How can I move my data to Puhti?
    • scp and rsync are powerful command line tools to copy file
      scp filename cscusername@puhti.csc.fi:/scratch/project_xxxx
      rsync -r foldername cscusername@puhti.csc.fi:/scratch/project_xxxx
    • Graphical tools for transfering files , eg FileZilla or WinSCP
      - SYKE: If you are working from home via VPN, it’s faster to transfer files from a remote computer with that you can access with a Remote Desktop Connection (WinSCP needs to be istalled on that remote machine)
    • Puhti webinterface (via Files tab)

If you want to connect to Puhti via your terminal/powershell: ssh <yourusername>@puhti.csc.fi

If you want to create a direct ssh-connection to Puhti from SYKE/LUKE network, I would recommend to use Putty-software where you can set our proxy settings correctly. Ask Antti-Jussi Kieloaho (LUKE) or Eetu Jutila (SYKE) for help.

  1. Where will you have the recorded session?

    • It will be sent later to the registered participants
  2. If the session time expires unexpectedly, are the work still saved in the work directory? What if you click that red Delete button?

    • If you wrote the data to disk, then yes.
    • If the writing is still ongoing, when time ends, it likely results in corrupted file.
    • Also if you click that red delete button, files that are written to your scratch directory will be available.
    • Always good to ask a little bit extra time.
  3. In the Rstudio last exercise, the Paituli files in Puhti locally were here “/appl/data/geo/mml/dem2m/dem2m_direct.vrt” <- but were they still uploaded there by Kylli? Or is that some central dem file? So not specific to this exercise?

    • Everything provided in /appl/data/... is read-only for all users and maintained by CSC
    • (The 2m DEM vrt creation that KE mentioned, KE added to Paituli URL files. The Puhti .vrt has been available for years.)
  4. If I would want to use a raster over Finland (and do analyses on it) that is not available in CSC, should I just upload it to my own project?..

    • yes, that would be the usual way to go.
    • If the files you need are available from the internet then you can download them directly to Puhti using tools like wget from the login node shell for example.
    • If the files are in URL/S3 accessible location, you can also use them directly from the original location.
  5. I guess the STAC datasets are processed for ready use without any further pre-processing?

    • hmm, that depends on the provider.
    • STAC provides a browsable metadata catalogue, the data that it links to is often already available
    • so if you need what the data provider provides, you do not need to do further pre-processing
    • If you need eg some special vegetation index from Sentinel data, which is not available you can use STAC to find the raw data for calculating the index
  6. Do you already have some examples where you apply the datacubes created using STAC-catalogs to machine learning workflows?

  7. What happens with the GIS-Project after this workshop?

    • the project continues to exist but your usernames will be removed soon (21.12), so you cannot use the resources of that project anymore after that and would need your own project to continue working on Puhti, see also: https://docs.csc.fi/accounts/
  8. How about this HedgeDoc with all the shared links?

    • we will archive it and make sure there is no personal information in it and then link it from the event page. This page might be removed at some point but then we will provide the contents also via our research.csc.fi/geocomputing page together with the links to slides