CSC / BioExcel workshop - GROMACS workflows and advanced topics

The place for all workshop information that you can edit using hackmd. Ask questions at the very bottom of this page!


Code of Conduct

We strive to follow the Code of Conduct developed by The Carpentries organisation to foster a welcoming environment for everyone. In short:

Before the workshop

Let us know something about you.

Things to watch

We suggest you watch the recorded lectures before the tutorials to make it easier for you to follow and make the most of time with the traininers. We will recap things briefly before each session.

QM/MM

QM/MM intro lecture Part 1 (30 min) QM/MM intro lecture Part 2 (30 min)

Also available on youtube (Part 1 and Part 2).

Free energies with pmx

Free energy calculations with pmx (60 min)

Things to prepare on your computer

Installing a current GROMACS version and python packages

Please follow these installation instructions

Software check

Send your flash talk slide

At the workshop

Connecting

Schedule

All times in CET

CET
Day 0 Thursday 2021-01-28
15:00-17:00 Software requirements check Christian Blau
Day 1 Monday 2021-02-01
9:00 – 9:30 Introduction, organization Christian Blau, Atte Sillanpää
9:30 – 11:30 QM/MM simulations with GROMACS and CP2K Dmitry Morozov
11:30 – 12:30 Lunch break
12:30 - 14:00 Participant flash talks on their research and/or interests
14:00 - … hangout in gather / Q&A with Dmitry
Day 2 Tuesday 2021-02-02
9:00–11:30 HPC workflows with GROMACS supported by BioBB - bioexcel building blocks / PyCOMPSs Adam Hospital
11:30 – 12:30 Lunch break
12:30–14:00 GROMACS - Python down to the metal interface for in-memory toolchains with GROMACS Eric Irrgang
14:00 - … hangout in gather / Q&A
Day 3 Wednesday 2021-02-03
09:00–11:30 pmx - free energy calculations Yuriy Khalak, Vytautas Gapsys
11:30–12:30 Lunch break
12:30-14:00 Performant setup GROMACS on GPUs Christian Blau
14:00-14:15 Wrap-up / feedback
14:15 - … hangout in gather / Q&A

About you

Name interests something about you
Christian Blau GROMACS code development, combining simulation and experimental data (mainly cryo-EM), statistical physics Started career in Göttingen, now in Stockholm
Dmitry Morozov QMMM code development in Gromacs, reactions in condesed phases, enzymes and photochemical/photobiological systems Received PhD from Moscow State University, then moved to University of Jyväskylä, now working within the Bioexcel-2 CoE
M. Eric Irrgang Python as glue between high performance C++ libraries; improving the application of modern computing resources to scientific software Research scientist in Kasson Lab at University of Virginia
Atte Sillanpää Support researchers in using HPC in chemistry and all disciplines. Workflows, clever data management and sharing next big things in addition getting maximum performance from hardware CSC development manager, background in computational chemistry. Beach volley at free time
Adam Hospital Biomolecular simulation workflows. Tools interoperability. HPC and HT biomolecular simulations Background in Computational science, PhD in BioTechnology, leading the BioExcel-2 workflows team. VideoGames in my free time (free time??)
Vytautas Gapsys Molecular dynamics simulations. Alchemical free energy calculations. pmx development Background in computational biophysics and basketball.

Training account assignment

Claim one of the accounts below, and replace XXX above with that number. Use that account for the rest of the workshop. Password is given in the chat.

Change to the scratch area and make a folder for yourself:

cd /scratch/project_2000745 mkdir trainingXXX # replace the XXX with your account number cd trainingXXX # same

Using GPUs

Use Puhti

Then use ssh to connect to Puhti, like

ssh training0XX@puhti.csc.fi

The same password as yesterday is used (pasted in the Zoom chat).

All work on Puhti should be done in the scratch folder, which is visible to all compute nodes. You might need to make your own working directory if it’s not already there. The inputs for today’s exercises are found in an archive file that you can then unzip:

cd /scratch/project_2000745 mkdir training0XX # if not already existing cd training0XX wget https://kth.box.com/shared/static/4dp0kbeasgesusedua3mu50vn1tk4rjw.gz

Note: Edits for the batch scripts (add this to all batch scripts):

#SBATCH --nodes=1

Q&A

With the CSC hardware there are 4 GPU cards per node and 40 cpu cores. As shown in the hardware topology 10 cpu cores are optimally positioned to access the GPU. Also, not requesting more than 10 cores will allow others to use the other GPU cards. Still, in more fundamental perspective, it’s hard to say in advance so you could/should try and look at the performance. Generally it will be more efficient to use 1 GPU per simulation but run more simulations in parallel. Using more than 1 GPU per simulation needs extra tweaking.

PMX

Let’s use the “develop” branch of pmx for this tutorial. The installation instructions can be found here: https://github.com/deGrootLab/pmx/blob/develop/tutorials/INSTALL

Q&A

That is true. Things are improving with newer versions of GROMACS, but there is still a penalty in these runs, because we use less specifically optimized compute routines that allow for interpolation between topologies even here and calculate the dH/dlambda values.

That works as well. You would have to create a different topology for lambda = 1 and you won’t get dH/dlambda values, but otherwise you are good to go.

pmx itself does not implement force-fields - you can use any all-atom force-field that GROMACS can run

There is an issue with furhter CMAP corrections for specific amino-acids - these are not supported by GROMACS at the moment.

If you extract only the Lys from the topology (with temporary capping of the peptide bonds) and similarly extract the acetylated form, you could feed them to pmx as if they were ligands. The resulting mixed topology can be inserted back into the protein topology.

Equilibrium approaches simulate equilibria of all the intermediate states and for good convergence need structural overlap between neighboring states. Non-equilibrium approach instead captures these intermediate states by running many short transitions from end state A to B and from B to A. Alternatively, if you do not want to run more repeats, the pmx analysis script also can do bootstrapping of the available work values (optional parameter) to produce an error estimate that is closer to the real uncertainty.

In my experience, run multiple repeats. The analytical error the BAR estimator reports is typically an overestimation of the true uncertainty.

Yes, 2020.4 is usually sufficient for running through the tutorial. 2020.5 received some bugfixes https://manual.gromacs.org/2020.5/release-notes/2020/2020.5.html - so in general it’s a good idea to run with the latest GROMACS.

Yes, there are some hard-coded rules. pmx will not morph between ring and non-ring atoms, will avoid bond-breaking (not generating disconnected fragments). Better not to morph between light hydrogen atoms and heavy atoms. If there is a very large pertubation it’s not a limitation to pmx itself, but the simulation will not converge; might have to run much slower.

try “conda install -c rdkit rdkit” from within the conda environment and restart the notebook

the gromacs binaries need to be in $PATH when the notebook is launched, if using bash for example you can “source /path/to/gmx/installation/bin/GMXRC” to set all the appropriate environmental variables.

When you transform between ligands with different net charges, you end up with a net charged system in at least one end state. Ewald-based long range electrostatics will have issues if the system has a net charge (there will be an effective canceling charge smeared across the whole simulation box). That’s the origin of the issue. The solution I have used in the past was the double system single box approach: combine both the protein and water parts of the thermodynamic cycle in one simulation box, but separated by a large distance. Namely, ligand_A morphs into ligand_B in the active site while ligand_B simultaneously morphs into ligand_A in the solvent. This ensures the system charge is always constant. To prevent the unbound ligand interacting with the protein one should also restrain the two to be far apart. The downside is that one has to simulate a lot of extra solvent with this approach.

if you modify the ligand, you will have to find the parameters for the ligand and the modified ligand yourself, for amino-acids you can use some information that is pre-calculated and available within pmx You can also modify the amino acids instead of the ligand. You just need to make sure the thermodynamic cycle still gives you the deltaGs you are looking for. For example if you want the free energy difference for ligand binding to WT vs a mutant, you only need the protein part of the thermodynamic cycle

On the todo list, depends on using the pmx develop branch

Can do that by updating the jobscripts

Use the force-field parameters, for only chaning ion species, you can even just manipulate the topology by hand.

We support anything coming up, only limit is GROMACS support for the force-field.

Barostats - the ones that reproduce physical behaviour. Thermostats can be tricky due to dummy atom acceleration, we use stochastic dynamics instead.

free energies on gpu: the free energy kernel itself is implemented for CPU only, therefore GPUs do not directly help to speed up this part of the simulation. The GPUs, of course help with the simulation speed of the rest of the system, but the CPU based free energy kernel remains the bottleneck

Questions

2020.2 is fine for most tutorial sessions - there had been some bugfixes that should not affect results in tutorials. Generally we would suggest to always use the latest version; for the sake of learning functionality, using 2020.2 is sufficient Christian Blau

Thanks for pointing this out - can you let me know where you find that discrepancy? Christian Blau By following this link to go to Gromacs workshop installation instructions “https://blauc.github.io/gromacs-workshop-installation/”. When you open the jupyter notebook tutorial it asks for Gromacs 2020.4. Again following the link in coda instructions it asks for 2020.5. But it’s just a minor detail. All I wanted to know was if the 2020.2 version is suffiecient. Thanks

Sessions

QMMM

Preparation

Please watch this before the tutorial.

The CP2K GROMACS interface will be available on Mahti, you will not have to install it yourself.

Logging in Mahti

You will recieve the password to log in to Mahti on Monday morning.

Created with Raphaël 2.2.0Your ComputerYour ComputerMahtiMahtiopen terminalssh trainingXXX@mahti.csc.fiOn Mahti:patched GROMACSCP2Kgmx_cp2k
ssh trainingXXX@mahti.csc.fi

Watch how to do this on youtube

QMMM Tutorial Handbook

We will follow this pdf during the tutorial - note that we might update things; make sure to download again just before the tutorial starts.

Updating the run script

sed -i 's/2003478/2003487/' run.sh

Additional material

Dmitris presentation Best Practice Guide

Q&A CP2K for QM/MM Workshop

No, it is sufficient that you use ssh Christian Blau

You will recieve the password tomorrow. Christian Blau

With CP2K you are mostly limited in the size of QM box rather than actual number of QM atoms. QM part could include large amount of atoms up to 200, sometimes even more

The value of force will normally depend your reaction coordinate. For a dihedral rotation, you can choose the value based on the dihedral parameters in the force field or start with a high value of 500 kJ mol-1 nm-1 and lower to check if affects the results. If you are pulling an ion or a peptide then one might need much higher >1000 kJ mol-1 nm-1.

yes, as long as you use PLUMED features that only add additional forces to the system. If the topology is modified, there is no checking that the modifications don’t interfere with QM/MM, so it’s a bit tricky

The tokens are for parsing the system coordinates to CP2K with MM charges which are treated using the GEEP method.

No, there is only MiMic at the moment.

This instructs the queuing system and Gromacs to place the threads correctly in the hardware. Omitting this, or choosing a bad option can kill perofmance completely. Note, in other (super)computers you might need different flags.

Yes, it is possible to run an independent QM calculation (no need of using the interface for such calculations).

Yes, that’s a challengeing thing, though.

You can use any method that CP2K supports, GROMACS and CP2K are decoupled in this sense.

I think it is up to user to decide wether energies is good or not. Best thing is always to compare with experimental absorption spectra and benchmark different DFT functionals.

In the presentation all input sections are comented: https://github.com/bioexcel/gromacs_cp2k_tutorial/raw/master/Introduction_to_QMMM-Tutorial-CSC.pptx

Practically they are not connected, you can use any forcefield you want, i.e. CHARMM or Amber. As for now, there is no direct influence between QM parameters and MM forcefield.

Biobb

Created with Raphaël 2.2.0Your ComputerYour Computeropen terminalconda activate biobb_Protein-Complex_MDsetup_tutorial...

Please watch the Protein-Ligand Complex GROMACS Setup and follow the instructions to reproduce the Jupyter Notebook in your own machines.

Main Web Page Slides Intro Tutorial

Q&A

Python API

For showcasing the python API we need:

As this is complex installation procedure, we provide a “singularity” image, where we have everything installed for you on Puhti - to fire up the installed software on Puhti and set up a connection so that you can see directly in your browser what happens on Puhti, follow the instructions below.

The alternative is for you to install everything locally on your machine yourself - something we would like to encourage if you plan to use gmxapi functionality later yourself.

To mitigate download delays, please docker pull gmxapi/tutorial as soon as possible. Docker images are distributed in layers, so if there are updates before the workshop session, it will take much less time to synchronize with another docker pull later than to do the full download.

Last update: 11:17am CET, 2 February

If you pulled the Docker image before 5:37pm CET, the next image update will not be small. Please pull again soon, if you can.

Accessing workshop content

Option 1: Install locally

  1. Set up whatever style of Python virtual environment you like.
  2. Install GROMACS 2021 and the gmxapi Python package: Instructions
  3. Get the workshop content archive. Visit the following Google Drive link to get gmxapi-introduction.tgz
  4. Put the archive somewhere you want it. Then,
    tar xvf gmxapi-introduction.tgz
    cd gmxapi-introduction
    
  5. Install additional Python requirements for the workshop: pip install -r requirements.txt
  6. Run jupyter notebook and open the indicated local URL.

Note: The archive was initially incomplete. Please update the local copy of the content before the workshop session if downloaded before the last update time given above.

Option 2: Docker container

  1. docker pull gmxapi/tutorial
  2. Assuming port 8888 is available locally, docker run --rm -ti -p 8888:8888 gmxapi/tutorial
  3. Open the indicated local URL.

Note that the --rm guarantees that abandoned containers don’t accidentally fill up your hard drive, but also that local changes will be lost when you stop the container.

Consider saving notebooks through the web interface, or refer to Docker documentation for how to take snapshots, if needed.

Option 3: Tunneling to the Singularity container

Created with Raphaël 2.2.0Your ComputerYour ComputerPuhtiPuhtissh trainingXXX@puhti.csc.fiStart singularity image Read outputssh tunnel commandbrowser addressStart new terminaltype ssh tunnel commandOpen browserbrowser addressopen jupyter notebook run tutorial

Singularity tips

Watch how to do this on youtube

Open connection to Puhti, set up environment:

Start a singularity image

ssh trainingXXX@puhti.csc.fi

wget https://a3s.fi/Gromacs_utilities/gromacs_2021.tar.gz
tar -xavf gromacs_2021.tar.gz 
cd gromacs_2021

curl -L -o gmxapi-introduction.tgz "https://drive.google.com/uc?export=download&id=1zhERf8OI_Zr5AbyofNUnczqgDdvc8j_u"
tar xvf gmxapi-introduction.tgz

Set up an interactive batch job and launch the container:

sinteractive -c 1 -m 4G -d 250
singularity run -B /users/$USER gromacs2021-notebook-puhti.sif

This will print out two options on how to tunnel the connection to your local machine AND commands how to start the notebook in your browser. Both differ, whether you’ve set up your ssh keys for the connection or not. Leave this terminal open.

Start ssh tunnel

Open a new local terminal.

You need a unique port for your connection. To achieve this copy from the output of the last command above.

The command to copy should look similar to ssh -L 5010:localhost:5010 training010@puhti.csc.fi ssh -L 5010:localhost:8888 training010@r07c51.bullx

Open notebook in your browser

Open your local browser and type in the address field the output from the singularity command above.

The command is similar to http://localhost:5010/?token=29fe5bdd3b3116ceae444fc16ace92e875281229801356a0

Follow up and additional information

Q&A

  1. gmxapi sample software illustrates using the Python buffer protocol for passing data between components or software layers.
  2. The gmxapi data model requires additional work that we had hoped to coordinate with the GROMACS dev team. It seems clear that we need to be able to represent opaque objects generally, in addition to just array buffers and function pointers. Hopefully we’ll see a general solution emerging for gmxapi 0.3.
  3. In the case of simulation input, specifically, there is a new C++ abstraction that we will be developing this year, so expect a much more useful and performant SimulationInput moving forward.

For the example shown (we were editing a TPR file) the simulation input was fully specified by grompp. We only modified existing values with modify_input. Preexisting values were simply copied from the source to the destination.

In the Python context, compare the semantics of the concurrency module, the asyncio module, and Parsl. Also compare these with C++ std::future. See also Wikipedia.

The emerging low-level interface for developing against GROMACS is the MDModules set of interfaces, but this generally still requires patching GROMACS source. For an example of run-time (dynamic) extension of a subset of IForceProvider (a part of the MDModules framework), see the Restraint headers and the sample restraint.

Participant flash talks

How to share files with Atte

module load allas
allas-conf
a-publish -b bucket-name yourfile.tgz

Feedback

Just wanted to say that I found exactly what I expected from the workshop: cool ways to tame multiple Gromacs simulations. Thank you so much. It also would be interesting to learn about Gromacs—Plumed workflows and ways of efficient automatization of trajectory analysis. Danila Iakovlev