Linux/mac
ssh training0XX@puhti.csc.fi (replace XX with your account number)
Windows/PuTTY
host: puhti.csc.fi
login as: training0XX (replace XX with your account number)
In Puhti check you environment with command:
csc-workspaces
Switch to the scratch directory of your project
cd /scratch/project_2002389
And create your own sub-directory, named after you training account:
mkdir training0XX (replace XX with your account number)
Make the directory permissions such, that other group members can only read the contents but
not modify it
chmod g-wx training0XX
move to the new directory.
cd training0XX
Next download a dataset from internet and uncompress it. The dataset contains some pythiun genomes with related BWA indexes to the genomes directory
curl https://a3s.fi/course_12.11.2019/pythium.tgz > pythium.tgz
ls -ltr
tar zxvf pythium.tgz
ls -ltr
tree pythium
Then move one step down in the directory hierarchy and create directory cellulose_synthase and move to this new directory:
cd ..
mkdir cellulose_synthase
cd cellulose_synthase
Next we text NCBI edirect tool ( https://docs.csc.fi/apps/edirect/) to retrieve some data:
Check how many proteins are found the NCBI protein databanks for Pythium species (count: row in the results)
esearch -db protein -query "Pythium [ORGN]"
The check the nuber of proteins: cellulose synthase 1, cellulose synthase 2 and cellulose synthase 3
that are found for Pythium species-
For cellulose synthase 1 this can be done with:
esearch -db protein -query "Pythium [ORGN] AND cellulose synthase 1 [PROT]"
( do the same for 2 and 3)
Retrive the cellulose synthase 3 sequenses in Fasta format
esearch -db protein -query "Pythium [ORGN] AND cellulose synthase 3 \ [PROT]" | efetch -format fasta > cesy3.fasta
Then run esearch command that tells, how many cellulose synthase 3 sequences there are in total in NCBI protein database?
# Extra exercises for the fast ones: Align the cellulose synthase 3 set with mafft
mafft cesy3.fasta > cesy3_aln.fasta
And study the results:
infoalign cesy3_aln.fasta
showalign cesy3_aln.fasta
Check the options of enaDataGet with command:
enaDataGet -h
Download a file (Pythium iwayamai genome assembly)
enaDataGet AKYA02000000 -f fasta
gunzip AKYA02.fasta.gz
ls -ltr
Extra exercise for the fast ones: study the downloaded file:
head -20 AKYA02.fasta
tail AKYA02.fasta
infoseq_summary AKYA02.fasta
Then compare the cellulose synthase 3 sequences against the genome using BLAST
pb tblastn -query cesy3.fasta -dbnuc AKYA02.fasta -out blast_result.txt
Upload the data from Puhti to Allas with rclone. Use the command below (replace XX):
rclone -P copyto pythium allas:training0XX-genomes-rc/
•How long did the data upload took?
•What was the transfer rate?
•How long would it take to transfer 100 GB with the same speed?
Then study what you have uploaded to Allas with commands (replace XX)
rclone lsd allas:
rclone ls allas:training0XX-genomes-rc/
rclone lsl allas:training0XX-genomes-rc/
rclone lsf allas:training0XX-genomes-rc/
Check how this looks like in the Pouta web interface. Open browser and go to: https://pouta.csc.fi/
In Pouta interface, go to “object store” section, list the buckets (that are here called as “Containers”).
Locate your own training0XX-genomes-rc directory and download one of the uploaded fasta files to your local computer.
Upload the pyhium directory from to Allas using following commands
(replace XX with your account number)
Case 1: Store everythig in one object
a-put pythium
a-list
a-list 2002389-puhti-SCRATCH
a-info 2002389-puhti-SCRATCH/training0XX/pythium.tar.zst
Case 2: Each subdirectory (species) as one object
a-put pythium/*
a-list 2002389-puhti-SCRATCH/training0XX
a-check pythium/*
a-info 2002389-puhti SCRATCH/training027/pythium/pythium_vexans.tar.zst
Case 3: Use your own bucket name
a-put pythium/* -b training0XX-genomes-ap
a-list training0XX-genomes-ap
Case 4: Upload files without compression.
a-put --nc pythium/pythium_vexans/bwaindex/* -b training0XX-a_vexans_bwa
a-list training0XX-a_vexans_bwa
Can you see the difference between the four a-put commands above?
Study the training0XX-genomes-ap bucket with commands
a-list training0XX-genomes-ap
rclone ls allas:training0XX-genomes-ap
Why the two commands above list different amount of objects?
Try command:
a-info training0XX-genomes-ap/pythium_vexans.tar.zst
which is actually the same as:
rclone cat allas:training0XX-genomes-ap/pythium_vexans.tar.zst_ameta
Finally try command:
a-flip pythium/pythium_vexans/pythium_vexans.fasta
Try opening the public link that a-flip produced, with your browser.
Run commands:
allas-backup –help
allas-backup pythium
allas-backup list
What did these commands do for your data?
The data in pythium directory is now stored in many ways to Allas so we can remove the data from puhti and log out
rm -r pythium
exit
1. Login to puhti.csc.fi
Linux/mac
ssh training0XX@puhti.csc.fi (replace XX with your account number)
Windows/PuTTY
host: puhti.csc.fi
login as: training0XX (replace XX with your account number)
In Puhti check you environment with command:
csc-workspaces
Switch to the personal scratch directory of your project
cd /scratch/project_2002389/training0XX
Set up Allas connection
module load allas
allas-conf
Then run commands
a-list
rclone lsd allas:
a-list training0XX-genomes-ap
rclone ls allas:training0XX-genomes-ap
a-find pythium_vexans.fasta
a-find -a pythium_vexans.fasta
Next download the data in different ways:
mkdir rclone_dir
cd rclone_dir/
example 1: copy everything
mkdir all
rclone ls allas:training027-genomes-rc
rclone copyto -P allas:training0XX-genomes-rc all/
ls -l all
example 2:copy a set of objects
mkdir vexans
rclone copyto allas:training027-genomes-rc/pythium_vexans vexans/
ls -l vexans
example 3: copy just one object
rclone copyto allas:training027-genomes-rc/pythium_vexans/pythium_vexans.fasta \ ./vexans.fasta
ls -l
Return to your training0XX directory
cd ..
Check that you are in right place:
pwd
The pwd command should print /scratch/project_2002389/training0XX
Make a new directory
mkdir a_dir
cd a_dir/
create directory all and go there
mkdir all
cd all
list your default scratch bucket.
a-list 2002389-puhti-SCRATCH
a-list 2002389-puhti-SCRATCH/training0XX
Look for file pythium_vexans.fasta in Puhti SCRATCH bucket:
a-find pythium_vexans.fasta -b 2002389-puhti-SCRATCH
download the full dataset with command:
a-get 2002389-puhti-SCRATCH/training0XX/pythium.tar.zst
And check what you got:
ls -l
ls -R
Now get just one genome dataset:
cd ..
a-get 2002389-puhti-SCRATCH/training0XX/pythium/pythium_vexans.tar.zst
ls -l pythium/
ls -l pythium/pythium_vexans/
Return to your main scratch directory and make a new directory
cd ..
mkdir a_backup
cd a_backup/
Use the commands below, to find out the ID of the most recent version backup of your pythium directory:
allas-backup list
allas-backup list | grep training027
Then use allas-backup restore to download the data:
allas-backup restore ID-string
ls -l
la -l pythium