This is the second part of the notes on my class Digital-age tools for research. The first part covers some basic on linux and operating system in general. You can read it Here.

Connection to a remote Server

Linux is the most used operating system in the world. Despite the fact that it is likely to find other operating systems for computers and mobiles, Linux is by far the most used operating system for super-computers (also called High Performance Computing).

  • HPC are used remotely over the internet (or other network)
  • When we used a computer to connect to a HPC this computer is usually called client, the HPC server.
  • HPC and client have a proper address that we need to know to connect to. We also need a user on such computer in order to login.

The general method used to connect to HPC is through a secure connection using the SSH protocol

Windows

  • MobaXterm
  • you should not use windows. No one should.

Linux

  • Connection using a terminal/consolle/terminator: ssh -X [USER]@[HOST] ( -X optional but it allows to open graphical windows)

  • Example. Connecting as user student to the local cluster MOF in prague.

    • The user is student

    • The address of the host MOF is mof.natr.cuni.cz

      student@mof.natr.cuni.cz
      
  • Once this command is sent you need to insert the password.(when you digit the password no characther or ‘*’ will be shown)

  • Other security measurement can be in place but they have tobe communicated by the system admin.

  • ⚠ If you need to use graphical program you need to specify -X after ssh :

    ssh -X student@mof.natr.cuni.cz
    

    You can perform several operation between your client and the server. For example:

  • copying a folder from server to a specific folder: scp -r [USER]@[HOST]:~/PATHofTHEfolder/

  • Copying document.text to the server in the folder test which is inside the home: scp document.text [USER]@[HOST]:~/test/

  • ☣ to copy large files better using rsync -avzu instead of scp

  • the s in ssh and scp stands for secure, so ssh=secure shell, and scp=secure copy.

  • ☣ Once you will master commands and ssh protocol you might learn how to login without writing the password every time. Do it only if you fully understand what you are doing.

    • The short list of commands to do so:

      a@A:~> ssh-keygen -t rsa
      a@A:~> ssh b@B mkdir -p .ssh
      b@B's password: 
      a@A:~> cat .ssh/id_rsa.pub | ssh b@B 'cat >> .ssh/authorized_keys'
      b@B's password: 
      

BASH

  • When you are using bash you are using a Linux system

  • it can be a server that you connected from Windows or another Linux machine, or your very own Linux machine or Linux Virtual Machine.

  • The original version bash was written by Brian Fox

  • All the Bash commands are intrinsically Linux software.

  • BASH = Born Again Bourne shell

  • ☣ Other shell are available, but mostly obsolete. No need to learn them. If you are in a different shell, just type bash

  • Alternative to graphical interface (GUI)

  • File manipulation (write, read, copy, cancel)

  • Script to execute automatically commands

  • CheatSheet 2 : this contains everything you will ever need. I would recommend to focus on:

    • Basic commands to manage files/folders: ls , ll , mkdir , cp , rm , mv , touch , head , tail , cat , grep , less, more
    • Commands have options usually indicated as - + letter e.g. cp does NOT copy folder. cp -r means recursive option, and it copies folders.
    • head shows the first 10 lines ofa text-file. head -n N shows first N lines of a file
    • ⚠ Output of files, or commands can be combined/concatenated using some special syntax elements: .,.., > , >> , <(Angle brackets), & (ampersand), | (pipe), ; , {n..m} (interval from n to m), #, ! , !!
      • Little note: < and << are intrinsecally different from > and >> more here
  • Most command have a manual available in bash

    • To open the manual of a command: man [command]
    • e.g. man more shows more manual. Manual can be closed using control+C

More information on bash can be found on O’Relly book

-⚠ Other commands allows to search for/inside files, or to build logical structures : where , locate , find , alias , sed , awk , do , if , then , elif , else , while , sort

  • ☣ Other software that you will use are: tar to compress file, make to compile , sh to launch software, chmod to change permission of files, wget to dowload files, apt to install on your own machine, curl similar to wget, (same same but different ), history
  • ☣ ☣ When using supercomputer with queue and parallel codes: qsub , sbatch , mpirun , aprun

Bash Environment

  • By default Bash uses some variables to store some information
  • A more extensive guide is found here
  • Bash can be used to run scripts.
    • In general the script my_script.sh can be launched in the following ways:

      ./my_script.sh #if executable
      bash my_scrupt.sh
      sh my_script.sh
      source my_script.sh
      
      • ☣ While these command are often considered inter-exchangeable they are not! You can read more here.
      • While in windows the dot . separate filename from extensions, this is not true in Linux.
      • the dot has several functions. Including executing script and binaries. This article explain well the situation with plenty of examples and it is easier than the previous one.

Conventions, tips, examples

The following section will presents tips and examples to better understand bash syntax. This can be challenging since bash syntax should be known already to appreciate them. For this reason I suggest an iterative approach. This is reading everything several times from beginning to end despite until no unclear parts are left.

  • The location of a file is called path. The path is the address of a file and it is given by all the folders. Usually it is reported with respect to the [home folder.]( “The home folder is often indicated as ‘~'")

  • Variables names are generally indicated with the $ sign when are objects of a command. For example echo $VARIABLE prints the value of VARIABLE

  • Let’s consider the command
    cp ~/Documents/important_files/file_1.dat ~/Documents/copyofdata/

  • This copy the file file_1.dat from the first path to the second.

  • The same command can be done using 2 variable instead of the full paths.

    • We can call the two variables PATH1 and PATH2 (in the terminal we will need to use $since the path is used by the comand cp)
    • The same command can then be written as: cp $PATH1/file\_1.dat $PATH2/
      This means file_1.dat is copied from $PATH1 to $PATH2 . In this case we used a single variable for each of the two paths.
  • Variables allows to write a command in a more general manner. The previous command is true for any couple of allowed paths inserted after the cpcommand. Once we define the variable values then we are running ONE specific command:

    PATH1='~/Documents/important_files' 
    PATH2='~/Document/copyofdata/'
    cp $PATH1/file_1.dat $PATH2/
    

    is equivalent:

    cp ~/Documents/important_files/file_1.dat ~/Documents/copyofdata/
    
  • Variables can be concatenated to form new variable or more flexible commands. Let’s take this example: PATH1='~/Calendar/2017/may/fourth' and PATH2='~/Calendar/2017/june/fifth' Each path is stored in one specific variable. This is basically equivalent as using the full paths instead of two variables. However the two paths could be created using other variables.:

    PATH1:~/{Calendar}/${Year}/${Month1}/${Day1}/
    PATH2:~/Calendar/${Year}/${Month2}/${Day2}/
    Year='2017 '
    Month1='may'
    Month2='june' ,
    Day1='fourth'
    Day2='june'
    
  • This allows to use simple (and more logical) variables. In some case we can use few variables to generate much more possible paths. (For example 31 variables for days + 12 variables per months allow to create all the possible days that appear on a calendar).

  • curly brackets { } are used to delimit the name of a variable to avoid confusion and ambiguity.Here some examples.

From this point on to differentiate between the command input and the output the former will start with a $

$ ls
file_archive1.dat photo.jpg file3 file_archive2.dat a.zip b.zip cc.zip
  • ls list all the files in the current folder

  • Wildcards are special characters that can have different values. Most important ones are * and ? . * stands for any group of characters, ? instead replaces a single character. More Here.

    $ ls *
    file_archive1.dat photo.jpg file3 file_archive2.dat a.zip b.zip cc.zip
    $ ls f*
    file_archive1.dat file3 file_archive2.dat
    $ ls *.zip
    a.zip b.zip cc.zip
    $ ls ?.zip
    a.zip b.zip
    
  • ls is equivalent to ls . , ls .. prints the file list of the folder in the previous level. ls $PATH will print the files store in $PATH

VIM

To edit files in a graphic interface we can use the default software of ubuntu gedit or the one we mentioned previously subl. These are equivalent to windows notepad or notepad++.

To edit files within a terminal though, we need to use other software. This is faster and simpler especially when working on remote servers.

Different programs can be used emacs, vi, vim, nano, pico, gedit … One of the most popular is VIM VIM is quite easy but it is counter-intuitive at the beginning. Mouse is not needed.

  • open vim and edit a file called file.dat':

     $ vim file.dat
    
  • once opened it is impossible to write. To start writing press i or ‘Ins’

  • to stop the edit mode press ‘esc’

  • to open the console press :'. This will make appear a ':' line at the bottom. Once:` appears it is possible to write the needed comands. Step by step example to edit a file and save it.

  1. $ vim file.dat
  2. Press i
  3. Write your file as in any general text editor. Remember: NO MOUSE.
  4. press ‘esc’. Now it is not possible to write
  5. press ‘:’, at the bottom it appear a line that starts with ‘:’
  6. after the ‘:’ write w and press enter. Now the file is saved
  7. press ‘:’ again and write q. The file is now closed and the shell is open.
  8. $ cat file.dat will show the content of the file
  9. If after ‘w’ other lines are written ‘q’ will not close the file. This s because the last changes are not saved. It will display the following line E37: No write since last change (add ! to override). To avoid this we need to save first.
    • this can be done saving and closing `:wq’ or using ‘:’
    • if we don’t want to save the last edits:q!

All data again:

Quick resume:

  • Insert mode i
  • open console press ‘esc’
  • :w saves
  • :q quits
  • [letter]+ ! forces the command given by [letter]
  • save and exit :wq
  • quit without saving: :!q
  • GUI editors: Sublime 3, Gedit (⚠ default editor in Gnome-based Linux distributions )

Exercise

This exercise was designed for the student of my course. They were supposed to work on the local cluster, but it can be done on any other machine running linux.

  1. Ideally make a script or do each command one by one by hand.

  2. You need this file: OSZICAR.

  3. Copy it i your folder.

  4. Create a backup folder, and copy OSZICAR inside such folder with the name OSZICAR.bck whenever you have a problem, just cancel the mistake and copy back OSIZCAR.bck as OSZICAR

  5. The files is made by some lines of numbers followed by special lines similar to this one 1 F= -.71306824E+02 E0= -.71311889E+02 d E =-.713068E+02 (Hint: remember only line with this EXACT syntax are correct! I introduced several ‘errors’

    • Just to provide some context: This file has been created modifying the output of a DFT software called vasp

    • It contains the energy of an electronic system during some step-wise calculation. the first number is the step number. F is the free energy, E0 the internal energy and dE the energy variation.

  • For this exercise We care about the first number that follows the = after the F. Each F correspond to a different step of our calculation from the Initial (1) to the final one (29) . Unfortunately the file has been a bit corrupted. Nonetheless all the useful data are still in the file, no fake data as been introduced. Only useless information have been included.
  1. Using grep search all the lines with the proper energy, this may require 2 passages
  2. Copy the last line of energy.dat in final_energy.dat
  3. ⚠ Try to do point 4 and 5 with a single command (need to use | )
  4. Provide the energy of step n.15 and save it in energy_step15.dat
  5. ☣ ☣ Create a file called trash.dat which contains all the lines that do not contain the energy we want (to do this you need to use ‘the inverse of grep’ and ‘some sort of logical operators (and/or)’.
  6. Once finished save the list of the commands you did during your attempts inside a file called history_${yourname}.dat
  7. ☣ ☣ Using the command date create a variable called ‘time’ which has the format DDMMhhmm (day, month, hour, minute. For example if it is the 15 of may at 4:23pm the variable value would be: ‘15051623’), then rename history_${yourname}.dat as hi_${yournmae}_${time}.dat
  8. Append at the end of the last file the actual date
  9. Appen the actual date at the BEGIN of your OSZICAR file copy