6  Introduction to the Command Line

Author

Thiseas C. Lamnidis, Aida Andrades Valtueña

Tip

For this chapter’s exercises, if not already performed, you will need to download the chapter’s dataset, decompress the archive, and create and activate the conda environment.

Do this, use wget or right click and save to download this Zenodo archive: 10.5281/zenodo.8412661, and unpack

tar xvf bare-bones-bash.tar.gz 
cd bare-bones-bash/

You can then create the subsequently activate environment with

conda env create -f bare-bones-bash.yml
conda activate bare-bones-bash

6.1 Preamble

This chapter is a condensed version of the both ‘Basic’ and ‘Boosted’ Bare Bones Bash courses that can be found in the Bare Bones Bash website. The material is displayed here under a CC-BY-SA 4.0 license.

Logo of BareBonesBash, a computer monitor with a command prompt dollar symbol, BBB and a bone on the screen.

The original Bare Bones Bash material was created by Aida Andrades Valtueña, James Fellows Yates, and Thiseas C. Lamnidis.

Aida James Thiseas

Cartoony pictures of the three authors of the BareBonesBash course. Designed by Zandra Fagernäs.

Below is a quick reference guide to the commands discussed in this tutorial. To understand actually what each command does, carry on reading below! For a complete run of all these commands AND MORE(!!), consider following the full Bare Bones Bash walkthroughs here.

command description example common flags or arguments
pwd print working directory pwd
ls list contents of directory ls -l (long info)
mkdir make directory mkdir pen
cd change directory cd ~/pen ~ (home dir), - (previous dir)
ssh log into a remote server ssh @.com -Y (allows graphical windows)
mv move something to a new location (& rename if needed) mv pen pineapple
rmdir remove a directory rmdir pineapple
wget download something from an URL wget www.pineapple.com/pen.txt -i (use input file)
cat print contents of a file to screen cat pen.txt
gzip a tool for dealing with gzip files gzip pen.txt -l (show info)
zcat print contents of a gzipped file to screen zcat pen.txt.gz
whatis get a short description of a program whatis zcat
man print the man(ual) page of a command man zcat
head print first X number of lines of a file to screen head -n 20 pineapple.txt -n (number of lines to show)
| pipe, a way to pass output of one command to another cat pineapple.txt | head
tail print last X number of lines of a file to screen tail -n 20 pineapple.txt -n (number of lines to show)
less print file to screen, but allow scrolling less pineapple.txt
wc tool to count words, lines or bytes of a files wc -l pineapple.txt -l (number of lines not words)
grep print to screen lines in a file matching a pattern grep pineapple.txt | grep pen
ln make a (sym)link between a file and a new location ln -s pineapple.txt pineapple_pen.txt -s (make symbolic link)
nano user-friendly terminal-based text editor nano pineapple_pen.txt
rm more general ‘remove’ command, including files rm pineapple_pen.txt -r (to remove directories)
$VAR Dollar sign + text indicates the name of a variable $PPAP
echo prints string to screen echo “$PPAP”
for begins ‘for’ loop, requires ‘in’, ‘do’ and ‘done’ for p in apple pineapple; do
echo “$p$PPAP”; done
applePen pineapplePen
find search for files or directories find -name ‘pen’ -type f (search only for files)
-name ’*JPG’ (search for file names matching the pattern)
“$var” use double quotes to use contents of variable pen=apple && echo “$pen”
<
>
2>
redirects the standard input/output/error stream respectively into a file cat <file.txt >file_copy.txt 2>cat_file.err

6.2 Introduction

The aim of this tutorial is to make you familiar with using bash everyday… for the rest of your life! More specifically, we want to do this in the context of bioinformatics. We will start with how to navigate around a filesystem in the terminal, download sequencing files, and then to manipulate these. Within these sections we will also show you simple tips and tricks to make your life generally easier.

This tutorial is designed so you follow along on any machine with a UNIX terminal (no warranty provided).

6.3 The 5 commandments of Bare Bones Bash

The Bare Bones Bash philosophy of learning to code follows five simple commandments:

1) Be lazy! Desire for shortcuts motivates you to explore more!
2) Google The Hive-Mind knows everything! 99% of the time, someone else has already had the same issue.
3) Document everything you do! Make future you happy!
4) There will ALWAYS be a typo! Don’t get disheartened, even best programmers make mistakes!
5) Don’t be afraid of you freedom! Explore! Try out things!
Pro Tip

Remember: No one writes code that works first time, or without looking at StackOverflow sooner or later.

6.4 What is a terminal?

A terminal is simply a fancy window that allows you to access the command-line interface of a computer or server (Figure 6.1).

The command-line itself is how you can work on the computer with just text.

bash (bourne again shell) is one of the most popular languages used in the terminal.

6.5 Understanding the command prompt

Figure 6.1: An example command prompt

After opening the terminal what you will normally see is a blank screen with a ‘command prompt’, like the one shown above. This typically consists of your username, the device name, a colon, a directory path and ends with a dollar symbol. Like so:

<username>@<device_name>:~$

The command prompt is never involved in any command, it is just there to ensure you know who and where you are. When copying a command you should NOT copy the command prompt.

Often times, when looking for commands online, the commands ran will pe prefaced with a $. This is a stand-in for the command prompt. When adding multi-line commands, it is also common to preface the additional lines with a >. When copying such commands it is therefore important to remove these characters from the start of each line (if present).

Finally, in this tutorial, the symbols<> are used to show things that will/should be replaced by another value. For example, in Thiseas’ command prompt <username> will be replaced by lamnidis, as that is his username.

Now back to your prompt: It tells you that you are in the directory ~. The directory ~, stands for your home directory. Note that this shorthand will point to a different place, depending on the machine and the user.

If you want to know what the shorthand means, (here comes your first command!) you can type in pwd, which stands for “print working directory”. Your “working directory” is whichever directory you are currently in.

pwd
/home/<YOUR_USERNAME>
Warning

In programming documentation, it is very common to use <ALL_CAPS> notation to indicate something that doesn’t exist, or will vary depending on the user.

Whenever you see such a notation (i.e., triangular brackets and all caps), you must always replace that whole section (including the <>) with whatever is on your system! You should never copy and paste this blindly!

This prints the entire “filepath” of the directory i.e. the route from the “root” (the deepest directory of the machine), through every subdirectory, leading to your particular working directory.

Note

If you’re not in your home directory (i.e., the output isn’t exactly the same as the output above), please run the command

cd $HOME

You’ll learn about this commands in a bit!

6.6 Absolute vs Relative paths

Filepaths (a.k.a. “paths”), come in two flavours. Let’s talk a bit about them!

  • An absolute path will start with the deepest directory in the machine shown as a /. Paths starting with ~ are also absolute paths, since ~ translates to an absolute path of your specific home directory. That is the directory path you see in the output of the pwd command you just ran.
  • Alternatively a relative path always begins from your working directory (i.e. your current directory). Often this type of path will begin with one (./) or two (../) dots followed by a forward slash, but not always. In the syntax of relative pathways . means “the current directory” and .. means “the parent directory” (or the ‘one above’).

6.6.1 A real life analogy for paths

You have just arrived to Leipzig for a summer school that is taking place at MPI-EVA. After some questionable navigation, you find yourself at the Bayerische Bahnhof. Tired and disheartened, you decide to ask for help.

You see a friendly-looking metalhead (Figure 6.2), and decide to ask them for directions!

Figure 6.2: A friendly-looking metalhead.

I’m Happy to help, but I only give directions in absolute paths!
From Leipzig Main Station, you should take Querstraße southward.
Continue straight and take Nürnberger Str. southward until you reach Str. des 18 Oktober.
Finally take Str. des 18 Oktober. moving southeast until you reach MPI-EVA!

Absolute paths

The directions above are equivalent to an absolute path, because they will ALWAYS take you to MPI-EVA, but you can only apply these directions if you start from Leipzig Main Station!

Examples of absolute paths:
/home/<PATH>/<TO>/
/Leipzig_Main_Station/Querstraße/Nürnberger_Str/Str_18_Oktober/Deutscher_Platz/MPI-EVA

Not sure how to get back to Leipzig Main Station to apply those directions, you decide to ask someone else for directions…

Lucky for you, a friendly looking local is passing by (Figure 6.3)!

Figure 6.3: A friendly-looking local.

You’re currently on Str. des 18 Oktober. Walk straight that way, past the tram tracks, and you will find Deutscher Platz. You will see MPI-EVA to your right!

Relative paths

These directions are equivalent to a relative path! They are easy to follow, but only work when you happen to start at the position you were in when you first got the directions!

Examples of relative paths:
./<PATH>/<TO>/my_file.txt
../Str_18_Oktober/Deutscher_Platz/MPI-EVA

6.7 Basic commands

We will now explore some basic commands that you will use to explore folders and interact with files:

  • list directory contents:
ls

Output should look like:

Desktop    Downloads  Pictures  Templates  bin    snap
Documents  Music      Public    Videos     cache  thinclient_drives
Note

We will use this format to show you commands and their corresponding output in the terminal (if any) for the rest of this chapter.

  • make a directory:
## ⚠ this will be your working directory for the rest of this chapter! Do not move out of it unless asked to! ⚠
mkdir barebonesbash
  • move (or rename) files and directories
mv barebonesbash BareBonesBash
  • change directories
cd BareBonesBash
  • Download (www get) a remote file to your computer
wget git.io/Boosted-BBB-meta
  • copy a file or directory to a new location
cp Boosted-BBB-meta Boosted-BBB-meta.tsv
  • remove (delete) files
rm Boosted-BBB-meta
  • Concatenate file contents to screen
cat Boosted-BBB-meta.tsv
  • See only the first/last 10 lines of a file
head -n 10 Boosted-BBB-meta.tsv
tail -n 10 Boosted-BBB-meta.tsv
Note

This is because the start of a cat is its head and the end of the cat is its tail (The great humour of computer scientists)

  • Look at the contents of a file interactively (less than the complete file, press q to quit)
less Boosted-BBB-meta.tsv
  • word count the number of lines (-l) in a file
wc -l Boosted-BBB-meta.tsv
15 Boosted-BBB-meta.tsv

6.8 Datastreams, piping, and redirects

Each of the commands you learned above is a small program with a very specialised functionality. Programs come in many forms and can be written in various programming languages, but most of them share some features. Specifically, most programs take some data in and spit some data out! Here’s how that works, conceptually:

6.8.1 Datastreams

Computer programs can take in and spit out data from different streams (Figure 6.4). By default there are 3 such data streams.

  • stdin : the standard input
  • stdout: the standard output
  • stderr: the standard error
Figure 6.4: Diagram showing stdin going into the program, and two output streams from the program: stderr and stdout.
Pro Tip

Each programme also has an ‘exit code’, which can tell you if execution completed with/without errors. You will rarely see these in the wild.

Typically, the stdin is where the input data comes in.
The stdout is the actual output of the command. In some cases this gets printed to the screen, but most often this is the information that needs to be saved in an output file.
The stderr is the datastream where errors and warnings go. This gets printed to your terminal to let you know when something is not going according to plan (Figure 6.5)!

Figure 6.5: This program (in the form of a bash script executed in the terminal) takes no input, and prints one line to the stdout and one line to the stderr.

6.8.2 Piping

A “pipe” (|) is a really useful feature of bash that lets you chain together multiple commands! When two commands are chained together with a pipe, the stdout of the first command becomes the stdin of the second (Figure 6.6)! The stderr is still printed on your screen, so you can always know when things fail.

Figure 6.6: Diagram showing stdin going into the program, and two output streams from the program: stderr and stdout, with the stdout becoming the stdin of a second program.

Example:

head -n 10 Boosted-BBB-meta.tsv | tail -n1
netsukeJapan    C       Artwork

The above command will only show the 10th line of Boosted-BBB-meta.tsv. The way it works is that head will take the first 10 lines of the file. These lines are then passed on to tail which will keep only the last of those lines.

6.8.3 Redirects

Much like streams of water in the real world, datastreams can be redirected.

This way you can save the stdout of a program (or even the stderr) into a file for later!

  • stdin can be redirected with <.
    • An arrow pointing TO your program name!
  • stdout can be redirected with >.
    • An arrow pointing AWAY your program name!
  • stderr can be redirected with 2>.
    • Because it is the secondary output stream.
Pro Tip

It is also possible to combine streams, but we won’t get into that here.

Example:

head -n 10 Boosted-BBB-meta.tsv | tail -n1 > line10.txt

This will create a new file called line10.txt within your work directory. Using cat on this file will BLOW YOUR MIND - (GONE WRONG)!

(Don’t forget to like and subscribe!)

cat line10.txt
netsukeJapan    C       Artwork
Figure 6.7: Here you can see an example of redirecting the output of the datastreams_demo.sh program from before. Redirecting the stdout with > only prints the stderr to the screen, and saves the stdout into output.txt. Additionally, we can redirect the stderr with 2> into runtime.log, and then nothing is printed onto the screen.

6.9 Help text

You don’t always have to google for documentation! Many programs come with in-built help text, or access to online manuals right from your terminal!

  • You can get a one sentence summary of what a tool does with whatis

    whatis cat
    cat(1)  - concatenate files and print on the standard 
            output
  • While man gives you access to online manuals for each tool (exit with q)

    man cat

6.10 Variables

Variables are a central concept of all programming. In short, a variable is a named container whose contents you can expand at will or change.

You can assign variables (tell the computer what do you want it to contain) with = and pull their contents with $

The easiest way to see the contents of a variable is using echo!

echo "This is my home directory: $HOME"
This is my home directory: /home/ubuntu

$HOME is a variable of the type called environment variables, which are set the moment you open your terminal or log into a server, they ensure the system works as intended and should not be change unless you are very sure of why.

Environment Variables

Environment variables in bash are typically named in all capital letters. It is a good idea to avoid using only capital letters for your variable names, so you avoid accidentally overwriting any environment variables.

But as mentioned, you can store in a variable anything you want, so let’s see a few examples:

First let’s try to store a number:

GreekFood=4            #Here, 'GreekFood' is a number.
echo "Greek food is $GreekFood people who want to know what heaven tastes like."
Greek food is 4 people who want to know what heaven tastes like.
Note

The # is used to add comments to your code. Comments are annotations that you write in your code to understand what it is doing but that the computer does not run. Very useful for when your future self or another person looks at your code

Now let’s store a word (“string”):

GreekFood=delicious   #We overwrite that number with a word (i.e. a 'string').
echo "Everyone says that Greek food is $GreekFood."
Everyone says that Greek food is delicious.

You can also store more than a single word (that is still a “string”):

GreekFood="Greek wine" #We can overwrite 'GreekFood' again, 
## but when there is a space in our string, we need quotations.
echo "The only thing better than Greek food is $GreekFood!"
The only thing better than Greek food is Greek wine!

Since variables can be reset to whatever you want, you can also store a number again:

GreekFood=7 #And, of course, we can overwrite with a number again too.
echo "I have been to Greece $GreekFood times already this year, for the food and wine!"
I have been to Greece 7 times already this year, for the food and wine!
Overwriting variables

In these examples you have seen how the same variable has been overwritten, this means that you can only access the last content that you stored in the variable. All the previous contents that a variable may have had are inaccessible as soon as the same variable is given a new value.

6.11 Quotes matter!

In bash, there is a big difference between a single quote ' and a double quote "!

  • The contents of single quotes, are passed on as they are.
  • Inside double quotes, contents are interpreted!

In some cases the difference doesn’t matter:

echo "I like Greek Food"
echo 'I like Greek Food'
I like Greek Food
I like Greek Food

In other cases it makes all the difference:

Arr="Banana"
echo 'Pirates say $Arr'
echo "Minions say $Arr"
Pirates say $Arr
Minions say Banana

Why does it make a difference in the second example?

This is because in the second example we are using a variable. We have assigned Banana to the variable $Arr. As mentioned above, when single (') quotes are used the computer just prints what it receives without caring that $Arr is a variable.

In the echo with the double (") quotes we are telling the computer to extract the value from the variable $Arr and that is why we see the store value (Banana) in the printed output in the terminal.

6.12 Find

You can also ask your computer where you have put your files, in case you forgot. To do this you can use find! The find command has the following syntax:

## ⚠ Don't run! Fake example ⚠
find /<PATH>/<TO>/ -type f -name 'your_file.txt'
  • First part of the find command: the place to look from
    • e.g. . to indicate ‘here’
    • Could also use ~/
    • Could use absolute path e.g. /home/james/
  • Second part of the find command: what type of things to look for?
    • Use -type to define the filetype:
      • file
      • directory
  • Third part of the find command: what to look in?
    • Use -name to say ‘look in names of things’
  • Finally after -name we give the the ‘strings’ to search for
    • Use wildcards (*) for maximum laziness!

Now let’s put into practise what you have learnt about find.

For that you will download a messy folder from a collaborator, remember to check you are in the BareBonesBash folder!:

wget git.io/Boosted-BBB-images -O Boosted-BBB.zip

We realise that this is a compressed file, and more precisely is a zip file (extension .zip). In order to access to its content we will need to “unzip” it first. For that we can use the command unzip:

unzip Boosted-BBB.zip

We know that our collaborator has shared with us some pictures from animals that we need to use for our research, and according to your collaborator they are marked with JPG. We first try to check the contents of the directory to find them quickly.

ls Boosted-BBB
And        Digging       Friday     Leave      Only     Where    Young
Anybody    Everything    Getting    Looking    Ooh      With     Youre
Dancing    Feel          Having     Night      Watch    You

Wow, what a mess! How would you retrieve all the files? Thanks to your wonderful teachers you have learnt how to use find and can simply run:

find Boosted-BBB -type f -name '*JPG*' 
Boosted-BBB/Having/the/time/of/your/life/bubobubo.JPG.MP3.TXT
Boosted-BBB/With/a/bit/of/rock/music/exhibitRoyal.JPG.MP3.TXT
Boosted-BBB/Friday/night/and/the/lights/are/low/fanta.JPG.MP3.TXT
Boosted-BBB/Everything/is/fine/nomnom.JPG.MP3.TXT
Boosted-BBB/Getting/in/the/swing/giacomo.JPG.MP3.TXT
Boosted-BBB/Youre/in/the/mood/for/a/dance/snore.JPG.MP3.TXT
Boosted-BBB/Digging/the/dancing/queen/excited.JPG.MP3.TXT
Boosted-BBB/Anybody/could/be/that/guy/alopochenaegyptiacaArnhem.JPG.MP3.TXT
Boosted-BBB/And/when/you/get/the/chance/stretch.JPG.MP3.TXT
Boosted-BBB/Looking/out/for/angry.JPG.MP3.TXT
Boosted-BBB/Feel/the/beat/from/the/tambourine/oh/yeah/netsukeJapan.JPG.MP3.TXT
Boosted-BBB/Watch/that/scene/licorne.JPG.MP3.TXT
Boosted-BBB/You/can/weimanarer.JPG.MP3.TXT
Boosted-BBB/Night/is/young/and/the/musics/high/bydgoszczForest.JPG.MP3.TXT
Boosted-BBB/Ooh/see/that/girl/pompeii.JPG.MP3.TXT

After -name we have written '*JPG*', this tells to find to search for any file that contains JPG in any part of its name, indicated by the *. The * are what are known as wildcards. To learn more on how to use them, please refer to the more complete material for this tutorial.

Now you have all the paths of the files that you will need!

6.13 For loops

Until now we have seen how to run single commands on a file. But, what about when you need to repeat a command multiple times on a list of things, for example a list of files?

To repeat an action (command) for a set of things (list, e.g. files) one needs to employ the concept of a loop. One of the most commonly used loops, is the for loop.

A for loop allows us to go through a list of things and perform some actions. Let’s see an example:

Variable=Yes
for i in Greece Spain Britain; do
  echo "Does $i have lovely food? $Variable"
done
Does Greece have lovely food? Yes
Does Spain have lovely food? Yes
Does Britain have lovely food? Yes

The for loop went through the list Greece Spain Britain and printed a statement with each item in the list. What happens if we change the order of the list to Britain Greece Spain?

Variable=Yes
for i in Britain Greece Spain; do
  echo "Does $i have lovely food? $Variable"
done
Does Britain have lovely food? Yes
Does Greece have lovely food? Yes
Does Spain have lovely food? Yes

We see that changing the order of the list will affect the output, this is because the for loop will go through the list in a sequential manner.

We can also add more elements to the list, and the for loop will continue until it reaches the end of the list.

6.14 How to Google like a pro

One of the most important skills you develop when coding and/or using the command line is how to phrase your questions so you can get relevant answers out of your search engine.

As Deep Thought put it in the Hitchhiker’s Guide to the Galaxy:

Only when you know the question will you know what the answer means.

Here are some quick tips to get you started:

  • ALWAYS include the name of the language in your query.
  • BROADEN your question!
    • BAD: “How to set X to 4 in bash?”
    • GOOD: “How to set a variable to an integer in bash?”
  • When you are more familiar, use fancy programmer lingo to make google think you know what you are talking about.
All the cool hackers say:
  • string and not text.
  • float and not decimal.

Note: some of these terms can be language specific.

Figure 6.8: How to cat the wrong way (i.e., don’t include the language in your Google search and get loads of cat pictures)
Figure 6.9: How to cat the BASH way (i.e., include the the language in your Google search and you get lots of terminal pictures! Much better, right? …right?)

6.15 (Optional) clean-up

It is extremely important to ALWAYS keep your directories clean from random clutter. This lowers the chances you will get lost in your directories, but also ensures you can stay lazy, since tab completion will not keep finding similarly named files. So let’s clean up your working directory by removing all the clutter we downloaded and worked with today. The command below will remove the /<PATH>/<TO>/BareBonesBash directory as well as all of its contents.

Pro Tip

Always be VERY careful when using rm -r. Check 3x that the path you are specifying is exactly what you want to delete and nothing more before pressing ENTER!

cd ~     ## We shouldn't delete a directory while we are still in it. (It is possible though).
rm -r /<PATH>/<TO>/BareBonesBash*

We can also get out of the conda environment with

conda deactivate

To delete the conda environment

conda remove --name bare-bones-bash --all -y

6.16 Summary

You should now know the basics of working on the command line, like:

  • What a command prompt is
  • How to navigate around the filesystem via the command line
  • How to view the contents of a file
  • How to remove files and directories
  • What a datastream is, and how they can be redirected
  • How to chain commands together
  • What a variable is, how to assign them and how to unpack them
  • How to construct a simple for loop
  • How to google more efficiently!

If you would like to know more about the magic of bash, you can find more commands as well as and more advanced bash concepts in the BareBonesBash website.