Before you Start

The summer school course that this textbook is derived from was designed to be as practical as possible. This means that most of the chapters are designed to act as a walkthrough to guide you through the steps on how to generate and analyse data for each of the major steps of an ancient metagenomics project.

The summer school utilised cloud computing to provide a consistent computing platform for all participants, however all tools and data demonstrated are open-source and publicly available. We describe here to approximately recreate the computing platform used during the summer schools.

Basic requirements

Warning

Bioinformatics often involve large computing resource requirements! While we aim to make example data and processing as efficient as possible, we cannot guarantee that they will all be able to work on standard laptops or desktop computing - most likely due to memory/RAM requirements. As a guide, the cloud nodes used during the summer school had 16 cores and 32 GB of RAM.

To following the practical chapters of this text book, you will require:

  • A unix based operating system (e.g., Linux, MacOS, or possibly Windows with Linux Subsystem - however the latter has not be tested )
  • A corresponding Unix terminal
  • An internet connection
  • A web browser
  • A conda installation with bioconda configured.
    • Conda is a very popular package manager for installing software in bioinformatics. bioconda is a the main source of bioinformatics software for conda.
    • To speed up installation, we would also highly recommend setting up the libmamba-solver

For each chapter, if it requires pre-prepared data, the top of the page will have a link to a .tar archive that will contain the raw data will be available to download.

A conda .yml file that specifies the software environment for that chapter will also be available for you to install.

See the rest of this page on how to install conda (if not already available to you), and also how to create conda software environments

Software Environments

Before loading the environment for the exercises, the software environment will need to be created using the .yml with the instructions below, and then activated. A list of the software in each chapter’s environment can be found in the Appendix.

If you’ve not yet installed conda, please follow the instructions in the box below.

These instructions have been tested on Ubuntu 22.04, but should apply to most Linux operating systems. For OSX you may need to download a different file from here.

  • Change directory to somewhere suitable for installing a few gigabytes of software, e.g. mkdir ~/bin/ && cd ~/bin/

  • Download miniconda

    wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
  • Run the install script

    bash bash Miniconda3-latest-Linux-x86_64.sh
    • Review license

    • Agree to license

    • Make sure to install miniconda to the correct directory! e.g. /home/<YOUR_USER>/bin/miniconda3

    • Yes to running conda init

    • Copy the conda config command

    • Close the terminal (e.g. with exit or ctrl + d)

    • Open the terminal again and run the command you copied (i.e., conda config --set auto_activate_base false)

    • Exit and open the terminal again

    • Type conda --version to check conda is installed and working

    • Install libsolver-mamba for faster software installation

      conda install -n base conda-libmamba-solver
      conda config --set solver libmamba
    • Set up bioconda

      conda config --add channels defaults
      conda config --add channels bioconda
      conda config --add channels conda-forge

Once conda is installed and bioconda configured, at the beginning of each chapter, to create the conda environment from the yml file, you will need to run the following:

  1. Download and unpack the conda env file the top of the chapter by right clicking on the link and pressing ‘save as’. Once uncompressed, change into the directory.

  2. Then you can run the following conda command to install the software into it’s dedicated environment

    conda env create -f /<PATH/<TO>/<DOWNLOADED_FILE>.yml
Note

You only have to run the environment creation once!

  1. Follow the instructions as prompted. Once created, you can see a list of installed environments with

    conda env list
  2. To load the relevant environment, you can run

    conda activate <NAME_OF_ENVIRONMENT>
  3. Once finished with the chapter, you can deactivate the environment with

    conda deactivate

To reuse the environment, just run step 4 and 5 as necessary.

Tip

To delete a conda software environment, run conda remove --name <NAME_OF_ENV> --all -y

Additional Software

For some chapters you may need the following software/and or data manually installed, which are not available on bioconda:

Introduction to the command line

  • rename (if not already installed, e.g. on OSX)

    sudo apt install rename

De novo assembly

  • MetaWRAP

    conda create -n metawrap-env python=2.7
    conda activate metawrap-env
    conda install -c bioconda biopython=1.68 bwa=0.7.17 maxbin2=2.2.7 metabat2 samtools=1.9 checkm-genome=1.0.12
    cd /<path>/<to>/denovo-assembly
    git clone https://github.com/bxlab/metaWRAP.git
    ## don't forget to update path/to!
    echo "export PATH=$PATH:/<path>/<to>/metaWRAP/bin" >> ~/.bashrc
    source ~/.bashrc

Functional Profiling

  • HUMAnN3 UniRef database (where the functional providing conda environment is already activated - see the Functional Profiling chapter for more details)

    humann3_databases --download uniref uniref90_ec_filtered_diamond /<path>/<to>/functional-profiling/humann3_db

Authentication and Decontamination

Phylogenomics

  • Tempest (v1.5.3)
    • It is also recommended to assign the following bash variable so you can access the tool without the full path

      tar -xvf TempEst_v1.5.3.tgz
      cd TempEst_V1.5.3
      export tempest='bash /<PATH>/<TO>/TempEst_v1.5.3/bin/tempest'
    • If you get an error like Exception in thread "main" java.lang.UnsatisfiedLinkError: Can't load library: /usr/lib/jvm/java-11-openjdk-amd64/lib/libawt_xawt.so, make sure you have Java installed e.g. 

      sudo apt install openjdk-11-jdk
  • MEGAX (v11.0.11)

Ancient metagenomic pipelines

  • Docker (installation method will vary depending on your OS)

    sudo install -m 0755 -d /etc/apt/keyrings
    curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
    sudo chmod a+r /etc/apt/keyrings/docker.gpg
    echo \
    "deb [arch="$(dpkg --print-architecture)" signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu \
    "$(. /etc/os-release && echo "$VERSION_CODENAME")" stable" | \
    sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
    sudo apt-get update
    sudo apt-get install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
    ## May need to do a reboot or something here
    sudo groupadd docker
    sudo usermod -aG docker $USER
    newgrp docker
    sudo reboot ## will kick you out, but it'll be back in a minute or two
  • aMeta (make sure you’ve already downloaded the data directory as per the chapter instructions)

    cd /<path>/<to>/ancient-metagenomic-pipelines/
    git clone https://github.com/NBISweden/aMeta
    cd aMeta
    conda env create -f workflow/envs/environment.yaml