AncientMetagenomeDir Metadatathon Dec. 2021


It’s time for AncientMetagenomeDir to level-up! Currently metadata about published ancient metagenomic samples stored at a sample level. However, for researchers working in the field, to maximise the utility of the resource we need to get closer to the data we actually use in our day-to-day work. For this, we need to expand to library and sequencing metadata, and start building an ecosystem to utilise the metadata itself.

As we are already utilising standardised sample accession codes of data uploaded to public databases, we can already get a lot of this information automatically from the databases themselves, however some of this needs cleaning up, and there are some other bits of metadata that maybe useful not included in such repositories.

This event will consist of different components:

  • Training on how to use Git(Hub), and AncientMetagenomeDir
  • Contributing cleaned metadata and ‘filling in the blanks’ of all current sample-level metadata publications, as well as reviewing each other’s contributions
  • Creating a toolkit to allow efficient filtering and downloading of sequencing data based on filtering keywords based on the repository
  • Socialising; getting to know each other, by having an informal environment allowing everyone to meet and chat about anything and everything

If we have sufficient progress, we aim to write a small paper introducing the toolkit. If you contribute metadata and/or code, you will be included as a co-author on the publication.

Organisational Details


Anyone and everyone, from across the world! Both newcomers and veterans!

We are looking for as much help as possible, so if you have time (even for a couple of hours), it would be gratefully welcomed.

The more people we have, the better the workload distribution there will be for everyone.


We will be meeting on! To learn how the platform works, please see the documentation here


  • Date: December 15th 2021
  • Time:
    • Due to the (aspired) global nature of SPAAM, we will be running this over multiple timezones:
      • Asia-Pacific
      • Europe, Middle East, & Africa
      • Americas

(All times following times are based on CET, please see this page for conversion to various cities)

Date Time Event
Dec. 14 09:00 CET Group Check-in (Asia-Pacific): Welcome and training (18:00 AUS)
Dec. 14 <…> CET Metadata processing and programming!
Dec. 14 <…> Continue…
Dec. 15 09:00 CET Check-in/out (EMEA/Asia-Pacific): Welcome and training
Dec. 15 10:00 CET Continue…
Dec. 15 11:00 CET Continue…
Dec. 15 12:00 CET Lunch break (EMEA)
Dec. 15 13:00 CET Metadata processing and programming!
Dec. 15 14:00 CET Check-in (Americas): Welcome and training
Dec. 15 15:00 CET Metadata processing and programming!
Dec. 15 16:00 CET Continue…
Dec. 15 17:00 CET Check-out (EMEA)
Dec. 15 18:00 CET Metadata processing and programming!
Dec. 15 19:00 CET Continue…
Dec. 15 20:00 CET Continue…
Dec. 15 21:00 CET Check-out (Americas)


The main bulk of the event consist of:

  • Training on how to use Git(Hub) & the AncientMetagenomeDir structure (see below if you want to practise beforehand!)
  • Assigning yourself a publication & making a pull requests of:
    • Copied and pasting columns from an already generated (partial) library metadata table
    • Filling in columns that are not represented in the automated metadata table
    • Checking for accuracy of the columns
  • Reviewing PRs to double-check accuracy

In addition, and in parallel, there will be a small team of software developers:

  • Making a command-line tool to download and verify raw FASTQ files based on user-given filters
  • Combine the new functionality with the current AncientMetagenomeDirCheck tool
  • Extend the types of validation that be performed by AncientMetagenomeDirCheck
  • (Stretch) auto-generate input table files for downstream tools