Slack Export - #2022-summerschool-introtometagenomics

James Fellows Yates (james_fellows_yates@eva.mpg.de)

2022-07-29 10:08:47

@James Fellows Yates has joined the channel

Anan Ibrahim (ananhamido@hotmail.com)

2022-07-29 10:11:10

@Anan Ibrahim has joined the channel

Andrea Musso (andrea.musso1995@gmail.com)

2022-07-29 10:11:10

@Andrea Musso has joined the channel

Brooklynn Scott (brooklynnscott00@gmail.com)

2022-07-29 10:11:10

@Brooklynn Scott has joined the channel

Darío Alejandro Ramirez (darioramirez092@gmail.com)

2022-07-29 10:11:10

@Darío Alejandro Ramirez has joined the channel

Davide Bozzi (davide.bozzi@unil.ch)

2022-07-29 10:11:10

@Davide Bozzi has joined the channel

Emily Gaul (emily_gaul@eva.mpg.de)

2022-07-29 10:11:10

@Emily Gaul has joined the channel

Emily Gaul (emilycsgaul@gmail.com)

2022-07-29 10:11:10

@Emily Gaul has joined the channel

Percy Ho (hei_chun_ho@eva.mpg.de)

2022-07-29 10:11:10

@Percy Ho has joined the channel

I-Ting Huang (ihuang@g.harvard.edu)

2022-07-29 10:11:10

@I-Ting Huang has joined the channel

Ina Wasmuth (ina.wasmuth@uni-jena.de)

2022-07-29 10:11:10

@Ina Wasmuth has joined the channel

Jaime Zolik (zolik006@umn.edu)

2022-07-29 10:11:10

@Jaime Zolik has joined the channel

Johnny Richards (s2052280@ed.ac.uk)

2022-07-29 10:11:10

@Johnny Richards has joined the channel

Kadri Irdt (kadri.irdt@gmail.com)

2022-07-29 10:11:10

@Kadri Irdt has joined the channel

Keri Burge (keri.burge@gmail.com)

2022-07-29 10:11:10

@Keri Burge has joined the channel

Laura Carrillo Olivas (laura-carrillo-olivas@hotmail.com)

2022-07-29 10:11:10

@Laura Carrillo Olivas has joined the channel

Louis Kraft (lokraf@dtu.dk)

2022-07-29 10:11:10

@Louis Kraft has joined the channel

Maria Lopopolo (maria.lopopolo1989@gmail.com)

2022-07-29 10:11:11

@Maria Lopopolo has joined the channel

Merlin Szymanski (merlin_szymanski@eva.mpg.de)

2022-07-29 10:11:11

@Merlin Szymanski has joined the channel

Mohamed Sarhan (mohamed.sarhan@eurac.edu)

2022-07-29 10:11:11

@Mohamed Sarhan has joined the channel

Nora Bergfeldt (nora.bergfeldt@gmail.com)

2022-07-29 10:11:11

@Nora Bergfeldt has joined the channel

Philomena Over (p.over@gmx.de)

2022-07-29 10:11:11

@Philomena Over has joined the channel

Pooja Mehta (poojamehta.microbio@gmail.com)

2022-07-29 10:11:11

@Pooja Mehta has joined the channel

Reed Morgan (reedmorgan@g.harvard.edu)

2022-07-29 10:11:11

@Reed Morgan has joined the channel

Sierra Blunt (sierrablunt97@gmail.com)

2022-07-29 10:11:11

@Sierra Blunt has joined the channel

Tre Blohm (tre.blohm@umontana.edu)

2022-07-29 10:11:11

@Tre Blohm has joined the channel

Yuti Gao (yuga3894@colorado.edu)

2022-07-29 10:11:11

@Yuti Gao has joined the channel

Alex Hübner (alexander_huebner@eva.mpg.de)

2022-07-29 10:11:11

@Alex Hübner has joined the channel

Alina Hiss (alina_naomi_hiss@eva.mpg.de)

2022-07-29 10:11:11

@Alina Hiss has joined the channel

aidanva (aida.andrades@gmail.com)

2022-07-29 10:11:12

@aidanva has joined the channel

Maxime Borry (maxime.borry@gmail.com)

2022-07-29 10:11:12

@Maxime Borry has joined the channel

Megan Michel (megan_michel@g.harvard.edu)

2022-07-29 10:11:12

@Megan Michel has joined the channel

Nikolay Oskolkov (nikolay.oskolkov@scilifelab.se)

2022-07-29 10:11:12

@Nikolay Oskolkov has joined the channel

James Fellows Yates (james_fellows_yates@eva.mpg.de)

2022-07-29 10:11:18

Good idea @Maria Lopopolo!

:headbangingparrot: Maria Lopopolo

Ina Wasmuth (ina.wasmuth@uni-jena.de)

2022-07-29 13:17:43

Could you please provide us with the link to the gather.town space?

James Fellows Yates (james_fellows_yates@eva.mpg.de)

2022-07-29 13:25:52

*Thread Reply:* Also good idea 🙂

James Fellows Yates (james_fellows_yates@eva.mpg.de)

2022-07-29 13:24:41

All information here: https://spaam-community.github.io/wss-summer-school/#/2022/

spaam-community.github.io

Description

Original URL: https://spaam-community.github.io/wss-summer-school/#/2022/

James Fellows Yates (james_fellows_yates@eva.mpg.de)

2022-07-29 13:24:53

Gather.Town link: https://app.gather.town/app/PlXjb0deog0B4JCq/spaam-community

app.gather.town

Gather is a video-calling space that lets multiple people hold separate conversations in parallel, walking in and out of those conversations just as easily as they would in real life.

Original URL: https://app.gather.town/app/PlXjb0deog0B4JCq/spaam-community

👍 Ina Wasmuth

Christina Warinner (warinner@shh.mpg.de)

2022-08-01 08:53:16

@channel Hi All! We’re meeting in the GatherTown Lecture Hall

Christina Warinner (warinner@shh.mpg.de)

2022-08-01 08:53:35

It’s the red room

Christina Warinner (warinner@shh.mpg.de)

2022-08-01 10:15:00

Hi All, I misspoke earlier - we have an index set of 195 F and 195 R that we use on rotation. Most labs don’t use quite this many - it really depends on your throughput

James Fellows Yates (james_fellows_yates@eva.mpg.de)

2022-08-01 10:21:52

@channel please can everyone move to the purple room!

Raphaela St (raphaela_stahl@eva.mpg.de)

2022-08-01 10:36:48

@Raphaela St has joined the channel

Christina Warinner (warinner@shh.mpg.de)

2022-08-01 11:26:28

Hi All, there were some great questions earlier about sequencing. Here are two additional tips: You want to match your expected DNA length to your sequencing chemistry. Illumina sequencing kits come in a few flavors depending on the instrument model, typically 100 cycle, 150 cycle, and 300 cycle kits. You can use these to sequence single end (SE) 100 bp, 150 bp, or 300 bp; or you can use them to do paired end (PE) sequencing of 2x50 bp, 2x75 bp, or 2x150 bp. Some sequencing centers also allow you to use the 300 cycle kit to do 2x100 bp. You want to match your sequencing chemistry to your DNA. For example, imagine you are sequencing a really important set of libraries on a NextSeq (with 2-color chemistry). Your TapeStation says that your mode DNA length is 60 bp (with some spread on either side). I’d recommend using 2x75 bp sequencing. That will allow you to sequence everything up to 150 bp, which is probably almost all of your data. This will maximize your sequencing efficiency and get high quality, high confidence data. You DO NOT want to sequence with 2x150 bp chemistry. Beyond being a waste of money (because you are paying for sequence data you won’t get), it will also reduce the calculated basecalling of the run and could cause it to fail. This is because by the time the instrument reaches cycle 120 or so, you will have probably already sequenced almost everything that is there and now most of the clusters will just be showing black. This will cause the instrument to have trouble locating the clusters, and the software is likely to interpret this as an error and stop the run or flag your run as failed. So, a good rule of thumb for ancient sequencing is as follows: 2x75 bp or 2x100 bp for ancient microbial DNA; 2x150 bp for modern DNA (for genomic DNA sheared to 500 bp). That will give you the best possible data.

Use paired end sequencing for microbial DNA. Although people doing human aDNA sequencing sometimes use SE sequencing, I STRONGLY recommend that you only use PE sequencing for microbial DNA. This is because de novo assembly works best with PE data. You can force de novo assemblers to use SE data, but it doesn’t work very well and will result in lower quality results. Also, by using PE data, you will achieve higher quality basecalling and have high quality sequences. So if you always PE sequence microbial DNA, you will have high quality data that you can also use to make MAGs (metagenome-assembled genomes).

👍:skin_tone_4: Anan Ibrahim

👍 Tre Blohm, Jasmin Frangenberg, I-Ting Huang, Davide Bozzi, Nikolay Oskolkov, Percy Ho

Christina Warinner (warinner@shh.mpg.de)

2022-08-01 15:48:52

@channel Hi All, we’re gathering now for the Roundtable! Please return and turn on your webcams

Christina Warinner (warinner@shh.mpg.de)

2022-08-01 15:49:29

And everyone please take a seat at one of the tables

Laura Carrillo Olivas (laura-carrillo-olivas@hotmail.com)

2022-08-02 10:19:18

since yesterday I tried a couple of times do this step but I'm stuck, $ mkdir images $ while read filepath; do > echo "${filepath}" images/$(basename ${filepath}) > # mv ${filepath} images/$(basename ${filepath}) > done < File_names.txt it created the file but is empty ¿filepath is a variable?, in the presentation I din't find it, or I need to put a specific path?

James Fellows Yates (james_fellows_yates@eva.mpg.de)

2022-08-02 10:25:02

*Thread Reply:* @aidanva 👆

aidanva (aida.andrades@gmail.com)

2022-08-02 10:28:00

*Thread Reply:* Hi!

aidanva (aida.andrades@gmail.com)

2022-08-02 10:30:33

*Thread Reply:* so ${filepath} is indeed a variable, that stores a line of the File_names.txt

aidanva (aida.andrades@gmail.com)

2022-08-02 10:31:42

*Thread Reply:* the while loop will read one line at a time from File_names.txt, which will modify ${filepath} to be one line each time

aidanva (aida.andrades@gmail.com)

2022-08-02 10:32:05

*Thread Reply:* what does your file File_names.txt contain?

Laura Carrillo Olivas (laura-carrillo-olivas@hotmail.com)

2022-08-02 13:28:27

*Thread Reply:* is empty

aidanva (aida.andrades@gmail.com)

2022-08-02 13:29:54

*Thread Reply:* ah, that's why the while loop is not printing anything, you will need to run this before: suffix="jpg" find Boosted-BBB/ -type f -name "**${suffix}" > File_names.txt

aidanva (aida.andrades@gmail.com)

2022-08-02 13:30:16

*Thread Reply:* and check if the File_names.txt contains the path to your jpg files

aidanva (aida.andrades@gmail.com)

2022-08-02 13:30:35

*Thread Reply:* let me know if not, and we can continue to debug 🙂

Laura Carrillo Olivas (laura-carrillo-olivas@hotmail.com)

2022-08-04 23:22:48

*Thread Reply:* Thank you very much!! Do all the steps again and it worked!! To which email do I send the new script for the image sorting with new folders?

:mask_parrot: aidanva

James Fellows Yates (james_fellows_yates@eva.mpg.de)

2022-08-05 07:58:34

*Thread Reply:* aida_andrades@eva.mpg.de

👆 aidanva

aidanva (aida.andrades@gmail.com)

2022-08-05 08:27:35

*Thread Reply:* The one James indicated 🙂

James Fellows Yates (james_fellows_yates@eva.mpg.de)

2022-08-02 13:24:27

@channel Poll time:

Polly

2022-08-02 13:24:54

@James Fellows Yates has a polly for you!

Christina Warinner (warinner@shh.mpg.de)

2022-08-03 10:25:11

@channel I realized after the talk that I described PCoA and CLR slightly incorrectly in the talk, so I updated the corresponding slide in the presentation (slide 70) with the correct information. Sorry for the confusion!

🙌 Mohamed Sarhan, Jasmin Frangenberg, Davide Bozzi, James Fellows Yates, Yuti Gao, Nikolay Oskolkov

James Fellows Yates (james_fellows_yates@eva.mpg.de)

2022-08-04 14:33:35

I just saw that we have a few gather.town twins!

@Ina Wasmuth and @Sierra Blunt as blondies

And Markus and @Tre Blohm as beardy bros

IMG_20220804_143146255.jpg

IMG_20220804_143130675.jpg

😂 Nikolay Oskolkov, Alina Hiss, Sierra Blunt, Laura Carrillo Olivas

🧔 Tre Blohm

😎 Jasmin Frangenberg, Yuti Gao

James Fellows Yates (james_fellows_yates@eva.mpg.de)

2022-08-05 08:45:56

@channel for thos who were interested in the Git setup re-do next week: please indicate your availability here - https://www.when2meet.com/?16311979-XsKID (all times Berlin times, the session would only be 1h tops, and we could try to do it on your own personal laptops/servers )

Nikolay Oskolkov (nikolay.oskolkov@scilifelab.se)

2022-08-05 15:31:06

Thank you so much for the outstanding organization and content! I learnt so much exciting things....and I love gathertown! 🙂

❤️ Laura Carrillo Olivas, Jasmin Frangenberg, Sierra Blunt, Maria Lopopolo, Yuti Gao, James Fellows Yates, Raphaela St, Christina Warinner

💯 Anan Ibrahim, James Fellows Yates, Jasmin Frangenberg

Yuti Gao (yuga3894@colorado.edu)

2022-08-05 16:00:08

Thankssssss for organising this boot camp, I were searching for materials to learn about how to do ancient microbial analysis and then saw this summer school, too good to be true! got inspired a lot and it’s so fun to see people from many different places sharing the similar science interest! 🍻

❤️ Yuti Gao, James Fellows Yates, Nikolay Oskolkov, Raphaela St, Christina Warinner

James Fellows Yates (james_fellows_yates@eva.mpg.de)

2022-08-05 20:06:11

I'm glad you enjoyed it and found it useful! Now go out and spread the knowledge!!

👍 Nikolay Oskolkov

James Fellows Yates (james_fellows_yates@eva.mpg.de)

2022-08-08 14:19:00

@channel the time (that isn't this afternoon) that most people who were interested in the re-do of setting up the git session is this Thursday - August 11th at 11am!

Please mark that in your calendars. We will meet in gather.town again 🙂

You're also welcome to join if you didn't fill in the when2meet or the poll!

We will be setting up hte SHH keys properly for you all on your laptops/computers/servers, whereever you want 😉

👀 Maria Lopopolo

👍 Mohamed Sarhan, Laura Carrillo Olivas

James Fellows Yates (james_fellows_yates@eva.mpg.de)

2022-08-08 14:19:30

PM your google account if you want a google calendar invit

👍 Laura Carrillo Olivas

Laura Carrillo Olivas (laura-carrillo-olivas@hotmail.com)

2022-08-09 14:31:38

lau.carrillo.olivas89@gmail.com

James Fellows Yates (james_fellows_yates@eva.mpg.de)

2022-08-09 14:32:03

*Thread Reply:* Invited you can delete this messag eif you want 🙂

James Fellows Yates (james_fellows_yates@eva.mpg.de)

2022-08-11 11:00:58

@channel we are starting teh git v2 thing (/personal git set up)

James Fellows Yates (james_fellows_yates@eva.mpg.de)

2022-08-11 11:05:19

@Pooja Mehta @Andrea Musso @Kadri Irdt @Laura Carrillo Olivas if you're around

Pooja Swali (swalipooja@gmail.com)

2022-08-11 11:05:22

@Pooja Swali has joined the channel

James Fellows Yates (james_fellows_yates@eva.mpg.de)

2022-08-11 11:05:35

Oops sorry wrong Pooja - Sorry @Pooja Swali you can leave this cahnnel 🙂

🥲 Pooja Swali

James Fellows Yates (james_fellows_yates@eva.mpg.de)

2022-08-11 11:42:16

We are still here for another twenty minutes!

Mohamed Sarhan (mohamed.sarhan@eurac.edu)

2022-08-11 16:35:46

Thanks a lot @James Fellows Yates and @Megan Michel for the Github session today ☺️

❤️ Megan Michel

Laura Carrillo Olivas (laura-carrillo-olivas@hotmail.com)

2022-08-12 00:31:09

I'm so sorry!! I didn't hear my alarm 🙁and didn't wake me up!! But I will check everything and if I have any doubts I will not hesitate to tell you, thank you so much!! ❤️

James Fellows Yates (james_fellows_yates@eva.mpg.de)

2022-08-12 08:20:54

*Thread Reply:* Don't apologise, I was surprised you inidicated you make that time! If you have time your morning/my afternoon today, we could quickly meet if you want?

Laura Carrillo Olivas (laura-carrillo-olivas@hotmail.com)

2022-08-12 19:51:23

*Thread Reply:* I just saw the message, I can any day after 3:00 pm Berlin time 😊

James Fellows Yates (james_fellows_yates@eva.mpg.de)

2022-08-12 20:02:47

*Thread Reply:* Shall we book 15:00_15:45 on Monday (15th)?

Laura Carrillo Olivas (laura-carrillo-olivas@hotmail.com)

2022-08-12 20:04:23

*Thread Reply:* yes! Perfect!

James Fellows Yates (james_fellows_yates@eva.mpg.de)

2022-08-12 20:07:08

Ok, if anyone else is interested in getting help setting up their GitHub shh keys on their servers/laptops, we will do one more git session on Monday 15ty august at 15:00_15:45 Berlin time

James Fellows Yates (james_fellows_yates@eva.mpg.de)

2022-08-15 14:57:51

I'll bei n gather in 2 mitnues

irinavelsko (irinavelsko@gmail.com)

2022-08-22 16:35:58

Hi All, a nice summary of how to approach functional analysis was just published in PLOS Computational Biology

I would add to Tip 5 - always perform an effect size calculation. Remember that a p-value has NO biological meaning. Smaller p-values are not "more significant", they mean your observations are less likely to be due to chance. Effect size tests tell you the size of the difference between groups, so look for bigger effect sizes. A gene/pathway with a big effect size and p > 0.05 may be more interesting/informative than a gene/pathway with a small effect size and a very small p-value. (p < 0.05 was arbitrarily selected as a standard cut-off anyway, so don't throw out "non-significant" data till you've looked at effect sizes)

https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1010348

journals.plos.org

Nine quick tips for pathway enrichment analysis

Pathway enrichment analysis (PEA) is a computational biology method that identifies biological functions that are overrepresented in a group of genes more than would be expected by chance and ranks these functions by relevance. The relative abundance of genes pertinent to specific pathways is measured through statistical methods, and associated functional pathways are retrieved from online bioinformatics databases. In the last decade, along with the spread of the internet, higher availability of computational resources made PEA software tools easy to access and to use for bioinformatics practitioners worldwide. Although it became easier to use these tools, it also became easier to make mistakes that could generate inflated or misleading results, especially for beginners and inexperienced computational biologists. With this article, we propose nine quick tips to avoid common mistakes and to out a complete, sound, thorough PEA, which can produce relevant and robust results. We describe our nine guidelines in a simple way, so that they can be understood and used by anyone, including students and beginners. Some tips explain what to do before starting a PEA, others are suggestions of how to correctly generate meaningful results, and some final guidelines indicate some useful steps to properly interpret PEA results. Our nine tips can help users perform better pathway enrichment analyses and eventually contribute to a better understanding of current biology.

Original URL: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1010348

🙌 Nikolay Oskolkov, I-Ting Huang, James Fellows Yates, Maria Lopopolo

👍 Laura Carrillo Olivas

Mohamed Sarhan (mohamed.sarhan@eurac.edu)

2022-08-25 23:17:42

Hi folks! I need your help 🙂 We have received comments on one of my PhD manuscripts and one of the reviewers insisting that we have to use MALT for taxonomic classification. I tried to use malt-build (v 0.5.0) on ~16,000 bacterial genomes, but it keeps giving a memory error. I allocated even more memory, up to 1TB 🤯, and still getting the same error (as shown below). Do you have any advice? Best Number input files: 16,637 Loading FastA files: 10% 20% 30% 40% 50% 60% 70% 100% (827.0s) java.lang.OutOfMemoryError: Java heap space at malt.io.FastAFileIteratorBytes.next(FastAFileIteratorBytes.java:155) at malt.data.ReferencesDBBuilder.loadFastAFile(ReferencesDBBuilder.java:151) at malt.data.ReferencesDBBuilder.loadFastAFiles(ReferencesDBBuilder.java:134) at malt.MaltBuild.run(MaltBuild.java:226) at malt.MaltBuild.main(MaltBuild.java:57) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at com.exe4j.runtime.LauncherEngine.launch(LauncherEngine.java:65) at com.install4j.runtime.launcher.UnixLauncher.main(UnixLauncher.java:57)

James Fellows Yates (james_fellows_yates@eva.mpg.de)

2022-08-26 06:30:27

*Thread Reply:* Hi Mohammed - depending on how you initially did your classification you may be able to push back against using MALT.

MALT is infamous for being memory intensive outside labs' computational resources, which is a large blocker for most labs. You can easily reply that to the editor and reviewer saying that.

In this case your only solution is to reduce the number of reference genomes you're inputting into your database. If you have not done it already, you could make sure you're only picking one representative per genome, for example.

However, you have to remember that MALT is not particularly special, the only benefit is that it's maybe slightly more specific because it does an alignment rather than slightly fuzzy kmer matching, and with that alignment you can generate damage plots etc. But you can also just generate alignments by mapping your self afterwards. It performs LCA just as e.g. Kraken does it, just kraken is more sensitive so you'll pick up more false positives - but you can just raise your support threshold a bit.

Furthermore both MALT 0.5.* and MALT 0.4.* are both actually broken:

http://megan.informatik.uni-tuebingen.de/t/lca-placement-failure-with-malt-v-0-5-2-and-0-5-3/1996 http://megan.informatik.uni-tuebingen.de/t/unable-to-change-the-default-coverage-parameter-for-naive-lca-assignment-with-malt-v-0-4-1/2032

This would force you to use an ancient version (0.3.8) but this will mean you will have to use a very out of date taxonomy as the version doesn't have updated Megan files anymore.

So ultimately, depending on why your reviewer wants you to run MALT you can quite robustly just say: we can't because it's broken and is outside our computational capacity (and that's with a 1TB node!)

MEGAN Community

Unable to change the default coverage parameter for naive LCA assignment with malt v. 0.4.1

Dear developers, Following the issue I encountered with malt v. 0.5.* (which is described here: LCA placement failure with Malt v. 0.5.2 and 0.5.3), I tried to switch back to v. 0.4.1. I encountered a different issue with this version, which is the following: By default, the LCA placement appears to be made with a “naive algorithm” and 80% “coverage”, as stated by the malt-run log: Using 'Naive LCA' algorithm (80.0 %) for binning: Taxonomy If I understand correctly (also from some testing ...

Original URL: http://megan.informatik.uni-tuebingen.de/t/unable-to-change-the-default-coverage-parameter-for-naive-lca-assignment-with-malt-v-0-4-1/2032

👍 Mohamed Sarhan

irinavelsko (irinavelsko@gmail.com)

2022-08-26 07:49:03

*Thread Reply:* Hi @Mohamed Sarhan, I would also add that you can argue that MALT produces a similar taxonomic table to Kraken, especially after filtering out low abundance taxa, despite performing alignment rather than k-mer matching, by citing this paper. I used CLARK-S, which performs k-mer matching highly similar to Kraken. https://journals.asm.org/doi/full/10.1128/mSystems.00080-18

👍 Mohamed Sarhan

Nikolay Oskolkov (nikolay.oskolkov@scilifelab.se)

2022-08-26 08:52:28

*Thread Reply:* Hi @Mohamed Sarhan, I agree with @James Fellows Yates and @irinavelsko that the lists of detected microbes are typically rather similar for Kraken and MALT providing you use similar databases (same organisms) for both. However, what Kraken absolutely cannot do is authentication, so you need some alignments for following up detected microbes. Bowtie2 alignments might be good enough for validation / authentication but they lack LCA, that is a drawback.

Now, regarding your technical issue, you can increase MALT java heap space by manually modifying the -Xmx flag in malt-build.vmoptions file which is located in /opt/malt/class folder in your malt installation. By default, I believe it is "-Xmx512m" but you can specify "-Xmx1000G" for example. Please try it and let me know whether it has worked

👍 James Fellows Yates, Mohamed Sarhan

James Fellows Yates (james_fellows_yates@eva.mpg.de)

2022-08-26 08:54:37

*Thread Reply:* Oh you can also try increasing the step size of the seeds in malt build

👍 Nikolay Oskolkov, Mohamed Sarhan

James Fellows Yates (james_fellows_yates@eva.mpg.de)

2022-08-26 08:55:01

*Thread Reply:* I think Ron and Felix found you can reasonably go down to 8 with minimal loss of sensitivity

Nikolay Oskolkov (nikolay.oskolkov@scilifelab.se)

2022-08-26 08:57:15

*Thread Reply:* Hmm, yes, I think --step 1 is the default. We tried --step 2 and --step 3, if I remember correctly, it does decrease the database size but from what we saw, it also decreases the accuracy of taxonomic classification. I do not recognize --step 8 🙂

Mohamed Sarhan (mohamed.sarhan@eurac.edu)

2022-08-26 15:20:49

*Thread Reply:* Thank you so much for your constructive replies. Really appreciate your help 😊 Thank you @James Fellows Yates, that's so convincing - It makes no sense to go back and use an outdated taxonomy. We used DIAMOND/MEGAN, MetaPhlAn3, and Kraken2/Bracken for taxonomic classification check, but include in the manuscript only the DIAMOND/MEGAN results. I will include these arguments and the paper @irinavelsko linked here. I hope that would be enough to convince the reviewer. Thank you @Nikolay Oskolkov - The "Xmx" was set to 64G, now changed it to 800G and it is running with the default step size 👍

👍 James Fellows Yates, Nikolay Oskolkov

James Fellows Yates (james_fellows_yates@eva.mpg.de)

2022-08-26 15:21:19

*Thread Reply:* Ok - DIAMOND could be your issue there and why the reviewer is asking for MALT

James Fellows Yates (james_fellows_yates@eva.mpg.de)

2022-08-26 15:21:32

*Thread Reply:* DIAMOND will not work well with short aDNA reads

James Fellows Yates (james_fellows_yates@eva.mpg.de)

2022-08-26 15:21:43

*Thread Reply:* because it translates to very short amino acid sequencs and will not be specific enough

James Fellows Yates (james_fellows_yates@eva.mpg.de)

2022-08-26 15:22:04

*Thread Reply:* https://peerj.com/preprints/27166/

PeerJ Preprints

Assessing alignment-based taxonomic classification of ancient microbial DNA

Original URL: https://peerj.com/preprints/27166/

👍 Mohamed Sarhan

James Fellows Yates (james_fellows_yates@eva.mpg.de)

2022-08-26 15:22:17

*Thread Reply:* (which uses a BLASTX moe of MALT, but similar thing)

Mohamed Sarhan (mohamed.sarhan@eurac.edu)

2022-08-26 15:37:26

*Thread Reply:* Agree, this could be the reason - We will add the output of MetaPhlAn3 and Kraken2 as well. For the DIAMOND, we use it against the NCBI-nr database, that's why we like it because it gives a comprehensive picture on everything we have in our samples (Human DNA, microbiome, and dietary components). Then we keep going with further confirmation with specialized curated databases.

Mohamed Sarhan (mohamed.sarhan@eurac.edu)

2022-08-29 17:13:26

*Thread Reply:* Just to inform you, here is a comparison between the the bacterial assigned reads using MALT/BLASTn against the representative bacterial genomes (~16,600 genomes) and DIAMOND/BLASTx against the NCBI-nr database. The numbers of assigned reads are different from sample to sample. Looking forward to discussing this more in details during the upcoming SPAAM4 🙂

👍 Nikolay Oskolkov

Nikolay Oskolkov (nikolay.oskolkov@scilifelab.se)

2022-08-29 17:33:25

*Thread Reply:* @Mohamed Sarhan so did you manage to build the Malt DB?

Mohamed Sarhan (mohamed.sarhan@eurac.edu)

2022-08-29 17:38:36

*Thread Reply:* Yes, @Nikolay Oskolkov.. Thanks to your suggestion 🙏. It worked once we changed the the file you referred to. It needed ~700GB to build it.

👍 Nikolay Oskolkov

:mask_parrot: James Fellows Yates

Nikolay Oskolkov (nikolay.oskolkov@scilifelab.se)

2022-08-29 17:40:29

*Thread Reply:* Good! It is interesting that MALT / BLASTN does not always assign more reads than DIAMOND / BLASTX

James Fellows Yates (james_fellows_yates@eva.mpg.de)

2022-08-29 17:44:35

*Thread Reply:* Indeed... Do you have any read lengtg stats?

👍 Mohamed Sarhan

Mohamed Sarhan (mohamed.sarhan@eurac.edu)

2022-08-29 18:03:38

*Thread Reply:* Here is a read-length distribution for 4 of the samples (Just plotted them now 😄)

James Fellows Yates (james_fellows_yates@eva.mpg.de)

2022-08-29 18:04:42

*Thread Reply:* Hm ok that's not what I expected

Nikolay Oskolkov (nikolay.oskolkov@scilifelab.se)

2022-08-29 18:10:14

*Thread Reply:* Hmm, I would not use reads below 30 bp, but those I believe should not be assigned by DIAMOND to any organism at all because they are too short. So this does not explain why DIAMOND gives more assigned reads than MALT for some samples. Were all adapters trimmed prior to mapping with MALT?

👍 James Fellows Yates

Mohamed Sarhan (mohamed.sarhan@eurac.edu)

2022-08-29 18:12:04

*Thread Reply:* Yes, these are already quality-filtered adapter-trimmed merged deduplicated reads

Mohamed Sarhan (mohamed.sarhan@eurac.edu)

2022-08-29 18:26:19

*Thread Reply:* In my opinion it might have to do with the database itself and the sample microbial composition. I think the default word-size for the BLASTx is 6 and can be adjusted to 3 or 2 (I'm not sure about DIAMOND/BLASTx word size), but if it is so, it could mean the short-fragments of ~18 nt can be still seeded and assigned (Just in theory). What do you think?

Nikolay Oskolkov (nikolay.oskolkov@scilifelab.se)

2022-08-29 19:06:27

*Thread Reply:* @Mohamed Sarhan I do not know about word size, but it seems plausible to me that this effect has to do with the NR/NT (used for Diamond) vs. 16 000 genomes (used for Malt). Since the former is much bigger, that might indeed result in more reads assigned by Diamond (higher sensitivity)

👍 James Fellows Yates, Mohamed Sarhan

Laura Carrillo Olivas (laura-carrillo-olivas@hotmail.com)

2022-09-06 00:28:32

Hello everyone! I was looking, but I only found the pdf of the lessons in the link that you sent us, could you share the link with the recordings please? I really wish I could see them again =)

James Fellows Yates (james_fellows_yates@eva.mpg.de)

2022-09-06 06:08:05

They are coming VERY soon! Don't worry!

👍 Nikolay Oskolkov, Jasmin Frangenberg

❤️ Laura Carrillo Olivas, Maria Lopopolo

James Fellows Yates (james_fellows_yates@eva.mpg.de)

2022-10-12 23:00:35

Very soon being ™️, only one and a half things missing now

❤️ Laura Carrillo Olivas

irinavelsko (irinavelsko@gmail.com)

2022-10-13 08:07:12

Hi EVA sediment people! SPAAM4 is happening this week and some of the pathogen and microbiome people are setting up a group viewing, maybe at the institute, that you're welcome to addend. We didn't see anyone from the EVA sedaDNA groups registered, but also don't know how to reach everyone in one go, so if you can let the rest of your groups know we'd appreciate it (it's a no-PI conference, and a perfect chance to hear about the latest in ancient metagenomics from the PhDs and postdocs doing the work, and to meet people in your field across the world!).

👍 Nikolay Oskolkov

irinavelsko (irinavelsko@gmail.com)

2022-10-13 13:53:35

Follow-up for EVA people, we'll be in either the Aquarium or Terrarium, depending on what's available, at 4pm when SPAAM4 starts today. Feel free to join us there!

Laura Carrillo Olivas (laura-carrillo-olivas@hotmail.com)

2023-10-09 19:06:26

Hi!! help please! could you teach me how can I get and plot the percent identity of my bam file? It is calculated before or after the capture of the pathogen?

Maxime Borry (maxime.borry@gmail.com)

2023-10-10 11:05:45

Hey @Laura Carrillo Olivas I’d encourage you to ask further questions in the the <#C02DCKJ54JX|no-stupid-questions> channel, you’re more likely to get an answer there 😉 Regarding the percent identity, you can retrieve it using the NM/MD tags of your bam file, the alignment length, and the read length. You can do it either using samtools and write a parser yourself, or better, use a library such as pysam https://pysam.readthedocs.io/en/stable/index.html

The doc of the sam/bam file format is also very informative when you look for these kind of informations https://samtools.github.io/hts-specs/SAMtags.pdf

:mask_parrot: Laura Carrillo Olivas

Laura Carrillo Olivas (laura-carrillo-olivas@hotmail.com)

2023-10-10 13:35:39

*Thread Reply:* thank you!!! ❤️