@James Fellows Yates has joined the channel
@Anneke ter Schure has joined the channel
@Pete Heintzman @Anneke ter Schure I've added you both as you've said you work on sedaDNA in the past, but feel free to leave if you're not interested. But @Becky Cribdon is looking for some help reviewing some papers for AncientMetagenomeDir to check the metadata she's added is correct. As you both joined the AncientMetagenomeDir channel I'm assuming you may have been interesting in helping out. Would either of you have time to help Becky?
Hi @James Fellows Yates and @Becky Cribdon, Sure - can devote some time to this next week. What needs doing, Becky?
Becky needs a review on: https://github.com/SPAAM-workshop/AncientMetagenomeDir/pull/162
However I will need to add to our spaam github organisation. If you send me your username I will add it, and they Becky can take it from there!
Thanks Pete! Great to hear from you again.
Actually, I've just added Smith 2015, which was much simpler than the Slon samples and would make an easier first review 🙂 https://github.com/SPAAM-workshop/AncientMetagenomeDir/pull/167
@Antonio Fernandez-Guerra has joined the channel
@Pete Heintzman while I'm looking at github, small reminder to send me your username to the org, then you can do the review for Becky
Ah wait, foudn you
> You've invited Pete Heintzman to SPAAM-workshop! They'll be receiving an email shortly. They can also visit https://github.com/SPAAM-workshop to accept the invitation.
How are you guys dealing with negative/positive control samples?
We are excluding controls at the moment
This becomes even more messy than actual samples unfortunately, so I've decided to leave that aside for the moment. It will be eaiser to retrieve them in the future though as we can use the ERS codes to backtrack to project level and grab them
First Issue to merge I didn't have to do anything 😍 thanks guys
🤝 @Pete Heintzman (can't find a hi-five)
@James Fellows Yates: what are the rules on unpublished metadata for a published sample? ie. if one has access to sample metadata that was not included in the original publication, and has not been published since, is this ok to include?
No, must be published in some open archive
i.e. if it wasn't in the original publication but is on the ENA/SRA that's ok
I get nervous about 'data property' if it's not open like that
We can always update it in the future if the authors publish it somehwere else though
Thanks, and good point re. data property!
Particularly when it comes to lat:lons, I've seen bad cases where that has been reported then the site looted 😞
I think the rule regarding radiocarbon dates will need to be relaxed for lake/marine sediment cores, as there are often far fewer direct dates than DNA samples (and generally very low overlap between directly dated and DNA layers). This is because bioturbation/reworking is far less of an issue, as compared to cave/archaeological sediments, and so interpolation using age-depth models is most often used to date samples.
Separately re. Graham2016/Wang2017 sample ages: There is no table of sample ages, but there are ages for some of the sample depths reported in Wooller2018 (different pub), although the exact data are in a separate repository (with their own DOI...). Have included both the publication and dataset DOIs for these sample ages. For the remaining sample ages, these have been inferred from Figure 5 in Wang2017 (and by cross-checking the data points against values in Wang2017 Appendix 2, and using Graham2016 Figure 2 for additional guidance).
Hmm. I see your point however it's not really good practise to start specifying 'this type of material' allows X and this type of material allows Y, as it isn't then consistent in how one should interpret the list as a whole.
I would rather make a exception for the environmental list, to allow undated layers (assuming they are from a sequence) even if they don't have dates; assuming that the sample names have a logical order and could be reconstructed.
I almost wonder if we should include depths now (as we decided against before, right @Becky Cribdon)?, but not for the purpose of comparison between datasets, but rather to help get around missing dates? What do you think (@Antonio Fernandez-Guerra too).?
I'm not sure what is useful for you guys as I'm not in the field...
For Graham, that is really ugly reporting by authors 😕. But OK, I'm not sure this will pass our regex checks atm. I think it would be better to pick the most recent or more precise reporting of the date. I believe Figshare also allows linking back to the original publication?
*Thread Reply:* Yeah - that was my bad. For context, the final ages were calculated late on in the work, but the sample depths (which we using to cross-correlate all proxies) remained static throughout. re. FigShare: and vice versa, so probably ok to just cite the publication DOI if only one DOI is desirable.
Depths: good question. Would depths in the 'Dir be enough for someone reading it to see, like, this sample isn't dated, but in the sequence it is between these two which are dated, so I'll assume it's close enough to the date I'm looking for? If so, I think it would help to say what sequence samples belong to, so something like core as well as just depth?
Perhaps the exception for the environmental list could be to allow dates from age-depth models where direct dates aren't available, rather than us including depths and having the reader interpret? Or would that be like the problem of calibrated/uncalibrated dates again, where the models (and inferred dates) could change over time?
*Thread Reply:* I found this with the Armbrecht paper: depth was given for all the samples but age was only given for the deepest sample. Since all the samples were recent I was able to put down 100 years for the shallower samples too, but not sure what I would have done otherwise.
I haven't worked with age-depth models myself (thanks coronavirus)
I don't really know what age-depth models are (predicted, I guess?). But yes I guess it would be...
IBut good point about the core. Is there a standard in the ways hese are reported?
on the otherhand, core
wouldn't necessarily work for cave sediments where they are taken from an excavation trench face...
Unless we just ask please use sample names that include core/transect in it 🤔
@Pete Heintzman @Antonio Fernandez-Guerra thoughts?
This might mean we have to go back and check already submitted papers, but better to do this nw befroe we get too many more samples
We could request sample name must include core/transect names in addition to specific sub-sample for DNA name (where possible), and if some samples in a specific core that has got dates , 'inferred' dates of layers between dated layers are allowed?
I think that would also work
I need to think about it… i don’t like add too many semantics on the sample names
Can we assume people normally do have the core in the sample name?
Is that common in the field?
I don’t know, @Pete Heintzman and @Becky Cribdon are more familiar to the sedaDNA field than I do
I always think with a DB oriented mind
Yeah, I dunno. It's difficult. Like in @Shreya’s pedersen paper the 'sample alias' listed in ENA is never used in the paper itself
*Thread Reply:* Threw me for a loop but very grateful for @Pete Heintzman helping me link IDs to ages!!
And is completely uninformative
(A bit like the BWINDI-1 in Campana)
But the submitted FASTQ files have informative sample IDs
Core names are usually a bit too long to include if you want workable sample names in the lab, so I think most people use a different indication like an abbreviated location name
Ah yeah, thanks for pointing that out @Anneke ter Schure. I didn't mean the literal core name but sample names with informative codes like the abbreivation
So Pedersen have CHLXXX and SPLXXX each lake
Yes so an abbreviation is common, but I'm not sure if it's standard
I wonder if we might just have to allow synthetic construction of combinations of sample names reported in the paper despite it being messy for databases @Antonio Fernandez-Guerra....
*Thread Reply:* I think this may be the way to go.
*Thread Reply:* I don’t know if this is possible, but can we create this field with the github actions?
*Thread Reply:* What do you mean? I mean the person submitting must work that out
*Thread Reply:* You have to take it from the publication. A lot of people don't give sample alias/library names in the ENA/SRA submission
*Thread Reply:* I mean if we have the columns defined, when doing the PR if we can have a small function that combines the different fields
Anyway, as I'm not in the field I won't make the decision here. @Becky Cribdon as biggest contributor you're in charge of that 😉
what we do, usually we have a function in the DB that constructs those artificial names so it is easy to manage and fix mistakes. If you ask the curators to construct those names you are prone to have errors
specially if they are going to be created manually
Sidenote @Pete Heintzman I'm merging current master into your branch to check the new regex to prevent multiple DOIs
Including depth data somewhere is desirable, as this is static (even if the ages are updated by future works).
If including this in sample_name, rather than a separate data entry, then how about standardizing samples as:
[Core name/code][depth][sample-name1|sample-name2|...] or [Cave name][stratum][sample-name1|sample-name2|...]
If we do this, then we need to decide on what a ‘sample’ is (ie. are sediment sub-samples from the same layer considered the same sample?). I would suggest a sample is all data from a specific sediment layer. (apologies if this has been discussed earlier)
Happy to explain how age-depth models work, if anyone is still interested...
So, to bring back Antonio's suggestion, we could add a column for sediment samples for core/cave (sequence) name/code, and a column for depth, and GitHub would then merge those into a standardised sample name?
But not "sample name" because that's already a field.
(I'd like to know how age-depth models work please Pete 😁 )
*Thread Reply:* @channel: here is a good primer on age-depth modelling used for dating sedimentary sequences:
I am quite concerned of trusting people on the merging
We don’t want to have bad data
then if we/someone spots an error, we need to go back to change the field + name
Github actions and modifying PR can be dangerous unfortunately and would take time to fix.
But we already allow retroactive changes/corrections in new releases so I don't think it's so bad. Particularly as we are 'making up' sample names here anyway....
So manual merging is fine? If so, separate columns may be overkill.
I would keep separate columns then, it helps fixing errors and also is data we should consider including
So what are the two clumns then?
it can seem a good idea of encoding names with lot of semantic data but in the long term is painful
if we have this names, I would keep the columns where the fields in the sample construction are coming from
Yeah @Pete Heintzman’s idea is nice but I tihnk we might be getting too-far away from the original publication (so look up would be difficult). What I meant was that if there is no 'nice' name in the publication in some form that matches both what is referred to in the pbulication and in the ENA/SRA, we allow a contirbutor to synthetically merge so a reader can infer both the sample and date/depth/core info
And was there a positive consensus on having a depth column?
Because that makes it easier I guess. Then it's just making sure the sample id has a core+(sub)sample ID
In case there are undated cores in a given publication (Which can't be used)
then we can validate the artificially created id, if there is any
I've got another meeting now: @Becky Cribdon your turn 😉
I'm working on a prototype table and README. Watch this space.
It's a good start but personally feel that the standardised column name is redundent
As the information is otherwise already there in the toher columns
I would rather have the sequence, depth and sample_name, but we recommend that if multple sample names are reported, go for the one that is more informative (where possible)
So if you have options of BWINDI-1 vs DNA-MUB17-2C-6L-SAMPLE43, go with the latter
Or to use the example from Pedersen 2016 use CHL13211317 vs ICF-10 (as ICF are for the whole study, and CHL is the lake name)
Thoughts @channel ? Please see Becky's suggestion and my porposed modification to drop the standardised name.
I think we can also specify that there must be at least two direct dates in a given sequence, but then 'inferred' ages of given layers are then allowed in that sequence. How does that sound?
I agree with @James Fellows Yates’ suggestions - samplenamestandardised is unnecessary, and can easily be recreated by a user, if needed. Perhaps change ‘sequence’ to ‘sedimentary_sequence’, to avoid any confusion with DNA sequence.
How about this (the addition under sample age may not be clear enough):
*Thread Reply:* Btw, you could always make a draft PR, so you once everyone agrees we can sent it straight in
*Thread Reply:* And some tweaks (which is why I suggest the draft PR ;)):
```## sedimentary_sequence
If not reported, NA```
*Thread Reply:* ```## sample_name
> ⚠️ Mandatory value```
*Thread Reply:* otherwise I'm happy with it!
*Thread Reply:* Agree - this looks good!
*Thread Reply:* @Becky Cribdon time for a PR! We will need to work out how to retroactively go back through published papers though...
*Thread Reply:* Changes noted and draft PR created 🙂
I just realised someone cited the following review in the preprint, would someone be willing to go through it and pick out papers it cites that we might be missing?
Edwards, M. E. (2020). The maturing relationship between Quaternary paleoecology and ancient sedimentary DNA. Quaternary Research, 96, 39–47. https://doi.org/10.1017/qua.2020.52
I have a feeling we are still sort of slim on the sedaDNA papers. Wasn't that even what Eske Willerslev started on (a while ago)
The vast majority of sedaDNA paper, including the majority of papers cited by Edwards, are based on metabarcoding.
Oh really?
(that includes Eske's massive 2014 paper)
there are not so many with shotgun
but slowly is changing
I'm surprised. I thought there was relatively big interest/utility/funding in that
There is, but there are a lot of technical challenges
There are also a lot of studies underway
I thought as well when started with sedaDNA
A lot will be published in the next year or two
when I switched I was surprised by the lack of computational methods (my thing)
specially for functional analyses
Going back to Edwards (2020), the only two refs of note are: • Parducci et al. (2019), but this looks to be the same data as Ahmed et al (2018) - the latter is in the database, but may want to check for updated metadata. • Lammers et al. (2020) - this is a cool one, but currently a preprint. A revision has been submitted, and we are hoping it will be accepted soon...
we are developing some new stuff to recover i.e plant traits from the shotgun data
and ways to authenticate the DNA at the same time we assemble, so no need of mapping
(Back to the 'Dir: if anyone fancies a fairly straightforward environmental review, perhaps your first, Braadbaart2020 was quite well reported as they go)
Ooh, we're working on authentication too: something like mapDamage for metagenomics. What does yours do?
this doesn’t work like mapDamage
we have three different lines, one based on Deep Learning, one optimization of mapDamage for large datasets, and the one that works at the assembly level with a hybrid approach with nucleotides and amino acids to access to the low abundant ones
@Pete Heintzman would you be OK adding me to the environemntal core team on github? Then Becky can for example just ping that team and hopefully one of the people can do a review, which would help with task distribution
*Thread Reply:* Sure… but i have no idea how to that. Pointers?
*Thread Reply:* you don't need to do anything
*Thread Reply:* @Becky Cribdon @Pete Heintzman made you a new team. So in the future people on submitting environmental PRs can go @thedir-team-dirt for reviewers or help, and you both get notificaitons.
*Thread Reply:* Heh heh, team dirt. I like it.
*Thread Reply:* Shame it's only us three.
*Thread Reply:* WE will get thre 💪
*Thread Reply:* I'm wondering if I should submit a second abstract to ISBA about this...
*Thread Reply:* And if there are any sedaDNA conferences feel free to present there about this!
*Thread Reply:* Abstract about what precisely?
*Thread Reply:* Tell people it's here, what it does, etc
*Thread Reply:* Want to recruit more contributors
on the functional side there is a lot to be done and many unknowns
@Becky Cribdon I've added to your PR for the structure change the new columns to the TSV, and also the JSON schema to check against.
However we need someone to go back and update those two columns for the already added papers. Is anyone willing to volunteer?
@Anneke ter Schure as you are only occasionally here (so I guess maybe busy) , maybe you would be willing to do a small review for Becky: https://github.com/SPAAM-workshop/AncientMetagenomeDir/pull/268/files only 7 samples form the same site!
I'll check it out this afternoon!
Thoughts https://github.com/SPAAM-workshop/AncientMetagenomeDir/pull/268#issuecomment-688640430?
We can add this to Becky's restructuring PR if you guys agree
instead of site_type
I would try to get it in one of the envO categories, i.e:
• envbiome: Descriptor of the broad ecological context of a sample.
• envfeature: Compared to biome, feature is a descriptor of a geographic aspect or a physical entity that strongly influences the more local environment of a sample
• env_material: Descriptor of the material that was displaced by the sample, or material in which the sample was embedded, prior to the sampling event.
With those you might be able to describe where the sample comes from. You can use any of the envO terms to define them and not overload the terms. If you remember our mail with Pier:
```Thus, for annotating a sample of material that a microbiome was extracted from, I would:
* Identify the broad scale environment (usually biome-level class) to characterise where the site was (tundra, desert, etc), being as specific as possible * Identify one or more local scale entities. These can include the site itself (burial site, midden, etc) as well as any tools or other objects (we can add those as needed) ** Identify the material or materials that were inputs to the biomass/DNA extraction process from the site (soil, sediment, scrapings from tools, etc).```
this would imply have extra columns to fit broad scale, local scale and medium scale terms
I don't want to do go that far though because it leads to swamping of the table. I think the second would be sufficient for here though.
in this case would be cave
and material sediment
@Katerina Guschanski has joined the channel
OK while I was waiting for some code to run I updated @Becky Cribdon’s PR with teh new fields.
We have the new columns:
I've added the corresponding features to the feature
column (as that was easy to infer from the material), but the other two are still identified as 'unknown', so I think we can decide whether we merge this now (after one more review, maybe from @Pete Heintzman to check phrasing of categories are OK), and then retroactively update the depth/core IDs; OR should we wait for those columns to be filled before we merge?
ah and @Anneke ter Schure just approved the Braadbaart paper just as Iwrote that message, awesome! You can merge now @Becky Cribdon! https://github.com/SPAAM-workshop/AncientMetagenomeDir/pull/268
re. Braadbaart2020: all looks good. Now merged with master.
re. environment README edit for new sediment fields and inferred dates: made a few minor edits - all looks good. Not sure why the checks are failing.
Checks will fail until the JSON is merged into matter
Master, as the tool checks off that branch because it reads off a github pages webpage
I would suggest that maybe we just merge that now and make an issue for someone to go back and update all the already added entries to add depth and core etc
ok, proceeding with the merge!
Done, and issue made: https://github.com/SPAAM-workshop/AncientMetagenomeDir/issues/284
So if anyone knows any enthusastic students who wanna help out 😅
Sooo @channel feedback time!
The reviewer said:
"A general thought on the environmental metagenomics category (mostclosely aligned with my expertise): a column indicating the taxonomic fo-cus of the published study might be useful, so that there is some indicationof what alignments the authors performed."
What do you think about this?
Is this necessary/useful/worth it?
*Thread Reply:* Maybe it's relevant if there's been some selection or enrichment before generating the data?
*Thread Reply:* But I'm imagining a typical user wanting to find comparable datasets, not really the analysis methods (i.e. what alignments).
*Thread Reply:* But, if the data itself is taxonomically limited, maybe it would be helpful to know that.
*Thread Reply:* Yeah I was also thinking that
*Thread Reply:* like mt data from Slon et al. (not that we specify the data, but they may only upload data along those lines 😏 )
*Thread Reply:* So we have a tentative yes from yuo?
*Thread Reply:* I agree with Becky's suggestion, although it is slightly different from what the reviewer is specifically asking.
*Thread Reply:* tl;dr: I don't think the reviewer's suggestion in necessary, otherwise we would need to update every time someone reanalyzes the data.
*Thread Reply:* In my specification I have stated 'the original purpose' of the data.
*Thread Reply:* Bceause I agree that would come complicated, but assuming it's the same DNA extraction that I think is the main influence you would have downstream
*Thread Reply:* OK - that works
*Thread Reply:* I guess it makes sense, for example people looking for just hominin DNA might do some funky stuff in the future to try and enrich for that
*Thread Reply:* Or if someone did some fine-filtering of the sediment to remove certain things (that might bias results)
*Thread Reply:* There are some seriously deep rabbit holes if one starts going into differences in wet lab methods used too deeply. As a compromise, I think enrichment v. not of libraries is the most important.
*Thread Reply:* Yeah sorry, I actually meant that by just saying 'they looked at animals or bacteria' that you can ignore that whole thing of lab-methods completely.
By using eukaryotic vs microbial, I would hope and assume people who are interested in animal DNA will do similar things (and would only download that data), and people who are only interested in microbial DNA would do similar things and look for that data...
@Anneke ter Schure I know you weren't involved in the manuscript directly, but do you have any opinions on this? Would you find it useful do you think? What about you @Antonio Fernandez-Guerra?
Maybe something like:
study_type: eukaryotic
, microbial
, or faunal
, floral
, faunal_flora
, bacterial
, virus
?
I think this would have to be a 'custom' content (rather than based on an ontology).
*Thread Reply:* Perhaps use NCBI taxonIDs as these will offer the most flexibility (from single species to phyla)?
*Thread Reply:* I looked at that actually! And while it was a good idea at first, I observed just now that people filtered in lots of weird an wonderful ways... which would result in very long lists. So far all the papers could be split into: faunal, floral, or microbial (or some combination of the three), in my opinio nat least.
Does any of you know what a 'core drive' exactly is?
*Thread Reply:* A core drive is a section of a sediment core. Some coring methods recover one drive (e.g. 1 m) of a core at a time from the same hole. Multiple holes are cored so that there are overlapping drives to reconstruct a composite core record (as the ends of drives tend to be disturbed).
The details in your example: 1B-1B_23-25 1 = coring site on the lake (in this case, centre) B = hole code 1 = core drive/section B = core section is cut in half longitudinally. Half B was sampled for DNA 23-25 = depth in cm into the drive that DNA sample was derived from
*Thread Reply:* Ahh cool thank you
@Becky Cribdon @Pete Heintzman I've just pinged you for a review on Github. I've added all the depth, sequencename and also a studytype column (as I didn't see any harm in it) to make the reviewer happy. I don't think you need to go through massively in-depth for the review, but if you could check a couple of papers that the depth/sequence_name looks good that would be great.
Also feel free to change the study type column.
(@Pete Heintzman maybe you can update Graham et al for the sequence_name, based on drive thing you just explained for me ;))
*Thread Reply:* np
@James Fellows Yates: For the reviewer response, if you want an example of how sedaDNA data has already reused (shameless plug): Wang et al. 2017 looked for woody plants by reanalyzing the Graham et al. 2016 data, which was originally generated to look for woolly mammoth.
I see, so you can say we would rather not specify that category because people will re-use for their own purposes?
*Thread Reply:* Personally, I would rather not specify "study type" if it hasn't affected the raw data the 'Dir is pointing to.
How about changing the name from "study type" to something like "enrichment", to mark whether the sample has been manipulated to change the taxonomic profile? So, if they enriched for mammal DNA it would be "faunal"?
But, for example, Gaffney 2020 was a floral study, but the raw data contains everything. We just picked out the plants for further analysis. So I wouldn't want to specify a "study type" for that: that could suggest the data is more limited than it is.
*Thread Reply:* Yeah good point
*Thread Reply:* I've not looked specifically if any filtering was done though
*Thread Reply:* So we need a new title 🤔 .
*Thread Reply:* taxonomic_category?
*Thread Reply:* originaltaxonomicpurpose?
*Thread Reply:* Could you expand on "I've not looked specifically if any filtering was done"? What do you mean by filtering here?
*Thread Reply:* As far as I understood the reviewer:
" A general thought on the environmental metagenomics category (most closely aligned with my expertise): a column indicating the taxonomic focus of the published study might be useful, so that there is some indication of what alignments the authors performed. "
he wants to find comparative datasets e.g. if he is interested in looking at e.g. floral eDNA from different sites at the same time period
*Thread Reply:* (I say he as I'm pretty sure it's Mike Bunce who reviewed 😉 )
*Thread Reply:* But he wants to comapre results of the original paper
*Thread Reply:* I am assuming that this implies that he would assume there were upstream 'decisions' that may influence extraction strategy for example.
*Thread Reply:* So he wants to make sure these are approximately like-by-like.
what I meant in my message above, is that the data I have input into the PR under the current column study_type
, I've just done it based on what the originally publication was focusing on
*Thread Reply:* I have no actually looked if there was any enrichment
*Thread Reply:* (sorry if this still isn't coming off as clear 🤦
*Thread Reply:* What about just study_taxonomic_focus
?
*Thread Reply:* Sorry, I'm finding this really complicated! But thanks for clarifying your statement.
So, for the reviewer, he would be searching the 'Dir for floral data and wants a column that would let him exclude a study that enriched for mammal DNA, because there's unlikely to be any floral DNA for him to use? I agree that would be useful.
But I think "taxonomic focus" or similar are still too broad: having a taxonomic focus doesn't necessarily mean the data is taxonomically limited. His assumption that upstream influences downstream wouldn't always hold, and could be misleading.
If I wanted mammal DNA and Gaffney 2020 had a "focus" of plants, would that suggest I exclude it? Despite it actually having raw data for everything?
Is there a title that implies actual limitation only, so studies like Gaffney 2020 could be "none" or "NA"?
*Thread Reply:* No don't worry. I'm also finding it complicated. To be clear: I am assuming that he may wonder if there is enrichment has happened, but I don't know for sure. I think I may have confused both you and @Pete Heintzman about mentioning enrichment 🤦 . We shouldn't focus on that (I was reading about TARA oceans the other day and they did particle filtering so maybe that's why I brought it up).
Looking through the comments, what he might actually be thinking about is 'meta-analysis' (which he says is something important that we are missing from the 'benefits' of the project). So he's sort of looking for some basically annotated bibligraphic info, maybe?
In which case it's actually a bit detached from the purpose of the project but I understand him from that point of view.
*Thread Reply:* So to take your example above:
> So, for the reviewer, he would be searching the 'Dir for floral data and wants a column that would let him exclude a study that enriched for mammal DNA, because there's unlikely to be any floral DNA for him to use? I agree that would be useful. Rather than enrichment, maybe what he means is that he's look for other studies that looked at flora. So he can then take their already publisehd 'OTU tables' (so to say), to compare with his own?
*Thread Reply:* Riiight, so instead of a flag for limitations, this column would be like a flag for particular detail?
*Thread Reply:* yeah sort of. What was the original purpose for the generation of that particular data
*Thread Reply:* @James Fellows Yates’s last comment is how I interpret the reviewer’s comment. However, having a separate column for shotgun/enrichment would also be very valuable to inform re-analysis of the data sets. (starting a new topic to continue this)
Instead of study_type
what about original_study_focus
? or study_primary_focus
or something like that?
primary
I guess would be good in the sense it implies you can do aother stuff
*Thread Reply:* Yes, I prefer "primary". Thanks for clarifying this above.
*Thread Reply:* Ok, will update
OK @Pete Heintzman @Becky Cribdon (a.k.a. team-dirt ;)) the PR should be ready for your review now 🙂
*Thread Reply:* On it :)
@Pete Heintzman Becky has approved so if you're happy I'll merge
Will get to this later this afternoon -- other ms fires to put out first...
OK - created a pull request to update some other details (sequence, depth, ages, etc) — an easy review for @Becky Cribdon.
*Thread Reply:* By the way, the sequence data for the new samples, that are not in Ahmed2018, do not seem to have been made available.
*Thread Reply:* Looks legit 🙂 Approved and merged.
re. point 16 of the reviewer response letter, I have added: An exception to this approach is for environmental samples, many of which are not directly dated (ages are inferred from a sedimentary sequence, such as an age-depth model for a lake sediment core) or are dated using alternative methods (OSL, tephras, U-series, etc). For these reasons, we make an exception and use calibrated ages for these samples (which we have stated in the README for environmental samples).
Ok! I might standardise the response a bit, but it'll make him happy I guess.
Are all age-depth models calibrated?
By convention, yes. As one cannot infer a radiocarbon age, as this is a direct measurement.
re. point 24: How about:
“We thank the reviewer for raising this point. We have now added a column for environmental samples called ‘studyprimaryfocus’, which gives a broad overview of which group the data were originally targeting (faunal, floral, microbial). We only include the focus of the original study and a broad grouping because: (1) as reference databases grow and are refined, future analyses targeting the same groups will yield different results (hopefully with improved precision), and (2) it is beyond the scope of AncientMetagenomeDir to keep track of all re-analyses of each data set. We note that this second point has already occurred (e.g. Wang et al. 2017 looked for woody plants by reanalyzing the Graham et al. 2016 data, which was originally generated to look for woolly mammoth).
The reviewer’s point also inspired us to create a new category of datatype to differentiate between shotgun and target enriched data, as this information would guide which samples to include in a reanalysis. The new types are ‘shotgunmetagenomic’, ‘targetenrichedhuman’, and ‘targetenrichedmammal’, but these will be expanded as new bait panels are developed and applied.”
Note that this latter point will also apply to other metagenomic categories
Wil ladd that (thanks!). Point two I'm not so sure about though as when we are talking about enrichment, this is library level information not sampl elevel
And that's already something else I plan add in a library-level extension of 'the Dir', if @Katherine Eaton is able to semi-automate the process of pulling that data for us. So I think I'll leave that out for now.
Y/N? https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7643826/
*Thread Reply:* Y. For the bottom sediment samples only.
@Becky Cribdon did a review of Sarkissian. That is indeed a wierd one, but i've simplified it a bit (see the comment on the PR - should just be a copy-and-paste job)
@Pete Heintzman https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7887274/ one for 'the Dir?
OK — all added as issues. Note that there will be a deluge of sedaDNA samples coming out this year!
@James Fellows Yates All of these ‘issues’ are small, so will make good training exercises for the uninitiated.
Busy busy! Ok, I might leave it for the June release then
Need one approval by someone here @channel!
https://github.com/SPAAM-community/AncientMetagenomeDir/pull/377
@Pete Heintzman I've gone through all three. The Liang and Schulte I have a couple of comments, but otherwise thanks for the speedy work 👍
@James Fellows Yates re. Liang: went with 'Sea coast', as not technically the shore. If OK, then I have already updated this in the Enum and Liang PR.
*Thread Reply:* Go ahead and merge both!
*Thread Reply:* Still needs a 'review' 😉
*Thread Reply:* Oh pants. One sec
re. Schulte: yeah, I think at some point we will need to re-visit the target enrichment rule. This is probably going to become a more common feature of sedaDNA metagenomics studies.
*Thread Reply:* Hmm ok. We can keep an eye on it then. I guess we do have precedence with the pathogen stuff, but that stuff is often more clear cut because each paper is basically referring to a single taxon, but it might be more variable with sedaDNA...
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8208683/ one to include?
I think they were looking at the living microbial component in the permafrost rather than dead 12 kyr-old aDNA, but it is unclear.
Suggest not to include.
Other than PIA (from @Becky Cribdon) what tools exist for seda/eDNA dedicatd to ancient samples?
or designed for tackling issues that are particularly pertinent to ancient DNA?
There are not many specialist tools, although Benjamin Vernot's Kallisto would be another
And PathPhynder by Bianca de Sanctis and co.
Kallisto isn't aDNA focused though right?
That's for rna-seq or transciptomics and canabilised for their purposes iirc
Haven't deep-dived into the version Vernot uses, so not sure if it was modified/optimized for sedaDNA or is just the rna-seq version
@channel y/n? https://www.ncbi.nlm.nih.gov/labs/pmc/articles/PMC8257590/
Another one? This one is more borderline though
*Thread Reply:* Would say yes, although I see that all raw reads are combined into a single accession/fastq. Is there a policy on this? @James Fellows Yates
*Thread Reply:* The read IDs should allow you separate i think...?
Or is it multiple samples merged into one?
*Thread Reply:* Multiple samples merged into one FASTQ... as far as I can tell.
*Thread Reply:* That's really shitty
Hi @Barbara I just prepared your Lake Chalco data for AncientMetagenomeDir, and I noticed that in the uploaded metadata to the SRA that you sequenced on NextSeqs.
But in your publication you say you sequenced on TruSeq nano and on Nextera XT platforms? But these are library prep kits not the sequencing machines...
So I just wanted to confirm when we start generating library metadata, that it is indeed NextSeqs you sequenced on for everything
(if you wanted to check the metadata anyway, the PR is here: https://github.com/SPAAM-community/AncientMetagenomeDir/pull/431, you can see under 'files changed')
OMG... ley me check... For sure thete was not nextera. Must be a problem when I upload tge data
No worries! The data on the ENA looks correct
it's just in the methods section
(and figu2).
If the metadata on the SRA is correct its fine, that's where people who need to know would get their data from 😉
Ok let me check very well. I am teaching now, but when I finish I will check. Thank you
Yes, I read that Vernot in his paper adapted Kallisto to seda DNA and works very well. :)
@Barbara @James Fellows Yates Am reviewing Moguel2021, but need some clarifications: • Samples S1 and S2 look to be modern top sediments, and so should not be included in the 'dir? • Samples 501 and 502 are <6000 years in the pub. but 3000 yrs in the 'dir. Where did the 3 ka ages come from? I note that these samples are from different depths, so may not be the same age? • There is a discrepancy on ENA: SRS4426892 and SRS4426891 are either sample 501 or S2, depending on whether one looks at the experiment or sample alias. Which is which?
*Thread Reply:* I thought we said for sequence cores we kept also modern?
I took the dates ftom the mid point of one of the schematic diagrams
*Thread Reply:* For the last one of if mismatch is ENA then that's up to Barbara
*Thread Reply:* Thanks for the quick clarification, James!
*Thread Reply:* Please check I got the dates right though...
*Thread Reply:* Or you make the same rule-of-thumb interpretation
hello, yes you right the S1 and S2 ares superficial samples
and the 501 and 502 there is a problem because sediments are mixed because of that we decide to just put more than 5,000 years! It is so hard to obtain a good dating
ok, thanks Barbara! Then 3,000 yr as a midpoint for those two samples is reasonable.
Please tell me if I can collaborate in some point!! I am bit lost now but I would like to help 😊
Oh absolutely! If you see #ancientmetagenomedir you can see the most recent instructions :) we have some other papers to be added, and if you know if any other ancient shotgun metagenomics papers we are missing please add it to github as an issue!
And we have a metadata-thon doodle for a one day event to get all the library level metadata
@Barbara, re. pull request for Moguel2021: Did you get a chance to look at the discrepancy on ENA? SRS4426892 and SRS4426891 are either sample 501 or S2, depending on whether one looks at the experiment or sample alias. Which one is which?
Yes, I am see what do you mean, the sample name is wrong in the ENA: the corresponding sample name for SRS4426892 is 50_1 and for SRS4426891 is S2.
do you know how I can to correct the data en ENA? I am trying to find how to do that
I think you have to contact the helpdesk
In case anyone was thinking of adding the new Wang et al. 2021 dataset — the uploaded data are the aligned reads only and not raw data. So not one for the AncientMetagenomeDir. https://www.nature.com/articles/s41586-021-04016-x
@Lennart Schreiber has joined the channel
@James Fellows Yates in the readme file for the env dir you mention that library_concentration is only applicable for the host-associated metagenome list only, but the column is still in the env lib. I have a publication that reports the concentrations shall i include them or do you want to remove the column entirely from the env list?
Would you ever use that in your work?
@Pete Heintzman @Barbara? Do you guys use library concentration information for aynthing?
We use it in microbiome stuff for tools like decontam
which allow you to remove likely lab-contaminants in taxonomic profiles
i dont think so, and the concentrations are usually in any case not reported
Ok then yes. If no-one else has added the information to the table already at least.
We would need to remove it from the table itself, and also the json schema
We are going to release soon ancient microbial environmental studies where we use this info
OK then @Anan Ibrahim there is your answer ☝️
*Thread Reply:* Then we should note it down for the MIxS checklist
*Thread Reply:* Please post it in <#C01BX7EM4EL|metadata-standards> so we don't forget!
I generally would think it's useful info to have
But leave it up to each field to decide
hello!! I didn´t use the library concentration!! just the sample DNA concentration.
New addition for the ’dir, although the raw data do not seem to be available yet (may change in the coming days): https://www.ncbi.nlm.nih.gov/bioproject/799375 https://www.sciencedirect.com/science/article/pii/S0277379122000191
Lets keep an eye on it and make an issue whe nthe data is available
I'm still getting through the library backlog
Now added the above publ. as a ’dir issue. Any takers? https://github.com/SPAAM-community/AncientMetagenomeDir/issues/838
I'm hoping that we get through the remaining library metadata stuff we can add all the new sample stuff as well 😉
@Nikolay Oskolkov has joined the channel
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9132974/
Looks legit — one to include, I reckon.
@Biancamaria Bonucci has joined the channel
New addition for the ’dir (Github issue already added): Courtin et al. Pleistocene glacial and interglacial ecosystems inferred from ancient DNA analyses of permafrost sediments from Batagay megaslump, East Siberia. https://onlinelibrary.wiley.com/doi/full/10.1002/edn3.336
@Kadir Toykan Özdoğan has joined the channel
@Kadir Toykan Özdoğan thanks again for all the PRs, unfortunately I have a deadline today and am holiday next week, but I promise first thing when I get back is to review your PRs!
*Thread Reply:* No worries!
@channel can anyone explain this paper to me:
*Thread Reply:* I quickly had a look at it. I think what they are saying is 'the off-target' DNA from the captured libraries are similar to shotgun ones in terms of microbes.
*Thread Reply:* but need to read it fully, I might have understood wrong
*Thread Reply:* Where do they mention the off-target?
*Thread Reply:* Abstract: 'These data are consistent among study sites and between replicates processed with different methodologies (shotgun sequencing and targeted capture), which highlights that the “off-target” fraction of metagenomic data used to study macro-ecosystems can also be used to investigate synchronous changes in microbial communities.'
4.1 Structural and functional shifts in microbial communities '...three different sequencing targets—shotgun sequencing, PalaeoChip Arctic v1.0, and a Bovidae specific bait-set (Murchie, Monteath, et al. (2021); Murchie, Kuch, et al. (2021); Murchie et al. (2022))—all biological replicates cluster together (Figure 2; Figures S2 and S8).'
*Thread Reply:* Jesus fucking Christ the methods are so bad then?!?
*Thread Reply:* Well... @Kadir Toykan Özdoğan willing to make an issue on the Dir 😅
*Thread Reply:* haha, sure 😃
They analyse microbes but from mtDNA/euk marker enriched libraries 😅
@Barsha Tripathi has joined the channel
@Nikolaos Psonis (Nikos) has joined the channel