Publications

R.C. Edgar (2018), Taxonomy annotation and guide tree errors in 16S rRNA databases, PeerJ 6:e5030
  • Approx. one in five SILVA and Greengenes taxonomy annotations are wrong
  • SILVA and Greengenes trees have pervasive conflicts with type strain taxonomies

R.C. Edgar (2018), Accuracy of taxonomy prediction for 16S rRNA and fungal ITS sequences, PeerJ 6:e4652
  • Cross-validation by identity, novel benchmark strategy enabling realistic accuracy estimates
  • Genus accuracy of best methods is 50% on V4 sequences
  • Recent algorithms do not improve on RDP Classifier or SINTAX

R.C. Edgar and H. Flyvbjerg (2018), Octave plots for visualizing diversity of microbial OTUs, https://doi.org/10.1101/389833
  • Octave plots visualize alpha diversity as a histogram
  • Plots show shape and completeness of distribution

R.C. Edgar (2018), UNCROSS2: identification of cross-talk in 16S rRNA OTU tables, https://doi.org/10.1101/400762
  • Cross-talk rate is approx. 1% in many Illumina datasets
  • Cross-talk can cause false positive core microbiome
  • UNCROSS2 algorithm for filtering cross-talk

R.C. Edgar (2017), Accuracy of microbial community diversity estimated by closed- and open-reference OTUs, PeerJ 5:e3889
  • QIIME closed- and open-reference clustering generates huge numbers of spurious OTUs
  • Closed-reference OTU assignment splits strains and species even when no sequence errors
  • Closed-reference fails to assign different hyper-variable regions to the same OTU
  • Closed-reference discards many well-known species that are present in Greengenes

R.C. Edgar (2017), SEARCH_16S: A new algorithm for identifying 16S ribosomal RNA genes in contigs and chromosomes, https://doi.org/10.1101/124131

R.C. Edgar (2017), SINAPS: Prediction of microbial traits from marker gene sequences, https://doi.org/10.1101/124156

R.C. Edgar (2017), "UNBIAS: An attempt to correct abundance bias in 16S sequencing, with limited success", https://doi.org/10.1101/124149
  • Read abundance has very low correlation with species abundance
  • Bias caused by gene copy count variation and primer mismatches
  • Gene copy count and primer mismatches cannot be accurately predicted
  • Impossible to correct abundance bias

R.C. Edgar (2017), Updating the 97% identity threshold for 16S ribosomal RNA OTUs, Bioinformatics 34(14) 2371-2375
  • Standard 97% OTU identity threshold is too low
  • Optimal OTU threshold is 99% for full-length 16S, 100% for V4

R.C. Edgar (2016), UNCROSS: Filtering of high-frequency cross-talk in 16S amplicon reads, https://doi.org/10.1101/088666
  • Cross-talk is common, many are reads assigned to wrong sample
  • UNCROSS algorithm for filtering cross-talk

R.C. Edgar (2016), UNOISE2: improved error-correction for Illumina 16S and ITS amplicon sequencing, https://doi.org/10.1101/081257
  • UNOISE2 algorithm, improved denoiser
  • Reduces false-positive chimeras compared to UNOISE and DADA2

R.C. Edgar (2016), UCHIME2: improved chimera prediction for amplicon sequencing, https://doi.org/10.1101/074252
  • UCHIME2 algorithm, improved chimera detection
  • "Fake" chimeras are common, valid biological sequences matching two-parent model
  • Perfect chimera filtering impossible even with complete and correct reference
  • Realistic chimera benchmark

R.C. Edgar (2016), SINTAX: a simple non-Bayesian taxonomy classifier for 16S and ITS sequences, https://doi.org/10.1101/074161
  • SINTAX taxonomy prediction algorithm
  • Fast and simple method, accuracy comparable to RDP Classifier

R.C. Edgar and H. Flyvbjerg (2015), "Error filtering, pair assembly and error correction for next-generation sequencing reads", Bioinformatics 31(21) 3476-3482
  • Quality filtering by expected errors
  • Bayesian paired read assembler
  • Most paired read assemblers calculate incorrect Q scores
  • UNOISE algorithm, first denoiser for Illumina reads

R.C. Edgar et al. (2014), UCHIME improves sensitivity and speed of chimera detection, Bioinformatics 27(16) 2194-2200
  • Shows UCHIME faster and more accurate than ChimeraSlayer
  • This paper report misleading benchmark tests, see critique in UCHIME2 paper

R.C. Edgar (2013), UPARSE: highly accurate OTU sequences from microbial amplicon reads, "Nat. Meth. 10, 996-998"
  • Describes UPARSE algorithm for 97% OTU clustering
  • Stringent error filtering and discarding singletons necessary
  • Highly accurate OTUs from paired OTUs without full overlap

R.C. Edgar (2010), Search and clustering orders of magnitude faster than BLAST, Bioinformatics 26(19) 2460-2461
  • USEARCH algorithm
  • Default citation for USEARCH software