Home Software Services About Contact     
 
USEARCH v11

Publications

R.C. Edgar (2018), Taxonomy annotation and guide tree errors in 16S rRNA databases, PeerJ 6:e5030
  • Approx. one in five SILVA and Greengenes taxonomy annotations are wrong

  • SILVA and Greengenes trees have pervasive conflicts with type strain taxonomies


R.C. Edgar (2018), Accuracy of taxonomy prediction for 16S rRNA and fungal ITS sequences, PeerJ 6:e4652
  • Cross-validation by identity, novel benchmark strategy enabling realistic accuracy estimates

  • Genus accuracy of best methods is 50% on V4 sequences

  • Recent algorithms do not improve on RDP Classifier or SINTAX


R.C. Edgar and H. Flyvbjerg (2018), Octave plots for visualizing diversity of microbial OTUs, https://doi.org/10.1101/389833
  • Octave plots visualize alpha diversity as a histogram

  • Plots show shape and completeness of distribution


R.C. Edgar (2018), UNCROSS2: identification of cross-talk in 16S rRNA OTU tables, https://doi.org/10.1101/400762
  • Cross-talk rate is approx. 1% in many Illumina datasets

  • Cross-talk can cause false positive core microbiome

  • UNCROSS2 algorithm for filtering cross-talk


R.C. Edgar (2017), Accuracy of microbial community diversity estimated by closed- and open-reference OTUs, PeerJ 5:e3889
  • QIIME closed- and open-reference clustering generates huge numbers of spurious OTUs

  • Closed-reference OTU assignment splits strains and species even when no sequence errors

  • Closed-reference fails to assign different hyper-variable regions to the same OTU

  • Closed-reference discards many well-known species that are present in Greengenes


R.C. Edgar (2017), SEARCH_16S: A new algorithm for identifying 16S ribosomal RNA genes in contigs and chromosomes, https://doi.org/10.1101/124131

R.C. Edgar (2017), SINAPS: Prediction of microbial traits from marker gene sequences, https://doi.org/10.1101/124156

R.C. Edgar (2017), "UNBIAS: An attempt to correct abundance bias in 16S sequencing, with limited success", https://doi.org/10.1101/124149
  • Read abundance has very low correlation with species abundance

  • Bias caused by gene copy count variation and primer mismatches

  • Gene copy count and primer mismatches cannot be accurately predicted

  • Impossible to correct abundance bias


R.C. Edgar (2017), Updating the 97% identity threshold for 16S ribosomal RNA OTUs, Bioinformatics 34(14) 2371-2375
  • Standard 97% OTU identity threshold is too low

  • Optimal OTU threshold is 99% for full-length 16S, 100% for V4


R.C. Edgar (2016), UNCROSS: Filtering of high-frequency cross-talk in 16S amplicon reads, https://doi.org/10.1101/088666
  • Cross-talk is common, many are reads assigned to wrong sample

  • UNCROSS algorithm for filtering cross-talk


R.C. Edgar (2016), UNOISE2: improved error-correction for Illumina 16S and ITS amplicon sequencing, https://doi.org/10.1101/081257
  • UNOISE2 algorithm, improved denoiser

  • Reduces false-positive chimeras compared to UNOISE and DADA2


R.C. Edgar (2016), UCHIME2: improved chimera prediction for amplicon sequencing, https://doi.org/10.1101/074252
  • UCHIME2 algorithm, improved chimera detection

  • "Fake" chimeras are common, valid biological sequences matching two-parent model

  • Perfect chimera filtering impossible even with complete and correct reference

  • Realistic chimera benchmark


R.C. Edgar (2016), SINTAX: a simple non-Bayesian taxonomy classifier for 16S and ITS sequences, https://doi.org/10.1101/074161
  • SINTAX taxonomy prediction algorithm

  • Fast and simple method, accuracy comparable to RDP Classifier


R.C. Edgar and H. Flyvbjerg (2015), "Error filtering, pair assembly and error correction for next-generation sequencing reads", Bioinformatics 31(21) 3476-3482
  • Quality filtering by expected errors

  • Bayesian paired read assembler

  • Most paired read assemblers calculate incorrect Q scores

  • UNOISE algorithm, first denoiser for Illumina reads


R.C. Edgar et al. (2014), UCHIME improves sensitivity and speed of chimera detection, Bioinformatics 27(16) 2194-2200
  • Shows UCHIME faster and more accurate than ChimeraSlayer

  • This paper report misleading benchmark tests, see critique in UCHIME2 paper


R.C. Edgar (2013), UPARSE: highly accurate OTU sequences from microbial amplicon reads, "Nat. Meth. 10, 996-998"
  • Describes UPARSE algorithm for 97% OTU clustering

  • Stringent error filtering and discarding singletons necessary

  • Highly accurate OTUs from paired OTUs without full overlap


R.C. Edgar (2010), Search and clustering orders of magnitude faster than BLAST, Bioinformatics 26(19) 2460-2461
  • USEARCH algorithm

  • Default citation for USEARCH software