Video talks on 16S data analysis posted.

URMAP ultra-fast read mapper posted (paper).

~20% of taxonomy annotations in SILVA and Greengenes are wrong (paper).

Taxonomy prediction is <50% accurate for 16S V4 sequences (paper).

97% OTU threshold is wrong for species, should be 99% for full-length 16S, 100% V4 (paper).

SINTAX algorithm

The SINTAX algorithm predicts the taxonomy of marker gene reads such as 16S or ITS. It is implemented in the sintax command.

Bootstrap confidence values are provided for all predicted ranks.

The algorithm is similar to the RDP Naive Bayesian Classifier algorithm except that k-mer similarity is used to identify the top taxonomy rather than Bayesian posteriors so there is no need for training.

Unlike the RDP Classifier, SINTAX does not require that the lowest ("training") rank be specified for all reference sequences which allows the use of large databases as a reference. However, I do not recommend using SILVA or Greengenes as a taxonomy reference because these databases have high error rates -- roughly one in five of the taxonomy annotations are wrong.