Video talks on 16S data analysis posted.

URMAP ultra-fast read mapper posted (paper).

~20% of taxonomy annotations in SILVA and Greengenes are wrong (paper).

Taxonomy prediction is <50% accurate for 16S V4 sequences (paper).

97% OTU threshold is wrong for species, should be 99% for full-length 16S, 100% V4 (paper).

OTU QC: sequences on both strands

If reads are created from both strands of the gene, then you will tend to get duplicated OTUs where one is the reverse-complement of the other.

To check for reads or OTU sequences on both strands, use the orient command with -tabbedout orient.txt. Any reference database will do for a quick check, though a large reference database is recommended for orienting the reads in a production pipeline. To get the number of sequences on each strand, use the following Linux command:

cut -f2 orient.txt | sort | uniq -c

All your OTUs should be on the same strand. If not, you need to adjust the pipeline to perform orientation before dereplication (the fastx_uniques step). You could also use:

usearch -cluster_fast otus.fa -id 0.97 -strand both \
-userout user.txt -userfields query+target+qstrand