Home Software Services About Contact     

closed_ref command

See also
  Closed-reference OTU algorithm
  Download QIIME-compatible Greegenes 97% OTU database
  Problems with closed- and open-reference OTU assignment

The closed_ref command performs closed-reference OTU assignment using a similar strategy to pick_closed_reference_otus.py in QIIME.

You can download the default database here.

I am providing this command because a few users have asked for it, but I do not recommend closed- or open- reference OTU assignment because my tests show that closed- and open-reference methods have fundamental flaws.

The main use of the closed_ref command is to generate OTUs that are compatible with analyses that require closed-reference, in particular PICRUSt. Given the problems with closed-reference, and the difficulty in predicting traits by comparing short 16S reads to sparse reference databases (see discussion in the SINAPS algorithm page), I am skeptical that reliable predictions are possible with the PICRUSt approach.

The closed_ref command can be used as an alternative to the QIIME pick_closed_reference_otus.py script, but the results will not be exactly the same. The database search method used in QIIME (at least as of v1.9) is the old uclust program (the predecessor of usearch) with default parameters which were designed primarily to maximize speed, while the closed_ref command in usearch uses an improved implementation of the USEARCH algorithm with settings designed to increase sensitivity and report ties where two or more reference sequences have the same identity.

The -strand option is required; it can be set to -strand plus (search only on the plus strand) or -strand both (search both strands). You can use -strand plus if you know the reads are on the same strand as the database (makes the search a bit faster).

Minimum sequence identity is specified by the -id option, default is 0.97. Value is between 0.0 and 1.0, so 0.97 corresponds to 97% identity.

Standard OTU table output files are supported.

The -tabbedout file reports one line per query sequence. Fields are query label, OTU label, identity and a list of ties if more than one OTU has the same identity. 


usearch -closed_ref reads.fastq -db gg97.fa -otutabout otutab.txt -strand plus -tabbedout closed.txt