Taxonomic annotation of observed sequences
Contents
Taxonomic annotation of observed sequences¶
In this chapter we’ll perform annotation of the features that were observed in this study by performing taxonomic classification of the sequences. This allows us to assess what organisms are represented by the ASV sequences that we observed in this study.
Taxonomy assignment¶
We’ll start with taxonomic classification. In this step we’ll use a pre-trained Naive Bayes taxonomic classifier. This particular classifier was trained on the Greengenes 13-8 database, where sequences were trimmed to represent only the region between the 515F / 806R primers.
You can download pre-trained classifiers from the QIIME 2 documentation Data Resources page.
First, obtain the taxonomic classifier.
url = 'https://qiime2-workshops.s3.us-west-2.amazonaws.com/faes-jan2022/data/030-tutorial-downstream/020-taxonomy/gg-13-8-99-515-806-nb-classifier.qza'
fn = 'gg-13-8-99-515-806-nb-classifier.qza'
request.urlretrieve(url, fn)
gg_13_8_99_515_806_nb_classifier = Artifact.load(fn)
wget \
  -O 'gg-13-8-99-515-806-nb-classifier.qza' \
  'https://qiime2-workshops.s3.us-west-2.amazonaws.com/faes-jan2022/data/030-tutorial-downstream/020-taxonomy/gg-13-8-99-515-806-nb-classifier.qza'
def classifier_factory():
    from urllib import request
    from qiime2 import Artifact
    fp, _ = request.urlretrieve(
        'https://data.qiime2.org/2021.11/common/gg-13-8-99-515-806-nb-classifier.qza',
    )
    return Artifact.load(fp)
classifier = use.init_artifact('gg-13-8-99-515-806-nb-classifier', classifier_factory)
- Using the 
Upload Datatool: On the first tab (Regular), press the
Paste/Fetchdata button at the bottom.Set “Name” (first text-field) to:
gg-13-8-99-515-806-nb-classifier.qzaIn the larger text-area, copy-and-paste: https://qiime2-workshops.s3.us-west-2.amazonaws.com/faes-jan2022/data/030-tutorial-downstream/020-taxonomy/gg-13-8-99-515-806-nb-classifier.qza
(“Type”, “Genome”, and “Settings” can be ignored)
Press the
Startbutton at the bottom.
Next, use that classifier to assign taxonomic information to the ASV sequences.
import qiime2.plugins.feature_classifier.actions as feature_classifier_actions
taxonomy, = feature_classifier_actions.classify_sklearn(
    classifier=gg_13_8_99_515_806_nb_classifier,
    reads=filtered_sequences_1,
)
qiime feature-classifier classify-sklearn \
  --i-classifier gg-13-8-99-515-806-nb-classifier.qza \
  --i-reads filtered-sequences-1.qza \
  --o-classification taxonomy.qza
taxonomy, = use.action(
    use.UsageAction(plugin_id='feature_classifier', action_id='classify_sklearn'),
    use.UsageInputs(classifier=classifier, reads=filtered_sequences_1),
    use.UsageOutputNames(classification='taxonomy'),
)
- Using the 
qiime2 feature-classifier classify-sklearntool: Set “reads” to
#: filtered-sequences-1.qzaSet “classifier” to
#: gg-13-8-99-515-806-nb-classifier.qzaPress the
Executebutton.
- Once completed, for the new entry in your history, use the 
Editbutton to set the name as follows: (Renaming is optional, but it will make any subsequent steps easier to complete.)
History Name
“Name” to set (be sure to press
Save)#: qiime2 feature-classifier classify-sklearn [...] : classification.qzataxonomy.qza
Finally, generate a human-readable summary of the taxonomic annotations.
taxonomy_as_md_md = taxonomy.view(Metadata)
taxonomy_viz, = metadata_actions.tabulate(
    input=taxonomy_as_md_md,
)
qiime metadata tabulate \
  --m-input-file taxonomy.qza \
  --o-visualization taxonomy.qzv
taxonomy_as_md = use.view_as_metadata('taxonomy_as_md', taxonomy)
use.action(
    use.UsageAction(plugin_id='metadata', action_id='tabulate'),
    use.UsageInputs(input=taxonomy_as_md),
    use.UsageOutputNames(visualization='taxonomy'),
)
- Using the 
qiime2 metadata tabulatetool: For “input”:
Perform the following steps.
Change to
Metadata from ArtifactSet “Metadata Source” to
taxonomy.qza
Press the
Executebutton.
- Once completed, for the new entry in your history, use the 
Editbutton to set the name as follows: (Renaming is optional, but it will make any subsequent steps easier to complete.)
History Name
“Name” to set (be sure to press
Save)#: qiime2 metadata tabulate [...] : visualization.qzvtaxonomy.qzv
Filtering filters based on their taxonomy¶
Taxonomic annotations provide useful information that can also be used in quality filtering of our data. A common step in 16S analysis is to remove sequences from an analysis that aren’t assigned to a phylum. In a human microbiome study such as this, these may for example represent reads of human genome sequence that were unintentionally sequences.
This filtering can be applied as follows by providing the feature table and the
taxonomic annotations that were just created. The include parameter here
specifies that an annotation must contain the text p__, which in the
Greengenes taxonomy is the prefix for all phylum-level taxonomy assignments.
Taxonomic labels that don’t contain p__ therefore were maximally assigned to
the domain (i.e., kingdom) level. This will also remove features that are
annotated with p__; (which means that no named phylum was assigned to the
feature), as well as annotations containing Chloroplast or Mitochondria
(i.e., organelle 16S sequences).
import qiime2.plugins.taxa.actions as taxa_actions
filtered_table_3, = taxa_actions.filter_table(
    table=filtered_table_2,
    taxonomy=taxonomy,
    mode='contains',
    include='p__',
    exclude='p__;,Chloroplast,Mitochondria',
)
qiime taxa filter-table \
  --i-table filtered-table-2.qza \
  --i-taxonomy taxonomy.qza \
  --p-mode contains \
  --p-include p__ \
  --p-exclude 'p__;,Chloroplast,Mitochondria' \
  --o-filtered-table filtered-table-3.qza
filtered_table_3, = use.action(
    use.UsageAction(plugin_id='taxa', action_id='filter_table'),
    use.UsageInputs(table=filtered_table_2, taxonomy=taxonomy, mode='contains',
                    include='p__', exclude='p__;,Chloroplast,Mitochondria'),
    use.UsageOutputNames(filtered_table='filtered_table_3')
)
- Using the 
qiime2 taxa filter-tabletool: Set “table” to
#: filtered-table-2.qzaSet “taxonomy” to
#: taxonomy.qzaExpand the
additional optionssectionSet “include” to
p__Set “exclude” to
p__;,Chloroplast,MitochondriaLeave “mode” as its default value of
contains
Press the
Executebutton.
- Once completed, for the new entry in your history, use the 
Editbutton to set the name as follows: (Renaming is optional, but it will make any subsequent steps easier to complete.)
History Name
“Name” to set (be sure to press
Save)#: qiime2 taxa filter-table [...] : filtered_table.qzafiltered-table-3.qza
Filtering samples with low sequence counts¶
You may have noticed when looking at feature table summaries earlier that some of the samples contained very few ASV sequences. These often represent samples which didn’t amplify or sequence well, and when we start visualizing our data low numbers of sequences can cause misleading results, because the observed composition of the sample may not be reflective of the sample’s actual composition. For this reason it can be helpful to exclude samples with low ASV sequence counts from our samples. Here, we’ll filter out samples from which we have obtained fewer than 10,000 sequences.
filtered_table_4, = feature_table_actions.filter_samples(
    table=filtered_table_3,
    min_frequency=10000,
)
qiime feature-table filter-samples \
  --i-table filtered-table-3.qza \
  --p-min-frequency 10000 \
  --o-filtered-table filtered-table-4.qza
filtered_table_4, = use.action(
    use.UsageAction(plugin_id='feature_table', action_id='filter_samples'),
    use.UsageInputs(table=filtered_table_3, min_frequency=10000),
    use.UsageOutputNames(filtered_table='filtered_table_4')
    )
- Using the 
qiime2 feature-table filter-samplestool: Set “table” to
#: filtered-table-3.qzaExpand the
additional optionssectionSet “min_frequency” to
10000
Press the
Executebutton.
- Once completed, for the new entry in your history, use the 
Editbutton to set the name as follows: (Renaming is optional, but it will make any subsequent steps easier to complete.)
History Name
“Name” to set (be sure to press
Save)#: qiime2 feature-table filter-samples [...] : filtered_table.qzafiltered-table-4.qza
After filtering ASVs that were not assigned a phylum, and filtering samples with low ASV sequence counts, we can remove ASV sequences that are no longer represented in our table from the collection of ASV sequences by filtering to only the features that are contained in the feature table. This step isn’t required, but can help to speed up some downstream steps.
filtered_sequences_2, = feature_table_actions.filter_seqs(
    data=rep_seqs,
    table=filtered_table_4,
)
qiime feature-table filter-seqs \
  --i-data rep-seqs.qza \
  --i-table filtered-table-4.qza \
  --o-filtered-data filtered-sequences-2.qza
filtered_sequences_2, = use.action(
    use.UsageAction(plugin_id='feature_table', action_id='filter_seqs'),
    use.UsageInputs(data=feature_sequences, table=filtered_table_4),
    use.UsageOutputNames(filtered_data='filtered_sequences_2')
    )
- Using the 
qiime2 feature-table filter-seqstool: Set “data” to
#: rep-seqs.qzaExpand the
additional optionssectionSet “table” to
#: filtered-table-4.qza
Press the
Executebutton.
- Once completed, for the new entry in your history, use the 
Editbutton to set the name as follows: (Renaming is optional, but it will make any subsequent steps easier to complete.)
History Name
“Name” to set (be sure to press
Save)#: qiime2 feature-table filter-seqs [...] : filtered_data.qzafiltered-sequences-2.qza
Exercise 4
Try summarizing the feature table that was created by this round of filtering. Expand this box if you need help.
Solution to Exercise 4
filtered_table_4_summ_exercise_viz, = feature_table_actions.summarize(
    table=filtered_table_4,
    sample_metadata=sample_metadata_md,
)
qiime feature-table summarize \
  --i-table filtered-table-4.qza \
  --m-sample-metadata-file sample-metadata.tsv \
  --o-visualization filtered-table-4-summ-exercise.qzv
use.action(
    use.UsageAction(plugin_id='feature_table', action_id='summarize'),
    use.UsageInputs(table=filtered_table_4, sample_metadata=sample_metadata),
    use.UsageOutputNames(visualization='filtered_table_4_summ_exercise'),
)
- Using the 
qiime2 feature-table summarizetool: Set “table” to
#: filtered-table-4.qzaExpand the
additional optionssectionFor “sample_metadata”:
Press the
+ Insert sample_metadatabutton to set up the next steps.Leave as
Metadata from TSVSet “Metadata Source” to
sample-metadata.tsv
Press the
Executebutton.
- Once completed, for the new entry in your history, use the 
Editbutton to set the name as follows: (Renaming is optional, but it will make any subsequent steps easier to complete.)
History Name
“Name” to set (be sure to press
Save)#: qiime2 feature-table summarize [...] : visualization.qzvfiltered-table-4-summ-exercise.qzv
Generate taxonomic composition barplots¶
We’ll now get one of our first views of our microbiome sample compositions using a taxonomic barplot. This can be generated with the following command.
taxa_bar_plots_1_viz, = taxa_actions.barplot(
    table=filtered_table_4,
    taxonomy=taxonomy,
    metadata=sample_metadata_md,
)
qiime taxa barplot \
  --i-table filtered-table-4.qza \
  --i-taxonomy taxonomy.qza \
  --m-metadata-file sample-metadata.tsv \
  --o-visualization taxa-bar-plots-1.qzv
use.action(
    use.UsageAction(plugin_id='taxa', action_id='barplot'),
    use.UsageInputs(table=filtered_table_4, taxonomy=taxonomy,
                    metadata=sample_metadata),
    use.UsageOutputNames(visualization='taxa_bar_plots_1'),
)
- Using the 
qiime2 taxa barplottool: Set “table” to
#: filtered-table-4.qzaSet “taxonomy” to
#: taxonomy.qzaExpand the
additional optionssectionFor “metadata”:
Press the
+ Insert metadatabutton to set up the next steps.Leave as
Metadata from TSVSet “Metadata Source” to
sample-metadata.tsv
Press the
Executebutton.
- Once completed, for the new entry in your history, use the 
Editbutton to set the name as follows: (Renaming is optional, but it will make any subsequent steps easier to complete.)
History Name
“Name” to set (be sure to press
Save)#: qiime2 taxa barplot [...] : visualization.qzvtaxa-bar-plots-1.qzv