扩增子进化树的构建以及可视化

前言

我们在获得qiime2结果feature-table.qza和rep-seqs.qza文件后,可以通过如下方法获取:

  • 树文件:rep_seq_k5_cln.tree

  • 物种进化关系文件:rep_seqs_k5.tax

生成准备文件

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
# install software 
conda install -n qiime -c bioconda qiime -y
source activate py27
conda install clustalo -y

# step1: count -> relative abundance
biom normalize-table -i feature-table.biom -r -o feature-table_norm.biom

# step2: filter abundance 0.5%
filter_otus_from_otu_table.py --min_count_fraction 0.005 -i feature-table_norm.biom -o feature-table_norm_k5.biom

# step3: get fasta
#qiime tools export --input-path rep-seqs.qza --output-path rep-seqs
filter_fasta.py -f rep_seq.fa -o rep_seq_k5.fa -b feature-table_norm_k5.biom

# step4: multip alignment
clustalo -i rep_seq_k5.fa -o rep_seq_k5_clus.fa --seqtype=DNA --full --force --threads=30

# step5: fastree build tree file
make_phylogeny.py -i rep_seq_k5_clus.fa -o rep_seq_k5.tree

# step6: delete ' signal for R ggtree
sed "s/\'//g" rep_seq_k5.tree > rep_seq_k5_cln.tree

# step7: get annotation information
grep ">" rep_seq_k5_clus.fa | sed 's/>//g' > rep_seq_k5_clus.id
awk 'BEGIN{OFS="\t";FS="\t"} NR==FNR {a[$1]=$0} NR>FNR {print a[$1]}' taxonomy.tsv rep_seq_k5_clus.id | sed 's/; /\t/g' | cut -f 1-5 | sed 's/;/\t/g' | cut -f 1-5 > rep_seqs_k5.tax

可视化

因为qiime2的ASV名称是复杂编码,所以需要对其进行修改,但最好在得到otu table时候就统一修改

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
library(dplyr)
library(tibble)
library(ggtree)
library(ggplot2)

# load data
tree <- read.tree("rep_seq_k5_cln.tree")
tax <- read.table("rep_seqs_k5.tax", row.names=1)

# curation
colnames(tax) <- c("kingdom","phylum","class","order")
otuid <- data.frame(OTUID=tree$tip.label,
OTU_ID=paste0("OTU_", seq(1:length(tree$tip.label))))
tax <- inner_join(tax %>% rownames_to_column("OTUID"), otuid, by = "OTUID") %>%
column_to_rownames("OTU_ID") %>%
dplyr::select(-OTUID)
tree$tip.label <- paste0("OTU_", seq(1:length(tree$tip.label)))

plot 1

1
2
3
4
5
groupInfo <- split(row.names(tax), tax$phylum) 
tree <- groupOTU(tree, groupInfo)
ggtree(tree, aes(color=group))+
theme(legend.position = "right")+
geom_tiplab(size=3)

plot2

1
2
3
4
ggtree(tree, layout="fan", ladderize = FALSE, size=1.2, 
branch.length = "none", aes(color=group))+
geom_tiplab2(size=4)+
theme(legend.position = "right")

参考

  1. 扩增子进化树分析

参考文章如引起任何侵权问题,可以与我联系,谢谢。


------------- The End Thanks for reading --------