0
点赞
收藏
分享

微信扫一扫

R语言里做基因组共线性可视化R包~GENESPACE

论文


GENESPACE: syntenic pan-genome annotations for eukaryotes


​​https://www.biorxiv.org/content/10.1101/2022.03.09.483468v1​​

还没有发表

github主页

​​https://github.com/jtlovell/GENESPACE​​

详细介绍

​​https://htmlpreview.github.io/?https://github.com/jtlovell/GENESPACE/blob/master/doc/genespaceOverview.html​​

windows系统还不能用 只能在MacOS或者在Linux系统下使用,我试试在linux下使用

首先安装orthofinder

conda install -c bioconda orthofinder

安装MCScanX

​​https://github.com/wyp1125/MCScanX​​

git clone https://github.com/wyp1125/MCScanX.git
cd MCScanX
make

R语言里做基因组共线性可视化R包~GENESPACE_数据 image.png

这里出现了三个error,但是也出现了三个可执行程序,试了一下可以运行,不知道后面会不会有影响

R语言里做基因组共线性可视化R包~GENESPACE_数据_02 image.png

安装依赖的R包

conda install r-data.table r-dbscan r-R.utils r-devtools
conda install bioconductor-Biostrings bioconductor-rtracklayer

安装GENESPAE

# 启动R radian
devtools::install_github("jtlovell/GENESPACE", upgrade = F)

运行示例数据

library(GENESPACE)
runwd<-file.path("./testGenespace/")
make_exampleDataDir(writeDir = runwd) ## 这一步会下载示例数据

gids<-c("human","chimp","rhesus")
gpar<-init_genespace(genomeIDs = gids,speciesIDs = gids,versionIDs = gids,ploidy = rep(1,3),wd = runwd,gffString = "gff",pepString = "pep",path2orthofinder = "orthofinder",path2mcscanx = "/home/myan/scratch/apps/mingyan/Biotools/MCScanX",path2diamond = "diamond",diamondMode = "fast",orthofinderMethod = "fast",rawGenomeDir = file.path(runwd,"rawGenomes"))

parse_annotations(gsParam = gpar,gffEntryType = "gene",gffIdColumn ="locus",gffStripText = "locus=",headerEntryIndex = 1,headerSep = " ",headerStripText = "locus=")
# 上面这行代码没有看懂是在干啥

gpar<-run_orthofinder(gsParam = gpar)

## 运行这行代码出现警告信息
Warning message:
In system2(gsParam$paths$orthofinderCall, com, stdout = TRUE, stderr = TRUE) :
running command ''orthofinder' -b ./testGenespace//orthofinder -t 4 -a 1 -X -og 2>&1' had status 120 and error message 'Interrupted system call'
## 不知道时候对后续有影响 有可能是 runwd<-file.path("./testGenespace/") 这行代码最后多了一个斜线 重新运行了一遍没有问题了

gpar<-synteny(gsParam = gpar)

## 画图展示

pdf(file="abc.pdf",width = 10,height = 8)
plot_riparianHits(gpar)
dev.off()

R语言里做基因组共线性可视化R包~GENESPACE_github_03 image.png

画图更多的参数

pdf(file="abc.pdf",width = 9.6,height = 4)
plot_riparianHits(gpar, refGenome = "chimp",invertTheseChrs = data.frame(genome = "rhesus", chr = 2),genomeIDs = c("chimp", "human", "rhesus"),labelTheseGenomes = c("chimp", "rhesus"),gapProp = .001,refChrCols = c("#BC4F43", "#F67243"),blackBg = FALSE,returnSourceData = T, verbose = F)
dev.off()

R语言里做基因组共线性可视化R包~GENESPACE_数据分析_04 image.png

还可以自定义感兴趣的区域

regs <- data.frame(genome = c("human", "human", "chimp", "rhesus"),chr = c(3, 3, 4, 5),start = c(0, 50e6, 0, 60e6),end = c(10e6, 70e6, 50e6, 90e6),cols = c("pink", "gold", "cyan", "dodgerblue"))
pdf(file = "abc2.pdf",width = 9.6,height = 4)
plot_riparianHits(gpar, onlyTheseRegions = regs,blackBg = FALSE)
dev.off()

R语言里做基因组共线性可视化R包~GENESPACE_github_05 image.png

构建泛基因组组

pg <- pangenome(gpar)

输出一个文件 ​​results/human_pangenomeDB.txt.gz​

打开这个文件,部分结果如下

R语言里做基因组共线性可视化R包~GENESPACE_数据_06 image.png

这个结果怎么看暂时没看懂

帮助文档里写道

​This is the source data that can be manipulated programatically to extract your regions of interest. Future GENESPACE releases will have auxilary functions that let the user access the pan-genome by rules (e.g. contains these genes, in these regions etc.). For now, we’ll leave this work to scripting by the user.​

接下来就是研究研究如何准备自己的数据

欢迎大家关注我的公众号

小明的数据分析笔记本


小明的数据分析笔记本 公众号 主要分享:1、R语言和python做数据分析和数据可视化的简单小例子;2、园艺植物相关录组学、基因组学、群体遗传学文献阅读笔记;3、生物信息学入门学习资料及自己的学习笔记!


举报

相关推荐

0 条评论