测序数据生成表达矩阵
质控Fastp
1 2 3 4 5 6 7
| fastp -i sample.raw.r1.fq.gz \ -I sample.raw.r2.fq.gz \ -o sample.clean.r1.fq.gz \ -O sample.clean.r2.fq.gz \ -j sample.QC.json \ -h sample.QC.html \ --adapter_sequence_r2 AAAAAAAAAAA
|
Fastp安装: https://github.com/OpenGene/fastp
1 2
| wget http://opengene.org/fastp/fastp chmod a+x ./fastp
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
| umi_tools whitelist --stdin sample.clean.r1.fq.gz \ --extract-method=regex \ --bc-pattern="(?P<cell_1>.{9})(?P<discard_1>.{12})(?P<cell_2>.{9})(?P<discard_2>.{13})(?P<cell_3>.{9})(?P<umi_1>.{8})(?<plotT>TTTTTTTT){s<=2}.*" \ --expect-cells=10000 \ --plot-prefix=sample \ --log2stderr \ --subset-reads=100000000 \ --knee-method=density \ --allow-threshold-error > sample.whitelist.txt
umi_tools extract --extract-method=regex \ --bc-pattern="(?P<cell_1>.{9})(?P<discard_1>.{12})(?P<cell_2>.{9})(?P<discard_2>.{13})(?P<cell_3>.{9})(?P<umi_1>.{8})(?<plotT>TTTTTTTT){s<=2}.*" \ --stdin sample.clean.r1.fq.gz \ --stdout sample.extracted.r1.fq.gz \ --read2-in sample.clean.r2.fq.gz \ --read2-out=sample.extracted.r2.fq.gz \ --filter-cell-barcode \ --whitelist=sample.whitelist.txt
|
umi_tools安装: python3 -m pip install umi_tools
比对STAR
参考基因组构建索引
1 2 3 4 5 6
| STAR --runMode genomeGenerate \ --genomeDir /opt/star/index \ --genomeFastaFiles GRCh38.p13.genome.fa \ --sjdbGTFfile gencode.v43.annotation.gtf \ --sjdbOverhang 149 \ --runThreadN 10
|
参考基因组和注释文件GTF下载: https://www.gencodegenes.org/human/
- 参考基因组: 选择 Genome sequence (GRCh38.p13) + ALL
- 注释文件GTF: 选择 Comprehensive gene annotation + CHR
比对
1 2 3 4 5 6 7
| STAR --runThreadN 4 \ --genomeDir /opt/star/index \ --readFilesIn sample.extracted.r2.fq.gz \ --readFilesCommand zcat \ --outFilterMultimapNmax 1 \ --outSAMtype BAM SortedByCoordinate \ --outFileNamePrefix sample
|
STAR安装: https://github.com/alexdobin/STAR
1 2 3 4
| wget https://github.com/alexdobin/STAR/archive/2.7.10b.tar.gz tar -xzf 2.7.10b.tar.gz cd STAR-2.7.10b/source make STAR
|
表达定量FeatureCounts
1 2 3 4 5 6 7 8 9 10 11 12 13 14
| featureCounts -T 4 \ -a gencode.v43.annotation.gtf \ -g gene_name \ -o sample \ sampleAligned.sortedByCoord.out.bam
umi_tools count \ --per-gene \ --gene-tag=XT \ --assigned-status-tag=XS \ --per-cell \ --wide-format-cell-counts \ -I sample.sorted.bam \ -S sample.counts.tsv.gz
|
FeatureCounts安装(下载解压即可用): https://sourceforge.net/projects/subread/
官方工作流
软件和文件准备
- 安装Docker:
参考: https://giftbear.github.io/2021/12/01/Linux%E5%AE%89%E8%A3%85Docker/
- 安装CWL:
python3 -m pip install cwlref-runner
- 下载参考基因组和注释文件:
http://bd-rhapsody-public.s3-website-us-east-1.amazonaws.com/Rhapsody-WTA/GRCh38-PhiX-gencodev29/GRCh38-PhiX-gencodev29-20181205.tar.gz
http://bd-rhapsody-public.s3-website-us-east-1.amazonaws.com/Rhapsody-WTA/GRCh38-PhiX-gencodev29/gencodev29-20181205.gtf
- 下载流程cwl和yml文件:
https://bitbucket.org/CRSwDev/cwl/downloads/
运行流程
- 配置yml文件: 设置测序数据,参考基因组和注释文件位置
- 运行:
/opt/software/python/bin/cwl-runner --outdir ./ rhapsody_wta_1.12.1.cwl template_wta_1.12.1.yml
*参考文档: https://www.bdbiosciences.com/content/dam/bdb/marketing-documents/BD_Single_Cell_Genomics_Analysis_Setup_User_Guide_v2.pdf