转录组表达定量分析

基于bam

Star比对

构建索引

1
2
3
4
5
6
STAR --runMode genomeGenerate \
--genomeDir /opt/hg38/index \
--genomeFastaFiles /opt/hg38/ref/GRCh38.p14.genome.fa \
--sjdbGTFfile /opt/hg38/gtf/gencode.v44.annotation.gtf \
--sjdbOverhang 149 \ # 测序长度-1
--runThreadN 10

比对

1
2
3
4
5
6
7
8
9
STAR --runMode alignReads \
--quantMode TranscriptomeSAM GeneCounts \
--twopassMode Basic \
--genomeDir /opt/hg38/index \
--readFilesIn /opt/data/sample_R1.fq.gz /opt/data/sample_R2.fq.gz \
--readFilesCommand zcat \
--outSAMtype BAM SortedByCoordinate \
--outFileNamePrefix /opt/star/sample \
--runThreadN 4

featureCounts定量

1
2
3
4
5
6
7
featureCounts \
-a /opt/hg38/gtf/gencode.v44.annotation.gtf \
-g gene_name \
-o /opt/count/sample.count \
-T 4 \
-p \
/opt/star/sampleAligned.sortedByCoord.out.bam

RSEM定量

构建索引

1
2
3
4
5
6
7
rsem-prepare-reference \
--gtf /opt/hg38/gtf/gencode.v44.annotation.gtf \
/opt/hg38/ref/GRCh38.p14.genome.fa \
--star \
--star-path /opt/software/STAR-2.7.11a/bin/Linux_x86_64_static \
/opt/hg38/rsem_index/hg38 \
-p 8

定量

1
2
3
4
5
6
7
rsem-calculate-expression \
--alignments \
--paired-end --no-bam-output --append-names \
-p 8 \
/opt/star/sampleAligned.toTranscriptome.out.bam \
/opt/hg38/rsem_index/hg38 \
/opt/count/sample

基于fastq

Salmon定量

构建索引

1
2
3
4
5
6
7
8
9
cat /opt/hg38/ref/gencode.v44.transcripts.fa /opt/hg38/ref/GRCh38.p14.genome.fa > /opt/hg38/ref/GRCh38.trans.genome.fa
cut -f 1 /opt/hg38/ref/GRCh38.p14.genome.fa.fai > /opt/hg38/ref/GRCh38.decoys.txt

salmon index \
-t /opt/hg38/ref/GRCh38.trans.genome.fa \
-d /opt/hg38/ref/GRCh38.decoys.txt \
-p 12 \
-i /opt/hg38/salmon_index \
--gencode

定量

1
2
3
4
5
6
7
8
9
10
salmon quant \
-i /opt/hg38/salmon_index \
-l A \ # 自动检测文库
-1 /opt/data/sample_R1.fq.gz \
-2 /opt/data/sample_R2.fq.gz \
--validateMappings \
--gcBias \
-g /opt/hg38/gtf/gencode.v44.annotation.gtf \
-o /opt/count/sample \
-p 12

Kallisto定量

构建索引

1
2
3
kallisto index \
-i kallisto.idx \
gencode.v44.transcripts.fa

定量

1
2
3
4
5
kallisto quant \
-i kallisto.idx \
-o /opt/count/sample \
-b 100 -t 4 \
/opt/data/sample_R1.fq.gz /opt/data/sample_R2.fq.gz

*获得GRCh38.p14.cdna.fa:

1
2
gffread gencode.v44.annotation.gtf -g GRCh38.p14.genome.fa -w GRCh38.p14.cdna.fa.tmp
cut -f 1 -d " " GRCh38.p14.cdna.fa.tmp > GRCh38.p14.cdna.fa

定量分析软件

Software Download Manual Note
STAR STAR_download STAR_manual make
bowtie2 bowtie2_download bowtie2_manual
RSEM RSEM_download RSEM_manual make
featureCounts featureCounts_download featureCounts_manual
Salmon Salmon_download Salmon_manual
Kallisto Kallisto_download Kallisto_manual
cufflinks cufflinks_download cufflinks_manual
gffread gffread_download gffread_manual
Author: Giftbear
Link: https://giftbear.github.io/2023/10/20/转录组表达定量分析/
Copyright Notice: All articles in this blog are licensed under CC BY-NC-SA 4.0 unless stating additionally.