Blast2GO Blog

b

Pairwise Differential Expression Analysis with Blast2GO

A simple use-case comparing Blast2GO with R chunks

The Blast2GO feature “Pairwise Differential Expression Analysis” is designed to perform differential expression analysis of count data arising from RNA-seq technology. This tool allows the identification of differentially expressed genes considering two different conditions based on the software package ‘edgeR’, which belongs to the Bioconductor project. This use case shows the basic analysis workflow, comparing the results obtained with R Bioconductor and Blast2GO.

Blast2GO and R Logo

 

DataSet

This analysis describes a differential expression analysis of an RNA-Seq dataset from tomato (Solanum lycopersicum) pollen undergoing chronic heat stress. Samples and counts were obtained from the laboratory of Nurit Firon. The control plants (C1-C5) were grown at 28/18 ºC and the treatment samples (T1-T5) were grown at 32/26 ºC day/night. Each collection includes one treatment and one control sample consisting of pollen pooled from multiple plants.

Download:

Video tutorial:

 

Analysis Workflow

  1. Loading Data

The read counts for the ten individual libraries were stored in one tab-delimited file. EdgeR only accepts raw counts without any type of normalization.

edgeR: 

  • The count table is loaded in R and used to build the DGEList object.
library(edgeR)

rawdata=read.delim('/data/tomato_counts.txt',
check.names=F, stringAsFactors=F,
row.names=1)

group=c('C','C','C','C','C',
'T','T','T','T','T')

dge=DGEList(rawdata, group=group)

 

Blast2GO: 

  • The file containing the count table is loaded: File → Load → Load Count Table. Note that the data are nicely formatted, with gene names as row names and sample names as column names.

 

  1. Filtering and Normalization

Genes with zero counts are eliminated since it makes no sense to test them for differential expression if they were not expressed. Normalization is applied to account for compositional difference between the libraries. Note that when normalization is applied in edgeR, the software does not change the counts. Instead, it calculates normalization factors that will be used later.

edgeR: 

  • Only genes with 0 counts for all samples are discarded. TMM is selected as normalization method ("calcNormFactors()" command).
##Filtering
keep=rowSums(cpm(dge) > 0) >= 1
dge=dge[keep, , keep.lib.sizes=F]

##Normalization
dge=calcNormFactors(dge, method='TMM')

 

Blast2GO: 

  • The adjustement of filtering and normalization steps is performed on the “Preprocessing Data” page of the main wizard: Diff Exprs → Run Differential Expression Analysis ("Pairwise Differential Expression Analysis" option).

 

  1. Experimental Design

The experimental design of this case corresponds to two experimental conditions, "Control" and "Treatment", with five replicates each one.

edgeR: 

  • A vector indicating which group each sample belongs is addded to the DGEList object.
groups = c('C','C','C','C','C',
'T','T','T','T','T')
dge$samples = groups

 

Blast2GO: 

  • On the “Experimental Design” page of the main wizard, users can load the experimental design of their data.

 

  1. Comparison and Test

For the analysis, all the control and treatment samples will be treated as replicates in orden to identify genes whose expression changed due to the heat stress treatment. This corresponds to what the edgeR manual calls the “classic approach” (Exact test).

edgeR: 

  • The "estimateDisp()" command estimates dispersions and the "exactTest()" function applies the statistical exact test. 
##Dispersions
dge=estimateDisp(dge)

##Test
dex=exactTest(dge, pair=c('C','T'))

 

Blast2GO: 

  • On the “Comparison and Test” page of the main wizard, users can choose the experimental conditions to compare and select the statistical test.

 

Results

After the analysis, interpretation of results is important to reach biological conclusions.

edgeR: 

  • The "topTags()" command extracts the top DE tags in a data fram for a given pair of groups, ranked by adjusted p-value.
TopTags(dex)

 

  • Using the "summary()" and "decideTestDGE()" commands, genes that are up- (1)  or down-regulated  (-1) are counted (FDR < 0.05).
summary(decideTestsDGE(dex, p=0.05))             

 

Blast2GO: 

  • Once the input counts has been processed and analyzed, results will be displayed on the Main Generic Table Viewer.

 

  • A result page shows a summary of the differential expression analysis results.

 

Statistics

  1. Library Size

A bar chart showing the read counts contained in each sample.

edgeR: 

  • "barplot()" command.                                                                
barplot(dge$samples$lib.size*1e-6,
names=colnames(dge$counts),
ylab='Library size (millions)')

 

Blast2GO: 

  • Diff Expr → View Results: Pairwise Analysis → Statistics → Library Size per Sample.

 

 

  1. MDS Plot

A multi-dimensional scaling (MDS) plot can show similarity between samples in which distances correspond to leading log-fold-changes between each pair of RNA samples. The leading log-fold-change is the average (root-mean-square) of the largest absolute log-fold-changes between each pair of samples. This plot can be viewed as a type of unsupervised clustering.

edgeR: 

  • "plotMDS()" command.                                                                
cn.color='blue'
tr.color='red'
main='MDS Plot for Count Data'
par(las=1)
colors=c(rep(cn.color,5),rep(tr.color,5))
plotMDS(dge, main=main,
labels=colnames(dge$counts),
col=colors, las=1)

 

Blast2GO: 

  • Diff Expr → View Results: Pairwise Analysis → Statistics → MDS Plot.

 

  1. MA Plot

This chart illustrates the relationship between logFC and average expression level for the differentially expressed genes, which are highlighted in red. The blue lines indicate genes that are up or downregulated two fold.

edgeR: 

  • "plotSmear()" command.                                                                
de=decideTestsDGE(dex, p=0.05, adjust='BH')
detags=rownames(dex)[as.logical(de)]
plotSmear(dex, de.tags=detags,
main='MA Plot for Count Data')
abline(h=c(-1,1), col='blue')

 

Blast2GO: 

  • Diff Expr → View Results: Pairwise Analysis→ Statistics → MA Plot.

 

  1. Volcano Plot

Scatter chart that is constructed by plotting the negative log of the adjusted p-values (FDR) on the y-axis versus the log of the fold changes on the x-axis.

edgeR: 

  • ggplot2" package.                                                                       
libarary(ggplot2)
ggplot(data=results, aes(x=logFC,
y=-log10(FDR), colour=series))+
geom_point(size=0.5)+
xlim(c(-10,10))+
ylim(c(0,15))+
xlab('log2 fold change')+
ylab('-log10 p-value')+
scale_color_manual(values=
c('gray', 'black', 'red', 'green'))+
ggtitle('Volcano Plot for Count Data')

 

Blast2GO: 

  • Diff Expr → View Results: Pairwise Analysis → Statistics → Volcano Plot.

 

Conclusions

As shown in this use case, the edgeR package is a powerful tool that allows statistical analysis for RNA-seq technology data.  The Blast2GO feature "Pairwise Differential Expression Analysis" uses all the edgeR statistical potential to offer an easy and simple way to perform this type of analysis, without requiring programming skills. Futhermore, users can take advantage of Blast2GO features to complete the analysis and achieve greater understanding of the biological problem that is being studied. 

 

References: 

Robinson MD, McCarthy DJ and Smyth GK (2010). “edgeR: a Bioconductor package for differential expression analysis of digital gene expression data.” Bioinformatics, 26, pp. -1.

 

FORUM

Join our Blast2GO Google Group