Blast2GO Blog

b

Helpful Features, Tips and Tricks

How to use "Retrieve Blast Top-hit" to improve functional annotations with Blast2GO

RNA-seq data is sometimes difficult to match with proteins, due to the short length of the reads. When this is the case, it might be useful to try to find EST hits, which can then be used to find new protein matches. In this demo, we will show how to retrieve top EST hits and the different options that this tool offers. 

In this video, we use an RNA-seq dataset in which many sequences did not find a Blast result. To further characterize sequences without result, we first show how to select and extract sequences to a new project. Then, we perform a local Blast against an EST database and finally we explain in detail how to retrieve the top EST hits. Using the top EST hits as queries, we perform a Blastx search in order to find new protein matches and improve functional characterization.  

mqdefault

New RNA-Seq Features in Blast2GO 5

This video shows an overview of new RNA-Seq features in Blast2GO 5: FastQ Quality Control and Preprocessing, de novo Transcriptome Assembly, Transcript Quantification and Differential Expression Analysis.


How to create analysis workflows with Blast2GO

Blast2GO provides an interface to create, edit and run workflows based on the Common Workflow Language (CWL) specification. This interface allows to describe all analysis steps using the functions and tools offered by Blast2GO and connect them to perform a complete analysis in a single run. 

This video shows step-by-step how to create a workflow from scratch, define the input data, configure the parameters of each step, save and export results, generate charts and more.

Please find further information in the online user manual.

 


Expression Estimation at Transcript-Level

The Transcript-level Quantification feature of Blast2GO allows quantifying the gene and isoform expression of RNA-seq datasets. 

This video shows step-by-step how to create a count table of aligned sequencing reads and explains in detail the different concepts of expression quantification at transcript level. The application is based on the RSEM software package, which assigns reads to the isoforms they came from modelling the uncertainty derived from multiple isoforms having overlapping sequences.  

As input, sequencing reads in FASTQ format and a FASTA file containing the transcript sequences are required. The output, an un-normalized count table, can then be analysed directly within Blast2GO. Various options for differential expression analysis are available (find videos here and here). 

Find more details in the online user manual.

References: 

  • Li B and Dewey CN (2011). "RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome." BMC Bioinformatics, 12:323.
  • Langmead B, Salzberg S (2012). "Fast gapped-read alignment with Bowtie 2." Nature Methods, 9:357-359

Expression Quantification with Blast2GO

The "Create Count Table" feature of Blast2GO allows quantifying the gene expression of RNA-seq datasets. 

This video shows step-by-step how to create a count table of raw reads and explains in detail different concepts of expression quantification. The available parameters are inspired by the popular HTSeq Python Package (reference below).

As input, aligned sequencing reads in SAM/BAM format and a GTF/GFF file with coordinates of genomic features are required. The output, an un-normalized count table, can then be analysed directly within Blast2GO. Various options for differential expression analysis are available (find videos here and here).

Reference: 

Anders S, Theodor Pyl P, Huber W (2015). "HTSeq — A Python framework to work with high-throughput sequencing data." Bioinformatics, 31 (2), p. 166-169.


Coding-Potential Assessment Tool with Blast2GO

This video shows how to use the 'Coding-Potential Assessment Tool' which allows distinguishing the coding transcripts from the non-coding transcripts. This can be achieved using prebuilt models or building a species-specific model from the NCBI database. The results of the coding potential can be cross-checked with the Blast results and may allow discovering some novel mRNA.


NCBI GenBank Submission with Blast2GO 

This video shows how to use the 'Create NCBI GenBank Genome Submission Files' tool which allows to generate all files (e.g. the Asn1 (.sqn) file) necessary to submit your annotated sequences to the NCBI database. It allows to combine genomic sequences and functional annotations and creates valid GenBank submission files. Additionally, this video explains how to obtain source files (.gff and .annot files), provides hints on how to prevent common validation errors and how to submit a WGS project via the NCBI website. 

 


Time course expression analysis with Blast2GO 

The Time Course Expression Analysis tool allows performing a differential expression analysis of expression data arising from a time course RNA-seq experiment. This application is based on the maSigPro Bioconductor package, which implements a two-step regression strategy to detect genes with significant temporal expression changes and significant differences between experimental groups. 

This video shows the analysis of count data coming from an experiment in which the expression levels of tumour and normal human cells were measured at different times.  

 

References: 

Nueda MJ, Tarazona S and Conesa A (2014). “Next maSigPro: updating maSigPro Bioconductor package for RNA-seq time series." Bioinformatics, 30, p. 2598-2602.


Pairwise differential expression analysis with Blast2GO 

The Pairwise Differential Expression Analysis tool is designed to perform differential expression analysis of count data arising from an RNA-seq experiment. The application, which is based on the software package "edgeR", allows the identification of differentially expressed genes between two experimental conditions by applying quantitative statistical methods. 

This video shows the performance of a pairwise differential expression analysis in which the expression of two cell types obtained from mice at different development stages was compared. 

 

References: 

Robinson MD, McCarthy DJ and Smyth GK (2010). “edgeR: a Bioconductor package for differential expression analysis of digital gene expression data.” Bioinformatics, 26, pp. -1.


How to translate longest ORFs with Blast2GO

The Blast2GO "Translate Longest ORF" tool searches for the longest Open Reading Frame (ORF) in nucleotide sequences and translates them into their protein sequences. You may choose one or multiple of the six possible DNA frames or select the reading frame based on the frame of the best blastx hit. In this video, we will explain how to translate a set of Salmonella enterica genes and the available parameters (e.g how to select the reading frames, the genetic codes that can be used for translation, how to allow open-ended translations, etc). We also show the "Batch Rename" functions to undo sequence name changes.


More Articles...

FORUM

Join our Blast2GO Google Group