Blast2GO Blog


How to load Fasta sequences from a whole genome using a GFF/GTF?

Sometimes databases provide the whole genome and the GFF or GTF files but not the exon or CDS FASTA files.
With Blast2GO it is possible to extract the exons or the CDS from the genome using the GFF file.


Use Case

For this example, the data used is from NCBI Bacteria Escherichia coli BW25113.
The sequences that will be loaded in Blast2GO will be the ones with feature exon (3rd column) in the GFF file and the given sequence name has to be chosen from the 9th column e.g. exon_id.

The GFF file looks the following:

Chromosome ena gene 190 255 . + . ID=gene:BW25113_0001;Name=thrL;biotype=protein_coding;description=thr operon leader peptide;gene_id=BW25113_0001;logic_name=ena
Chromosome ena mRNA 190 255 . + . ID=transcript:AIN30539;Parent=gene:BW25113_0001;Name=thrL-1;biotype=protein_coding;transcript_id=AIN30539
Chromosome ena exon 190 255 . + . Parent=transcript:AIN30539;Name=AIN30539-1;constitutive=1;ensembl_end_phase=0;ensembl_phase=0;exon_id=AIN30539-1;rank=1
Chromosome ena CDS 190 255 . + 0 ID=CDS:AIN30539;Parent=transcript:AIN30539;protein_id=AIN30539

 Steps to retrieve the exon sequences with exon id as sequence name.

  1. Download the DNA (whole genome)
  2. Download GFF file
  3. In Blast2GO go to File > Load Sequences > Load Fasta from Reference + GFF/GTF
    1. See Figure 1 for parameters:
      • Feature Level: exon
      • Group and Name by: exon_id

Once loaded, a new project will be created in Blast2GO with the exon sequences and the SeqName corresponds to the exon_id, see Figure 2.


Load exon sequences from reference

Figure 1: Load fasta from reference parameters window.

Blast2GO table with exons

Figure 2: Exon sequences loaded in Blast2GO.


Join our Blast2GO Google Group