Frequently Asked Questions

?

 

How to run Blast2GO installer in Linux systems


To install Blast2GO in Linux systems follow the next steps:

0. Open a terminal in the folder where you have downloaded the installer.

1. Unzip the file:
tar -xvf  Blast2GO_unix_X_X_XX.zip
2. Make sure the .sh file has execution permissions:
chmod +x Blast2GO_unix_X_X_XX.sh
3. Execute the .sh file:
./Blast2GO_unix_X_X_XX.sh

An installation wizard should open. Just follow the instructions on screen.

Note: change X_X_XX by the corresponding version.

How to get Blast2GO installed in a central computer where multiple users have access


This article describes how you can get Blast2GO installed in a central computer where multiple users have access and what you need to configure so that it doesn't ask for the License key for every new user and the updates are performed centrally.

0.0. You need a Unix user group that common to all the users that will execute Blast2GO. If there is none, create one and add each user to the group. Consider there is a group named domain^users and all the users that will run Blast2GO are associated to this group.

0.1. Install Blast2GO as root. It will install by default into the folder /opt/Blast2GO.

1. Now let's set the correct permissions on /opt/Blast2GO with the command:

sudo chgrp -R domain^users /opt/Blast2GO
sudo chmod -R g+w /opt/Blast2GO
2. Make the directory and all directories below it "set GID", so that all new files and directories created under /opt/Blast2GO are owned by the domain^users group:
sudo find /opt/Blast2GO -type d -exec chmod 2775 {} \;

3. Find all files in /opt/Blast2GO and add read and write permission for owner and group:

sudo find /opt/Blast2GO -type f -exec chmod ug+rw {} \;

Make sure all users log out before running Blast2GO again to apply the group, otherwise they will still create files with their own groups.

Also make sure the default system umask is set to 0002. If not,you can create a pre-launch with the line "umask 0002" before executing Blast2GO.


Some parts of Blast2GO 5 are not working (Show Blast Results, Welcome Part, Create Workflow) running Linux system.


You need to install some packages depending your distribution.

For Debian:
apt-get install libxss1

For Fedora:
yum install libXScrnSaver libXScrnSaver.i686

For CentOS:
yum install libXScrnSaver



 

Error when running LocalBlast "NCBI Binaries C++ Exception".


The error looks like:

"Critical: (110.6) CNcbiRegistry: Syntax error in system-wide configuration file: NCBI C++ Exception: "..........\src\corelib\ncbireg.cpp", line 660: Error: ncbi::IRWRegistry ::x_Read() - Badly placed '\' in the registry value: 'ROOT=J:\nASNLOAD=J:\BioEdi t\tables\nDATA=J:\BioEdit\tables\' (m_Pos = 4)
Error: NCBI C++ Exception: "..........\src\corelib\ncbireg.cpp", line 660: Error: ncbi::IRWRegistry ::x_Read() - Badly placed '\' in the registry value: 'ROOT=J:\nASNLOAD=J:\BioEdi t\tables\nDATA=J:\BioEdit\tables\' (m_Pos = 4)"


This error is reported form the NCBI binaries and not from Blast2GO.
According to the following thread at Biostars the problem is related to the Bioedit software.
Do you have this software installed on your computer?
We recommend to move the configuration setting file named “ncbi.ini” in the windows folder to a different place.
Once moved, try to run the LocalBlast again.

 How to run Enrichment Analysis of the Differential Expressed genes?


  1. Open differential expression results in Blast2GO. 
  2. Filter the table by the "Tags column". It can be filtered by UP, DOWN or both UP and DOWN regulated genes. 
  3. Mark all filtered entries by Control+A (Windows and Linux) or Apple + A (Mac).
  4. Once the sequences have been marked, it is possible to generate two types of ID lists, one for the Fisher Exact Test and the other for GSEA.
    1. Fisher Exact Test: right-click on the SeqName column and click on the "Create ID list of column: SeqName" option in the context menu. A new tab will open with the ID list of your entries.
    2. GSEA: right-click on the SeqName column and click on the "Create ID-Value-List of column: SeqName and LogFC" option in the context menu. A new tab will open with the ID Value list of your entries.
  5. Save the lists as a b2g project (go to File --> Save).
  6. Open the annotated sequences (.b2g project). 
  7. Go to Analysis --> Enrichment Analysis (Fisher's Exact Test). In the wizard select the ID list of DE genes generated in the previous steps as Test-Set Files. 
  8. Set the remaining parameters and run the analysis. 

How to cite Blast2GO?


Citation:

S. Götz et al. "High-throughput functional annotation and data mining with the Blast2GO suite", Nucleic Acids Research, Vol. 36, June, 2008, pp. 3420-3435.


CloudBlast is skipping some of my sequences. I still have CloudBlast ComputationUnits and if I restart CloudBlast it still does not blast them. Is there a restriction?


Each database in CloudBlast has a limit of sequence length.
If you are blasting against nr and your sequences are longer than 18000bp, then CloudBlast will return an error and this sequence cannot be blasted.
This restriction depends on the database size.

List of the sequence length limitation per database:

18000 - Non-redundant protein sequences (nr)
24000 - Reference proteins (refseq_protein)
100000 - UniProtKB/Swiss-Prot (swissprot)
100000 - Protein Data Bank proteins (pdb)
100000 - Viruses (nr subset) [viruses, taxa:10239]
100000 - Archaea (nr subset) [archaea, taxa:2157]
36000 - Bacteria (nr subset) [bacteria, taxa:2]
72000 - Eukaryota (nr subset) [eukaryota, taxa:2759]
100000 - Viridiplantae (nr subset) [viridiplantae, taxa:33090]
100000 - Fungi (nr subset) [fungi, taxa:4751]
100000 - Metazoa (nr subset) [metazoa, taxa:33208]
100000 - Arthropoda (nr subset) [arthropoda, taxa:6656]
100000 - Vertebrata (nr subset) [vertebrata, taxa:7742]
100000 - Mammalia (nr subset) [mammalia, taxa:40674]
100000 - Rodentia (nr subset) [rodentia, taxa:9989]
100000 - Primates (nr subset) [primates, taxa:9443]
100000 - Arabidopsis (based on nr) [arabidopsis_thaliana, taxa:3702]
100000 - Cow (based on nr) [bos_taurus, taxa:9913]
100000 - C. elegans (based on nr) [caenorhabditis_elegans, taxa:6239]
100000 - Dog (based on nr) [canis_familiaris, taxa:9615]
100000 - Zebrafish (based on nr) [danio_rerio, taxa:7955]
100000 - D. discoideum (based on nr) [dictyostelium_discoideum, taxa:44689]
100000 - Fruit fly (based on nr) [drosophila_melanogaster, taxa:7227]
100000 - E. coli (based on nr) [escherichia_coli, taxa:562]
100000 - Chicken (based on nr) [gallus_gallus, taxa:9031]
100000 - Human (based on nr) [homo_sapiens, taxa:9606]
100000 - Mouse (based on nr) [mus_musculus, taxa:10090]
100000 - Mycoplasma pneumoniae (based on nr) [mycoplasma_pneumoniae, taxa:2104]
100000 - Rice (based on nr) [oryza_sativa, taxa:4530]
100000 - P. falciparum (based on nr) [plasmodium_falciparum, taxa:5833]
100000 - Rat (based on nr) [rattus_norvegicus, taxa:10116]
100000 - Yeast (based on nr) [saccharomyces_cerevisiae , taxa:4932]
100000 - Fission Yeast (based on nr) [schizosaccharomyces_pombe, taxa:4896]
100000 - Pig (based on nr) [sus_scrofa, taxa:9823]
100000 - Fugu (based on nr) [takifugu_rubripes , taxa:31033]
100000 - Xenopus tropicalis (based on nr) [xenopus_tropicalis, taxa:8364]
100000 - Xenopus laevis (based on nr) [xenopus_laevis, taxa:8355]
100000 - Maiz (based on nr) [zea_mays, taxa:4577]
100000 - Porifera (nr subset) [sponges, taxa:6040]

I would like to compare 2 groups (functional enrichment analysis) of sequences, which are in different annotation files. Is it possible to perform Fisher Exact Test?


It is possible to perform a Fisher Exact Test on the 2 groups even if the annotation is in different files (.annot).

First, both .annot files (group 1 and group 2) need to be loaded into Blast2GO.
The test and reference set have to be generated according to the groups you want to compare. These are normal text files with the sequence name in one column.

Example:
TestSet (group1.txt)
Seq1
Seq2
Seq3

RefSet (group2.txt)
Seq4
Seq2
Seq5

The following steps describe how to load both annotation files into Blast2GO and how to create the reference and test sets in order to compare two groups.

1) Load the .annot file of group 1 in Blast2GO Add the .annot file of group 2 to the already loaded project (File Load Load Annotation Add to existing Project).
2) Generate the test set for group 1 and the reference set for group 2. These lists of sequence IDs have to be in .txt format.
3) Once the .annot files are loaded into Blast2GO and both files have been created it is possible to execute Fisher Exact test on these 2 groups;
-> go to Analysis Enrichment Analysis (Fisher Exact Test) and now select group1.txt as Test set and group2.txt as Reference set.

Remember that the Fisher's Exact Test implementation is sensitive in the direction of the test: the sequences that are present in group1 and also in group2 will be deleted from group2, but not from group1.
If you have sequences in common in both lists (e.g. Seq2) and you want to perform a test which is insensitive to the direction of the comparison, select the option Remove double IDs when performing the test.
 

After loading the sequences, the blast icon is not functional to start the analysis. All Blast2GO analysis icons are grey, why?


We have changed the Blast2GO tabs behaviour because it is possible to load more than one project into Blast2GO and they will be displayed in different tabs.
So to run the analysis on the different project you will need to have the B2G table active (white tab).

In your case, probably the Progress tab or the Messages tab is active and therefore you cannot analyse your sequences
In order to activate the analysis buttons you have to click on the B2G table itself.

What is Blast2GO?


Blast2GO is a bioinformatics platform for high-quality functional annotation and analysis of genomic datasets.
It allows analyzing and visualizing newly sequenced genomes by combining state-of-the-art methodologies, standard resources and algorithms.

Why is not possible to blast sequences with more than 8000bp using NCBI Blast?


The length of 8000bp is the maximum sequence length accepted by NCBI Blast via HTTP.
If you want to process sequences longer than 8000 bp you will have to perform the Blast locally or run the Blast search directly from the NCBI webpage and then import the results in XML format into Blast2GO.
Of course you can combine both ways to perform Blast - below 8000 online and the rest locally.

Blast2GO PRO provides different type of Blast including LocalBlast, CloudBlast and AWS Blast.

CloudBlast stopped, but there are still white sequences.


CloudBlast needs ComputationUnits to work.
If all ComputationUnits have been consumed CloudBlast will stop. Consumption depends on what you are doing. The most costly (in terms of computation time) analysis is definitely to do a blastx against the whole NR with very long sequences. The smaller the database used the less ComputationUnits are used. It is possible to monitor the number of ComputationUnits used during the blast (CloudBlast History Activity under the Help menu).

PRO:
Blast2GO PRO subscriptions can add ComputationUnits to the account.

TRIAL:
Limit number of ComputationUnits in order to evaluate CloudBlast. It is not possible to add ComputationUnits to a trail accounts.

Can I run a full functional annotation using Blast2GO Command Line in a High Performance Computing (HPC) and offline?


Yes it is possible to use Blast2GO Command Line in a HPC, but there is no need.
We recommend to use a normal server because it will be enough. It is also possible to run a full functional annotation offline, without internet connection.

In case you really want to use Blast2GO Command Line (CLI) in a HPC you will need to be aware that the Blast2GO CLI will need to be installed in each node of the cluster, thus one license.b2g for each node. With this option each node will need to be connected to the Blast2GO database, which can be installed in a server. A different approach will be to have the Blast2GO database and Blast2GO CLI itself in each node.

For further and particular questions on this, please contact Blast2GO support team (This email address is being protected from spambots. You need JavaScript enabled to view it. ).

When executing the enrichment analysis by mistake I set the Reference set as Test set and vice versa.
Does this confusion influence the results or are these the same but in the opposite direction?


By changing the reference set for the test set the results will not be the same. When executing Fisher Exact Test, Blast2GO removes, by default, from the Reference set those IDs that are also present in the Test set.

For example, if you have 100 sequence IDs in the Reference set and 20 sequence IDs in the Test set and there are 4 IDs that are common in both lists, these 4 IDs will be removed from the Reference list.
The deletion of the duplicated IDs in Reference set is needed to create a consistent contingency table used in the Fisher's Exact Test. In case there is a mistake in selecting the files the results will not be the same.

For other comparisons e.g. comparing 2 libraries, then duplicate genes would need to be removed from both lists.
This option is available at the Enrichment Analysis menu.

Some of my sequences have low blast e-values and mapping results but failed to get annotated. Why?


The Blast2GO annotation algorithm is carried out by applying an annotation rule (AR) on the mapped sequences. The AR uses the highest hit similarity and the evidence code weights (ECw) to calculate an annotation score (AS) for each GO candidate. Once the AS is calculated for each GO, the AR selects the lowest term per branch that lies over a certain threshold (default=55).

Default values for the ECw and threshold were chosen to provide a good balance between quantity and quality of annotation. However, each parameter can be adjusted manually in order to achieve a more restrictive or more permissive annotation.

There are four scenarios where the annotation can fail even when there are low blast e-values and mapped sequences.
1) The low e-values correspond to those blast hits that could not be mapped.
2) If the sequences are only mapped to the root GOs such as GO:0003674, GO:0008150 and GO:0005575, no annotation will be retrieved.
3) Assigning low weights to particular evidence codes, such as IEA Inferred from Electronic Annotation, means that higher similarities are required to reach the threshold.
4) The annotation threshold is set to a high value.

While scenarios 1 and 2 are hopeless, 3 and 4 can still lead to a successful annotation of GOs by adjusting the parameters mentioned above.

No mapping results after loading my own blast xml file.


This can happen for two different reasons:

One, due to the usage of an inadequate blast program. Be sure to run blastx or blastp, since you need to get protein IDs. This is because GOs are assigned to proteins only.

Or two, when importing your own blast XML results the parameters given may not be correct. Make sure to adjust the import blast XML parameters (position, separator, xml).

The parameters depend on the XML file and can be adjusted in the import blast dialogue.
> Join Hit ID and Description: This option lets you decide if you want to join the id and the description
> Separator: The separator defines the character by which the "positions" are "separated". Default is |
> Position: Defines the position, where the Hit description is located, depends also on whether "Join Hit ID and Description" is checked.

Example with Join Hit ID and Description checked:
position 1: gi
position 2: 88607073
position 3: ref
position 4: YP_504641.1
position 5: ribulose-phosphate 3-epimerase [Anaplasma phagocytophilum HZ]; gi
position 6: 88598136


I have 2000 sequences in my fasta file but only 1150 sequences are loaded in Blast2GO.


Check if the sequence names are not duplicated.

Sequence name have to be unique in order to be loaded in Blast2GO.

When using Blast2GO Basic I cannot connect to the public server database.


Check the following:

The MySQL port (3306) is open at your institution for out-going connections (ask your sys-admin) or type in a command prompt telnet publicdb.blast2go.com 3306 and no time out should occur, otherwise the port 3306 is closed.

You do not have any personal firewalls installed (in Windows under System Preferences).

Do you have the right database name (host-name or ip-address) and DB-Name selected (in the Blast2GO menu under File > Preferences > Data Access).

After running InterProScan I receive a warning message telling there is no sequence information.


Indeed, to run InterProScan the sequence information is needed. In case you imported blast XML results to Blast2GO they do not contain sequence information.

Advise: Load your fasta file, run InterProScan and finally import the XML files to the existing Blast2GO project.

end faq

FORUM

Join our Blast2GO Google Group

TESTIMONIALS

Scientific Impact of Blast2GO

Publications, Citations and Testimonials.