Amazon Web Service (AWS) NCBI Blast Search with Blast2GO
(This feature is now obsolete.)
In the following article, I will explain how to set up an EC2 instance with the Blast+ AMI.
- This NCBI webpage takes us to the most recent Blast+ AMI in the AWS Marketplace, which we want to configure after hitting "Continue" on the right-hand side.
- We can now play around with the Region and EC2 Instance Type, which will influence the estimated monthly price for our set-up (see right-hand side). Keep in mind that the NCBI states in their documentation the following:
BLAST searches will not run efficiently on smaller instances. Minimally, an instance with 32 GB of memory and a minimum of 32 GiB free space is required.
- After configuring the instance (Keep in mind that the settings can be changed afterwards, but those default settings should be fine for a start), we continue by hitting "Launch with 1-Click" on the right-hand side.
- The following pop-up offers us to visit our AWS Console, which we do in order to wait until the instance is up and running. We can already go to the Instance tab, select our instance and select Description at the bottom of the page. Here we copy the Public DNS URL.
- Now we start Blast2GO, select Blast -> AWS Blast and paste the URL into "AWS BLAST Server URL" within the AWS Blast Configuration step. In the Advanced Configuration, we should set the Number of Threads accordingly to the instance type we just contracted.
- The first time after completing the wizard it will take some time before results will be received which also depends on the size of the BLAST database we selected (The Amazon instance synchronizes the database on the first BLAST request).
Performance and price comparison between AWS Blast and CloudBlast:
Please note: The following numbers were obtained performing blastx against the NCBI NR database with a word-size of 3. Values may change over time.
AWS Blast: We used an on-demand 16 core c3.4xlarge instance with 30GB RAM and 80GB disk space and obtained an average of 645 nt/min on this instance. Please note that the AWS database installation takes an extra of 4 hours for NR.
CloudBlast: The average performance of the CloudBlast during the last 30 days for the same setup (blastx) was 3770 nt/min. This is 5,85 times faster. Please note that these values may vary depending on the actual load balance. Also, note that CloudBlast means absolutely zero installation and maintenance.
AWS Blast: With the on-demand c3.4xlarge instance costs are 0,87 €/h (0,956 $/h) for the location Ireland, December 2015. Using blastx this equals to 1€ per 100.000 nucleotides, considering that this instance type is capable of processing about 38.000nt/h
CloudBlast: Each 6.000.000 computation units allow to process 64.000.000 nt (blastx against NR). This translates into 0.31€/100.000 nt. This is nearly 3 times cheaper than the AWS solution (1€/100.000 nt). Please note that a standard 1 year Blast2GO PRO subscription includes 6M. units for free. Each additional 6M. units cost 200€.
Sometimes the initial database download on the AWS instance stales, you will notice this by an excessive waiting time for the result-retrieval. The database is downloaded and managed by the so called fuse-client, which seems to be a bit picky.
I do not recommend launching various searches in parallel until the desired database has been downloaded completely, it seems to provoke a stalling of the caching mechanism inside the fuse client.
In other words, let the initial search (which triggers the database download) finish before you launch your parallel searches.
You can log in to your instance via ssh and become root with:
Now you can check the database directory (I get ~195MB for SwissProt and ~28GB for nr):
du -sh /blast
If the database folder size is significantly lower, it might help to restart the fuse-client:
Now, after launching another blast request from within Blast2GO, the database folder should increase in size.