Assignment 6
Select one of your interesting sequences from the database (sequence should be longer than 300 base pair) to do the BLAST search and answer the following questions:
a. What are the different between 6 BLASTs(blastn, blastp, blastx, tblastn, tblastx, PSI-BLAST)?
b. Use your sequence to do 3 out of 6 BLASTs and discuss “What’s the strength and weakness of BLAST you have selected?”
c. Show us the first hit on each BLAST with their identity or/and similarity scores.
d. Summarize the result from 3 BLASTs you select.
The deadline will be Jan 24, 2010 at 16.30. (This system will refuse to accept your assignment after 16.30pm).
Question a. What are the different between 6 BLASTs(blastn, blastp, blastx, tblastn, tblastx, PSI-BLAST)?
BLAST (Basic Local Alignment Search Tool) is a similarity search tool program used to find regions of local similarity between sequences. The program compares nucleotide or protein sequences to sequence databases and calculates the statistical significance of matches. BLAST can be used to infer functional and evolutionary relationships between sequences as well as help identify members of gene families.
1. blastn is a type of search tool program used to search a nucleotide database from a nucleotide query. It is a nucleotide BLAST using blastn as an algorithm.
2.blastp use amino acid or protein sequences query to search against protein database.
3.blastx is a type of BLAST search in which a translated nucleotide sequence query is compared with the contents of an amino acid sequence or protein database. The query sequence is translated in all six reading frames, and each of the resulting sequences is used to search the sequence database.
4. tblastn is A type of BLAST search in which an amino acid sequence is compared with the contents of a nucleotide sequence database. The sequences in the sequence database are translated in all six reading frames, and the resulting sequences are searched for regions homologous to regions of the query sequence.
5. tblastx is a type of BLAST search in which a nucleotide sequence is compared with the contents of a nucleotide sequence database. In a tBLASTx search, both the query sequence and the sequence database are translated in all six reading frames, and the resulting sequences are compared to discover homologous regions.
6. PSI-BLAST (Position-Specific Iterative BLAST) is a type of protein BLAST. It is an iterative search using the BLAST 2.0 algorithm.A profile (or position specific scoring matrix, PSSM) is constructed (automatically) from a multiple alignment of the highest scoring hits in an initial BLAST search.
In the following excercise, I choose “citrate synthase gene (gltA) of Rickettsia” to do 3 out of 6 BLASTs and discuss the strength and weakness of each BLAST.
- Retrieve the DNA sequence of type II citrate synthase (gltA) gene of Rickettsia from NCBI database. These are the NCBI reference sequence of my interting gene.
NCBI reference sequence ; Accession number NC_009882.1

I use that gene (i.e., gltA) to do blastn, blastx and tblastx.
I. blastn
1. Go to http://blast.ncbi.nlm.nih.gov/Blast.cgi.
2. Input the DNA sequences (FASTA format) into the query box.
3. Choose Database:
- Others (nr etc.):
- Search database Nucleotide collection (nr/nt)
- Use Megablast (Optimize for highly similar sequences)
4. Algorithm parameters setting
- Filter: Low complexity regions
5. Click on “BLAST”.
- The screenshots show sequences that producing significant alignments.
- The first hit on blastn with their identity or/and similarity scores was shown in the screenshot
My interested DNA sequences (1305 bases) was 100% identity (1305/1305) to the sequences of type II citrate synthase of Rickettsia rickettsii str. ‘Sheila Smith’ (accession no.CP000848.1) with score = 2354 bits and expected value at 0.0.
Strengthen
- Query nucleotide sequences were searched against all nucleotide databases.
- Many algorithms of BLAST program to be the choices of selection including ;
-Highly similar sequences (megablast)
-More dissimilar sequences (discontiguous megablast)
-Somewhat similar sequences
- Give high expect value and identity.
Weakness
- Many choices of algorithms of BLAST program which each algorithms has to be optimized for our nucleotide sequences.
II. blastx: search protein database using a translated nucleotide query
1. Input the DNA sequences (FASTA format) into the query box.
- Set genetic code as “Standard 1″.
- Database : Non-redundant protein sequences (nr)
- Matrix ” BLOSUM 62″
- Filter: Low complexity regions
2. Click on “BLAST”.
- The screenshot show sequences that producing significant alignments.
- The first hit on blastx with their identity or/and similarity scores as follows:
- DNA sequences of interest was translated. Given open reading frame 1+ was 100% identity (231/231) to type II citrate synthase of Rickettsia rickettsii str. ‘Sheila Smith’ ( accession no.YP_001495383.1) with expect value at 6e-86.
Strength
- We do not need to translate our nucleotide squences. Blastx program translates nucleotide squences, design open reading frames, and aligned them with protein database.
- Usage in analysis of the query sequences.
Weakness
- On the graphic summary of blastx, there were many hits sequences with low alignment scores paralleling with high alingnment scores.
- Give low expect value.
III. tblastx (search translated nucleotide databases using a translated nucleotide query.
- The steps to do tblastx search is similar to blastx and all setting values were set as defualt.
- The screenshot shows sequences that producing significant alignments.
Strength
- Query sequences were aligned with all GenBank+EMBL+DDBJ+PDB sequences
- Give the highest expect value and identity.
Weakness
- Query sequences were not aligned with EST, STS, GSS,environmental samples or phase 0, 1 or 2 HTGS sequences.
- Take more time to run tblastx search compared to blastx, because query sequences were aligned to total genome of matched taxon.
Summarize the result from 3 BLASTs
| blastn | blastx | tblastx | |
| Query | DNA | Translated DNA | Translated DNA |
| Database | DNA | Protein | Translated DNA |
| Usage | Very similar sequences | Analysis of query DNA sequence | Protein discovery and ESTs |
| Query accession no. | NC_009882.1 | NC_009882.1 | NC_009882.1 |
| First hit | type II citrate synthase of Rickettsia rickettsii str. ‘Sheila Smith’ | type II citrate synthase of Rickettsia rickettsii str. ‘Sheila Smith’ | type II citrate synthase of Rickettsia rickettsii str. ‘Sheila Smith’ |
| First hit/accession no. | CP000848.1 | YP_001495383.1 | CP000848.1 |
| E value | 0.0 | 6e-86 | 0.0 |
| Score (bits) | 2354 | 322 | 1042 |
| Identity | 1035/1035 (100%) | 231/231 (100%) | 435/435 (100%), |
| Frame | - | +1 | +1/-1 |
























































































































































เห็บอ่อนเก็บจากจากที่อาศัยนกปากห่างในวัดไผ่ล้อม จังหวัดปทุมธานี
เห็บแข็งกินเลือดงูสิงจนอิ่ม จนมีขนาดใหญ่เกือบเท่ากับนิ้วก้อย
ความเห็นล่าสุด