Posted by: sumrandee | มกราคม 22, 2010

Assignment 6. BLAST SEARCH

Assignment 6

 Select one of your interesting sequences from the database (sequence should be longer than 300 base pair) to do the BLAST search and answer the following questions:

a. What are the different between 6 BLASTs(blastn, blastp, blastx, tblastn, tblastx, PSI-BLAST)?

b. Use your sequence to do 3 out of 6 BLASTs and discuss “What’s the strength and weakness of BLAST you have selected?”

c. Show us the first hit on each BLAST with their identity or/and similarity scores.

d. Summarize the result from 3 BLASTs you select.

    The deadline will be Jan 24, 2010 at 16.30. (This system will refuse to accept your assignment after 16.30pm).

Question a. What are the different between 6 BLASTs(blastn, blastp, blastx, tblastn, tblastx, PSI-BLAST)?

        BLAST (Basic Local Alignment Search Tool) is a similarity search tool program used to  find regions of local similarity between sequences. The program compares nucleotide or protein sequences to sequence databases and calculates the statistical significance of matches. BLAST can be used to infer functional and evolutionary relationships between sequences as well as help identify members of gene families.

1. blastn is a type of search tool program used to search a nucleotide database from a nucleotide query. It is a nucleotide BLAST using blastn as an algorithm.

2.blastp use  amino acid or protein sequences query to search against protein database.

3.blastx is a type of BLAST search in which a translated nucleotide sequence query is compared with the contents of an amino acid sequence or protein database. The query sequence is translated in all six reading frames, and each of the resulting sequences is used to search the sequence database.

4. tblastn is A type of BLAST search in which an amino acid sequence is compared with the contents of a nucleotide sequence database. The sequences in the sequence database are translated in all six reading frames, and the resulting sequences are searched for regions homologous to regions of the query sequence.

5. tblastx is a type of BLAST search in which a nucleotide sequence is compared with the contents of a nucleotide sequence database. In a tBLASTx search, both the query sequence and the sequence database are translated in all six reading frames, and the resulting sequences are compared to discover homologous regions.

6. PSI-BLAST  (Position-Specific Iterative BLAST) is a type of protein BLAST. It is an iterative search using the BLAST 2.0 algorithm.A profile (or position specific scoring matrix, PSSM) is constructed (automatically) from a multiple alignment of the highest scoring hits in an initial BLAST search.

  In the following excercise, I choose “citrate synthase gene (gltA) of Rickettsia” to do 3 out of 6 BLASTs and discuss the strength and weakness of each BLAST.

  • Retrieve the DNA sequence of type II citrate synthase (gltA) gene of Rickettsia from NCBI database. These are the NCBI reference sequence of my interting gene.

NCBI reference sequence ; Accession number NC_009882.1

I use that gene (i.e., gltA) to do blastn, blastx and tblastx.

I. blastn

1. Go to http://blast.ncbi.nlm.nih.gov/Blast.cgi.

2. Input the DNA sequences (FASTA format)  into the query box.

3. Choose Database:

  • Others (nr etc.):
  • Search database Nucleotide collection (nr/nt)
  • Use Megablast (Optimize for highly similar sequences)

4. Algorithm parameters setting

  • Filter: Low complexity regions

5. Click on “BLAST”.

  • The screenshots show sequences that producing significant alignments.

  • The first hit on blastn with their identity or/and similarity scores was shown in the screenshot

My interested DNA sequences (1305 bases) was 100% identity (1305/1305) to the sequences of type II citrate synthase of Rickettsia rickettsii str. ‘Sheila Smith’ (accession no.CP000848.1) with score = 2354 bits and expected value at 0.0.  

Strengthen

  • Query nucleotide sequences were searched against all nucleotide databases.
  • Many algorithms of  BLAST  program to be the choices of selection including ;

                       -Highly similar sequences (megablast)

                       -More dissimilar sequences (discontiguous megablast)

                      -Somewhat similar sequences

  • Give high expect value and identity.

 

Weakness 

  • Many choices of algorithms of  BLAST  program which each algorithms has to be optimized for our nucleotide sequences.

 

II. blastx: search protein database using a translated nucleotide query

1. Input the DNA sequences (FASTA format)  into the query box.

  •  Set genetic code as “Standard 1″.
  • Database : Non-redundant protein sequences (nr)
  • Matrix ” BLOSUM 62″
  • Filter: Low complexity regions

2. Click on “BLAST”.

  • The screenshot show sequences that producing significant alignments.

  • The first hit on blastx with their identity or/and similarity scores as follows:

  • DNA sequences of interest was translated. Given open reading frame 1+ was 100% identity (231/231) to type II citrate synthase of Rickettsia rickettsii str. ‘Sheila Smith’ ( accession no.YP_001495383.1) with expect value  at 6e-86.

Strength

  • We do not need to translate our nucleotide squences. Blastx program translates nucleotide squences, design open reading frames, and aligned them with protein database.
  • Usage in analysis of the query sequences.

Weakness

  • On the graphic summary of blastx, there were many hits sequences with low alignment scores paralleling with high alingnment scores.
  • Give low expect value.

 

III. tblastx (search translated nucleotide databases using a translated nucleotide query.

  • The steps to do tblastx search is similar to blastx and all setting values were set as defualt.
  • The screenshot shows sequences that producing significant alignments.

  • The first hit on tblastx with their identity or/and similarity scores as follows:
  • DNA sequences of interest was translated to 434 amino acid. Given open reading frame 1+ /-1 was aligned. It was 100% identity to type II citrate synthase of Rickettsia rickettsii str. ‘Sheila Smith’ ( accession no.CP000848.1) with expect value  at 0.0.
  • Strength

    • Query sequences were aligned with all GenBank+EMBL+DDBJ+PDB sequences
    • Give the highest expect value and identity.

    Weakness

    • Query sequences were not aligned with EST, STS, GSS,environmental samples or phase 0, 1 or 2 HTGS sequences.
    • Take more time to run tblastx search compared to blastx, because query sequences were aligned to total genome of matched taxon.  

     Summarize the result from 3 BLASTs

      blastn blastx tblastx 
    Query DNA Translated DNA Translated DNA
    Database DNA Protein Translated DNA
    Usage Very similar sequences Analysis of query DNA sequence Protein discovery and ESTs
    Query accession no. NC_009882.1 NC_009882.1 NC_009882.1
    First hit type II citrate synthase of Rickettsia rickettsii str. ‘Sheila Smith’ type II citrate synthase of Rickettsia rickettsii str. ‘Sheila Smith’ type II citrate synthase of Rickettsia rickettsii str. ‘Sheila Smith’
    First hit/accession no. CP000848.1 YP_001495383.1 CP000848.1
    E value 0.0 6e-86 0.0
    Score (bits) 2354 322 1042
    Identity 1035/1035 (100%) 231/231 (100%) 435/435 (100%),
    Frame - +1 +1/-1
    Posted by: sumrandee | มกราคม 16, 2010

    Primer and Probe Design

    Assignment 5

    Please use the bioinformatics tools to design these following items;

    1. The real-time PCR primer and probe set(s) which can be used to distinguish between 2009 Swine-Origin Influenza A (H1N1)from other influenza subtypes.
    Please also describe what are gene(s)/region(s) that you choose? And give us the reason why?

    2. The conventional PCR and sequencing primer set which can be used to identify oseltamivir resistance associated NA gene mutations: N1: H274Y.

    Note:
    a) Please show the size of PCR product and locate the position of PCR, probe, and sequencing primer used in #1 and #2
    b) What kind/type/name of the programs that you use for #1 and #2?

    Hence:
    a) Algorithm used to design PCR primer, real-time PCR primer, and sequencing primer are different.

    Question 5.1. The real-time PCR primer and probe set(s) which can be used to distinguish between 2009 Swine-Origin Influenza A (H1N1) from other influenza subtypes.

        The real-time PCR primer and probe set(s) were designed based on the hemagglutinin (HA) gene found in Genbank database ( http://www.ncbi.nlm.nih.gov/genomes/FLU/SwineFlu.html)  using BLAST.

    • Hemagglutinin (HA) gene were chosen to design primer and probe because of the nature of gene in term of high genetic diversity among influenza subtypes. There are at least 16 different HA antigens ranging from H1-H16.  
    • In the following, I use segment 4 hemagglutinin (HA) gene to design primer and probe.

     Real-Time PCR Primer and Probes Design

    1. Go to http://www.ncbi.nlm.nih.gov/

    2. Seach for “2009(H1N1)) segment 4 hemagglutinin (HA) gene AND Thailand”.

           Another search was for ” (H1N1) AND Mexico”.

    • cDNA (in fasta format) were retrieved from those database including;

    2.1 Accession number GQ866959: Influenza A virus (A/Thailand/CU-H9/2009(H1N1)) segment 4 hemagglutinin (HA) gene, complete cds.

    2.2 Accession number GQ150342 :Influenza A virus (A/Nonthaburi/102/2009(H1N1)) segment 4 hemagglutinin (HA) gene, complete cds.

    2.3 Accession number GQ169382 : Influenza A virus (A/Thailand/104/2009(H1N1)) segment 4 hemagglutinin (HA) gene, complete cds, and

    2.4 Accession number GQ149641: Influenza A virus (A/Mexico/4603/2009(H1N1)) segment 4 hemagglutinin (HA) gene, complete cds

     3. To find the most conserved region in the hemagglutinin (HA) gene, the cDNA sequences of 4 influenza A H1N1 strains were aligned with ClustalW2

    (http://www.ebi.ac.uk/Tools/, and the HA sequences were selected by using the BioEdit Sequence Alignment Editor software.

    • The cDNA in the most conserved sequences of H1 gene with 300 bases were chosen to desing primer and probe.

    (5′ATATACATCCGATCACAATTGGAAAATGTCCAAAATATGTAAAAAGCAC
    AAAATTGAGACTGGCCACAGGATTGAGGAATGTCCCGTCTATTCAATCT
    AGAGGCCTATTTGGGGCCATTGCCGGTTTCATTGAAGGGGGGTGGACAG
    GGATGGTAGATGGATGGTACGGTTATCACCATCAAAATGAGCAGGGGTC
    AGGATATGCAGCCGACCTGAAGAGCACACAGAATGCCATTGACGAGATT
    ACTAACAAAGTAAATTCTGTTATTGAAAAGATGAATACACAGTTCACAG
    CAGTAG3′)

      

    To design realtime PCR primer and probe set, Primer3Plus program was used.

    4. Put the 300 bases from the conserved region of  H1 gene to “Paste source sequence below” .

    5.Set “ Product Size Range” as 150-250 100-300.

    6.In Menu “Internal Oligo”  left the values as program setting.

    •  The results of designed primes and probes using Primer3Plus were showed in the next 4 screenshots as follows;

    • Primer and probe set 1
    Primer and Probe Sequence 5’-3’ 5’ position 3’ position Amplicon size(bp)
    Forward Primer GCCACAGGATTGAGGAATGT 63 82 172
    Reverse Primer TGGCATTCTGTGTGCTCTTC 215 234
    Probe CAGGGATGGTAGATGGATGC 145 164 -
    •  The below windows show primer and probe set 2, 3, 4 and 5.

     

    • In addition, Realtime PCR primer and probe set was designed  from the most conserved sequences (selected 300 bp)

    using Real TimeDesign software from the website  http://www.biosearchtech.com.

    • The result of designed primer and probe was shown in the following windows.

    Primer and Probe Sequence 5’-3’ 5’ position 3’ position Amplicon size(bp)
    Forward Primer GGACAGGGATGGTAGATGGATG

    142

    163

    74

    Reverse Primer CAGGTCGGCTGCATATCCT

    215

    197

    Probe ACGGTTATCACCATCAAAATGAGCAGGG

    166

    196

    -

      

          

     

    Question 5.2.The conventional PCR and sequencing primer set which can be used to identify oseltamivir resistance associated NA gene mutations: N1: H274Y.

    1. Go to Http://www.sciencedirect.com

    Search for “Oseltamivir-resistant A (H1N1) influenza viruses”

    -Link to highlighted articles, we could have H1N1 those resistant to oseltamivir.

    • A 2008 Philippines H1N1 Influenza A virus (Accession no. FJ743468) showed to be resistant to oseltamivir (Tamiflu), and
    • A 2009 Mexico H1N1 Influenza A virus (Accession no. ACQ99625 ) showed to be susceptible to oseltamivir were chosen to be analyzed.

    Finding N1: H274Y mutation

    • Copy and paste the amino acid sequences of neuraminidase (NA) gene (in FASTA format) that confer resistance to oseltamivir (Accession no. FJ743468) and confer susceptible to oseltamivir (Accession no. ACQ99625 ) to the BioEdit program.

    •      The window shows amino acid changed from Histidine  ( H) to Tyrosine (Y) in position 274  of the  oseltamivir-resistant H1N1 strains.

    Finding N1: DNA mutation for the locus  H274Y

    • At the bases number 823-825 ; CAC ——>TAT 
    •  Histidine (H)—–>Tyrosine (Y)

     

                  Design Conventional PCR Primer Sets with Promer3Plus Program

    1.Copy and paste cDNA sequence of  oseltamivir-resistant H1N1 (1449 bases) into the query box of  ”Paste source sequence below” .

    • Pick left primer and right primer.

    -On the “General setting”

    • set PCR product size ranges : 501-600 601-700 701-850 851-1000 , whereas the other values were setted by program as default.

    • Click on “Pick Primers”. The primer sets will be showed as follows;

    • PCR Primer pair 1
    5’-3’ sequence Length (monomer) Tm(°C) Position at 5′ PCR product size (bp)
     Forward primer GGTCCAGACAATGGAGCTGT 20 60.1 609 565
    Reverse  primer:TGTCGGTATTTGTCCATCCA 20 59.8 1154

     

    -The below sreenshot shows primer  pair 2 and 3.

    • PCR Primer pair2
    5’-3’ sequence Length(monomer) Tm(°C) Position(5’) PCR product size (bp)
    Forward primer: TTGGCTCCAAAGGAGATGTT 20 59.7 343 585
    Reverse  primer: AAGGTCGATTTGAACCATGC 20 59.9 908

     

    • PCR Primer pair3
    5’-3’ sequence Length (monomer) Tm(°C) Position at 5′ PCR product size (bp)
    Forward primer: TGGCTCCAAAGGAGATGTTT 20 59.7 344 584
    Reverse  primer: AAGGTCGATTTGAACCATGC 20 59.9 908

     

    • Primer pair4
    5’-3’ sequence Length(monomer) Tm(°C) Position at 5′ PCR product size (bp)
    Forward primer: CCGGCAATTCATCTCTTTGT 20 60.1 277 594
    Reverse  primer: CTGGGTAACAGGAGCATTCC 20 59.5 851

     

    • PCR Primer pair 5
    5’-3’ sequence Length(monomer) Tm(°C) Position at 5′ PCR product size (bp)
    Forward primer: TTGGCTCCAAAGGAGATGTT 20 59.7 343 587
    Reverse  primer: CCAAGGTCGATTTGAACCAT 20 59.8 910

     

     

     

     

     

     

    Design Primers for Sequencing with Primer3Plus

    1. Input DNA sequences into the query box.

    2. Choose Task as “Sequencing”.

     3. Setting product size ranges: 501-600 601-700 701-850 851-1000.

    4. Click on “Pick Primers”.

    •    The following windows show the left  (forward) and right (reverse) primers, their binding positions on cDNA template, and sequencing primer sets.

    • One pair of sequencing primer which were the most likely to produce a longest sequence after sequencing was selected;

             -Left primer name 441_F   (5′  TGACCCAAGGCGCTCTATTA  3′) binding at base sequence no. 421-440.

            -Right primer name 1446_R (5′ GTC AATGGTGAACGGCAAC  3′) binding at base sequence no. 1409-1427. This primer pair gives an amplicon size of 1,007 bp.

    Posted by: sumrandee | มกราคม 10, 2010

    Structural bioinformatics practice

    Assignment #4

    The function of a protein being a direct consequence of its 3-D structure (shape), the logical link was established.

    Sequence >> Structure >> Function

    It is now a central concept of molecular biology devoted bioinformatics. As a consequence, an increasing proportion of the bioinformatics pie is now devoted to the development of tools to navigate between sequences and 3-D structures. (This specialized area is called structural bioinformatics.)

    Please use the following sequence of unknown (not shown ) to explain this concept.
    I. Finding Open Reading Fame

    1.Copy the DNA sequence of unknown in to Notepad.
    
    
    These sequences of unknown contain not only gene (coding sequences). 
    It also contains human promotor gene and poly A tail, and Cap gene.
    
    2. To find open reading frame,
    
    Go to http://www.ncbi.nlm.nih.gov/projects/gorf/
    
    3. Put the DNA sequence of unknow in to query box  and the click “orfFind”.
    
    
    4. We now get 6 open reading frames.
    
    
     
    
    5. Choose the longest open reading frame (Frame +1) which would be the
    correct frame. The window shows the Frame +1 with DNA sequence starting
    with ATG (Methionine). The sequence starts form 142-5733
    (a total of 5592 bases) with a length of amino acid 1,863 amino acid.
    
    
    6. To check the frame for the correct frame.
    We want to know our unknown protein sequence of interest is new and not
    yet in Entrez,using blastp to compare the sequence against the pdb database.
    
    • Use Blastp (Search protein database using a protein query).     -Click “BLAST”.
    •  Then click “View report”.

     

    6.2 The frame+1 significantly matches (Alignment score, E value = 0.0)
    with ref|NP_009225.1|  breast cancer 1, early onset isoform 1 [Homo sapiens].

     

    • Identical proteins for accession no.  NP_00922.1 were showed.

    7. To get the sequences for Open Reading Frame (ORF)  for selected frame +1.

    • Delete the sequences before ATG (sequence 1-142), and sequence 5,734-7,108 which was not coding sequences using Notepad.

    8. Save the edited sequence as “unknown-edited sequence”.

    • It is the Open Reading Frame (ORF) which has the sequence length 5,592 bases.

    II. Protein Translation

    9. Translate unknown-edited sequence to amino acid sequences using Translate tool from  http://www.expasy.org/tools/dna.html.

    9.1 Amino acid sequences (1,863 sequences).

     

     
    II. Predicting Post-translational modification (PTM) from protein sequences.

    10. As we known (by using blastp) that 1,863 amino acid sequences of the unknown sequences

    were identity to human breast cancer 1 gene (BRCA1). Most glycosylations were assumed to be occurred in human.

    • How to predict glycosylation were showed.

    10.1   Asn-Xaa-Ser/Thr sequons in the sequence output below are highlighted in blue.

                 Asparagines predicted to be N-glycosylated are highlighted in red.

                                               

                               Finding subcellular localization of protein

    • It is plasma membrane protein.

     
    Predicting the presence and location of signal peptide cleavage sites in amino acid sequences

    12.1 The result of program analysis

    13. Prediction of transmembrane helices in proteins

     

    13.1 The result of program analysis

     

    Conclusion for the prediction of post-translational modification.

    14 .Prediction of protein secondary structure using Markov chains  in PSSFinder program.

    http://linux1.softberry.com/berry.phtml?topic=pps&group=programs&subgroup=propt

    15. CPHmodels 3.0 is a protein homology modeling server. The template recognition is based on profile-profile alignment guided by secondary structure and exposure prediction.

    • The result of program analysis.

     

    Finding protein domains  comparing with references protein in database

    • Go to website http:\\swissmodel.expasy.org
    •  Put protein sequences of interest in query box.

    The result of program analysis will appear like below windows.

     

    •  The screenshot show the protein sequences with the significant alignments and domains.

                   For example, BRCT domains and Zinc finger, RING - type domains

     

     

    Searching for protein similarity  of unknown protein with protein data bank  (PDB) database

    • To find conserved domains  along protein chain and structures.

    1. Go to http://www.ncbi.nlm.nih.gov/Structure/cblast/cblast.cgi?

    Algorithm used: blastp

    • Enter query sequence  in qery box.

    • Significant alignments were produced.

    • Then click on the first blast hit with high alignment score and low E value:  pdb|1JNX|X  Chain X,

               Crystal Structure Of The Brct Repeat Regi…  468    1e-131 Related structures.        (S stands for stucture of protein).

                              -Chain X, Crystal Structure Of The Brct Repeat Region From The Breast Cancer Associated Protein, Brca1

    Description: Structure Of The Brct Repeats Of Brca1 Bound To A Ctip Phosphopeptide.
    Taxonomy:
    Chain A: Homo sapiens
    • This window shows the 3D structure of The Brct Repeat Region From The Breast Cancer Associated Protein, Brca1.

    • Putative conserved domains have also been detected.

    •  List of conserved domains.

     

    • Details of some  conserved domain including structure and functions from local query sequence was showed in the below window.

                    For example,  cd00162, RING, RING-finger (Really Interesting New Gene) domain a specialized type of Zn-finger of 40 to 60 residues that binds two atoms of zinc; defined by the ‘cross-brace’ motif C-X2-C-X(9-39)-C-X(1-3)- H-X(2-3)-(N/C/H)-X2-C-X(4-48)C-X2-C; probably involved in mediating protein-protein interactions; identified in a proteins with a wide range of functions such as viral replication, signal transduction, and development; has two variants, the C3HC4-type and a C3H2C3-type (RING-H2 finger), which have different cysteine/histidine pattern; a subset of RINGs are associated with B-Boxes (C-X2-H-X7-C-X7-C-X2-C-H-X2-H).

    • 3D view of  structure of RING-finger using Cn3D 3-D Structure Viewer software.

     

    Posted by: sumrandee | ธันวาคม 28, 2009

    Phylogenetic tree reconstruction

    Assignment 3

     
    Who are the ancestors of the dinosaurs?
    Science 1994 Nov 18;266(5188):1229-1232
    DNA was extracted from 80-million-year-old bone fragments found in strata of the Upper Cretaceous Blackhawk Formation in the roof of an underground coal mine in eastern Utah. This DNA was used as the template in a polymerase chain reaction that amplified and sequenced a portion of the gene encoding mitochondrial cytochrome b. These sequences differ from all other cytochrome b sequences investigated, including those in the GenBank and European Molecular Biology Laboratory databases.

    DNA isolated from these bone fragments and the resulting gene sequences demonstrate that small fragments of DNA may survive in bone for millions of years. The authors conclude that the DNA sequence,

    cccttctattattcattctcattctattcgttattcttgtactccacacatccaaacaac
    aaagcataatattccacccattgagtccattcctatcctgattcttagtccccgaacctt
    ttacactcacatg

    ,appears to be from a dinosaur that lived 80 million years ago.

    Show us step by step of how to do phylogenetic analysis with cytochrome b sequences. Then, what is your conclusion about the structure of the tree and the position of the dinosaur sequence that might come close to these following species? Use these following species;
    o Human
    o Dog
    o Rabbit
    o rhinoceros
    o dugong
    o mouse
    o whale
    o bovine
    o sicklebill
    o chicken
    o magpie
    o frog

    Step 1. State the Hypothesis.

               Ho: The nucleotide  sequences from the 80-million-year-old bone fragments were more closer to the avians than the frog and mammals   sequences.

               H1: The nucleotide  sequences from the 80-million-year-old bone fragments were not closer to the avians than the frog and mammals sequences.

    Step 2 . Retrieve the sequence of cytochrome b gene (CYTB) from NCBI DNA databases

    1. Go to NCBI website    http://www.ncbi.nlm.nih.gov/.

    • In the following steps, I only show step by step how to retrieve CYTB sequence of human. For the rest of other species, I will summarize the outcome of retrieved DNA  and show it in the table 1.

     

    • Search NCBI  nucleotide using  ”cytochrome b AND Homo sapiens”.

     

    • click ” CYTB”.

    • The screenshot will appear.

    • Then choose “reference sequence details” and click it.
    • We will find ”NCBI Reference Sequences (RefSeq)”. The following sections contain reference sequences that belong to a specific genome build.
    • Click on YP_003024038.1 cytochrome b [Homo sapiens].

    • Then we get this screenshot.

    • To find reference sequence for CYTB, Click on “DBSOURCE    REFSEQ: accession NC_012920.1” the screenshot will show;

     

    • Find the name of  cytochrome b gene.

                     : /gene=”CYTB” and 

                    :CDS  (coding sequence)  and the position of  base sequence.

    • Put the CDS (Coding Sequence): 14747..15887  in the Change Region Shown
    • Selected Region from begin to end  in which 14747 is the begin value and 15887  is the end value.

    • Then click “UPdate View”.

     

    • We will get NCBI Reference Sequence: NC_012920. REGION: 14747..15887  for CYTB

    • To obtain FASTA format of CYTB, click  FASTA format on th top menu of the page.

    • The FASTA format of CYTB will be obtained

    •  Copy CYTB sequence (FASTA format) into Notepad.

    • To retrieve the sequence for cytochrome b (CYTB) for the rest of interested animal species from the NCBI database, the processes to obtain CYTB sequences are the same as human.

     

    • The output of retrieved CYTB are as follows; (Table 1.)

    • FASTA format for cytochome b sequence of 13 species are put together in Notepad in one file “ALL species-FASTA.txt ”

      

      

    Step 3 .  DNA Sequence  Alignment

     

    1. Download BioEdit  program from website http://www.mbio.ncsu.edu/BioEdit/bioedit.html

     

    2. Install BioEdit  program in the Desktop of computer C:\Documents and Settings\cha\Desktop.

    3. Open BioEdit  program

    • Welcomimg’ s page of BioEdit

    4.  Open file “All species-FASTA.text” (Cytochome b sequence of  all 13 species in FASTA format).

                      File—–> Open

    5. The screenshot will appear.

    5. Multiple sequence alignment using ClustalW multiple alignment.

         Accesory Application—–>ClustalW multiple alignment

    6. Click “ClustalW multiple alignment”.

    -Set ClustalW alignment option as:

                     : Full Multiple Alignment

                     : Bootstrap NJ Tree  Number of bootstraps: 1000

    7. Click “Run ClustalW”.

    8.  To analyse the phylogenetic relationship,  Parsimony method (character-based method) was used.

    9.Click on “Run Application’.

    10. One most Parsimonous tree was produced using  DNA parsimony algorithm.

    11. One most Parsimonous tree  “outtree” was produced automatically.

    To display phylogenetic relationship, TreeView programm was downloaded from  http://taxonomy.zoology.gla.ac.uk/rod/treeview.html

    • Open TreeView program then open  ”outtree” file.

  • We can switch type of tree either cladogram or phylogram.
  • The below screenshot shows phylogram (informative branch lengths) representing the relationship between DNA sequences of cytochrome b of 13 species using Parsimony.
    • Dinosuar had a longest branch implying the highest number of substitution of cytb gene.

    12. Define outgroup.

    • On the  Tree  menu choose  “Define outgroup”.

                                      Tree —->Define outgroup

    • To root the tree with an outgroup

                     -On the Tree menu, click “Root with outgroup”.

    13. Resulting Phylogenetic tree after setting L. bannaensis (frog) as an outgoup and root the tree with an outgroup.

                    Furthermore, I have used Clustalx to align multiple sequences, and Phylip program to reconstruct phylogetic relationship which analysed with parsimony method (dnapars). TreeView  program was used to display the topology of tree. It was found that the resulting trees which produced by diffferent alignment programs (BioEdit vs. Clustalx) and phylogenetic analysis programs (BioEdit vs. Phylip) but used the same  method of analysis(DNA parsimony) producing the same topology of tree.

    Conclusion:

          Based on the parsimony method (a character-based method) in phylogenetic anaysis, the nucleotide  sequences from the 80-million-year-old bone fragments (Dinosuar) were closer to the avian sequences (birds and chicken),  as compared with  frog and mammals  sequences.  Magpie (Cissa chinensis) was the closest relative to dinosaur and formed a clade with birds and chicken, whereas all mammals formed another clade.  It is noticed that the limited regions of the gene encoding cytochrome b of the 80-million-year-old bone fragments (133 bp ) that was too short for use in phylogenetic analysis.

    Posted by: sumrandee | พฤศจิกายน 29, 2009

    Assignment 2: Haplotype analysis with Haploview

    Haplotype analysis using Haploview

    I. The Java Runtime Environment (JRE) v1.4 or later was required to work with the haploview program.

    1. Downloaded Java Runtime Environment (JRE)  and installed the program at;

                http://java.sun.com

    II. Haploview Downloads

    1. Search Haploview using Google search.

    2. The Haploview’ s webpage

    3. Choose to download the Haploview Windows installer (hapinstall.exe) from

    HapInstall.exe

     

      

    4. Install Haploview by double-clicking the installer file. The installer will create a Haploview folder in  Start Menu.

         -To run the program, click on “Haploview.jar” file in that folder.

    5. Haploview’s welcoming page

     

    Question 1. What is the name of haploview format to use in this analysis?

    The name of  assigned  haploview format used in the haplotype analysis is “HapMap Project data dumps format”. This file format has several header lines beginning with “#”.

    6. To input the file from the assignment to Haploview for ana lysis.

         6.1 Copy SNP data from the asssigment.

         6.2 Paste it to MS word.

        6.3 Save file as ‘plain text file’. (file name; SNP data.txt)

         6.4 Plain text file.

    7. Open the Haploview program.   

    8. Choose HapMap format to browse saved file ‘SNP data.txt’.

        Then click’ OK’.

    9. Set the HW p-value at 0.05. Then click at ‘Rescore Markers’.

        -The screenshot  will appear;

    Question2. Please show us the marker and individual quality control of the genotype data use in the analysis.

       From the screenshot above, after loading a file, Haploview shows  basic data quality checks for the markers.

    The description of  terms use as follow;

  • # is the marker number.
  • Name is the marker ID specified (only if an info file is loaded).
  • Position is the marker position specified (only if an info file is loaded).
  • ObsHET is the marker’s observed heterozygosity.
  • PredHET is the marker’s predicted heterozygosity (i.e. 2*MAF*(1-MAF)).
  • HWpval is the Hardy-Weinberg equilibrium p value, which is the probability that its deviation from H-W equilibrium could be explained by chance.
  • %Geno is the percentage of non-missing genotypes for this marker.
  • FamTrio is the number of fully genotyped family trios for this marker (0 for datasets with unrelated individuals).
  • MendErr is the number of observed Mendelian inheritance errors (0 for datasets with unrelated individuals).
  • MAF is the minor allele frequency (using founders only) for this marker.
  • Alleles are the major and minor alleles for this marker.
  • Rating is checked if the marker passes all the tests and unchecked if it fails one or more tests (highlighted in red).
  • 10. Click at LD Plot on the Menu bar to show LD map.

    Question 3. Please show us the LD map then explain what do you get from the LD map?

    • Haploview  calculates several pairwise measures of LD, which it uses to create a graphical representation as shows in above screenshot.
    • Halpoview allows a number of different color schemes to represent The LD relationship.
    • It  generates haplotypes and their population frequencies. The LD display shows lines to indicate transition from one block to the next with frequencies corresponding to the thickness of the lines.
    • The LD display presents Hedridge’s multialleic D, which represent the degree of LD between 2 blocks, treating each haplotype within ablock as an allele of that region.

    This LD maps above show color scheme in the mode of  ’Standard D’/LOD’ .

    When;

  • D' is the value of D prime between the two loci.
  • LOD is the log of the likelihood odds ratio, a measure of confidence in the value of D'.
  • Question 4. How many haplotype blocks in this region of Chromosome X, then explain how to interprete them?

    There are 3 haplotype blocks in this region of Chromosome X and the values to present the relationship between each locus or marker of each blocks was shown in the white box.

    • The two most common pairwise measures of LD is D‘ and r2.
    • D‘ is defined to be 1 in the absence of obligate recombination, declining only due to recombination or recurrent mutation.
    • r2 is  the squared correlation coefficient between the two SNPs. Thus, r2 is 1 when two SNPs arose on the same branch of the genealogy and remain undisrupted by recombination, but has a value less than 1 when SNPs arose on different branches, or if an initially strong correlation has been disrupted by crossing over.

     

    • Block 1 comprises marker number 8, rs908005 and marker  no. 9, rs979484.

    •  Block 2 comprises marker number 13-17.

      For instance, this figure in the white box only shows the correlation between marker 13 and 17 of haplotype block.

    •   Block 3 comprises marker number 24-29.

    For instance, this figure in the white box only shows the correlation between marker 24 and 27 of the haplotype block.

    When;

  • D' is the value of D prime between the two loci.
  • LOD is the log of the likelihood odds ratio, a measure of confidence in the value of D'.
  • r2 is the correlation coefficient between the two loci.
  •  

    Question 5. Could you find out the tagging SNP in each haplotype block, then explain what the tagging SNPs?

    A tag SNP is a representative single nucleotide polymorphism (SNP) in a region of the genome with high  linkage disequilibrium (LD).

    •   To find out the tagging SNP in each haplotype block

    -At the Display Menu, choose  ’ Show tags in blocks’.

    There are 3 haplotype blocks and each block consisting 2 tagging SNP;

    •       Block 1  comprises 2 tagging SNP i.e., marker number 8 and 9.

                         -The  frequency of GA was 33.3%.

                        -The  frequency of AT was 64.4%.

                       -The  frequency of GTwas 2.2%.

    •       Block 2  comprises 2 tagging SNP i.e., marker number 13 and 15.

                       -The frequency of TT was 48.9%.

                        -The  frequency of GT was 27.8%.

                       -The frequency of TGwas 22.3%.

    •      Block 3  comprises 2 tagging SNP i.e., marker number 24 and 27.

                        -The frequency of CA was 73.3%.

                        -The  frequency of GG was 25.6%.

                       -The frequency of CGwas 1.1%.

    Posted by: sumrandee | พฤศจิกายน 25, 2009

    Assignment 1:Taverna Workflow

    BUILDING THE TAVERNA WORKFLOW 

     

     

    The objective  of the exercise;

     

              -To align the DNA sequences in Fasta format from DDBJ (DNA Data Bank of Japan).

     

     

     

    Building a workflow from the following diagram;

     

    1. Open the DDBJ

     

    2.Put DNA sequences in FASTA format in the search box  

     

    3. BLAST (sequence alignment and similarity searching) 

     

    4.Blast_Report  (Percent similarities and identity)  

     

     

    The  modification of Taverna workflow was adapted from this figure below;

     

      

     

     

     

     

    Installing the Taverna Workbench 

     

     

    1. Download Taverna from http://taverna.sourceforge.net 

     

     

     

     

    2.Download  a modern Java Runtime Environment (JRE) from http://java.sun.com

     

     

     

     

    ►  Once Taverna has loaded, you will see 3 windows:

    • Advanced Model Explorer
    • Workflow Diagram
    • Available Services

     

      

     

     

     3. Installing Plugins

     

    -Go to the ‘Tools’ menu at the top of the workbench

     

    -Select the ‘Plugin manager’¨ Select find new plugins ¨

     

    -Tick the boxes  for Feta and LogBook and install these plugins. Two more options ‘Discover’ and ‘LogBook’ will now have appeared at the top of the Taverna workbench alongside ‘Design’ and ‘Results’ 

     

     

     

    4. Adding new services which were not designed for use in Taverna,

     

     -New services can be used in Taverna if WSDL file was supplied.

     

    -Go to the DDBJ list of available web services at: http://xml.nig.ac.jp/index.html

     

     

     

     

    -Click on the DDBJ blast service (http://xml.nig.ac.jp/wsdl/Blast.wsdl) and copy the web page address.

     

     

     

     

     

    - Go to the ‘Available services’ panel and right-click on ‘Available Processors’ (at the top of the list).

    - Select ‘Add new WSDL scavenger’.

     

     

     

     

     

     

     

    - Enter the Blast Web service address.

     

      

     

     

    - Scroll down to the bottom of the ‘Available Services’ panel and look at the new DDBJ service that is now included. 

     

      

     

     

    5.  Adding Processor INPUT 

     

    5.1 Import the ‘searchSimple’ service from the DDBJ service into a n ew workflow model. SearchSimple is a processor used to execute nucleotide BLAST for DNA query vs. DNA database. 

     

     

     -Right-click on ‘searchSimple’ and import it into the workbench by selecting ‘Add to Model’ 

     

     

      

     

     

     -Go to the AME and expand the [+] next to the newly imported ‘simpleSearch’ service. You will see: ¤ 3 input (Green arrow pointing up) and 2 output (purple arrow pointing down)  

     

      6. Adding Workflow Input

     

     

     -Right-clicking on ‘Workflow Input’ and selecting  ‘create new Input’ . 

     

     
     
     
    -Type a name ‘Fasta sequence’
     
     
     
    Click ‘OK’.
     
     
     
    - For the other two workflow inputs ‘Database’ and ‘Program’  Do the same steps  as ‘FASTA_Sequence’.
    We will get,
     
     
    -Connect the input  ‘FASTA_Sequence’  to the ‘searchSimple’ service by right-clicking on ‘FASTA_Sequence’  and connecting  to ‘search Simple’ by choosing an Input as query.
     
     
     
     
     

     

    -Connect the input  ‘Database’  to the ‘searchSimple’ service by right-clicking on ‘ ‘Database’  and connecting  to ‘search Simple’  by choosing an Input as database.

     

     

     

     

     

     

     

     

     -Connect the input  ‘Program’  to the ‘searchSimple’ service by right-clicking on ‘Program’  and connecting  to ‘search Simple’  by choosing an Input as program.

     

     

      

     

     

    7. Create  a Workflow output.

     

    -Click on the Workflow output. Create New Output  as  ‘blast_report’. 

     

      

     

     

    -Click right at ‘blast_report ‘ and connect to search ‘search Simple’.

     

     

     

     

     

     

     

    8. Run the workflow by selecting ‘run workflow’ from the ‘File’ menu at the  top of the workbench.

     

     

     

     

     

     9. To align the DNA sequences in Fasta format of an unknown Bacteria species, Ehrlichia  sp. with the Database DDBJ (DNA Data Bank of Japan). The DNA sequence of 16S rRNA gene of  Ehrlichia sp. HF565 was retrieved via Acceesion Number AB275138.
     
    9.1 Dowdload Fasta format of Ehrlichia sp. HF565 via accession number AB275138.
     
     - Sequence Fasta format.
    >AB275138|Ehrlichia sp. HF565 gene for 16S rRNA, partial sequence, isolate:
    tacggtccagactcctacgggaggcagcagtggggaatattggacaatgggcgaaagcct
    gatccagctatgccgcgtgagtgaagaaggccttcgggttgtaaagctctttcaataggg
    aagataatgacggtacctatagaagaagtcccggcaaactccgtgccagcagccgcggta
    atacggagggggcaagcgttgttcggaattattgggcgtaaagggcacgtaggtggacta
    gtaagttaaaagtgaaataccaaagcttaactttggagcggcttttaatactgctagact
    agaggtcgaaagaggatagcggaattcctagtgtagaggtgaaattcgtagatattagga
    ggaacaccagtggcgaaagcggctatctggttcgatactgacactgaggtgcgaaagcgt
    ggggagcaaacaggattagataccctggtagtccacgctgtaaacgatgagtgctaaatg
    tgaggattttatctttgtattgtagctaacgcgttaagcactccgcctggggactacggt
    cgcaagactaaaactcaaaggaattgacggggacccgcacaagcggtggagcatgtggtt
    taattcgatgcaacgcgaaaaaccttaccactttttgacatgaaggtcgtatccccctaa
    cagggggagtcagtccggctggaccttacacaggtgctgcatggctgtcgtcagctcgtg
    tcgtgagatgttgggttaagtcccgcaacgagcgcaaccctcatccttagttaccaacag
    gtaatgctgggcactctaaggaaactgccagtgataaactggaggaaggtggggatgatg
    tcaagtcagcacggcccttataagg
     
     9.2 Input fasta sequenece of Ehrlichia sp. HF565  by clicking  New Input  ‘then copy the fasta sequence in to the blank.
     
     
     9.3 Input the ‘Database’ as ‘ddbjbct’  by clicking New Input then type ‘ddbjbct’ .
     
    (ddbjbct – DDBJ Bacteria division is reference sequence database for FASTA).
     
    9.4 Input the ‘Program as ‘blastn’ by clicking New Input then type ‘blastn’ .
     
     
     
    9.5  Click on Run work flow
     
    9.6 Result of BLAST (blast_report)
     
     
    9.7  Save  the RESULT as ‘blast-report Ehrlichia Hf5652.xml’.  

     
     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

    Posted by: sumrandee | สิงหาคม 20, 2009

    Link videos

     www.dld.go.th

    ข่าวจากหนังสือพิมพ์แนวหน้า วันที่ 11 พฤษภาคม 2551

     21-8-2552 22-05-25

    21-8-2552 22-36-18

    21-8-2552 22-28-43

    Posted by: sumrandee | สิงหาคม 19, 2009

    เห็บอ่อนเก็บจากนกปากห่าง

    เห็บนกเห็บอ่อนเก็บจากจากที่อาศัยนกปากห่างในวัดไผ่ล้อม จังหวัดปทุมธานี

    Posted by: sumrandee | สิงหาคม 19, 2009

    เห็บแข็งเก็บจากงู

    เห็บงูสิงเห็บแข็งกินเลือดงูสิงจนอิ่ม จนมีขนาดใหญ่เกือบเท่ากับนิ้วก้อย

    ธรรมชาติของเห็บงูจะไม่กัดคน เพราะคนไม่ใช่สัตว์ป่า เห็บงูสามารถเป็นพาหะนำเชื้อโปรโตซัวหรือพยาธิในเม็ดเลือดไปถ่ายทอดให้กับงูตัวอื่น โดยงูที่กินงูเป็นอาหารอาทิเช่น งูจงอางสามารถได้รับเชื้อจากการกินงูสิงที่มีเห็บเกาะและมีเชื้อพยาธิอยู่ข้างใน

    Older Posts »

    หมวดหมู่

    Follow

    Get every new post delivered to your Inbox.