Learning Exercise

NCBI BLAST Search - Identifying an Unknown Bacterium

Students are given a sequence representing a segment of the 16S rRNA gene and must determine which organism it comes from using the BLAST database.
Course: General Microbiology or General Biology
Share

This site compares a DNA or protein sequence to all of the sequences in Genbank and produces a list of the best matches. see more

Exercise

NCBI Search

Connect to the NCBI website at www.ncbi.nlm.nih.gov/BLAST
At this point you should be at the BLAST site. BLAST allows you to enter DNA
sequence information and search the database for matches.

Select BASIC BLAST search located in the middle of the page.

In the data entry area, pick the program Blastn, which stands for Blast
nucleotide. (The other programs allow for protein sequences to be entered,
comparisons between protein and nucleotide sequences, and comparisons of two
nucleotide sequences.)
Why might you search with a protein sequence instead of a nucleotide sequence?

The database selected for comparison is also important and this program offers
several options, such as human, mouse, and E. coli. Using a more specific
databases allows for faster searches and more relevant information on your
sequence. Select the nr database, which stands for nonredundant, meaning that
your sequence information will be compared against all available databases.

The format commonly used to display sequence similarities is the FASTA format
and this should be left as the default format. This format allows for a
description to be inserted prior to the sequence, which does not interfere with
the sequence matching. Open your file in word and in the line above the sequence
type in the "greater than" symbol (>) followed by a name for your sequence.
Just below this is the data entry box and you can copy and paste your sequence
from the disk into this area. It is helpful to keep Microsoft Word open to
retrieve your assigned sequence. Do not worry about deleting the numbers within
your sequence, since Blastn ignores them. If there are any backslashes after
your sequence when pasted into the data entry box be sure to delete them before
submitting your data. Press submit query and the program will do the rest.
Almost immediately a new page will open displaying a request ID number for your
query. Click the Format results button and a new window will open with your
alignment results. Don't be surprised if it takes a few minutes to process the
sequence.

RESULTS

Scroll down to a figure composed of the possible matches, each represented as an
individual line.The color-coded key allows you to see how well each sequence
matches your own.. This output format allows you to select the match visually
and clicking on the line will jump you to the individual sequence alignment.

Scroll further to the actual list of results. These are ranked from the most
likely match to the least likely match. The first score represents the bit score
and is determined by the number of sequence runs that are identical to your
sequence. Be cautious about this since short sequences may give inconclusive
results. That is why it is important to check the E value. Short sequences may
result in many matches only because they are short and unspecific. If the E
value is high, this suggests a low probability of the match being accurate.

Click on the first match to see the actual alignment of your sequence and its
match. It will give you the description of the match as well as the percentage
of positive nucleotide matches. Copy and paste this entry only in to Word and
print it out from there. Click on the score from the last match on the list and
describe how it differs from the best choice.

Assume that the first match from the list is the best. Click on the blue link to
the left of the first match. This page page shows the Gene bank record for this
sequence and lists information about who submitted the sequence, data on the
sequence itself and links to more information. Print out this page.

On the same page, click on the link for the organism and this will take you to
the taxonomy and lineage of your organism. Print out this data.

Return to the previous page that contains information about the match. Click on
the Medline link and it will display an abstract from the article where the
sequence is referenced. This page also give you the option of viewing related
articles. Print the Medline abstract.

Now that you are familiar with searching in the NCBI database, go back to your
sequence and alter it by changing nucleotides or deleting portions of it and
performing a search on the altered sequence. Describe these changes and how they
affect the search results. Print page one of the match results for the altered
sequence.

TURN IN TH E FOLLOWING:

The sequence alignment for the best score.
The Gene Bank record page with information on who submitted the
sequence.
The taxonomy page.
Medline abstract.
Match results for the altered sequence with a description of the
changes and their affects.

Audience

Technical Notes

The sequences can be downloaded onto disks and provided for the students. They can represent any organisms of interest to the instructor. It is recommended that if the students have identified unknowns by traditional phenetic methods that this unknown represent sequences from their lab unknown and this exercise be scheduled just after they have turned their unknown results in.

Requirements

Some knowledge of computers
Understanding of 16S rRNA and its use in phylogeny

Topics

Phylogeny
Bacterial Diversity
Identification of an unknown bacterium

Learning Objectives

To explain the rationale for genetic identification using 16srRNA gene sequences

To identify bacteria based on genetic sequences using computer databases