Make Input Files From Ensembl Data
Step 0: Make Sure Requirements.
Our Perl scripts for generating the input files of the OSfinder program
require the following packages.
the BioPerl module (1.5.2 or later version)
the BLASTP program
Step 1: Get a list of organisms.
First of all, type as follows in order
to get a list of the organisms
whose protein sequence files can be downloaded
from the Ensembl genome browser.
% cd osfinder_v*_*/
% ./scripts/get_organism_list_from_ensembl.pl -v 52
Then, a list of the organisms will be displayed as follows.
------ ------------------------------------------ ID organism name ------ ------------------------------------------ 1 aedes_aegypti 2 anopheles_gambiae 3 bos_taurus 4 caenorhabditis_elegans ...
Note that the "-v" option specifies the release version of the Ensembl genome browser.
Step 2: Download protein sequence files.
To automatically download protein sequence files
from the Ensembl genome browser,
type as follows.
% mkdir ensembl_seqs
% ./scripts/download_from_ensembl.pl -v 52 -n aedes_aegypti -o ensembl_seqs/
Then, a new directory "ensembl_seqs/aedes_aegypti.v52/" will be created, and a MFA-formatted file named "ensembl_mfa" will be downloaded in the directory.
Step 3: Parse MFA-formatted files downloaded from Ensembl.
To parse the MFA-formatted protein sequence files
downloaded from the Ensembl genome browser,
type as follows.
% ./scripts/parse_ensembl_mfa.pl -i ensembl_seqs/aedes_aegypti.v52/
Then, three files will be created in the "ensembl_seqs/aedes_aegypti.v52/" directory. The first file, which will be named "all_proteins.mfa", is a MFA-formatted file that contains all protein sequences encoded in Aedes aegypti. The second file, which will be named "all_proteins.pos", is a file that contains the genomic locations of all protein-coding genes encoded in Aedes aegypti. The third file, which will be named "chrom_map", is a file that contains a map from chromosome IDs (integer) to chromosome names (string).
Subsequent Steps...
Thank to the steps up to now,
you are ready to execute the BLASTP program.
A description about the subsequent steps
can be found in this page ("Execute BLASTP").