Make Input Files by Executing the BLASTP program
Here, we assume that you have performed Step 1~3, either of the two ways described in the page "From GenBank Data" or the page "From Ensembl Data". In other words, we assume that three files named "all_proteins.mfa", "all_proteins.pos", and "chrom_map" have been created for each genome under your study.
Step 4: Execute the BLASTP program.
In order to perform an all-against-all comparison of the protein sequences encoded in two genomes A and B, type as below. Here, we assume that the genome A is Candida albicans, and the genome B is Candida glabrata strain CBS138.
% cd osfinder_v*_*/
% mkdir blast_results
% ./scripts/exec_blastp.pl -i ncbi_seqs/Candida_albicans/ -d ncbi_seqs/Candida_glabrata_CBS138/ -o blast_results/
Then, the all-against-all comparison will be executed by utilizing the BLASTP program, and a file named "Candida_albicans.vs.Candida_glabrata_CBS138.blastp" will be created in the "blast_results/" directory. Note that the calculation will spend a considerable amount of time. For example, in our environment (2.2GHz Intel Core 2 Duo), the computational time needed to compare two mammalian genomes (Homo sapiens and Mus musculus) was almost one day.
Step 5: Parse files output by the BLASTP program.
To parse the files output by the BLASTP program, and to generate the input files of the OSfinder program, type as follows.
% mkdir anchor_files
% ./scripts/parse_blast_result.pl -i ncbi_seqs/Candida_albicans/ -d ncbi_seqs/Candida_glabrata_CBS138/ -b blast_results/Candida_albicans.vs.Candida_glabrata_CBS138.blastp -o anchor_files/cal-cgl.blastp.anchor.txt
Then, an anchor file named "cal-cgl.blastp.anchor.txt" will be created in the directory "anchor_files/". The anchor file can be used as an input of the OSfinder program.
Thank to the steps up to now, you are ready to execute OSfinder. This page ("Getting Started") explains the way to execute OSfinder.