OSfinder: A Tool for Accurate Orthology Mapping

Make Anchor Files

Two approaches are commonly used for detecting anchors. The first approach detects homologous gene pairs between pairwise genomes, and considers the homologous gene pairs as anchors. The second approach detects short genomic sequences conserved among evolutionary related genomes, and considers the homologous sequences as anchors.

The OSfinder program requires a file that includes the genomic locations of the anchors to be input. You can choose the way to create the anchor files from the following list.

Detect Homologous Gene Pairs

The OSfinder distribution includes helpful Perl scripts for detecting homologous gene pairs by utilizing the BLASTP program, and for generating the input files of OSfinder. For detailed descriptions, please see the following pages.


From GenBank Data -- This page explains the way to automatically download genome sequence files from the NCBI GenBank database, and to parse the genome sequence files in the GBK format.

From Ensembl Data -- This page explains the way to automatically download protein sequence files from the Ensembl genome browser, and to parse the protein sequence files downloaded.

Execute BLASTP -- This page explains the way to execute the all-against-all comparison of the protein sequences encoded in two genomes, to parse the files output by the BLASTP program, and to generate the input files of OSfinder.

Detect Homologous Sequences

Murasaki detects short homologous sequences conserved among multiple genomes. The output of Murasaki includes ".anchors" file which includes the genomic locations of the anchors. The Murasaki software includes a Perl script that converts the ".anchors" file into the input file of OSfinder. This approach can be applied to the detection of the anchors among multiple genomes as well as the detection of the anchors between pairwise genomes. You can download the latest version of the Murasaki software from here.

Detect Anchors Based On Your Own Strategy

If you want to detect anchors based on your own strategy, the anchor files made by you must have the appropriate format in order to be accepted by the OSfinder program. The OSfinder program can accept all anchor files based on the format described in the page "File Format".

Download Mammalian Data

The computational cost for calculating anchors between mammalian genomes may inhibit your further analyses. Thus we pre-computed anchor files between mammalian genomes, and those files can be downloaded from the page "Mammalian Data".

osfinder banner murasaki banner PHMMTS banner PSTAG banner