DIOPT Documentation




The identification of orthologs is commonly used for bioinformatics activities such as data mining and establishing models for human diseases. Moreover, our group notes that researchers analyzing the results of screens performed at the Drosophila RNAi Screening Center (DRSC) frequently wish to identify mammalian orthologs of the fly genes that were "hits" (positive results) in their screens.

In helping DRSC screeners to identify orthologs using existing tools and algorithms, we recognized a need for a user-friendly approach to viewing and comparing ortholog predictions obtained using different tools and algorithms. This was our motivation in developing DIOPT. To facilitate identification of orthologs specifically of human disease-associated genes, we further developed DIOPT-DIST. Information about our approaches to development of both tools is summarized below.

The DIOPT Approach

Many tools have emerged to meet the need to identify orthologs. However, low coverage and heterogeneity of these tools present an obstacle to scientists who want to identify a one or a few highest-confidence orthologs for a given gene of interest or conversely, want to cast a wide net and follow up on all possible orthologs of a gene.

Our goal is to provide an easy-to-use resource that facilitates summary, comparison and access to various sources of ortholog predictions. DIOPT integrates human, mouse, fly, worm, zebrafish and yeast ortholog predictions made by Ensembl Compara, HomoloGene, Inparanoid, Isobase, OMA, orthoMCL, Phylome, RoundUp, and TreeFam. DIOPT lets users find ortholog pairs for a specified gene or genes identified by one, many or all of these published approaches. This provides a streamlined method for integration, comparison and access to orthology predictions originating from algorithms based on sequence homology, phylogenetic trees, and functional similarity. DIOPT calculates a simple score indicating the number of tools that support a given orthologous gene-pair relationship, as well as a weighted score based on functional assessment using high quality GO molecular function annotation of all fly-human orthologous pairs predicted by each tool. Differences in the algorithms used by each tool to predict orthologous relationship is one source of difference in the set of predictions made by one tool versus another. However, we also note that some of these differences might be attributable to use of different genome annotation releases used by some tools versus others, and that not all tools cover all of the species that we include in the DIOPT tool (see Tables 1,2 and 3).

DIOPT also displays protein and domain alignments, including percent amino acid identity, for predicted ortholog pairs. These should help you to identify the most appropriate matches among multiple possible orthologs.

The following summary figures and tables help to explain our approach and summarize the tools and algorithms included in DIOPT.

DIOPT integration schemaFigure 1: Summary of the DIOPT approach to integration of results from multiple ortholog prediction tools and algorithms. In green, tools based on sequence alignment. In purple, tools based on evolutionary relationships. In orange, a tool that incorporates protein-protein interaction network data into ortholog predictions.

Table 1: Summary Information and Publications for the Tools Integrated in DIOPT

Prediction Method Source Prediction Algorithm Coverage DIOPT Weight* PMID
Compara Ensembl Phylogenetic approach 70 species (vs.81) 0.931 19029536
Homologene NCBI Combination of BBH*, tree and synteny 21 species (vs. 68) 1 11125071
Inparanoid Stockholm University, Sweden BBH* approach to identify orthologs and in-paralogs 273 species (vs. 8) 1.005 11743721
Isobase MIT Sequence and PPI* network alignments 5 species (vs.2, Nov. 2014) 0.957 21177658
OMA CBRG, ETH Zurich BBH*, global sequence alignments 1706 species (Oct 2014) 1.019 17545180
OrthoDB University of Geneva Phylogenetic approach 3027 species (vs.8) 1.001 20972218
orthoMCL University of Pennsylvania Markov Cluster algorithm 150 species (vs. 5) 0.903 12952885
Phylome Centre for Genomic Regulation (CRG), Spain Reconstruction of evolutionary histories of all genes in a genome, also known as phylome. 1059 species,120 Phylomes (vs. 4) 0.912 17962297
RoundUp Harvard Medical School RSD*, modified BBH* 2044 species (Apr 2013) 1.003 16777906
TreeFam Wellcome Trust Sanger Institute Manually curated based on trees 109 species (vs. 9) 0.963 16381935
Panther University of Southern California Phylogenetic approach 79 species 1.1 26578592
HGNC European Bioinformatics Institute (EMBL-EBI) Manually curated 2 species 1.5
ZFIN Zebrafish Model Organism Database Sequence similarity analysis and manual curation 4 species 1.5

* DIOPT weights are based on the mean semantic similarity of high quality GO molecular function annotation of all fly-human orthologous pairs predicted by each tool.
   BBH, Best Blast Hits
   RSD, Reciprocal Smallest Distance
   PPI, Protein-Protein Interactions

Table 2A: Genome Release Information for the Tools Integrated in DIOPT

Worm Fish Fly Human Mouse Yeast Fission Yeast Frog Rat
Compara WBcel235 GRCz10 BDGP6 GRCh38.p3 GRCm38.p4 R64-1-1 JGI 4.2 Rnor_6.0
Homologene WS195 Zv9 "FlyBase r5.48" GRCh38 GRCm38.p2 R64-1-1 ASM294v2 Rnor_5.0
OMA Ensembl v73 WBcel235 Ensembl v70 Zv9 Ensembl v73 BDGP5 Ensembl v75 GRCh37 Ensembl v75 GRCm38 Ensembl v73 (EF4) Ensembl Fungi v22 (ASM294v2) Ensembl v73 (JGI_4.2) Ensembl v73 (Rnor_5.0)
Inparanoid UniProt Nov 2013 UniProt Nov 2013 UniProt Nov 2013 UniProt Nov 2013 UniProt Nov 2013 UniProt Nov 2013 UniProt Nov 2013 UniProt Nov 2013 UniProt Nov 2013
Isobase Ensembl v59 NA Ensembl v59 Ensembl v59 Ensembl v59 Ensembl v59
orthoMCL WS206 Zv8.56 BDGP5.13.56 GRCh37.56 NCBI v37.56 FungiDB GenBank Ensembl v53
orthoDB Ensembl v75 FlyBase r5.55 Ensembl v75 Ensembl v75 UniProt Feb 2014 UniProt Feb 2014 Ensembl v75 Ensembl v75
RoundUp UniProt Apr 2013 UniProt Apr 2013 UniProt Apr 2013 UniProt Apr 2013 UniProt Apr 2013 UniProt Apr 2013 UniProt Apr 2013 UniProt Apr 2013
TreeFam Ensembl v69 Ensembl v69 Ensembl v69 Ensembl v69 Ensembl v69 Ensembl v69 Ensembl v69 Ensembl v69 Ensembl v69 Phylome
Panther WormBase Apr 2014 Ensembl Apr 2014 FlyBase Apr 2014 Ensembl Apr 2014 MGI Apr 2014 SGD Apr 2014 PomBase Apr 2014 Gene Apr 2014 RGD Apr 2014
HGNC HGNC Feb 2016 HGNC Feb 2016
ZFIN ZFIN May 2016 ZFIN May 2016 ZFIN May 2016 ZFIN May 2016

Table 2B: Additional Information About Genome Releases

Other ResourceVersion
WormBase release250
FlyBase release6.07
RefSeq release72
EntrezGene 11-Sep-15

Table 3. Maximum DIOPT score for each orthologous relationship

Orthologous RelationshipMax scoreRelevant Tools
fission yeast-baker's yeast 8 Inparanoid,OMA,orthoMCL,Phylome,RoundUp,Treefam,Homologene,Panther
fission yeast-worm 7 Homologene,Treefam,RoundUp,orthoMCL,Inparanoid,OMA,Panther
fission yeast-fly 8 Phylome,Homologene,Treefam,RoundUp,orthoMCL,Inparanoid,OMA,Panther
fission yeast-fish 7 Homologene,Inparanoid,OMA,orthoMCL,RoundUp,Treefam,Panther
fission yeast-frog 4 Treefam,Inparanoid,OMA,RoundUp,
fission yeast-human 8 Homologene,Inparanoid,OMA,orthoMCL,Phylome,RoundUp,Treefam,Panther
fission yeast-mouse 7 orthoMCL,RoundUp,OMA,Inparanoid,Homologene,Treefam,Panther
baker's yeast-worm 10 orthoMCL,OMA,Treefam,RoundUp,Isobase,Compara,Inparanoid,Homologene,Phylome,Panther
baker's yeast-fly 10 Isobase,Treefam,RoundUp,Phylome,OMA,Inparanoid,Homologene,Compara,orthoMCL,Panther
baker's yeast-fish 9 Phylome,Homologene,Treefam,RoundUp,orthoMCL,Inparanoid,Compara,OMA,Panther
baker's yeast-frog 6 Treefam,Compara,Inparanoid,OMA,Phylome,RoundUp,
baker's yeast-human 10 orthoMCL,Treefam,RoundUp,Phylome,Isobase,Inparanoid,Compara,Homologene,OMA,Panther
baker's yeast-mouse 10 Phylome,RoundUp,orthoMCL,OMA,Inparanoid,Homologene,Compara,Treefam,Isobase,Panther
worm-fly 11 Phylome,RoundUp,orthoMCL,OrthoDB,Isobase,Inparanoid,Compara,Homologene,OMA,Treefam,Panther
worm-fish 9 Homologene,Treefam,RoundUp,orthoMCL,OrthoDB,Inparanoid,Compara,OMA,Panther
worm-frog 6 OrthoDB,OMA,Treefam,Compara,Inparanoid,RoundUp,
worm-human 11 RoundUp,Treefam,Phylome,orthoMCL,OrthoDB,Isobase,Inparanoid,Homologene,Compara,OMA,Panther
worm-mouse 10 Inparanoid,Treefam,RoundUp,orthoMCL,OrthoDB,Compara,Isobase,Homologene,OMA, Panther
fly-fish 11 OrthoDB,RoundUp,Treefam,orthoMCL,Inparanoid,Homologene,Compara,OMA,Phylome,Panther,ZFIN
fly-frog 7 OMA,Treefam,RoundUp,OrthoDB,Inparanoid,Compara,Phylome,
fly-human 11 Inparanoid,RoundUp,Phylome,Treefam,orthoMCL,OrthoDB,Isobase,Homologene,Compara,OMA,Panther
fly-mouse 11 Phylome,Compara,Homologene,Inparanoid,Isobase,OMA,OrthoDB,RoundUp,Treefam,orthoMCL,Panther
fish-frog 6 Compara,Inparanoid,OMA,OrthoDB,RoundUp,Treefam,
fish-human 11 Compara,Treefam,RoundUp,Phylome,orthoMCL,OrthoDB,OMA,Homologene,Inparanoid,Apnther,ZFIN
fish-mouse 10 OrthoDB,orthoMCL,RoundUp,Inparanoid,OMA,Homologene,Compara,Treefam,Panther,ZFIN
frog-human 7 Compara,Inparanoid,OMA,OrthoDB,Phylome,RoundUp,Treefam,
frog-mouse 6 OMA,Treefam,OrthoDB,Inparanoid,Compara,RoundUp,
human-mouse 12 Compara,orthoMCL,RoundUp,Treefam,Phylome,OrthoDB,OMA,Isobase,Homologene,Inparanoid,Panther,HGNC
rat-fission yeast 6 Inparanoid,OMA,orthoMCL,TreeFam,Homologene,Panther
rat- baker's yeast 7 Compara,Homologene,Inparanoid,OMA,orthoMCL,TreeFam,Panther
rat-worm 8 Inparanoid,TreeFam,orthoMCL,OMA,Homologene,Compara,OrthoDB,Panther
rat-fly 8 Compara,Homologene,Inparanoid,OMA,OrthoDB,orthoMCL,TreeFam,Panther
rat-fish 8 OrthoDB,Compara,TreeFam,orthoMCL,Homologene,OMA,Inparanoid,Panther
rat-frog 6 Compara,Homologene,Inparanoid,OMA,OrthoDB,TreeFam
rat-human 8 Homologene,TreeFam,orthoMCL,OrthoDB,Inparanoid,Compara,OMA,Panther
rat-mouse 8 TreeFam,Compara,Homologene,Inparanoid,OMA,OrthoDB,orthoMCL,Panther

Version information

5.3- May 2016 - Added more prediction tools (Panther, HGNC and ZFIN)
5.2.1- April 2016 - Added orthologous rank
High: best score both ways AND DIOPT score >=2
(best score forward or reverse) AND DIOPT score >=2
DIOPT score >=4
Low: all others
5.2- April 2016 - Added New Spcecies (Rattus norvegicus)
5.1.1 - December 2015 - Added Best forward and reverse columns
5.1 - November 2015 - Upgraded gene matching algorithm
5.0 - November 2015 - Upgraded data sources to version 5