The accumulation of biological and biomedical literature outpaces the ability of most researchers and clinicians to stay abreast of their own immediate fields, let alone a broader range of topics. Although available search tools support identification of relevant literature, finding relevant and key publications is not always straightforward. For example, important publications might be missed in searches with an official gene name due to gene synonyms. Moreover, ambiguity of gene names can result in retrieval of a large number of irrelevant publications. To address these issues and help researchers and physicians quickly identify relevant publications, we developed BioLitMine, an advanced literature mining tool that takes advantage of the medical subject heading (MeSH) index and gene-to-publication annotations already available for PubMed literature. Using BioLitMine, a user can identify what MeSH terms are represented in the set of publications associated with a given gene of the interest, or start with a term and identify relevant publications. Users can also use the tool to find co-cited genes and a build a literature co-citation network. In addition, BioLitMine can help users build a gene list relevant to a MeSH terms, such as a list of genes relevant to "stem cells" or "breast neoplasms." Users can also start with a gene or pathway of interest and identify authors associated with that gene or pathway, a feature that makes it easier to identify experts who might serve as collaborators or reviewers. Altogether, BioLitMine extends the value of PubMed-indexed literature and its existing expert curation by providing a robust and gene-centric approach to retrieval of relevant information.
Single-gene knockout experiments can fail to reveal function in the context of redundancy, which is frequently observed among duplicated genes (paralogs) with overlapping functions. We discuss the complexity associated with studying paralogs and outline how recent advances in CRISPR will help address the "phenotype gap" and impact biomedical research.
One of the most powerful ways to develop hypotheses regarding biological functions of conserved genes in a given species, such as in humans, is to first look at what is known about function in another species. Model organism databases (MODs) and other resources are rich with functional information but difficult to mine. Gene2Function (G2F) addresses a broad need by integrating information about conserved genes in a single online resource.
One major challenge encountered with interpreting human genetic variants is the limited understanding of the functional impact of genetic alterations on biological processes. Furthermore, there remains an unmet demand for an efficient survey of the wealth of information on human homologs in model organisms across numerous databases. To efficiently assess the large volume of publically available information, it is important to provide a concise summary of the most relevant information in a rapid user-friendly format. To this end, we created MARRVEL (model organism aggregated resources for rare variant exploration). MARRVEL is a publicly available website that integrates information from six human genetic databases and seven model organism databases. For any given variant or gene, MARRVEL displays information from OMIM, ExAC, ClinVar, Geno2MP, DGV, and DECIPHER. Importantly, it curates model organism-specific databases to concurrently display a concise summary regarding the human gene homologs in budding and fission yeast, worm, fly, fish, mouse, and rat on a single webpage. Experiment-based information on tissue expression, protein subcellular localization, biological process, and molecular function for the human gene and homologs in the seven model organisms are arranged into a concise output. Hence, rather than visiting multiple separate databases for variant and gene analysis, users can obtain important information by searching once through MARRVEL. Altogether, MARRVEL dramatically improves efficiency and accessibility to data collection and facilitates analysis of human genes and variants by cross-disciplinary integration of 18 million records available in public databases to facilitate clinical diagnosis and basic research.
Our understanding of the genetic mechanisms that underlie biological processes has relied extensively on loss-of-function (LOF) analyses. LOF methods target DNA, RNA or protein to reduce or to ablate gene function. By analysing the phenotypes that are caused by these perturbations the wild-type function of genes can be elucidated. Although all LOF methods reduce gene activity, the choice of approach (for example, mutagenesis, CRISPR-based gene editing, RNA interference, morpholinos or pharmacological inhibition) can have a major effect on phenotypic outcomes. Interpretation of the LOF phenotype must take into account the biological process that is targeted by each method. The practicality and efficiency of LOF methods also vary considerably between model systems. We describe parameters for choosing the optimal combination of method and system, and for interpreting phenotypes within the constraints of each method.
The FlyRNAi database of the Drosophila RNAi Screening Center (DRSC) and Transgenic RNAi Project (TRiP) at Harvard Medical School and associated DRSC/TRiP Functional Genomics Resources website (http://fgr.hms.harvard.edu) serve as a reagent production tracking system, screen data repository, and portal to the community. Through this portal, we make available protocols, online tools, and other resources useful to researchers at all stages of high-throughput functional genomics screening, from assay design and reagent identification to data analysis and interpretation. In this update, we describe recent changes and additions to our website, database and suite of online tools. Recent changes reflect a shift in our focus from a single technology (RNAi) and model species (Drosophila) to the application of additional technologies (e.g. CRISPR) and support of integrated, cross-species approaches to uncovering gene function using functional genomics and other approaches.
The rapid rise of CRISPR as a technology for genome engineering and related research applications has created a need for algorithms and associated online tools that facilitate design of on-target and effective guide RNAs (gRNAs). Here, we review the state-of-the-art in CRISPR gRNA design for research applications of the CRISPR-Cas9 system, including knockout, activation and inhibition. Notably, achieving good gRNA design is not solely dependent on innovations in CRISPR technology. Good design and design tools also rely on availability of high-quality genome sequence and gene annotations, as well as on availability of accumulated data regarding off-targets and effectiveness metrics. This article is protected by copyright. All rights reserved.
Regulation of cell growth is a fundamental process in development and disease that integrates a vast array of extra- and intracellular information. A central player in this process is RNA polymerase I (Pol I), which transcribes ribosomal RNA (rRNA) genes in the nucleolus. Rapidly growing cancer cells are characterized by increased Pol I-mediated transcription and, consequently, nucleolar hypertrophy. To map the genetic network underlying the regulation of nucleolar size and of Pol I-mediated transcription, we performed comparative, genome-wide loss-of-function analyses of nucleolar size in Saccharomyces cerevisiae and Drosophila melanogaster coupled with mass spectrometry-based analyses of the ribosomal DNA (rDNA) promoter. With this approach, we identified a set of conserved and nonconserved molecular complexes that control nucleolar size. Furthermore, we characterized a direct role of the histone information regulator (HIR) complex in repressing rRNA transcription in yeast. Our study provides a full-genome, cross-species analysis of a nuclear subcompartment and shows that this approach can identify conserved molecular modules.
Analysis of high-throughput data increasingly relies on pathway annotation and functional information derived from Gene Ontology. This approach has limitations, in particular for the analysis of network dynamics over time or under different experimental conditions, in which modules within a network rather than complete pathways might respond and change. We report an analysis framework based on protein complexes, which are at the core of network reorganization. We generated a protein complex resource for human, Drosophila, and yeast from the literature and databases of protein-protein interaction networks, with each species having thousands of complexes. We developed COMPLEAT (http://www.flyrnai.org/compleat), a tool for data mining and visualization for complex-based analysis of high-throughput data sets, as well as analysis and integration of heterogeneous proteomics and gene expression data sets. With COMPLEAT, we identified dynamically regulated protein complexes among genome-wide RNA interference data sets that used the abundance of phosphorylated extracellular signal-regulated kinase in cells stimulated with either insulin or epidermal growth factor as the output. The analysis predicted that the Brahma complex participated in the insulin response.
BACKGROUND: Mapping of orthologous genes among species serves an important role in functional genomics by allowing researchers to develop hypotheses about gene function in one species based on what is known about the functions of orthologs in other species. Several tools for predicting orthologous gene relationships are available. However, these tools can give different results and identification of predicted orthologs is not always straightforward. RESULTS: We report a simple but effective tool, the Drosophila RNAi Screening Center Integrative Ortholog Prediction Tool (DIOPT; http://www.flyrnai.org/diopt), for rapid identification of orthologs. DIOPT integrates existing approaches, facilitating rapid identification of orthologs among human, mouse, zebrafish, C. elegans, Drosophila, and S. cerevisiae. As compared to individual tools, DIOPT shows increased sensitivity with only a modest decrease in specificity. Moreover, the flexibility built into the DIOPT graphical user interface allows researchers with different goals to appropriately 'cast a wide net' or limit results to highest confidence predictions. DIOPT also displays protein and domain alignments, including percent amino acid identity, for predicted ortholog pairs. This helps users identify the most appropriate matches among multiple possible orthologs. To facilitate using model organisms for functional analysis of human disease-associated genes, we used DIOPT to predict high-confidence orthologs of disease genes in Online Mendelian Inheritance in Man (OMIM) and genes in genome-wide association study (GWAS) data sets. The results are accessible through the DIOPT diseases and traits query tool (DIOPT-DIST; http://www.flyrnai.org/diopt-dist). CONCLUSIONS: DIOPT and DIOPT-DIST are useful resources for researchers working with model organisms, especially those who are interested in exploiting model organisms such as Drosophila to study the functions of human disease genes.
BACKGROUND: A genomic catalogue of protein-protein interactions is a rich source of information, particularly for exploring the relationships between proteins. Numerous systems-wide and small-scale experiments have been conducted to identify interactions; however, our knowledge of all interactions for any one species is incomplete, and alternative means to expand these network maps is needed. We therefore took a comparative biology approach to predict protein-protein interactions across five species (human, mouse, fly, worm, and yeast) and developed InterologFinder for research biologists to easily navigate this data. We also developed a confidence score for interactions based on available experimental evidence and conservation across species. RESULTS: The connectivity of the resultant networks was determined to have scale-free distribution, small-world properties, and increased local modularity, indicating that the added interactions do not disrupt our current understanding of protein network structures. We show examples of how these improved interactomes can be used to analyze a genome-scale dataset (RNAi screen) and to assign new function to proteins. Predicted interactions within this dataset were tested by co-immunoprecipitation, resulting in a high rate of validation, suggesting the high quality of networks produced. CONCLUSIONS: Protein-protein interactions were predicted in five species, based on orthology. An InteroScore, a score accounting for homology, number of orthologues with evidence of interactions, and number of unique observations of interactions, is given to each known and predicted interaction. Our website http://www.interologfinder.org provides research biologists intuitive access to this data.
Genetic screens in the yeast Saccharomyces cerevisiae have identified many proteins involved in the secretory pathway, most of which have orthologues in higher eukaryotes. To investigate whether there are additional proteins that are required for secretion in metazoans but are absent from yeast, we used genome-wide RNA interference (RNAi) to look for genes required for secretion of recombinant luciferase from Drosophila S2 cells. This identified two novel components of the secretory pathway that are conserved from humans to plants. Gryzun is distantly related to, but distinct from, the Trs130 subunit of the TRAPP complex but is absent from S. cerevisiae. RNAi of human Gryzun (C4orf41) blocks Golgi exit. Kish is a small membrane protein with a previously uncharacterised orthologue in yeast. The screen also identified Drosophila orthologues of almost 60% of the yeast genes essential for secretion. Given this coverage, the small number of novel components suggests that contrary to previous indications the number of essential core components of the secretory pathway is not much greater in metazoans than in yeasts.
Damage initiates a pleiotropic cellular response aimed at cellular survival when appropriate. To identify genes required for damage survival, we used a cell-based RNAi screen against the Drosophila genome and the alkylating agent methyl methanesulphonate (MMS). Similar studies performed in other model organisms report that damage response may involve pleiotropic cellular processes other than the central DNA repair components, yet an intuitive systems level view of the cellular components required for damage survival, their interrelationship, and contextual importance has been lacking. Further, by comparing data from different model organisms, identification of conserved and presumably core survival components should be forthcoming. We identified 307 genes, representing 13 signaling, metabolic, or enzymatic pathways, affecting cellular survival of MMS-induced damage. As expected, the majority of these pathways are involved in DNA repair; however, several pathways with more diverse biological functions were also identified, including the TOR pathway, transcription, translation, proteasome, glutathione synthesis, ATP synthesis, and Notch signaling, and these were equally important in damage survival. Comparison with genomic screen data from Saccharomyces cerevisiae revealed no overlap enrichment of individual genes between the species, but a conservation of the pathways. To demonstrate the functional conservation of pathways, five were tested in Drosophila and mouse cells, with each pathway responding to alkylation damage in both species. Using the protein interactome, a significant level of connectivity was observed between Drosophila MMS survival proteins, suggesting a higher order relationship. This connectivity was dramatically improved by incorporating the components of the 13 identified pathways within the network. Grouping proteins into "pathway nodes" qualitatively improved the interactome organization, revealing a highly organized "MMS survival network." We conclude that identification of pathways can facilitate comparative biology analysis when direct gene/orthologue comparisons fail. A biologically intuitive, highly interconnected MMS survival network was revealed after we incorporated pathway data in our interactome analysis.
Yeast genetics and in vitro biochemical analysis have identified numerous genes involved in protein secretion. As compared with yeast, however, the metazoan secretory pathway is more complex and many mechanisms that regulate organization of the Golgi apparatus remain poorly characterized. We performed a genome-wide RNA-mediated interference screen in a Drosophila cell line to identify genes required for constitutive protein secretion. We then classified the genes on the basis of the effect of their depletion on organization of the Golgi membranes. Here we show that depletion of class A genes redistributes Golgi membranes into the endoplasmic reticulum, depletion of class B genes leads to Golgi fragmentation, depletion of class C genes leads to aggregation of Golgi membranes, and depletion of class D genes causes no obvious change. Of the 20 new gene products characterized so far, several localize to the Golgi membranes and the endoplasmic reticulum.