The accumulation of biological and biomedical literature outpaces the ability of most researchers and clinicians to stay abreast of their own immediate fields, let alone a broader range of topics. Although available search tools support identification of relevant literature, finding relevant and key publications is not always straightforward. For example, important publications might be missed in searches with an official gene name due to gene synonyms. Moreover, ambiguity of gene names can result in retrieval of a large number of irrelevant publications. To address these issues and help researchers and physicians quickly identify relevant publications, we developed BioLitMine, an advanced literature mining tool that takes advantage of the medical subject heading (MeSH) index and gene-to-publication annotations already available for PubMed literature. Using BioLitMine, a user can identify what MeSH terms are represented in the set of publications associated with a given gene of the interest, or start with a term and identify relevant publications. Users can also use the tool to find co-cited genes and a build a literature co-citation network. In addition, BioLitMine can help users build a gene list relevant to a MeSH terms, such as a list of genes relevant to "stem cells" or "breast neoplasms." Users can also start with a gene or pathway of interest and identify authors associated with that gene or pathway, a feature that makes it easier to identify experts who might serve as collaborators or reviewers. Altogether, BioLitMine extends the value of PubMed-indexed literature and its existing expert curation by providing a robust and gene-centric approach to retrieval of relevant information.
Model organism and human databases are rich with information about genetic and physical interactions. These data can be used to interpret and guide the analysis of results from new studies and develop new hypotheses. Here, we report the development of the Molecular Interaction Search Tool (MIST; http://fgrtools.hms.harvard.edu/MIST/). The MIST database integrates biological interaction data from yeast, nematode, fly, zebrafish, frog, rat and mouse model systems, as well as human. For individual or short gene lists, the MIST user interface can be used to identify interacting partners based on protein-protein and genetic interaction (GI) data from the species of interest as well as inferred interactions, known as interologs, and to view a corresponding network. The data, interologs and search tools at MIST are also useful for analyzing 'omics datasets. In addition to describing the integrated database, we also demonstrate how MIST can be used to identify an appropriate cut-off value that balances false positive and negative discovery, and present use-cases for additional types of analysis. Altogether, the MIST database and search tools support visualization and navigation of existing protein and GI data, as well as comparison of new and existing data.
One of the most powerful ways to develop hypotheses regarding biological functions of conserved genes in a given species, such as in humans, is to first look at what is known about function in another species. Model organism databases (MODs) and other resources are rich with functional information but difficult to mine. Gene2Function (G2F) addresses a broad need by integrating information about conserved genes in a single online resource.
One major challenge encountered with interpreting human genetic variants is the limited understanding of the functional impact of genetic alterations on biological processes. Furthermore, there remains an unmet demand for an efficient survey of the wealth of information on human homologs in model organisms across numerous databases. To efficiently assess the large volume of publically available information, it is important to provide a concise summary of the most relevant information in a rapid user-friendly format. To this end, we created MARRVEL (model organism aggregated resources for rare variant exploration). MARRVEL is a publicly available website that integrates information from six human genetic databases and seven model organism databases. For any given variant or gene, MARRVEL displays information from OMIM, ExAC, ClinVar, Geno2MP, DGV, and DECIPHER. Importantly, it curates model organism-specific databases to concurrently display a concise summary regarding the human gene homologs in budding and fission yeast, worm, fly, fish, mouse, and rat on a single webpage. Experiment-based information on tissue expression, protein subcellular localization, biological process, and molecular function for the human gene and homologs in the seven model organisms are arranged into a concise output. Hence, rather than visiting multiple separate databases for variant and gene analysis, users can obtain important information by searching once through MARRVEL. Altogether, MARRVEL dramatically improves efficiency and accessibility to data collection and facilitates analysis of human genes and variants by cross-disciplinary integration of 18 million records available in public databases to facilitate clinical diagnosis and basic research.
The FlyRNAi database of the Drosophila RNAi Screening Center (DRSC) and Transgenic RNAi Project (TRiP) at Harvard Medical School and associated DRSC/TRiP Functional Genomics Resources website (http://fgr.hms.harvard.edu) serve as a reagent production tracking system, screen data repository, and portal to the community. Through this portal, we make available protocols, online tools, and other resources useful to researchers at all stages of high-throughput functional genomics screening, from assay design and reagent identification to data analysis and interpretation. In this update, we describe recent changes and additions to our website, database and suite of online tools. Recent changes reflect a shift in our focus from a single technology (RNAi) and model species (Drosophila) to the application of additional technologies (e.g. CRISPR) and support of integrated, cross-species approaches to uncovering gene function using functional genomics and other approaches.
Insulin regulates an essential conserved signaling pathway affecting growth, proliferation, and meta- bolism. To expand our understanding of the insulin pathway, we combine biochemical, genetic, and computational approaches to build a comprehensive Drosophila InR/PI3K/Akt network. First, we map the dynamic protein-protein interaction network sur- rounding the insulin core pathway using bait-prey interactions connecting 566 proteins. Combining RNAi screening and phospho-specific antibodies, we find that 47% of interacting proteins affect pathway activity, and, using quantitative phospho- proteomics, we demonstrate that $10% of interact- ing proteins are regulated by insulin stimulation at the level of phosphorylation. Next, we integrate these orthogonal datasets to characterize the structure and dynamics of the insulin network at the level of protein complexes and validate our method by iden- tifying regulatory roles for the Protein Phosphatase 2A (PP2A) and Reptin-Pontin chromatin-remodeling complexes as negative and positive regulators of ribosome biogenesis, respectively. Altogether, our study represents a comprehensive resource for the study of the evolutionary conserved insulin network.
The protein-protein interaction (PPI) network is crucial for cellular information processing and decision-making. With suitable inputs, PPI networks drive the cells to diverse functional outcomes such as cell proliferation or cell death. Here, we characterize the structural controllability of a large directed human PPI network comprising 6,339 proteins and 34,813 interactions. This network allows us to classify proteins as "indispensable," "neutral," or "dispensable," which correlates to increasing, no effect, or decreasing the number of driver nodes in the network upon removal of that protein. We find that 21% of the proteins in the PPI network are indispensable. Interestingly, these indispensable proteins are the primary targets of disease-causing mutations, human viruses, and drugs, suggesting that altering a network's control property is critical for the transition between healthy and disease states. Furthermore, analyzing copy number alterations data from 1,547 cancer patients reveals that 56 genes that are frequently amplified or deleted in nine different cancers are indispensable. Among the 56 genes, 46 of them have not been previously associated with cancer. This suggests that controllability analysis is very useful in identifying novel disease genes and potential drug targets.
Alkylating agents are a key component of cancer chemotherapy. Several cellular mechanisms are known to be important for its survival, particularly DNA repair and xenobiotic detoxification, yet genomic screens indicate that additional cellular components may be involved. Elucidating these components has value in either identifying key processes that can be modulated to improve chemotherapeutic efficacy or may be altered in some cancers to confer chemoresistance. We therefore set out to reevaluate our prior Drosophila RNAi screening data by comparison to gene expression arrays in order to determine if we could identify any novel processes in alkylation damage survival. We noted a consistent conservation of alkylation survival pathways across platforms and species when the analysis was conducted on a pathway/process level rather than at an individual gene level. Better results were obtained when combining gene lists from two datasets (RNAi screen plus microarray) prior to analysis. In addition to previously identified DNA damage responses (p53 signaling and Nucleotide Excision Repair), DNA-mRNA-protein metabolism (transcription/translation) and proteasome machinery, we also noted a highly conserved cross-species requirement for NRF2, glutathione (GSH)-mediated drug detoxification and Endoplasmic Reticulum stress (ER stress)/Unfolded Protein Responses (UPR) in cells exposed to alkylation. The requirement for GSH, NRF2 and UPR in alkylation survival was validated by metabolomics, protein studies and functional cell assays. From this we conclude that RNAi/gene expression fusion is a valid strategy to rapidly identify key processes that may be extendable to other contexts beyond damage survival.
We present a resource of high quality lists of functionally related Drosophila genes, e.g. based on protein domains (kinases, transcription factors, etc.) or cellular function (e.g. autophagy, signal transduction). To establish these lists, we relied on different inputs, including curation from databases or the literature and mapping from other species. Moreover, as an added curation and quality control step, we asked experts in relevant fields to review many of the lists. The resource is available online for scientists to search and view, and is editable based on community input. Annotation of gene groups is an ongoing effort and scientific need will typically drive decisions regarding which gene lists to pursue. We anticipate that the number of lists will increase over time; that the composition of some lists will grow and/or change over time as new information becomes available; and that the lists will benefit the scientific community, e.g. at experimental design and data analysis stages. Based on this, we present an easily updatable online database, available at www.flyrnai.org/glad, at which gene group lists can be viewed, searched and downloaded.
A major objective of systems biology is to organize molecular interactions as networks and to characterize information flow within networks. We describe a computational framework to integrate protein-protein interaction (PPI) networks and genetic screens to predict the 'signs' of interactions (i.e., activation-inhibition relationships). We constructed a Drosophila melanogaster signed PPI network consisting of 6,125 signed PPIs connecting 3,352 proteins that can be used to identify positive and negative regulators of signaling pathways and protein complexes. We identified an unexpected role for the metabolic enzymes enolase and aldo-keto reductase as positive and negative regulators of proteolysis, respectively. Characterization of the activation-inhibition relationships between physically interacting proteins within signaling pathways will affect our understanding of many biological functions, including signal transduction and mechanisms of disease.
BACKGROUND: RNA interference (RNAi) is an effective and important tool used to study gene function. For large-scale screens, RNAi is used to systematically down-regulate genes of interest and analyze their roles in a biological process. However, RNAi is associated with off-target effects (OTEs), including microRNA (miRNA)-like OTEs. The contribution of reagent-specific OTEs to RNAi screen data sets can be significant. In addition, the post-screen validation process is time and labor intensive. Thus, the availability of robust approaches to identify candidate off-targeted transcripts would be beneficial. RESULTS: Significant efforts have been made to eliminate false positive results attributable to sequence-specific OTEs associated with RNAi. These approaches have included improved algorithms for RNAi reagent design, incorporation of chemical modifications into siRNAs, and the use of various bioinformatics strategies to identify possible OTEs in screen results. Genome-wide Enrichment of Seed Sequence matches (GESS) was developed to identify potential off-targeted transcripts in large-scale screen data by seed-region analysis. Here, we introduce a user-friendly web application that provides researchers a relatively quick and easy way to perform GESS analysis on data from human or mouse cell-based screens using short interfering RNAs (siRNAs) or short hairpin RNAs (shRNAs), as well as for Drosophila screens using shRNAs. Online GESS relies on up-to-date transcript sequence annotations for human and mouse genes extracted from NCBI Reference Sequence (RefSeq) and Drosophila genes from FlyBase. The tool also accommodates analysis with user-provided reference sequence files. CONCLUSION: Online GESS provides a straightforward user interface for genome-wide seed region analysis for human, mouse and Drosophila RNAi screen data. With the tool, users can either use a built-in database or provide a database of transcripts for analysis. This makes it possible to analyze RNAi data from any organism for which the user can provide transcript sequences.
Drosophila melanogaster has become a system of choice for functional genomic studies. Many resources, including online databases and software tools, are now available to support design or identification of relevant fly stocks and reagents or analysis and mining of existing functional genomic, transcriptomic, proteomic, etc. datasets. These include large community collections of fly stocks and plasmid clones, "meta" information sites like FlyBase and FlyMine, and an increasing number of more specialized reagents, databases, and online tools. Here, we introduce key resources useful to plan large-scale functional genomics studies in Drosophila and to analyze, integrate, and mine the results of those studies in ways that facilitate identification of highest-confidence results and generation of new hypotheses. We also discuss ways in which existing resources can be used and might be improved and suggest a few areas of future development that would further support large- and small-scale studies in Drosophila and facilitate use of Drosophila information by the research community more generally.
Here, I discuss how RNAi screening can be used effectively to uncover gene function. Specifically, I discuss the types of high-throughput assays that can be done in Drosophila cells and in vivo, RNAi reagent design and available reagent collections, automated screen pipelines, analysis of screen results, and approaches to RNAi results verification.
Analysis of high-throughput data increasingly relies on pathway annotation and functional information derived from Gene Ontology. This approach has limitations, in particular for the analysis of network dynamics over time or under different experimental conditions, in which modules within a network rather than complete pathways might respond and change. We report an analysis framework based on protein complexes, which are at the core of network reorganization. We generated a protein complex resource for human, Drosophila, and yeast from the literature and databases of protein-protein interaction networks, with each species having thousands of complexes. We developed COMPLEAT (http://www.flyrnai.org/compleat), a tool for data mining and visualization for complex-based analysis of high-throughput data sets, as well as analysis and integration of heterogeneous proteomics and gene expression data sets. With COMPLEAT, we identified dynamically regulated protein complexes among genome-wide RNA interference data sets that used the abundance of phosphorylated extracellular signal-regulated kinase in cells stimulated with either insulin or epidermal growth factor as the output. The analysis predicted that the Brahma complex participated in the insulin response.
Reactive Oxygen Species (ROS) are a natural by-product of cellular growth and proliferation, and are required for fundamental processes such as protein-folding and signal transduction. However, ROS accumulation, and the onset of oxidative stress, can negatively impact cellular and genomic integrity. Signalling networks have evolved to respond to oxidative stress by engaging diverse enzymatic and non-enzymatic antioxidant mechanisms to restore redox homeostasis. The architecture of oxidative stress response networks during periods of normal growth, and how increased ROS levels dynamically reconfigure these networks are largely unknown. In order to gain insight into the structure of signalling networks that promote redox homeostasis we first performed genome-scale RNAi screens to identify novel suppressors of superoxide accumulation. We then infer relationships between redox regulators by hierarchical clustering of phenotypic signatures describing how gene inhibition affects superoxide levels, cellular viability, and morphology across different genetic backgrounds. Genes that cluster together are likely to act in the same signalling pathway/complex and thus make "functional interactions". Moreover we also calculate differential phenotypic signatures describing the difference in cellular phenotypes following RNAi between untreated cells and cells submitted to oxidative stress. Using both phenotypic signatures and differential signatures we construct a network model of functional interactions that occur between components of the redox homeostasis network, and how such interactions become rewired in the presence of oxidative stress. This network model predicts a functional interaction between the transcription factor Jun and the IRE1 kinase, which we validate in an orthogonal assay. We thus demonstrate the ability of systems-biology approaches to identify novel signalling events.
FlyRNAi (http://www.flyrnai.org), the database and website of the Drosophila RNAi Screening Center (DRSC) at Harvard Medical School, serves a dual role, tracking both production of reagents for RNA interference (RNAi) screening in Drosophila cells and RNAi screen results. The database and website is used as a platform for community availability of protocols, tools, and other resources useful to researchers planning, conducting, analyzing or interpreting the results of Drosophila RNAi screens. Based on our own experience and user feedback, we have made several changes. Specifically, we have restructured the database to accommodate new types of reagents; added information about new RNAi libraries and other reagents; updated the user interface and website; and added new tools of use to the Drosophila community and others. Overall, the result is a more useful, flexible and comprehensive website and database.