Data mining

Arunachalam Vinayagam, Travis E Gibson, Ho-Joon Lee, Bahar Yilmazel, Charles Roesel, Yanhui Hu, Young Kwon, Amitabh Sharma, Yang-Yu Liu, Norbert Perrimon, and Albert-László Barabási. 5/3/2016. “Controllability analysis of the directed human protein interaction network identifies disease genes and drug targets.” Proc Natl Acad Sci U S A, 113, 18, Pp. 4976-81.Abstract

The protein-protein interaction (PPI) network is crucial for cellular information processing and decision-making. With suitable inputs, PPI networks drive the cells to diverse functional outcomes such as cell proliferation or cell death. Here, we characterize the structural controllability of a large directed human PPI network comprising 6,339 proteins and 34,813 interactions. This network allows us to classify proteins as "indispensable," "neutral," or "dispensable," which correlates to increasing, no effect, or decreasing the number of driver nodes in the network upon removal of that protein. We find that 21% of the proteins in the PPI network are indispensable. Interestingly, these indispensable proteins are the primary targets of disease-causing mutations, human viruses, and drugs, suggesting that altering a network's control property is critical for the transition between healthy and disease states. Furthermore, analyzing copy number alterations data from 1,547 cancer patients reveals that 56 genes that are frequently amplified or deleted in nine different cancers are indispensable. Among the 56 genes, 46 of them have not been previously associated with cancer. This suggests that controllability analysis is very useful in identifying novel disease genes and potential drug targets.

Yanhui Hu, Aram Comjean, Charles Roesel, Arunachalam Vinayagam, Ian Flockhart, Jonathan Zirin, Lizabeth Perkins, Norbert Perrimon, and Stephanie E Mohr. 10/11/2016. “FlyRNAi.org—the database of the Drosophila RNAi screening center and transgenic RNAi project: 2017 update.” Nucleic Acids Research. Publisher's VersionAbstract

The FlyRNAi database of the Drosophila RNAi Screening Center (DRSC) and Transgenic RNAi Project (TRiP) at Harvard Medical School and associated DRSC/TRiP Functional Genomics Resources website (http://fgr.hms.harvard.edu) serve as a reagent production tracking system, screen data repository, and portal to the community. Through this portal, we make available protocols, online tools, and other resources useful to researchers at all stages of high-throughput functional genomics screening, from assay design and reagent identification to data analysis and interpretation. In this update, we describe recent changes and additions to our website, database and suite of online tools. Recent changes reflect a shift in our focus from a single technology (RNAi) and model species (Drosophila) to the application of additional technologies (e.g. CRISPR) and support of integrated, cross-species approaches to uncovering gene function using functional genomics and other approaches.

Screenshot of GLAD results with the hh gene

Gene List Annotation for Drosophila (GLAD) online resource updated

October 19, 2016

We recently udpated our Gene List Annotation for Drosophila (GLAD) online resource.  At GLAD you could already view the members of a gene list, such as genes grouped as members of a pathway, process, or sharing a functional domain. Now, you can also ask if a gene of interest is a member of a given group. Please see examples of the two ways to use GLAD below. As always, we welcome your feedback, including suggestions for changes or additions to the curated lists, or for addition of new...

Read more about Gene List Annotation for Drosophila (GLAD) online resource updated
Multi sequence alignments for ALL search best matches

"One vs. All" a new feature in our ortholog search tool

October 3, 2016

Our DIOPT ortholog search tool has been updated to include the option to search for orthologs of a gene in all other species included. So you can search with, for example, a fly gene, and see orthologs in human, mouse, rat, frog, worm, and yeast.

Click on the button "show summary of top scores" to see a heat map view of the top-scoring ortholog matches in other species to your query species. This feature helps you see quickly if a gene has been conserved across many species or is, for example, only found in vertebrates.

As always, our tool supports batch-mode searches (you can...

Read more about "One vs. All" a new feature in our ortholog search tool
Arunachalam Vinayagam, Meghana M Kulkarni, Richelle Sopko, Xiaoyun Sun, Yanhui Hu, Ankita Nand, Christians Villalta, Ahmadali Moghimi, Xuemei Yang, Stephanie E Mohr, Pengyu Hong, John M Asara, and Norbert Perrimon. 9/13/2016. “An Integrative Analysis of the InR/PI3K/Akt Network Identifies the Dynamic Response to Insulin Signaling.” Cell Reports, 16, 11, Pp. 3062-3074.Abstract

Insulin regulates an essential conserved signaling pathway affecting growth, proliferation, and meta- bolism. To expand our understanding of the insulin pathway, we combine biochemical, genetic, and computational approaches to build a comprehensive Drosophila InR/PI3K/Akt network. First, we map the dynamic protein-protein interaction network sur- rounding the insulin core pathway using bait-prey interactions connecting 566 proteins. Combining RNAi screening and phospho-specific antibodies, we find that 47% of interacting proteins affect pathway activity, and, using quantitative phospho- proteomics, we demonstrate that $10% of interact- ing proteins are regulated by insulin stimulation at the level of phosphorylation. Next, we integrate these orthogonal datasets to characterize the structure and dynamics of the insulin network at the level of protein complexes and validate our method by iden- tifying regulatory roles for the Protein Phosphatase 2A (PP2A) and Reptin-Pontin chromatin-remodeling complexes as negative and positive regulators of ribosome biogenesis, respectively. Altogether, our study represents a comprehensive resource for the study of the evolutionary conserved insulin network. 

Screenshot of the Online Tools Overview page

Is it a hit? On mining our data sets.

July 22, 2016

The DRSC/TRiP-FGR's FlyRNAi database stores results from the many cell-based screens done since 2003 using DRSC Drosophila RNAi libraries. It also stores information about knockdown and phenotypes resulting from specific combinations of in vivo RNAi fly stocks (including our TRiP stocks and also VDRC and NIG-Japan stocks). The in vivo data includes directly deposited data and results curated by FlyBase from the literature.

Even if you are not interested to do a fly RNAi screen, these data might help you. For...

Read more about Is it a hit? On mining our data sets.
Ian T Flockhart, Matthew Booker, Yanhui Hu, Benjamin McElvany, Quentin Gilly, Bernard Mathey-Prevot, Norbert Perrimon, and Stephanie E Mohr. 2012. “FlyRNAi.org--the database of the Drosophila RNAi screening center: 2012 update.” Nucleic Acids Res, 40, Database issue, Pp. D715-9.Abstract

FlyRNAi (http://www.flyrnai.org), the database and website of the Drosophila RNAi Screening Center (DRSC) at Harvard Medical School, serves a dual role, tracking both production of reagents for RNA interference (RNAi) screening in Drosophila cells and RNAi screen results. The database and website is used as a platform for community availability of protocols, tools, and other resources useful to researchers planning, conducting, analyzing or interpreting the results of Drosophila RNAi screens. Based on our own experience and user feedback, we have made several changes. Specifically, we have restructured the database to accommodate new types of reagents; added information about new RNAi libraries and other reagents; updated the user interface and website; and added new tools of use to the Drosophila community and others. Overall, the result is a more useful, flexible and comprehensive website and database.

Ralph A Neumüller and Norbert Perrimon. 2011. “Where gene discovery turns into systems biology: genome-scale RNAi screens in Drosophila.” Wiley Interdiscip Rev Syst Biol Med, 3, 4, Pp. 471-8.Abstract

Systems biology aims to describe the complex interplays between cellular building blocks which, in their concurrence, give rise to the emergent properties observed in cellular behaviors and responses. This approach tries to determine the molecular players and the architectural principles of their interactions within the genetic networks that control certain biological processes. Large-scale loss-of-function screens, applicable in various different model systems, have begun to systematically interrogate entire genomes to identify the genes that contribute to a certain cellular response. In particular, RNA interference (RNAi)-based high-throughput screens have been instrumental in determining the composition of regulatory systems and paired with integrative data analyses have begun to delineate the genetic networks that control cell biological and developmental processes. Through the creation of tools for both, in vitro and in vivo genome-wide RNAi screens, Drosophila melanogaster has emerged as one of the key model organisms in systems biology research and over the last years has massively contributed to and hence shaped this discipline. WIREs Syst Biol Med 2011 3 471-478 DOI: 10.1002/wsbm.127

Amy M Wiles, Dashnamoorthy Ravi, Selvaraj Bhavani, and Alexander JR Bishop. 2008. “An analysis of normalization methods for Drosophila RNAi genomic screens and development of a robust validation scheme.” J Biomol Screen, 13, 8, Pp. 777-84.Abstract

Genome-wide RNA interference (RNAi) screening allows investigation of the role of individual genes in a process of choice. Most RNAi screens identify a large number of genes with a continuous gradient in the assessed phenotype. Screeners must decide whether to examine genes with the most robust phenotype or the full gradient of genes that cause an effect and how to identify candidate genes. The authors have used RNAi in Drosophila cells to examine viability in a 384-well plate format and compare 2 screens, untreated control and treatment. They compare multiple normalization methods, which take advantage of different features within the data, including quantile normalization, background subtraction, scaling, cellHTS2 (Boutros et al. 2006), and interquartile range measurement. Considering the false-positive potential that arises from RNAi technology, a robust validation method was designed for the purpose of gene selection for future investigations. In a retrospective analysis, the authors describe the use of validation data to evaluate each normalization method. Although no method worked ideally, a combination of 2 methods, background subtraction followed by quantile normalization and cellHTS2, at different thresholds, captures the most dependable and diverse candidate genes. Thresholds are suggested depending on whether a few candidate genes are desired or a more extensive systems-level analysis is sought. The normalization approaches and experimental design to perform validation experiments are likely to apply to those high-throughput screening systems attempting to identify genes for systems-level analysis.

Bahar Yilmazel, Yanhui Hu, Frederic Sigoillot, Jennifer A Smith, Caroline E Shamu, Norbert Perrimon, and Stephanie E Mohr. 2014. “Online GESS: prediction of miRNA-like off-target effects in large-scale RNAi screen data by seed region analysis.” BMC Bioinformatics, 15, Pp. 192.Abstract

BACKGROUND: RNA interference (RNAi) is an effective and important tool used to study gene function. For large-scale screens, RNAi is used to systematically down-regulate genes of interest and analyze their roles in a biological process. However, RNAi is associated with off-target effects (OTEs), including microRNA (miRNA)-like OTEs. The contribution of reagent-specific OTEs to RNAi screen data sets can be significant. In addition, the post-screen validation process is time and labor intensive. Thus, the availability of robust approaches to identify candidate off-targeted transcripts would be beneficial. RESULTS: Significant efforts have been made to eliminate false positive results attributable to sequence-specific OTEs associated with RNAi. These approaches have included improved algorithms for RNAi reagent design, incorporation of chemical modifications into siRNAs, and the use of various bioinformatics strategies to identify possible OTEs in screen results. Genome-wide Enrichment of Seed Sequence matches (GESS) was developed to identify potential off-targeted transcripts in large-scale screen data by seed-region analysis. Here, we introduce a user-friendly web application that provides researchers a relatively quick and easy way to perform GESS analysis on data from human or mouse cell-based screens using short interfering RNAs (siRNAs) or short hairpin RNAs (shRNAs), as well as for Drosophila screens using shRNAs. Online GESS relies on up-to-date transcript sequence annotations for human and mouse genes extracted from NCBI Reference Sequence (RefSeq) and Drosophila genes from FlyBase. The tool also accommodates analysis with user-provided reference sequence files. CONCLUSION: Online GESS provides a straightforward user interface for genome-wide seed region analysis for human, mouse and Drosophila RNAi screen data. With the tool, users can either use a built-in database or provide a database of transcripts for analysis. This makes it possible to analyze RNAi data from any organism for which the user can provide transcript sequences.

Pages