Online tools

Multi-Species        

Square logo for DIOPT online resource
DIOPT ortholog search
10 species, 18 algorithms
[Demo Video]

Square logo for G2F online resource
Gene2Function
orthologs & gene info summaries (orthologs, GO, & more)
[Demo Video]
Square logo for BioLitMine online resource
BioLitMine

literature mining tool (genes, pathways, people, MeSH terms)
[Demo Video]
Square logo for MISTMIST
protein-protein & genetic interactions
(multi-source)
[Demo Video]

MARRVEL
Connect human gene variants to ortholog info (multi-source)

DIOPT-DIST
Connect disease genes to ortholog info or vice versa (OMIM & GWAS)

Fly CRISPR        

Square icon for fly sgRNA database and LIMS
TRiP sgRNA LIMS
nominate or track TRiP-KO & -OE fly stock production

Square icon for Find CRISPRs online resourceFind CRISPRs
fly sgRNA designs with genome view
(2017 version)

CRISPR3

Find CRISPRs 3
fly sgRNA designs with genome view
(2019 version)
[Demo Video]

Square logo for CRIMIC online resource
CRIMIC

nominate for GDP gene trap fly stocks

snp-crispr logoSNP CRISPR
design allele-specific sgRNA for major model organisms

Fly RNAi        
Square logo for UP-TORR online resource
UP-TORR

cell and in vivo RNAi reagent search
Square logo for SnapDragon online resource
SnapDragon

design dsRNAs for
fly cell RNAi
Square logo for RSVP Plus
RSVP Plus

in vivo CRISPR & RNAi phenotype data
Square logo for DRSC Screen Summary online data view
Screen Summary
browse DRSC cell RNAi screen data sets
 

GeneLookup
(search DRSC & TRiP reagents by gene)

TRiP Batch Query
(make a TRiP fly stock list from a gene list)

More fly resources       Fly PTMs
Square logo for DGET online resource
DGET

mine bulk RNAseq data for fly
Square logo for GLAD online resource
GLAD

view grouped gene lists for fly
Square logo for FlyPrimerBank
FlyPrimerBank

find qPCR primers
for fly studies

Paralogs Explorer
find paralogs & info

More fly resource and utility tools

Cell Line Expression
HRMA online tool
List of Utility Tools

Square logo for iProteinDB online resourceiProteinDB
post-translational modifications
[Demo Video]
Network & Pathway       Fly Protocols
COMPLEAT
protein complex enrichment analysis
(fly, human, yeast)
[Demo Video]

Signed PPI/

DirectedPPI

No Longer available.

InsulinNet
fly insulin network data (PPI, RNAi, PTM +/- insulin)
[Demo Video]
Screenshot from the PathOn online resource home page
PathOn
find networks in your RNAseq data
Screenshot of Drosophila Protocols Portal online resource
Drosophila Protocols Portal
Data Sets        
List of Data Sets
(DRSC & Perrimon)
with links
Single-cell RNAseq data sets for specific tissues InsulinNet fly PPI, RNAi, PTM data
Vinayagam et al.
MitoMax fly mitochondrial proteomics data set
Chen et al.
PhosphoSite fly kinase network data set
Sopko et al.
Pooled CRISPR fly cell screen raw data sets
Viswanatha et al.
Nucleolar fly cell RNAi image-based screen data set
Neumuller et al.
DRSC RNA Binding RNAi library fly cell screen data set
Mohr et al.
Table of all public DRSC cell-based RNAi screen data sets at Screen Summary FlyBi Drosophila Y2H binary interaction data DRSC/CCSB/BDGP
Mosquito resources       Single-Cell Resource

Ortholog Mapping
(D. mel <-> mosquitos)

[Demo Video]

CRISPR sgRNA designs
(mosquitos)

[Demo Video]

    Screenshot and logo for the DRscDB single-cell resource
DRscDB single-cell portal

 

CONTACT US

report a bug or make a suggestion

Miss the old tools overview page? View it here.

Citations help us keep going

Be sure to cite our online resources if they contribute to a published study. You can cite our latest Nucleic Acids Research Database Issue paper and/or the publication(s) corresponding to the specific online resource(s) that you used. See below.

Bioinformatics Publications

Raghuvir Viswanatha, Enzo Mameli, Jonathan Rodiger, Pierre Merckaert, Fabiana Feitosa-Suntheimer, Tonya M. Colpitts, Stephanie E. Mohr, Yanhui Hu, and Norbert Perrimon. Working Paper. “Bioinformatic and cell-based tools for pooled CRISPR knockout screening in mosquitos.” bioRxiv. Publisher's VersionAbstract
Mosquito-borne diseases present a worldwide public health burden. Genome-scale screening tools that could inform our understanding of mosquitos and their control are lacking. Here, we adapt a recombination-mediated cassette exchange system for delivery of CRISPR sgRNA libraries into cell lines from several mosquito species and perform pooled CRISPR screens in an Anopheles cell line. To implement this method, we engineered modified mosquito cell lines, validated promoters and developed bioinformatics tools for multiple mosquito species.Competing Interest StatementThe authors have declared no competing interest.
A.M. Conard, N. Goodman, Hu, Y, N. Perrimon, R. Singh, C. Lawrence, and E. Larschan. Submitted. “TIMEOR: a web-based tool to uncover temporal regulatory mechanisms from multi-omics data.” BioRxiv. Publisher's VersionAbstract
Uncovering how transcription factors (TFs) regulate their targets at the DNA, RNA and protein levels over time is critical to define gene regulatory networks (GRNs) in normal and diseased states. RNA-seq has become a standard method to measure gene regulation using an established set of analysis steps. However, none of the currently available pipeline methods for interpreting ordered genomic data (in time or space) use time series models to assign cause and effect relationships within GRNs, are adaptive to diverse experimental designs, or enable user interpretation through a web-based platform. Furthermore, methods which integrate ordered RNA-seq data with transcription factor binding data are urgently needed. Here, we present TIMEOR (Trajectory Inference and Mechanism Exploration with Omics data in R), the first web-based and adaptive time series multi-omics pipeline method which infers the relationship between gene regulatory events across time. TIMEOR addresses the critical need for methods to predict causal regulatory mechanism networks between TFs from time series multi-omics data. We used TIMEOR to identify a new link between insulin stimulation and the circadian rhythm cycle. TIMEOR is available at https://github.com/ashleymaeconard/TIMEOR.git.
Yanhui Hu, Sudhir Gopal Tattikota, Yifang Liu, Aram Comjean, Yue Gao, Corey Forman, Grace Kim, Jonathan Rodiger, Irene Papatheodorou, Gilberto Dos Santos, Stephanie E Mohr, and Norbert Perrimon. 2021. “DRscDB: A single-cell RNA-seq resource for data mining and data comparison across species.” Comput Struct Biotechnol J, 19, Pp. 2018-2026.Abstract
With the advent of single-cell RNA sequencing (scRNA-seq) technologies, there has been a spike in studies involving scRNA-seq of several tissues across diverse species including Drosophila. Although a few databases exist for users to query genes of interest within the scRNA-seq studies, search tools that enable users to find orthologous genes and their cell type-specific expression patterns across species are limited. Here, we built a new search database, DRscDB (https://www.flyrnai.org/tools/single_cell/web/), to address this need. DRscDB serves as a comprehensive repository for published scRNA-seq datasets for Drosophila and relevant datasets from human and other model organisms. DRscDB is based on manual curation of Drosophila scRNA-seq studies of various tissue types and their corresponding analogous tissues in vertebrates including zebrafish, mouse, and human. Of note, our search database provides most of the literature-derived marker genes, thus preserving the original analysis of the published scRNA-seq datasets. Finally, DRscDB serves as a web-based user interface that allows users to mine gene expression data from scRNA-seq studies and perform cell cluster enrichment analyses pertaining to various scRNA-seq studies, both within and across species.
Xuechun Feng, Víctor López Del Amo, Enzo Mameli, Megan Lee, Alena L Bishop, Norbert Perrimon, and Valentino M Gantz. 2021. “Optimized CRISPR tools and site-directed transgenesis towards gene drive development in Culex quinquefasciatus mosquitoes.” Nat Commun, 12, 1, Pp. 2960.Abstract
Culex mosquitoes are a global vector for multiple human and animal diseases, including West Nile virus, lymphatic filariasis, and avian malaria, posing a constant threat to public health, livestock, companion animals, and endangered birds. While rising insecticide resistance has threatened the control of Culex mosquitoes, advances in CRISPR genome-editing tools have fostered the development of alternative genetic strategies such as gene drive systems to fight disease vectors. However, though gene-drive technology has quickly progressed in other mosquitoes, advances have been lacking in Culex. Here, we develop a Culex-specific Cas9/gRNA expression toolkit and use site-directed homology-based transgenesis to generate and validate a Culex quinquefasciatus Cas9-expressing line. We show that gRNA scaffold variants improve transgenesis efficiency in both Culex quinquefasciatus and Drosophila melanogaster and boost gene-drive performance in the fruit fly. These findings support future technology development to control Culex mosquitoes and provide valuable insight for improving these tools in other species.
Yanhui Hu, Verena Chung, Aram Comjean, Jonathan Rodiger, Fnu Nipun, Norbert Perrimon, and Stephanie E Mohr. 2020. “BioLitMine: Advanced Mining of Biomedical and Biological Literature About Human Genes and Genes from Major Model Organisms.” G3 (Bethesda).Abstract
The accumulation of biological and biomedical literature outpaces the ability of most researchers and clinicians to stay abreast of their own immediate fields, let alone a broader range of topics. Although available search tools support identification of relevant literature, finding relevant and key publications is not always straightforward. For example, important publications might be missed in searches with an official gene name due to gene synonyms. Moreover, ambiguity of gene names can result in retrieval of a large number of irrelevant publications. To address these issues and help researchers and physicians quickly identify relevant publications, we developed BioLitMine, an advanced literature mining tool that takes advantage of the medical subject heading (MeSH) index and gene-to-publication annotations already available for PubMed literature. Using BioLitMine, a user can identify what MeSH terms are represented in the set of publications associated with a given gene of the interest, or start with a term and identify relevant publications. Users can also use the tool to find co-cited genes and a build a literature co-citation network. In addition, BioLitMine can help users build a gene list relevant to a MeSH terms, such as a list of genes relevant to "stem cells" or "breast neoplasms." Users can also start with a gene or pathway of interest and identify authors associated with that gene or pathway, a feature that makes it easier to identify experts who might serve as collaborators or reviewers. Altogether, BioLitMine extends the value of PubMed-indexed literature and its existing expert curation by providing a robust and gene-centric approach to retrieval of relevant information.
Chiao-Lin Chen, Jonathan Rodiger, Verena Chung, Raghuvir Viswanatha, Stephanie E Mohr, Yanhui Hu, and Norbert Perrimon. 2019. “SNP-CRISPR: A Web Tool for SNP-Specific Genome Editing.” G3 (Bethesda).Abstract
CRISPR-Cas9 is a powerful genome editing technology in which a short guide RNA (sgRNA) confers target site specificity to achieve Cas9-mediated genome editing. Numerous sgRNA design tools have been developed based on reference genomes for humans and model organisms. However, existing resources are not optimal as genetic mutations or single nucleotide polymorphisms (SNPs) within the targeting region affect the efficiency of CRISPR-based approaches by interfering with guide-target complementarity. To facilitate identification of sgRNAs (1) in non-reference genomes, (2) across varying genetic backgrounds, or (3) for specific targeting of SNP-containing alleles, for example, disease relevant mutations, we developed a web tool, SNP-CRISPR (https://www.flyrnai.org/tools/snp_crispr/). SNP-CRISPR can be used to design sgRNAs based on public variant data sets or user-identified variants. In addition, the tool computes efficiency and specificity scores for sgRNA designs targeting both the variant and the reference. Moreover, SNP-CRISPR provides the option to upload multiple SNPs and target single or multiple nearby base changes simultaneously with a single sgRNA design. Given these capabilities, SNP-CRISPR has a wide range of potential research applications in model systems and potential applications for design of sgRNAs for disease-associated mutant correction.
Yanhui Hu, Richelle Sopko, Verena Chung, Marianna Foos, Romain A Studer, Sean D Landry, Daniel Liu, Leonard Rabinow, Florian Gnad, Pedro Beltrao, and Norbert Perrimon. 2019. “iProteinDB: An Integrative Database of Post-translational Modifications.” G3 (Bethesda), 9, 1, Pp. 1-11.Abstract
Post-translational modification (PTM) serves as a regulatory mechanism for protein function, influencing their stability, interactions, activity and localization, and is critical in many signaling pathways. The best characterized PTM is phosphorylation, whereby a phosphate is added to an acceptor residue, most commonly serine, threonine and tyrosine in metazoans. As proteins are often phosphorylated at multiple sites, identifying those sites that are important for function is a challenging problem. Considering that any given phosphorylation site might be non-functional, prioritizing evolutionarily conserved phosphosites provides a general strategy to identify the putative functional sites. To facilitate the identification of conserved phosphosites, we generated a large-scale phosphoproteomics dataset from embryos collected from six closely-related species. We built iProteinDB (https://www.flyrnai.org/tools/iproteindb/), a resource integrating these data with other high-throughput PTM datasets, including vertebrates, and manually curated information for At iProteinDB, scientists can view the PTM landscape for any protein and identify predicted functional phosphosites based on a comparative analysis of data from closely-related species. Further, iProteinDB enables comparison of PTM data from to that of orthologous proteins from other model organisms, including human, mouse, rat, , , and .
Yanhui Hu, Arunachalam Vinayagam, Ankita Nand, Aram Comjean, Verena Chung, Tong Hao, Stephanie E Mohr, and Norbert Perrimon. 11/16/2017. “Molecular Interaction Search Tool (MIST): an integrated resource for mining gene and protein interaction data.” Nucleic Acids Res, 46, D1, Pp. D567-D574.Abstract
Model organism and human databases are rich with information about genetic and physical interactions. These data can be used to interpret and guide the analysis of results from new studies and develop new hypotheses. Here, we report the development of the Molecular Interaction Search Tool (MIST; http://fgrtools.hms.harvard.edu/MIST/). The MIST database integrates biological interaction data from yeast, nematode, fly, zebrafish, frog, rat and mouse model systems, as well as human. For individual or short gene lists, the MIST user interface can be used to identify interacting partners based on protein-protein and genetic interaction (GI) data from the species of interest as well as inferred interactions, known as interologs, and to view a corresponding network. The data, interologs and search tools at MIST are also useful for analyzing 'omics datasets. In addition to describing the integrated database, we also demonstrate how MIST can be used to identify an appropriate cut-off value that balances false positive and negative discovery, and present use-cases for additional types of analysis. Altogether, the MIST database and search tools support visualization and navigation of existing protein and GI data, as well as comparison of new and existing data.
Julia Wang, Rami Al-Ouran, Yanhui Hu, Seon-Young Kim, Ying-Wooi Wan, Michael F Wangler, Shinya Yamamoto, Hsiao-Tuan Chao, Aram Comjean, Stephanie E Mohr, Undiagnosed Diseases Network, Norbert Perrimon, Zhandong Liu, and Hugo J Bellen. 6/1/2017. “MARRVEL: Integration of Human and Model Organism Genetic Resources to Facilitate Functional Annotation of the Human Genome.” Am J Hum Genet, 100, 6, Pp. 843-853.Abstract
One major challenge encountered with interpreting human genetic variants is the limited understanding of the functional impact of genetic alterations on biological processes. Furthermore, there remains an unmet demand for an efficient survey of the wealth of information on human homologs in model organisms across numerous databases. To efficiently assess the large volume of publically available information, it is important to provide a concise summary of the most relevant information in a rapid user-friendly format. To this end, we created MARRVEL (model organism aggregated resources for rare variant exploration). MARRVEL is a publicly available website that integrates information from six human genetic databases and seven model organism databases. For any given variant or gene, MARRVEL displays information from OMIM, ExAC, ClinVar, Geno2MP, DGV, and DECIPHER. Importantly, it curates model organism-specific databases to concurrently display a concise summary regarding the human gene homologs in budding and fission yeast, worm, fly, fish, mouse, and rat on a single webpage. Experiment-based information on tissue expression, protein subcellular localization, biological process, and molecular function for the human gene and homologs in the seven model organisms are arranged into a concise output. Hence, rather than visiting multiple separate databases for variant and gene analysis, users can obtain important information by searching once through MARRVEL. Altogether, MARRVEL dramatically improves efficiency and accessibility to data collection and facilitates analysis of human genes and variants by cross-disciplinary integration of 18 million records available in public databases to facilitate clinical diagnosis and basic research.
Yanhui Hu, Aram Comjean, Stephanie E Mohr, The FlyBase Consortium, and Norbert Perrimon. 8/7/2017. “Gene2Function: An Integrated Online Resource for Gene Function Discovery.” G3 (Bethesda).Abstract
One of the most powerful ways to develop hypotheses regarding biological functions of conserved genes in a given species, such as in humans, is to first look at what is known about function in another species. Model organism databases (MODs) and other resources are rich with functional information but difficult to mine. Gene2Function (G2F) addresses a broad need by integrating information about conserved genes in a single online resource.
Arunachalam Vinayagam, Travis E Gibson, Ho-Joon Lee, Bahar Yilmazel, Charles Roesel, Yanhui Hu, Young Kwon, Amitabh Sharma, Yang-Yu Liu, Norbert Perrimon, and Albert-László Barabási. 5/3/2016. “Controllability analysis of the directed human protein interaction network identifies disease genes and drug targets.” Proc Natl Acad Sci U S A, 113, 18, Pp. 4976-81.Abstract

The protein-protein interaction (PPI) network is crucial for cellular information processing and decision-making. With suitable inputs, PPI networks drive the cells to diverse functional outcomes such as cell proliferation or cell death. Here, we characterize the structural controllability of a large directed human PPI network comprising 6,339 proteins and 34,813 interactions. This network allows us to classify proteins as "indispensable," "neutral," or "dispensable," which correlates to increasing, no effect, or decreasing the number of driver nodes in the network upon removal of that protein. We find that 21% of the proteins in the PPI network are indispensable. Interestingly, these indispensable proteins are the primary targets of disease-causing mutations, human viruses, and drugs, suggesting that altering a network's control property is critical for the transition between healthy and disease states. Furthermore, analyzing copy number alterations data from 1,547 cancer patients reveals that 56 genes that are frequently amplified or deleted in nine different cancers are indispensable. Among the 56 genes, 46 of them have not been previously associated with cancer. This suggests that controllability analysis is very useful in identifying novel disease genes and potential drug targets.

Yanhui Hu, Aram Comjean, Charles Roesel, Arunachalam Vinayagam, Ian Flockhart, Jonathan Zirin, Lizabeth Perkins, Norbert Perrimon, and Stephanie E Mohr. 10/11/2016. “FlyRNAi.org—the database of the Drosophila RNAi screening center and transgenic RNAi project: 2017 update.” Nucleic Acids Research. Publisher's VersionAbstract

The FlyRNAi database of the Drosophila RNAi Screening Center (DRSC) and Transgenic RNAi Project (TRiP) at Harvard Medical School and associated DRSC/TRiP Functional Genomics Resources website (http://fgr.hms.harvard.edu) serve as a reagent production tracking system, screen data repository, and portal to the community. Through this portal, we make available protocols, online tools, and other resources useful to researchers at all stages of high-throughput functional genomics screening, from assay design and reagent identification to data analysis and interpretation. In this update, we describe recent changes and additions to our website, database and suite of online tools. Recent changes reflect a shift in our focus from a single technology (RNAi) and model species (Drosophila) to the application of additional technologies (e.g. CRISPR) and support of integrated, cross-species approaches to uncovering gene function using functional genomics and other approaches.

Arunachalam Vinayagam, Meghana M Kulkarni, Richelle Sopko, Xiaoyun Sun, Yanhui Hu, Ankita Nand, Christians Villalta, Ahmadali Moghimi, Xuemei Yang, Stephanie E Mohr, Pengyu Hong, John M Asara, and Norbert Perrimon. 9/13/2016. “An Integrative Analysis of the InR/PI3K/Akt Network Identifies the Dynamic Response to Insulin Signaling.” Cell Reports, 16, 11, Pp. 3062-3074.Abstract

Insulin regulates an essential conserved signaling pathway affecting growth, proliferation, and meta- bolism. To expand our understanding of the insulin pathway, we combine biochemical, genetic, and computational approaches to build a comprehensive Drosophila InR/PI3K/Akt network. First, we map the dynamic protein-protein interaction network sur- rounding the insulin core pathway using bait-prey interactions connecting 566 proteins. Combining RNAi screening and phospho-specific antibodies, we find that 47% of interacting proteins affect pathway activity, and, using quantitative phospho- proteomics, we demonstrate that $10% of interact- ing proteins are regulated by insulin stimulation at the level of phosphorylation. Next, we integrate these orthogonal datasets to characterize the structure and dynamics of the insulin network at the level of protein complexes and validate our method by iden- tifying regulatory roles for the Protein Phosphatase 2A (PP2A) and Reptin-Pontin chromatin-remodeling complexes as negative and positive regulators of ribosome biogenesis, respectively. Altogether, our study represents a comprehensive resource for the study of the evolutionary conserved insulin network. 

Stephanie E Mohr, Yanhui Hu, Benjamin Ewen-Campen, Benjamin E Housden, Raghuvir Viswanatha, and Norbert Perrimon. 2016. “CRISPR guide RNA design for research applications.” FEBS J.Abstract

The rapid rise of CRISPR as a technology for genome engineering and related research applications has created a need for algorithms and associated online tools that facilitate design of on-target and effective guide RNAs (gRNAs). Here, we review the state-of-the-art in CRISPR gRNA design for research applications of the CRISPR-Cas9 system, including knockout, activation and inhibition. Notably, achieving good gRNA design is not solely dependent on innovations in CRISPR technology. Good design and design tools also rely on availability of high-quality genome sequence and gene annotations, as well as on availability of accumulated data regarding off-targets and effectiveness metrics. This article is protected by copyright. All rights reserved.

Lizabeth A Perkins, Laura Holderbaum, Rong Tao, Yanhui Hu, Richelle Sopko, Kim McCall, Donghui Yang-Zhou, Ian Flockhart, Richard Binari, Hye-Seok Shim, Audrey Miller, Amy Housden, Marianna Foos, Sakara Randkelv, Colleen Kelley, Pema Namgyal, Christians Villalta, Lu-Ping Liu, Xia Jiang, Qiao Huan-Huan, Xia Wang, Asao Fujiyama, Atsushi Toyoda, Kathleen Ayers, Allison Blum, Benjamin Czech, Ralph Neumuller, Dong Yan, Amanda Cavallaro, Karen Hibbard, Don Hall, Lynn Cooley, Gregory J Hannon, Ruth Lehmann, Annette Parks, Stephanie E Mohr, Ryu Ueda, Shu Kondo, Jian-Quan Ni, and Norbert Perrimon. 2015. “The Transgenic RNAi Project at Harvard Medical School: Resources and Validation.” Genetics, 201, 3, Pp. 843-52.Abstract

To facilitate large-scale functional studies in Drosophila, the Drosophila Transgenic RNAi Project (TRiP) at Harvard Medical School (HMS) was established along with several goals: developing efficient vectors for RNAi that work in all tissues, generating a genome-scale collection of RNAi stocks with input from the community, distributing the lines as they are generated through existing stock centers, validating as many lines as possible using RT-qPCR and phenotypic analyses, and developing tools and web resources for identifying RNAi lines and retrieving existing information on their quality. With these goals in mind, here we describe in detail the various tools we developed and the status of the collection, which is currently composed of 11,491 lines and covering 71% of Drosophila genes. Data on the characterization of the lines either by RT-qPCR or phenotype is available on a dedicated website, the RNAi Stock Validation and Phenotypes Project (RSVP, http://www.flyrnai.org/RSVP.html), and stocks are available from three stock centers, the Bloomington Drosophila Stock Center (United States), National Institute of Genetics (Japan), and TsingHua Fly Center (China).

Stephanie E Mohr, Yanhui Hu, Kirstin Rudd, Michael Buckner, Quentin Gilly, Blake Foster, Katarzyna Sierzputowska, Aram Comjean, Bing Ye, and Norbert Perrimon. 2015. “Reagent and Data Resources for Investigation of RNA Binding Protein Functions in Drosophila melanogaster Cultured Cells.” G3 (Bethesda), 5, 9, Pp. 1919-24.Abstract

RNA binding proteins (RBPs) are involved in many cellular functions. To facilitate functional characterization of RBPs, we generated an RNA interference (RNAi) library for Drosophila cell-based screens comprising reagents targeting known or putative RBPs. To test the quality of the library and provide a baseline analysis of the effects of the RNAi reagents on viability, we screened the library using a total ATP assay and high-throughput imaging in Drosophila S2R+ cultured cells. The results are consistent with production of a high-quality library that will be useful for functional genomics studies using other assays. Altogether, we provide resources in the form of an initial curated list of Drosophila RBPs; an RNAi screening library we expect to be used with additional assays that address more specific biological questions; and total ATP and image data useful for comparison of those additional assay results with fundamental information such as effects of a given reagent in the library on cell viability. Importantly, we make the baseline data, including more than 200,000 images, easily accessible online.

Yanhui Hu, Aram Comjean, Lizabeth A Perkins, Norbert Perrimon, and Stephanie E Mohr. 2015. “GLAD: an Online Database of Gene List Annotation for Drosophila.” J Genomics, 3, Pp. 75-81.Abstract

We present a resource of high quality lists of functionally related Drosophila genes, e.g. based on protein domains (kinases, transcription factors, etc.) or cellular function (e.g. autophagy, signal transduction). To establish these lists, we relied on different inputs, including curation from databases or the literature and mapping from other species. Moreover, as an added curation and quality control step, we asked experts in relevant fields to review many of the lists. The resource is available online for scientists to search and view, and is editable based on community input. Annotation of gene groups is an ongoing effort and scientific need will typically drive decisions regarding which gene lists to pursue. We anticipate that the number of lists will increase over time; that the composition of some lists will grow and/or change over time as new information becomes available; and that the lists will benefit the scientific community, e.g. at experimental design and data analysis stages. Based on this, we present an easily updatable online database, available at www.flyrnai.org/glad, at which gene group lists can be viewed, searched and downloaded.

Benjamin E Housden, Shuailiang Lin, and Norbert Perrimon. 2014. “Cas9-based genome editing in Drosophila.” Methods Enzymol, 546, Pp. 415-39.Abstract

Our ability to modify the Drosophila genome has recently been revolutionized by the development of the CRISPR system. The simplicity and high efficiency of this system allows its widespread use for many different applications, greatly increasing the range of genome modification experiments that can be performed. Here, we first discuss some general design principles for genome engineering experiments in Drosophila and then present detailed protocols for the production of CRISPR reagents and screening strategies to detect successful genome modification events in both tissue culture cells and animals.

Arunachalam Vinayagam, Jonathan Zirin, Charles Roesel, Yanhui Hu, Bahar Yilmazel, Anastasia A Samsonova, Ralph A Neumüller, Stephanie E Mohr, and Norbert Perrimon. 2014. “Integrating protein-protein interaction networks with phenotypes reveals signs of interactions.” Nat Methods, 11, 1, Pp. 94-9.Abstract

A major objective of systems biology is to organize molecular interactions as networks and to characterize information flow within networks. We describe a computational framework to integrate protein-protein interaction (PPI) networks and genetic screens to predict the 'signs' of interactions (i.e., activation-inhibition relationships). We constructed a Drosophila melanogaster signed PPI network consisting of 6,125 signed PPIs connecting 3,352 proteins that can be used to identify positive and negative regulators of signaling pathways and protein complexes. We identified an unexpected role for the metabolic enzymes enolase and aldo-keto reductase as positive and negative regulators of proteolysis, respectively. Characterization of the activation-inhibition relationships between physically interacting proteins within signaling pathways will affect our understanding of many biological functions, including signal transduction and mechanisms of disease.

Bahar Yilmazel, Yanhui Hu, Frederic Sigoillot, Jennifer A Smith, Caroline E Shamu, Norbert Perrimon, and Stephanie E Mohr. 2014. “Online GESS: prediction of miRNA-like off-target effects in large-scale RNAi screen data by seed region analysis.” BMC Bioinformatics, 15, Pp. 192.Abstract

BACKGROUND: RNA interference (RNAi) is an effective and important tool used to study gene function. For large-scale screens, RNAi is used to systematically down-regulate genes of interest and analyze their roles in a biological process. However, RNAi is associated with off-target effects (OTEs), including microRNA (miRNA)-like OTEs. The contribution of reagent-specific OTEs to RNAi screen data sets can be significant. In addition, the post-screen validation process is time and labor intensive. Thus, the availability of robust approaches to identify candidate off-targeted transcripts would be beneficial. RESULTS: Significant efforts have been made to eliminate false positive results attributable to sequence-specific OTEs associated with RNAi. These approaches have included improved algorithms for RNAi reagent design, incorporation of chemical modifications into siRNAs, and the use of various bioinformatics strategies to identify possible OTEs in screen results. Genome-wide Enrichment of Seed Sequence matches (GESS) was developed to identify potential off-targeted transcripts in large-scale screen data by seed-region analysis. Here, we introduce a user-friendly web application that provides researchers a relatively quick and easy way to perform GESS analysis on data from human or mouse cell-based screens using short interfering RNAs (siRNAs) or short hairpin RNAs (shRNAs), as well as for Drosophila screens using shRNAs. Online GESS relies on up-to-date transcript sequence annotations for human and mouse genes extracted from NCBI Reference Sequence (RefSeq) and Drosophila genes from FlyBase. The tool also accommodates analysis with user-provided reference sequence files. CONCLUSION: Online GESS provides a straightforward user interface for genome-wide seed region analysis for human, mouse and Drosophila RNAi screen data. With the tool, users can either use a built-in database or provide a database of transcripts for analysis. This makes it possible to analyze RNAi data from any organism for which the user can provide transcript sequences.

Xingjie Ren, Jin Sun, Benjamin E Housden, Yanhui Hu, Charles Roesel, Shuailiang Lin, Lu-Ping Liu, Zhihao Yang, Decai Mao, Lingzhu Sun, Qujie Wu, Jun-Yuan Ji, Jianzhong Xi, Stephanie E Mohr, Jiang Xu, Norbert Perrimon, and Jian-Quan Ni. 2013. “Optimized gene editing technology for Drosophila melanogaster using germ line-specific Cas9.” Proc Natl Acad Sci U S A, 110, 47, Pp. 19012-7.Abstract

The ability to engineer genomes in a specific, systematic, and cost-effective way is critical for functional genomic studies. Recent advances using the CRISPR-associated single-guide RNA system (Cas9/sgRNA) illustrate the potential of this simple system for genome engineering in a number of organisms. Here we report an effective and inexpensive method for genome DNA editing in Drosophila melanogaster whereby plasmid DNAs encoding short sgRNAs under the control of the U6b promoter are injected into transgenic flies in which Cas9 is specifically expressed in the germ line via the nanos promoter. We evaluate the off-targets associated with the method and establish a Web-based resource, along with a searchable, genome-wide database of predicted sgRNAs appropriate for genome engineering in flies. Finally, we discuss the advantages of our method in comparison with other recently published approaches.

Yanhui Hu, Richelle Sopko, Marianna Foos, Colleen Kelley, Ian Flockhart, Noemie Ammeux, Xiaowei Wang, Lizabeth Perkins, Norbert Perrimon, and Stephanie E Mohr. 2013. “FlyPrimerBank: an online database for Drosophila melanogaster gene expression analysis and knockdown evaluation of RNAi reagents.” G3 (Bethesda), 3, 9, Pp. 1607-16.Abstract

The evaluation of specific endogenous transcript levels is important for understanding transcriptional regulation. More specifically, it is useful for independent confirmation of results obtained by the use of microarray analysis or RNA-seq and for evaluating RNA interference (RNAi)-mediated gene knockdown. Designing specific and effective primers for high-quality, moderate-throughput evaluation of transcript levels, i.e., quantitative, real-time PCR (qPCR), is nontrivial. To meet community needs, predefined qPCR primer pairs for mammalian genes have been designed and sequences made available, e.g., via PrimerBank. In this work, we adapted and refined the algorithms used for the mammalian PrimerBank to design 45,417 primer pairs for 13,860 Drosophila melanogaster genes, with three or more primer pairs per gene. We experimentally validated primer pairs for ~300 randomly selected genes expressed in early Drosophila embryos, using SYBR Green-based qPCR and sequence analysis of products derived from conventional PCR. All relevant information, including primer sequences, isoform specificity, spatial transcript targeting, and any available validation results and/or user feedback, is available from an online database (www.flyrnai.org/flyprimerbank). At FlyPrimerBank, researchers can retrieve primer information for fly genes either one gene at a time or in batch mode. Importantly, we included the overlap of each predicted amplified sequence with RNAi reagents from several public resources, making it possible for researchers to choose primers suitable for knockdown evaluation of RNAi reagents (i.e., to avoid amplification of the RNAi reagent itself). We demonstrate the utility of this resource for validation of RNAi reagents in vivo.

Yanhui Hu, Charles Roesel, Ian Flockhart, Lizabeth Perkins, Norbert Perrimon, and Stephanie E Mohr. 2013. “UP-TORR: online tool for accurate and Up-to-Date annotation of RNAi Reagents.” Genetics, 195, 1, Pp. 37-45.Abstract

RNA interference (RNAi) is a widely adopted tool for loss-of-function studies but RNAi results only have biological relevance if the reagents are appropriately mapped to genes. Several groups have designed and generated RNAi reagent libraries for studies in cells or in vivo for Drosophila and other species. At first glance, matching RNAi reagents to genes appears to be a simple problem, as each reagent is typically designed to target a single gene. In practice, however, the reagent-gene relationship is complex. Although the sequences of oligonucleotides used to generate most types of RNAi reagents are static, the reference genome and gene annotations are regularly updated. Thus, at the time a researcher chooses an RNAi reagent or analyzes RNAi data, the most current interpretation of the RNAi reagent-gene relationship, as well as related information regarding specificity (e.g., predicted off-target effects), can be different from the original interpretation. Here, we describe a set of strategies and an accompanying online tool, UP-TORR (for Updated Targets of RNAi Reagents; www.flyrnai.org/up-torr), useful for accurate and up-to-date annotation of cell-based and in vivo RNAi reagents. Importantly, UP-TORR automatically synchronizes with gene annotations daily, retrieving the most current information available, and for Drosophila, also synchronizes with the major reagent collections. Thus, UP-TORR allows users to choose the most appropriate RNAi reagents at the onset of a study, as well as to perform the most appropriate analyses of results of RNAi-based studies.

Arunachalam Vinayagam, Yanhui Hu, Meghana Kulkarni, Charles Roesel, Richelle Sopko, Stephanie E Mohr, and Norbert Perrimon. 2013. “Protein complex-based analysis framework for high-throughput data sets.” Sci Signal, 6, 264, Pp. rs5.Abstract

Analysis of high-throughput data increasingly relies on pathway annotation and functional information derived from Gene Ontology. This approach has limitations, in particular for the analysis of network dynamics over time or under different experimental conditions, in which modules within a network rather than complete pathways might respond and change. We report an analysis framework based on protein complexes, which are at the core of network reorganization. We generated a protein complex resource for human, Drosophila, and yeast from the literature and databases of protein-protein interaction networks, with each species having thousands of complexes. We developed COMPLEAT (http://www.flyrnai.org/compleat), a tool for data mining and visualization for complex-based analysis of high-throughput data sets, as well as analysis and integration of heterogeneous proteomics and gene expression data sets. With COMPLEAT, we identified dynamically regulated protein complexes among genome-wide RNA interference data sets that used the abundance of phosphorylated extracellular signal-regulated kinase in cells stimulated with either insulin or epidermal growth factor as the output. The analysis predicted that the Brahma complex participated in the insulin response.

Ian T Flockhart, Matthew Booker, Yanhui Hu, Benjamin McElvany, Quentin Gilly, Bernard Mathey-Prevot, Norbert Perrimon, and Stephanie E Mohr. 2012. “FlyRNAi.org--the database of the Drosophila RNAi screening center: 2012 update.” Nucleic Acids Res, 40, Database issue, Pp. D715-9.Abstract

FlyRNAi (http://www.flyrnai.org), the database and website of the Drosophila RNAi Screening Center (DRSC) at Harvard Medical School, serves a dual role, tracking both production of reagents for RNA interference (RNAi) screening in Drosophila cells and RNAi screen results. The database and website is used as a platform for community availability of protocols, tools, and other resources useful to researchers planning, conducting, analyzing or interpreting the results of Drosophila RNAi screens. Based on our own experience and user feedback, we have made several changes. Specifically, we have restructured the database to accommodate new types of reagents; added information about new RNAi libraries and other reagents; updated the user interface and website; and added new tools of use to the Drosophila community and others. Overall, the result is a more useful, flexible and comprehensive website and database.

Marcelo Perez-Pepe, Victoria Slomiansky, Mariela Loschi, Luciana Luchelli, Maximiliano Neme, María Gabriela Thomas, and Graciela Lidia Boccaccio. 2012. “BUHO: a MATLAB script for the study of stress granules and processing bodies by high-throughput image analysis.” PLoS One, 7, 12, Pp. e51495.Abstract

The spontaneous and reversible formation of foci and filaments that contain proteins involved in different metabolic processes is common in both the nucleus and the cytoplasm. Stress granules (SGs) and processing bodies (PBs) belong to a novel family of cellular structures collectively known as mRNA silencing foci that harbour repressed mRNAs and their associated proteins. SGs and PBs are highly dynamic and they form upon stress and dissolve thus releasing the repressed mRNAs according to changes in cell physiology. In addition, aggregates containing abnormal proteins are frequent in neurodegenerative disorders. In spite of the growing relevance of these supramolecular aggregates to diverse cellular functions a reliable automated tool for their systematic analysis is lacking. Here we report a MATLAB Script termed BUHO for the high-throughput image analysis of cellular foci. We used BUHO to assess the number, size and distribution of distinct objects with minimal deviation from manually obtained parameters. BUHO successfully addressed the induction of both SGs and PBs in mammalian and insect cells exposed to different stress stimuli. We also used BUHO to assess the dynamics of specific mRNA-silencing foci termed Smaug 1 foci (S-foci) in primary neurons upon synaptic stimulation. Finally, we used BUHO to analyze the role of candidate genes on SG formation in an RNAi-based experiment. We found that FAK56D, GCN2 and PP1 govern SG formation. The role of PP1 is conserved in mammalian cells as judged by the effect of the PP1 inhibitor salubrinal, and involves dephosphorylation of the translation factor eIF2α. All these experiments were analyzed manually and by BUHO and the results differed in less than 5% of the average value. The automated analysis by this user-friendly method will allow high-throughput image processing in short times by providing a robust, flexible and reliable alternative to the laborious and sometimes unfeasible visual scrutiny.

Matthew Booker, Anastasia A Samsonova, Young Kwon, Ian Flockhart, Stephanie E Mohr, and Norbert Perrimon. 2011. “False negative rates in Drosophila cell-based RNAi screens: a case study.” BMC Genomics, 12, Pp. 50.Abstract

BACKGROUND: High-throughput screening using RNAi is a powerful gene discovery method but is often complicated by false positive and false negative results. Whereas false positive results associated with RNAi reagents has been a matter of extensive study, the issue of false negatives has received less attention. RESULTS: We performed a meta-analysis of several genome-wide, cell-based Drosophila RNAi screens, together with a more focused RNAi screen, and conclude that the rate of false negative results is at least 8%. Further, we demonstrate how knowledge of the cell transcriptome can be used to resolve ambiguous results and how the number of false negative results can be reduced by using multiple, independently-tested RNAi reagents per gene. CONCLUSIONS: RNAi reagents that target the same gene do not always yield consistent results due to false positives and weak or ineffective reagents. False positive results can be partially minimized by filtering with transcriptome data. RNAi libraries with multiple reagents per gene also reduce false positive and false negative outcomes when inconsistent results are disambiguated carefully.

Adam A Friedman, George Tucker, Rohit Singh, Dong Yan, Arunachalam Vinayagam, Yanhui Hu, Richard Binari, Pengyu Hong, Xiaoyun Sun, Maura Porto, Svetlana Pacifico, Thilakam Murali, Russell L Finley, John M Asara, Bonnie Berger, and Norbert Perrimon. 2011. “Proteomic and functional genomic landscape of receptor tyrosine kinase and ras to extracellular signal-regulated kinase signaling.” Sci Signal, 4, 196, Pp. rs10.Abstract

Characterizing the extent and logic of signaling networks is essential to understanding specificity in such physiological and pathophysiological contexts as cell fate decisions and mechanisms of oncogenesis and resistance to chemotherapy. Cell-based RNA interference (RNAi) screens enable the inference of large numbers of genes that regulate signaling pathways, but these screens cannot provide network structure directly. We describe an integrated network around the canonical receptor tyrosine kinase (RTK)-Ras-extracellular signal-regulated kinase (ERK) signaling pathway, generated by combining parallel genome-wide RNAi screens with protein-protein interaction (PPI) mapping by tandem affinity purification-mass spectrometry. We found that only a small fraction of the total number of PPI or RNAi screen hits was isolated under all conditions tested and that most of these represented the known canonical pathway components, suggesting that much of the core canonical ERK pathway is known. Because most of the newly identified regulators are likely cell type- and RTK-specific, our analysis provides a resource for understanding how output through this clinically relevant pathway is regulated in different contexts. We report in vivo roles for several of the previously unknown regulators, including CG10289 and PpV, the Drosophila orthologs of two components of the serine/threonine-protein phosphatase 6 complex; the Drosophila ortholog of TepIV, a glycophosphatidylinositol-linked protein mutated in human cancers; CG6453, a noncatalytic subunit of glucosidase II; and Rtf1, a histone methyltransferase.

Yanhui Hu, Ian Flockhart, Arunachalam Vinayagam, Clemens Bergwitz, Bonnie Berger, Norbert Perrimon, and Stephanie E Mohr. 2011. “An integrative approach to ortholog prediction for disease-focused and other functional studies.” BMC Bioinformatics, 12, Pp. 357.Abstract

BACKGROUND: Mapping of orthologous genes among species serves an important role in functional genomics by allowing researchers to develop hypotheses about gene function in one species based on what is known about the functions of orthologs in other species. Several tools for predicting orthologous gene relationships are available. However, these tools can give different results and identification of predicted orthologs is not always straightforward. RESULTS: We report a simple but effective tool, the Drosophila RNAi Screening Center Integrative Ortholog Prediction Tool (DIOPT; http://www.flyrnai.org/diopt), for rapid identification of orthologs. DIOPT integrates existing approaches, facilitating rapid identification of orthologs among human, mouse, zebrafish, C. elegans, Drosophila, and S. cerevisiae. As compared to individual tools, DIOPT shows increased sensitivity with only a modest decrease in specificity. Moreover, the flexibility built into the DIOPT graphical user interface allows researchers with different goals to appropriately 'cast a wide net' or limit results to highest confidence predictions. DIOPT also displays protein and domain alignments, including percent amino acid identity, for predicted ortholog pairs. This helps users identify the most appropriate matches among multiple possible orthologs. To facilitate using model organisms for functional analysis of human disease-associated genes, we used DIOPT to predict high-confidence orthologs of disease genes in Online Mendelian Inheritance in Man (OMIM) and genes in genome-wide association study (GWAS) data sets. The results are accessible through the DIOPT diseases and traits query tool (DIOPT-DIST; http://www.flyrnai.org/diopt-dist). CONCLUSIONS: DIOPT and DIOPT-DIST are useful resources for researchers working with model organisms, especially those who are interested in exploiting model organisms such as Drosophila to study the functions of human disease genes.

Michael Schnall-Levin, Yong Zhao, Norbert Perrimon, and Bonnie Berger. 2010. “Conserved microRNA targeting in Drosophila is as widespread in coding regions as in 3'UTRs.” Proc Natl Acad Sci U S A, 107, 36, Pp. 15751-6.Abstract

MicroRNAs (miRNAs) are a class of short noncoding RNAs that regulate protein-coding genes posttranscriptionally. In animals, most known miRNA targeting occurs within the 3'UTR of mRNAs, but the extent of biologically relevant targeting in the ORF or 5'UTR of mRNAs remains unknown. Here, we develop an algorithm (MinoTar-miRNA ORF Targets) to identify conserved regulatory motifs within protein-coding regions and use it to estimate the number of preferentially conserved miRNA-target sites in ORFs. We show that, in Drosophila, preferentially conserved miRNA targeting in ORFs is as widespread as it is in 3'UTRs and that, while far less abundant, conserved targets in Drosophila 5'UTRs number in the hundreds. Using our algorithm, we predicted a set of high-confidence ORF targets and selected seven miRNA-target pairs from among these for experimental validation. We observed down-regulation by the miRNA in five out of seven cases, indicating our approach can recover functional sites with high confidence. Additionally, we observed additive targeting by multiple sites within a single ORF. Altogether, our results demonstrate that the scale of biologically important miRNA targeting in ORFs is extensive and that computational tools such as ours can aid in the identification of such targets. Further evidence suggests that our results extend to mammals, but that the extent of ORF and 5'UTR targeting relative to 3'UTR targeting may be greater in Drosophila.