Bioinformatics

Working Paper
Raghuvir Viswanatha, Enzo Mameli, Jonathan Rodiger, Pierre Merckaert, Fabiana Feitosa-Suntheimer, Tonya M. Colpitts, Stephanie E. Mohr, Yanhui Hu, and Norbert Perrimon. Working Paper. “Bioinformatic and cell-based tools for pooled CRISPR knockout screening in mosquitos.” bioRxiv. Publisher's VersionAbstract
Mosquito-borne diseases present a worldwide public health burden. Genome-scale screening tools that could inform our understanding of mosquitos and their control are lacking. Here, we adapt a recombination-mediated cassette exchange system for delivery of CRISPR sgRNA libraries into cell lines from several mosquito species and perform pooled CRISPR screens in an Anopheles cell line. To implement this method, we engineered modified mosquito cell lines, validated promoters and developed bioinformatics tools for multiple mosquito species.Competing Interest StatementThe authors have declared no competing interest.
2021.03.29.437496v2.full_.pdf
Submitted
A.M. Conard, N. Goodman, Hu, Y, N. Perrimon, R. Singh, C. Lawrence, and E. Larschan. Submitted. “TIMEOR: a web-based tool to uncover temporal regulatory mechanisms from multi-omics data.” BioRxiv. Publisher's VersionAbstract
Uncovering how transcription factors (TFs) regulate their targets at the DNA, RNA and protein levels over time is critical to define gene regulatory networks (GRNs) in normal and diseased states. RNA-seq has become a standard method to measure gene regulation using an established set of analysis steps. However, none of the currently available pipeline methods for interpreting ordered genomic data (in time or space) use time series models to assign cause and effect relationships within GRNs, are adaptive to diverse experimental designs, or enable user interpretation through a web-based platform. Furthermore, methods which integrate ordered RNA-seq data with transcription factor binding data are urgently needed. Here, we present TIMEOR (Trajectory Inference and Mechanism Exploration with Omics data in R), the first web-based and adaptive time series multi-omics pipeline method which infers the relationship between gene regulatory events across time. TIMEOR addresses the critical need for methods to predict causal regulatory mechanism networks between TFs from time series multi-omics data. We used TIMEOR to identify a new link between insulin stimulation and the circadian rhythm cycle. TIMEOR is available at https://github.com/ashleymaeconard/TIMEOR.git.
2020.09.14.296418v1.full_.pdf
2021
Yanhui Hu, Sudhir Gopal Tattikota, Yifang Liu, Aram Comjean, Yue Gao, Corey Forman, Grace Kim, Jonathan Rodiger, Irene Papatheodorou, Gilberto Dos Santos, Stephanie E Mohr, and Norbert Perrimon. 2021. “DRscDB: A single-cell RNA-seq resource for data mining and data comparison across species.” Comput Struct Biotechnol J, 19, Pp. 2018-2026.Abstract
With the advent of single-cell RNA sequencing (scRNA-seq) technologies, there has been a spike in studies involving scRNA-seq of several tissues across diverse species including Drosophila. Although a few databases exist for users to query genes of interest within the scRNA-seq studies, search tools that enable users to find orthologous genes and their cell type-specific expression patterns across species are limited. Here, we built a new search database, DRscDB (https://www.flyrnai.org/tools/single_cell/web/), to address this need. DRscDB serves as a comprehensive repository for published scRNA-seq datasets for Drosophila and relevant datasets from human and other model organisms. DRscDB is based on manual curation of Drosophila scRNA-seq studies of various tissue types and their corresponding analogous tissues in vertebrates including zebrafish, mouse, and human. Of note, our search database provides most of the literature-derived marker genes, thus preserving the original analysis of the published scRNA-seq datasets. Finally, DRscDB serves as a web-based user interface that allows users to mine gene expression data from scRNA-seq studies and perform cell cluster enrichment analyses pertaining to various scRNA-seq studies, both within and across species.
DRscDB.pdf
Xuechun Feng, Víctor López Del Amo, Enzo Mameli, Megan Lee, Alena L Bishop, Norbert Perrimon, and Valentino M Gantz. 2021. “Optimized CRISPR tools and site-directed transgenesis towards gene drive development in Culex quinquefasciatus mosquitoes.” Nat Commun, 12, 1, Pp. 2960.Abstract
Culex mosquitoes are a global vector for multiple human and animal diseases, including West Nile virus, lymphatic filariasis, and avian malaria, posing a constant threat to public health, livestock, companion animals, and endangered birds. While rising insecticide resistance has threatened the control of Culex mosquitoes, advances in CRISPR genome-editing tools have fostered the development of alternative genetic strategies such as gene drive systems to fight disease vectors. However, though gene-drive technology has quickly progressed in other mosquitoes, advances have been lacking in Culex. Here, we develop a Culex-specific Cas9/gRNA expression toolkit and use site-directed homology-based transgenesis to generate and validate a Culex quinquefasciatus Cas9-expressing line. We show that gRNA scaffold variants improve transgenesis efficiency in both Culex quinquefasciatus and Drosophila melanogaster and boost gene-drive performance in the fruit fly. These findings support future technology development to control Culex mosquitoes and provide valuable insight for improving these tools in other species.
s41467-021-23239-0.pdf
2020
Yanhui Hu, Verena Chung, Aram Comjean, Jonathan Rodiger, Fnu Nipun, Norbert Perrimon, and Stephanie E Mohr. 2020. “BioLitMine: Advanced Mining of Biomedical and Biological Literature About Human Genes and Genes from Major Model Organisms.” G3 (Bethesda).Abstract
The accumulation of biological and biomedical literature outpaces the ability of most researchers and clinicians to stay abreast of their own immediate fields, let alone a broader range of topics. Although available search tools support identification of relevant literature, finding relevant and key publications is not always straightforward. For example, important publications might be missed in searches with an official gene name due to gene synonyms. Moreover, ambiguity of gene names can result in retrieval of a large number of irrelevant publications. To address these issues and help researchers and physicians quickly identify relevant publications, we developed BioLitMine, an advanced literature mining tool that takes advantage of the medical subject heading (MeSH) index and gene-to-publication annotations already available for PubMed literature. Using BioLitMine, a user can identify what MeSH terms are represented in the set of publications associated with a given gene of the interest, or start with a term and identify relevant publications. Users can also use the tool to find co-cited genes and a build a literature co-citation network. In addition, BioLitMine can help users build a gene list relevant to a MeSH terms, such as a list of genes relevant to "stem cells" or "breast neoplasms." Users can also start with a gene or pathway of interest and identify authors associated with that gene or pathway, a feature that makes it easier to identify experts who might serve as collaborators or reviewers. Altogether, BioLitMine extends the value of PubMed-indexed literature and its existing expert curation by providing a robust and gene-centric approach to retrieval of relevant information.
4531.full_.pdf
2019
Yanhui Hu, Richelle Sopko, Verena Chung, Marianna Foos, Romain A Studer, Sean D Landry, Daniel Liu, Leonard Rabinow, Florian Gnad, Pedro Beltrao, and Norbert Perrimon. 2019. “iProteinDB: An Integrative Database of Post-translational Modifications.” G3 (Bethesda), 9, 1, Pp. 1-11.Abstract
Post-translational modification (PTM) serves as a regulatory mechanism for protein function, influencing their stability, interactions, activity and localization, and is critical in many signaling pathways. The best characterized PTM is phosphorylation, whereby a phosphate is added to an acceptor residue, most commonly serine, threonine and tyrosine in metazoans. As proteins are often phosphorylated at multiple sites, identifying those sites that are important for function is a challenging problem. Considering that any given phosphorylation site might be non-functional, prioritizing evolutionarily conserved phosphosites provides a general strategy to identify the putative functional sites. To facilitate the identification of conserved phosphosites, we generated a large-scale phosphoproteomics dataset from embryos collected from six closely-related species. We built iProteinDB (https://www.flyrnai.org/tools/iproteindb/), a resource integrating these data with other high-throughput PTM datasets, including vertebrates, and manually curated information for At iProteinDB, scientists can view the PTM landscape for any protein and identify predicted functional phosphosites based on a comparative analysis of data from closely-related species. Further, iProteinDB enables comparison of PTM data from to that of orthologous proteins from other model organisms, including human, mouse, rat, , , and .
iProteinDB-G3-journal.pdf
Chiao-Lin Chen, Jonathan Rodiger, Verena Chung, Raghuvir Viswanatha, Stephanie E Mohr, Yanhui Hu, and Norbert Perrimon. 2019. “SNP-CRISPR: A Web Tool for SNP-Specific Genome Editing.” G3 (Bethesda).Abstract
CRISPR-Cas9 is a powerful genome editing technology in which a short guide RNA (sgRNA) confers target site specificity to achieve Cas9-mediated genome editing. Numerous sgRNA design tools have been developed based on reference genomes for humans and model organisms. However, existing resources are not optimal as genetic mutations or single nucleotide polymorphisms (SNPs) within the targeting region affect the efficiency of CRISPR-based approaches by interfering with guide-target complementarity. To facilitate identification of sgRNAs (1) in non-reference genomes, (2) across varying genetic backgrounds, or (3) for specific targeting of SNP-containing alleles, for example, disease relevant mutations, we developed a web tool, SNP-CRISPR (https://www.flyrnai.org/tools/snp_crispr/). SNP-CRISPR can be used to design sgRNAs based on public variant data sets or user-identified variants. In addition, the tool computes efficiency and specificity scores for sgRNA designs targeting both the variant and the reference. Moreover, SNP-CRISPR provides the option to upload multiple SNPs and target single or multiple nearby base changes simultaneously with a single sgRNA design. Given these capabilities, SNP-CRISPR has a wide range of potential research applications in model systems and potential applications for design of sgRNAs for disease-associated mutant correction.
2017
Yanhui Hu, Arunachalam Vinayagam, Ankita Nand, Aram Comjean, Verena Chung, Tong Hao, Stephanie E Mohr, and Norbert Perrimon. 11/16/2017. “Molecular Interaction Search Tool (MIST): an integrated resource for mining gene and protein interaction data.” Nucleic Acids Res, 46, D1, Pp. D567-D574.Abstract
Model organism and human databases are rich with information about genetic and physical interactions. These data can be used to interpret and guide the analysis of results from new studies and develop new hypotheses. Here, we report the development of the Molecular Interaction Search Tool (MIST; http://fgrtools.hms.harvard.edu/MIST/). The MIST database integrates biological interaction data from yeast, nematode, fly, zebrafish, frog, rat and mouse model systems, as well as human. For individual or short gene lists, the MIST user interface can be used to identify interacting partners based on protein-protein and genetic interaction (GI) data from the species of interest as well as inferred interactions, known as interologs, and to view a corresponding network. The data, interologs and search tools at MIST are also useful for analyzing 'omics datasets. In addition to describing the integrated database, we also demonstrate how MIST can be used to identify an appropriate cut-off value that balances false positive and negative discovery, and present use-cases for additional types of analysis. Altogether, the MIST database and search tools support visualization and navigation of existing protein and GI data, as well as comparison of new and existing data.
gkx1116.pdf
Yanhui Hu, Aram Comjean, Stephanie E Mohr, The FlyBase Consortium, and Norbert Perrimon. 8/7/2017. “Gene2Function: An Integrated Online Resource for Gene Function Discovery.” G3 (Bethesda).Abstract
One of the most powerful ways to develop hypotheses regarding biological functions of conserved genes in a given species, such as in humans, is to first look at what is known about function in another species. Model organism databases (MODs) and other resources are rich with functional information but difficult to mine. Gene2Function (G2F) addresses a broad need by integrating information about conserved genes in a single online resource.
2017_G3_Hu.pdf Supplemental Methods.pdf Table S1.xlsx
Julia Wang, Rami Al-Ouran, Yanhui Hu, Seon-Young Kim, Ying-Wooi Wan, Michael F Wangler, Shinya Yamamoto, Hsiao-Tuan Chao, Aram Comjean, Stephanie E Mohr, Undiagnosed Diseases Network, Norbert Perrimon, Zhandong Liu, and Hugo J Bellen. 6/1/2017. “MARRVEL: Integration of Human and Model Organism Genetic Resources to Facilitate Functional Annotation of the Human Genome.” Am J Hum Genet, 100, 6, Pp. 843-853.Abstract
One major challenge encountered with interpreting human genetic variants is the limited understanding of the functional impact of genetic alterations on biological processes. Furthermore, there remains an unmet demand for an efficient survey of the wealth of information on human homologs in model organisms across numerous databases. To efficiently assess the large volume of publically available information, it is important to provide a concise summary of the most relevant information in a rapid user-friendly format. To this end, we created MARRVEL (model organism aggregated resources for rare variant exploration). MARRVEL is a publicly available website that integrates information from six human genetic databases and seven model organism databases. For any given variant or gene, MARRVEL displays information from OMIM, ExAC, ClinVar, Geno2MP, DGV, and DECIPHER. Importantly, it curates model organism-specific databases to concurrently display a concise summary regarding the human gene homologs in budding and fission yeast, worm, fly, fish, mouse, and rat on a single webpage. Experiment-based information on tissue expression, protein subcellular localization, biological process, and molecular function for the human gene and homologs in the seven model organisms are arranged into a concise output. Hence, rather than visiting multiple separate databases for variant and gene analysis, users can obtain important information by searching once through MARRVEL. Altogether, MARRVEL dramatically improves efficiency and accessibility to data collection and facilitates analysis of human genes and variants by cross-disciplinary integration of 18 million records available in public databases to facilitate clinical diagnosis and basic research.
2017_Am J Hum Genet_Wang.pdf Supplement.pdf
2016
Yanhui Hu, Aram Comjean, Charles Roesel, Arunachalam Vinayagam, Ian Flockhart, Jonathan Zirin, Lizabeth Perkins, Norbert Perrimon, and Stephanie E Mohr. 10/11/2016. “FlyRNAi.org—the database of the Drosophila RNAi screening center and transgenic RNAi project: 2017 update.” Nucleic Acids Research. Publisher's VersionAbstract

The FlyRNAi database of the Drosophila RNAi Screening Center (DRSC) and Transgenic RNAi Project (TRiP) at Harvard Medical School and associated DRSC/TRiP Functional Genomics Resources website (http://fgr.hms.harvard.edu) serve as a reagent production tracking system, screen data repository, and portal to the community. Through this portal, we make available protocols, online tools, and other resources useful to researchers at all stages of high-throughput functional genomics screening, from assay design and reagent identification to data analysis and interpretation. In this update, we describe recent changes and additions to our website, database and suite of online tools. Recent changes reflect a shift in our focus from a single technology (RNAi) and model species (Drosophila) to the application of additional technologies (e.g. CRISPR) and support of integrated, cross-species approaches to uncovering gene function using functional genomics and other approaches.

2016_Nucl Acids Res_Hu.pdf
Arunachalam Vinayagam, Meghana M Kulkarni, Richelle Sopko, Xiaoyun Sun, Yanhui Hu, Ankita Nand, Christians Villalta, Ahmadali Moghimi, Xuemei Yang, Stephanie E Mohr, Pengyu Hong, John M Asara, and Norbert Perrimon. 9/13/2016. “An Integrative Analysis of the InR/PI3K/Akt Network Identifies the Dynamic Response to Insulin Signaling.” Cell Reports, 16, 11, Pp. 3062-3074.Abstract

Insulin regulates an essential conserved signaling pathway affecting growth, proliferation, and meta- bolism. To expand our understanding of the insulin pathway, we combine biochemical, genetic, and computational approaches to build a comprehensive Drosophila InR/PI3K/Akt network. First, we map the dynamic protein-protein interaction network sur- rounding the insulin core pathway using bait-prey interactions connecting 566 proteins. Combining RNAi screening and phospho-specific antibodies, we find that 47% of interacting proteins affect pathway activity, and, using quantitative phospho- proteomics, we demonstrate that $10% of interact- ing proteins are regulated by insulin stimulation at the level of phosphorylation. Next, we integrate these orthogonal datasets to characterize the structure and dynamics of the insulin network at the level of protein complexes and validate our method by iden- tifying regulatory roles for the Protein Phosphatase 2A (PP2A) and Reptin-Pontin chromatin-remodeling complexes as negative and positive regulators of ribosome biogenesis, respectively. Altogether, our study represents a comprehensive resource for the study of the evolutionary conserved insulin network. 

2016_Cell Rep_Vinayagam.pdf Supplement.pdf
Arunachalam Vinayagam, Travis E Gibson, Ho-Joon Lee, Bahar Yilmazel, Charles Roesel, Yanhui Hu, Young Kwon, Amitabh Sharma, Yang-Yu Liu, Norbert Perrimon, and Albert-László Barabási. 5/3/2016. “Controllability analysis of the directed human protein interaction network identifies disease genes and drug targets.” Proc Natl Acad Sci U S A, 113, 18, Pp. 4976-81.Abstract

The protein-protein interaction (PPI) network is crucial for cellular information processing and decision-making. With suitable inputs, PPI networks drive the cells to diverse functional outcomes such as cell proliferation or cell death. Here, we characterize the structural controllability of a large directed human PPI network comprising 6,339 proteins and 34,813 interactions. This network allows us to classify proteins as "indispensable," "neutral," or "dispensable," which correlates to increasing, no effect, or decreasing the number of driver nodes in the network upon removal of that protein. We find that 21% of the proteins in the PPI network are indispensable. Interestingly, these indispensable proteins are the primary targets of disease-causing mutations, human viruses, and drugs, suggesting that altering a network's control property is critical for the transition between healthy and disease states. Furthermore, analyzing copy number alterations data from 1,547 cancer patients reveals that 56 genes that are frequently amplified or deleted in nine different cancers are indispensable. Among the 56 genes, 46 of them have not been previously associated with cancer. This suggests that controllability analysis is very useful in identifying novel disease genes and potential drug targets.

2016_PNAS_Vinayagam.pdf
Stephanie E Mohr, Yanhui Hu, Benjamin Ewen-Campen, Benjamin E Housden, Raghuvir Viswanatha, and Norbert Perrimon. 2016. “CRISPR guide RNA design for research applications.” FEBS J.Abstract

The rapid rise of CRISPR as a technology for genome engineering and related research applications has created a need for algorithms and associated online tools that facilitate design of on-target and effective guide RNAs (gRNAs). Here, we review the state-of-the-art in CRISPR gRNA design for research applications of the CRISPR-Cas9 system, including knockout, activation and inhibition. Notably, achieving good gRNA design is not solely dependent on innovations in CRISPR technology. Good design and design tools also rely on availability of high-quality genome sequence and gene annotations, as well as on availability of accumulated data regarding off-targets and effectiveness metrics. This article is protected by copyright. All rights reserved.

2016_FEBS_Mohr.pdf
2015
Yanhui Hu, Aram Comjean, Lizabeth A Perkins, Norbert Perrimon, and Stephanie E Mohr. 2015. “GLAD: an Online Database of Gene List Annotation for Drosophila.” J Genomics, 3, Pp. 75-81.Abstract

We present a resource of high quality lists of functionally related Drosophila genes, e.g. based on protein domains (kinases, transcription factors, etc.) or cellular function (e.g. autophagy, signal transduction). To establish these lists, we relied on different inputs, including curation from databases or the literature and mapping from other species. Moreover, as an added curation and quality control step, we asked experts in relevant fields to review many of the lists. The resource is available online for scientists to search and view, and is editable based on community input. Annotation of gene groups is an ongoing effort and scientific need will typically drive decisions regarding which gene lists to pursue. We anticipate that the number of lists will increase over time; that the composition of some lists will grow and/or change over time as new information becomes available; and that the lists will benefit the scientific community, e.g. at experimental design and data analysis stages. Based on this, we present an easily updatable online database, available at www.flyrnai.org/glad, at which gene group lists can be viewed, searched and downloaded.

2015_J Genomics_Hu.pdf
Stephanie E Mohr, Yanhui Hu, Kirstin Rudd, Michael Buckner, Quentin Gilly, Blake Foster, Katarzyna Sierzputowska, Aram Comjean, Bing Ye, and Norbert Perrimon. 2015. “Reagent and Data Resources for Investigation of RNA Binding Protein Functions in Drosophila melanogaster Cultured Cells.” G3 (Bethesda), 5, 9, Pp. 1919-24.Abstract

RNA binding proteins (RBPs) are involved in many cellular functions. To facilitate functional characterization of RBPs, we generated an RNA interference (RNAi) library for Drosophila cell-based screens comprising reagents targeting known or putative RBPs. To test the quality of the library and provide a baseline analysis of the effects of the RNAi reagents on viability, we screened the library using a total ATP assay and high-throughput imaging in Drosophila S2R+ cultured cells. The results are consistent with production of a high-quality library that will be useful for functional genomics studies using other assays. Altogether, we provide resources in the form of an initial curated list of Drosophila RBPs; an RNAi screening library we expect to be used with additional assays that address more specific biological questions; and total ATP and image data useful for comparison of those additional assay results with fundamental information such as effects of a given reagent in the library on cell viability. Importantly, we make the baseline data, including more than 200,000 images, easily accessible online.

2015_G3_Mohr.pdf Reagent Table S1.xlsx
Lizabeth A Perkins, Laura Holderbaum, Rong Tao, Yanhui Hu, Richelle Sopko, Kim McCall, Donghui Yang-Zhou, Ian Flockhart, Richard Binari, Hye-Seok Shim, Audrey Miller, Amy Housden, Marianna Foos, Sakara Randkelv, Colleen Kelley, Pema Namgyal, Christians Villalta, Lu-Ping Liu, Xia Jiang, Qiao Huan-Huan, Xia Wang, Asao Fujiyama, Atsushi Toyoda, Kathleen Ayers, Allison Blum, Benjamin Czech, Ralph Neumuller, Dong Yan, Amanda Cavallaro, Karen Hibbard, Don Hall, Lynn Cooley, Gregory J Hannon, Ruth Lehmann, Annette Parks, Stephanie E Mohr, Ryu Ueda, Shu Kondo, Jian-Quan Ni, and Norbert Perrimon. 2015. “The Transgenic RNAi Project at Harvard Medical School: Resources and Validation.” Genetics, 201, 3, Pp. 843-52.Abstract

To facilitate large-scale functional studies in Drosophila, the Drosophila Transgenic RNAi Project (TRiP) at Harvard Medical School (HMS) was established along with several goals: developing efficient vectors for RNAi that work in all tissues, generating a genome-scale collection of RNAi stocks with input from the community, distributing the lines as they are generated through existing stock centers, validating as many lines as possible using RT-qPCR and phenotypic analyses, and developing tools and web resources for identifying RNAi lines and retrieving existing information on their quality. With these goals in mind, here we describe in detail the various tools we developed and the status of the collection, which is currently composed of 11,491 lines and covering 71% of Drosophila genes. Data on the characterization of the lines either by RT-qPCR or phenotype is available on a dedicated website, the RNAi Stock Validation and Phenotypes Project (RSVP, http://www.flyrnai.org/RSVP.html), and stocks are available from three stock centers, the Bloomington Drosophila Stock Center (United States), National Institute of Genetics (Japan), and TsingHua Fly Center (China).

2015_Genetics_Perkins.pdf Supplement.pdf
2014
Benjamin E Housden, Shuailiang Lin, and Norbert Perrimon. 2014. “Cas9-based genome editing in Drosophila.” Methods Enzymol, 546, Pp. 415-39.Abstract

Our ability to modify the Drosophila genome has recently been revolutionized by the development of the CRISPR system. The simplicity and high efficiency of this system allows its widespread use for many different applications, greatly increasing the range of genome modification experiments that can be performed. Here, we first discuss some general design principles for genome engineering experiments in Drosophila and then present detailed protocols for the production of CRISPR reagents and screening strategies to detect successful genome modification events in both tissue culture cells and animals.

2014_Methods Enzymol_Housden.pdf
Arunachalam Vinayagam, Jonathan Zirin, Charles Roesel, Yanhui Hu, Bahar Yilmazel, Anastasia A Samsonova, Ralph A Neumüller, Stephanie E Mohr, and Norbert Perrimon. 2014. “Integrating protein-protein interaction networks with phenotypes reveals signs of interactions.” Nat Methods, 11, 1, Pp. 94-9.Abstract

A major objective of systems biology is to organize molecular interactions as networks and to characterize information flow within networks. We describe a computational framework to integrate protein-protein interaction (PPI) networks and genetic screens to predict the 'signs' of interactions (i.e., activation-inhibition relationships). We constructed a Drosophila melanogaster signed PPI network consisting of 6,125 signed PPIs connecting 3,352 proteins that can be used to identify positive and negative regulators of signaling pathways and protein complexes. We identified an unexpected role for the metabolic enzymes enolase and aldo-keto reductase as positive and negative regulators of proteolysis, respectively. Characterization of the activation-inhibition relationships between physically interacting proteins within signaling pathways will affect our understanding of many biological functions, including signal transduction and mechanisms of disease.

2014_Nat Methods_Vinayagam.pdf Supplemental Files.zip
Bahar Yilmazel, Yanhui Hu, Frederic Sigoillot, Jennifer A Smith, Caroline E Shamu, Norbert Perrimon, and Stephanie E Mohr. 2014. “Online GESS: prediction of miRNA-like off-target effects in large-scale RNAi screen data by seed region analysis.” BMC Bioinformatics, 15, Pp. 192.Abstract

BACKGROUND: RNA interference (RNAi) is an effective and important tool used to study gene function. For large-scale screens, RNAi is used to systematically down-regulate genes of interest and analyze their roles in a biological process. However, RNAi is associated with off-target effects (OTEs), including microRNA (miRNA)-like OTEs. The contribution of reagent-specific OTEs to RNAi screen data sets can be significant. In addition, the post-screen validation process is time and labor intensive. Thus, the availability of robust approaches to identify candidate off-targeted transcripts would be beneficial. RESULTS: Significant efforts have been made to eliminate false positive results attributable to sequence-specific OTEs associated with RNAi. These approaches have included improved algorithms for RNAi reagent design, incorporation of chemical modifications into siRNAs, and the use of various bioinformatics strategies to identify possible OTEs in screen results. Genome-wide Enrichment of Seed Sequence matches (GESS) was developed to identify potential off-targeted transcripts in large-scale screen data by seed-region analysis. Here, we introduce a user-friendly web application that provides researchers a relatively quick and easy way to perform GESS analysis on data from human or mouse cell-based screens using short interfering RNAs (siRNAs) or short hairpin RNAs (shRNAs), as well as for Drosophila screens using shRNAs. Online GESS relies on up-to-date transcript sequence annotations for human and mouse genes extracted from NCBI Reference Sequence (RefSeq) and Drosophila genes from FlyBase. The tool also accommodates analysis with user-provided reference sequence files. CONCLUSION: Online GESS provides a straightforward user interface for genome-wide seed region analysis for human, mouse and Drosophila RNAi screen data. With the tool, users can either use a built-in database or provide a database of transcripts for analysis. This makes it possible to analyze RNAi data from any organism for which the user can provide transcript sequences.

2014_BMC Bioinfo_Yilmazel.pdf

Pages