Data mining


Missed us at ADRC 2018? View our workshop slides!

April 19, 2018
Thank you to all those who attended our workshop at last week's Annual Drosophila Research Conference in Philadelphia, PA, USA. It was great to talk fly stocks, cell screens, and bioinformatics with the community. We are here to help and look forward to continued feedback on the resources we are building to empower your research. PDFs of our workshop presentations are attached to this news item. The slides will help you learn more about our in vivo resources for CRISPR, new pooled cell-based CRISPR screen technology, and bioinformatics resources at our facility.  Feel free to contact... Read more about Missed us at ADRC 2018? View our workshop slides!
Cartoon of essential gene pooled screen (made using

Pooled-format CRISPR screens in Drosophila cells

March 22, 2018

The DRSC/TRiP-FGR is pleased to support collaborations on pooled CRISPR screens using the method recently, reported in eLife by Viswanatha et al.   From the abstract: "... Here, we developed a site-specific integration strategy for library delivery and performed a genome-wide CRISPR knockout screen in Drosophila S2R+ cells. Under basal growth conditions, 1235 genes were essential for cell fitness at a false-discovery rate of 5...

Read more about Pooled-format CRISPR screens in Drosophila cells
Yanhui Hu, Arunachalam Vinayagam, Ankita Nand, Aram Comjean, Verena Chung, Tong Hao, Stephanie E Mohr, and Norbert Perrimon. 11/16/2017. “Molecular Interaction Search Tool (MIST): an integrated resource for mining gene and protein interaction data.” Nucleic Acids Res, 46, D1, Pp. D567-D574.Abstract
Model organism and human databases are rich with information about genetic and physical interactions. These data can be used to interpret and guide the analysis of results from new studies and develop new hypotheses. Here, we report the development of the Molecular Interaction Search Tool (MIST; The MIST database integrates biological interaction data from yeast, nematode, fly, zebrafish, frog, rat and mouse model systems, as well as human. For individual or short gene lists, the MIST user interface can be used to identify interacting partners based on protein-protein and genetic interaction (GI) data from the species of interest as well as inferred interactions, known as interologs, and to view a corresponding network. The data, interologs and search tools at MIST are also useful for analyzing 'omics datasets. In addition to describing the integrated database, we also demonstrate how MIST can be used to identify an appropriate cut-off value that balances false positive and negative discovery, and present use-cases for additional types of analysis. Altogether, the MIST database and search tools support visualization and navigation of existing protein and GI data, as well as comparison of new and existing data.
2018 Apr 13

DRSC & TRiP Workshop at ADRC

1:45pm to 3:45pm


Philadelphia, PA, USA
The DRSC & TRiP will be hosting a workshop at the Annual Drosophila Research Conference in Philadelphia, PA. The workshop is scheduled for Friday, April 13th from 1:45 to 3:45 PM. Come hear from DRSC & TRiP leaders Norbert Perrimon, Jonathan Zirin (organizer), Claire Yanhui Hu, and Stephanie Mohr. At the workshop, you will learn about new opportunities for community nomination and experiments using CRISPR knockout and activation, as well as learn what's new and popular among our online software and database tools. There will be something for everyone -- we will provide information... Read more about DRSC & TRiP Workshop at ADRC
Julia Wang, Rami Al-Ouran, Yanhui Hu, Seon-Young Kim, Ying-Wooi Wan, Michael F Wangler, Shinya Yamamoto, Hsiao-Tuan Chao, Aram Comjean, Stephanie E Mohr, Stephanie E Mohr, Norbert Perrimon, Zhandong Liu, and Hugo J Bellen. 6/1/2017. “MARRVEL: Integration of Human and Model Organism Genetic Resources to Facilitate Functional Annotation of the Human Genome.” Am J Hum Genet, 100, 6, Pp. 843-853.Abstract
One major challenge encountered with interpreting human genetic variants is the limited understanding of the functional impact of genetic alterations on biological processes. Furthermore, there remains an unmet demand for an efficient survey of the wealth of information on human homologs in model organisms across numerous databases. To efficiently assess the large volume of publically available information, it is important to provide a concise summary of the most relevant information in a rapid user-friendly format. To this end, we created MARRVEL (model organism aggregated resources for rare variant exploration). MARRVEL is a publicly available website that integrates information from six human genetic databases and seven model organism databases. For any given variant or gene, MARRVEL displays information from OMIM, ExAC, ClinVar, Geno2MP, DGV, and DECIPHER. Importantly, it curates model organism-specific databases to concurrently display a concise summary regarding the human gene homologs in budding and fission yeast, worm, fly, fish, mouse, and rat on a single webpage. Experiment-based information on tissue expression, protein subcellular localization, biological process, and molecular function for the human gene and homologs in the seven model organisms are arranged into a concise output. Hence, rather than visiting multiple separate databases for variant and gene analysis, users can obtain important information by searching once through MARRVEL. Altogether, MARRVEL dramatically improves efficiency and accessibility to data collection and facilitates analysis of human genes and variants by cross-disciplinary integration of 18 million records available in public databases to facilitate clinical diagnosis and basic research.
Yanhui Hu, Aram Comjean, Stephanie E Mohr, The FlyBase Consortium, and Norbert Perrimon. 8/7/2017. “Gene2Function: An Integrated Online Resource for Gene Function Discovery.” G3 (Bethesda).Abstract
One of the most powerful ways to develop hypotheses regarding biological functions of conserved genes in a given species, such as in humans, is to first look at what is known about function in another species. Model organism databases (MODs) and other resources are rich with functional information but difficult to mine. Gene2Function (G2F) addresses a broad need by integrating information about conserved genes in a single online resource.
Arunachalam Vinayagam, Travis E Gibson, Ho-Joon Lee, Bahar Yilmazel, Charles Roesel, Yanhui Hu, Young Kwon, Amitabh Sharma, Yang-Yu Liu, Norbert Perrimon, and Albert-László Barabási. 5/3/2016. “Controllability analysis of the directed human protein interaction network identifies disease genes and drug targets.” Proc Natl Acad Sci U S A, 113, 18, Pp. 4976-81.Abstract

The protein-protein interaction (PPI) network is crucial for cellular information processing and decision-making. With suitable inputs, PPI networks drive the cells to diverse functional outcomes such as cell proliferation or cell death. Here, we characterize the structural controllability of a large directed human PPI network comprising 6,339 proteins and 34,813 interactions. This network allows us to classify proteins as "indispensable," "neutral," or "dispensable," which correlates to increasing, no effect, or decreasing the number of driver nodes in the network upon removal of that protein. We find that 21% of the proteins in the PPI network are indispensable. Interestingly, these indispensable proteins are the primary targets of disease-causing mutations, human viruses, and drugs, suggesting that altering a network's control property is critical for the transition between healthy and disease states. Furthermore, analyzing copy number alterations data from 1,547 cancer patients reveals that 56 genes that are frequently amplified or deleted in nine different cancers are indispensable. Among the 56 genes, 46 of them have not been previously associated with cancer. This suggests that controllability analysis is very useful in identifying novel disease genes and potential drug targets.

Yanhui Hu, Aram Comjean, Charles Roesel, Arunachalam Vinayagam, Ian Flockhart, Jonathan Zirin, Lizabeth Perkins, Norbert Perrimon, and Stephanie E Mohr. 10/11/2016. “—the database of the Drosophila RNAi screening center and transgenic RNAi project: 2017 update.” Nucleic Acids Research. Publisher's VersionAbstract

The FlyRNAi database of the Drosophila RNAi Screening Center (DRSC) and Transgenic RNAi Project (TRiP) at Harvard Medical School and associated DRSC/TRiP Functional Genomics Resources website ( serve as a reagent production tracking system, screen data repository, and portal to the community. Through this portal, we make available protocols, online tools, and other resources useful to researchers at all stages of high-throughput functional genomics screening, from assay design and reagent identification to data analysis and interpretation. In this update, we describe recent changes and additions to our website, database and suite of online tools. Recent changes reflect a shift in our focus from a single technology (RNAi) and model species (Drosophila) to the application of additional technologies (e.g. CRISPR) and support of integrated, cross-species approaches to uncovering gene function using functional genomics and other approaches.

Screenshot of GLAD results with the hh gene

Gene List Annotation for Drosophila (GLAD) online resource updated

October 19, 2016

We recently udpated our Gene List Annotation for Drosophila (GLAD) online resource.  At GLAD you could already view the members of a gene list, such as genes grouped as members of a pathway, process, or sharing a functional domain. Now, you can also ask if a gene of interest is a member of a given group. Please see examples of the two ways to use GLAD below. As always, we welcome your feedback, including suggestions for changes or additions to the curated lists, or for addition of new...

Read more about Gene List Annotation for Drosophila (GLAD) online resource updated