The protein-protein interaction (PPI) network is crucial for cellular information processing and decision-making. With suitable inputs, PPI networks drive the cells to diverse functional outcomes such as cell proliferation or cell death. Here, we characterize the structural controllability of a large directed human PPI network comprising 6,339 proteins and 34,813 interactions. This network allows us to classify proteins as "indispensable," "neutral," or "dispensable," which correlates to increasing, no effect, or decreasing the number of driver nodes in the network upon removal of that protein. We find that 21% of the proteins in the PPI network are indispensable. Interestingly, these indispensable proteins are the primary targets of disease-causing mutations, human viruses, and drugs, suggesting that altering a network's control property is critical for the transition between healthy and disease states. Furthermore, analyzing copy number alterations data from 1,547 cancer patients reveals that 56 genes that are frequently amplified or deleted in nine different cancers are indispensable. Among the 56 genes, 46 of them have not been previously associated with cancer. This suggests that controllability analysis is very useful in identifying novel disease genes and potential drug targets.
TIMEOR: a web-based tool to uncover temporal regulatory mechanisms from multi-omics data.” BioRxiv. Publisher's VersionAbstract. Submitted. “
Uncovering how transcription factors (TFs) regulate their targets at the DNA, RNA and protein levels over time is critical to define gene regulatory networks (GRNs) in normal and diseased states. RNA-seq has become a standard method to measure gene regulation using an established set of analysis steps. However, none of the currently available pipeline methods for interpreting ordered genomic data (in time or space) use time series models to assign cause and effect relationships within GRNs, are adaptive to diverse experimental designs, or enable user interpretation through a web-based platform. Furthermore, methods which integrate ordered RNA-seq data with transcription factor binding data are urgently needed. Here, we present TIMEOR (Trajectory Inference and Mechanism Exploration with Omics data in R), the first web-based and adaptive time series multi-omics pipeline method which infers the relationship between gene regulatory events across time. TIMEOR addresses the critical need for methods to predict causal regulatory mechanism networks between TFs from time series multi-omics data. We used TIMEOR to identify a new link between insulin stimulation and the circadian rhythm cycle. TIMEOR is available at https://github.com/ashleymaeconard/TIMEOR.git.
BioLitMine: Advanced Mining of Biomedical and Biological Literature About Human Genes and Genes from Major Model Organisms.” G3 (Bethesda).Abstract. 2020. “
The accumulation of biological and biomedical literature outpaces the ability of most researchers and clinicians to stay abreast of their own immediate fields, let alone a broader range of topics. Although available search tools support identification of relevant literature, finding relevant and key publications is not always straightforward. For example, important publications might be missed in searches with an official gene name due to gene synonyms. Moreover, ambiguity of gene names can result in retrieval of a large number of irrelevant publications. To address these issues and help researchers and physicians quickly identify relevant publications, we developed BioLitMine, an advanced literature mining tool that takes advantage of the medical subject heading (MeSH) index and gene-to-publication annotations already available for PubMed literature. Using BioLitMine, a user can identify what MeSH terms are represented in the set of publications associated with a given gene of the interest, or start with a term and identify relevant publications. Users can also use the tool to find co-cited genes and a build a literature co-citation network. In addition, BioLitMine can help users build a gene list relevant to a MeSH terms, such as a list of genes relevant to "stem cells" or "breast neoplasms." Users can also start with a gene or pathway of interest and identify authors associated with that gene or pathway, a feature that makes it easier to identify experts who might serve as collaborators or reviewers. Altogether, BioLitMine extends the value of PubMed-indexed literature and its existing expert curation by providing a robust and gene-centric approach to retrieval of relevant information.
Molecular Interaction Search Tool (MIST): an integrated resource for mining gene and protein interaction data.” Nucleic Acids Res, 46, D1, Pp. D567-D574.Abstract. 11/16/2017. “
Model organism and human databases are rich with information about genetic and physical interactions. These data can be used to interpret and guide the analysis of results from new studies and develop new hypotheses. Here, we report the development of the Molecular Interaction Search Tool (MIST; http://fgrtools.hms.harvard.edu/MIST/). The MIST database integrates biological interaction data from yeast, nematode, fly, zebrafish, frog, rat and mouse model systems, as well as human. For individual or short gene lists, the MIST user interface can be used to identify interacting partners based on protein-protein and genetic interaction (GI) data from the species of interest as well as inferred interactions, known as interologs, and to view a corresponding network. The data, interologs and search tools at MIST are also useful for analyzing 'omics datasets. In addition to describing the integrated database, we also demonstrate how MIST can be used to identify an appropriate cut-off value that balances false positive and negative discovery, and present use-cases for additional types of analysis. Altogether, the MIST database and search tools support visualization and navigation of existing protein and GI data, as well as comparison of new and existing data.
2018 Apr 13
Gene2Function: An Integrated Online Resource for Gene Function Discovery.” G3 (Bethesda).Abstract. 8/7/2017. “
One of the most powerful ways to develop hypotheses regarding biological functions of conserved genes in a given species, such as in humans, is to first look at what is known about function in another species. Model organism databases (MODs) and other resources are rich with functional information but difficult to mine. Gene2Function (G2F) addresses a broad need by integrating information about conserved genes in a single online resource.
Controllability analysis of the directed human protein interaction network identifies disease genes and drug targets.” Proc Natl Acad Sci U S A, 113, 18, Pp. 4976-81.Abstract. 5/3/2016. “