COMPLEAT Documentation

URL

http://www.flyrnai.org/compleat/

Data upload page

Species: The user may choose human, fly or yeast

Data files: The user may upload multiple datasets in one file or multiple files. Tab-separated text files, and excel files (.xls, .xlsx, .xlsm) will be accepted.

Table 3. Format of data file(s) for COMPLEAT

FormatExample of file 1Example of file 2
Tab-separated values, single file with one or more data columns FlyRNAi_data_baseline_vs_EGF.tsv
Tab-separated values, multiple files with one or more data columns per file. (Use shift-click or ctrl-click in file browser to select multiple files.) FlyRNAi_data_baseline.tsv FlyRNAi_data_EGF.tsv
Excel file, single sheet with multiple data columns FlyRNAi_data_baseline_vs_EGF.xls
Excel file, multiple sheets with single data columns FlyRNAi_data_baseline_vs_EGF_2_Sheets.xls
Tab-Separated, One data column HumanRNAiDNARepairScreen.tsv
Tab-Separated, One data column HumanRNAiStemCellDeterminantsScreen.tsv

Gene/protein identifiers: COMPLEAT incorporates an ID mapping table to allow the user to upload data with Entrez GeneID, gene symbol, UniProt accession number, FlyBase GeneID for fly data or locus_tag (ORF name) for yeast data.

Table 4. Gene/protein identifiers for COMPLEAT

SourceHumanFlyYeast
NCBI EntrezGene GeneID or symbol GeneID or symbol GeneID, Locus-tag or symbol
FlyBase   FBgn, CG or symbol  
UniProt Accession number Accession number Accession number

Advanced options: COMPLEAT only returns detailed information for complexes below a certain p-value cutoff (default is 0.1). This detailed information is required for network visualization and further data mining. These restrictions were introduced to optimize the performance of the tool. COMPLEAT allows the user to specify a more stringent p-value cutoff than the default (0.1), to optimize performance based on user preferences. In addition, the user may choose the background from which random complexes are built. The default background (auto option) is selected based on the coverage of the input dataset. For genome-scale or close to genome scale datasets (i.e. input data larger than the size of complex resource; see Table 1 on the “About” page), the user input data will be selected as the background. If the user input is smaller (i.e. smaller than the complex resource size) the complex data are used as the background. Users can change this default option by specifying their preferred background data.

Result page – global view

Scatter plot: COMPLEAT displays the enrichment results using an interactive scatter-plot, where each point corresponds to a single complex. The complex position corresponds to the score (IQM score), size reflects the relative complex size, and the color corresponds to the p-value. For a single dataset, the y-axis corresponds to the complex score (IQM score) and the x-axis corresponds to ranked complexes (based on the complex score). For multiple datasets, the x-axis shows the complex score from one dataset and the y-axis shows the complex score from another dataset. If the input is more than two datasets (up to four datasets are allowed), two dataset at a time can be compared, with the option to change what datasets are displayed on the x- and y-axes. The complexes are color-coded to distinguish significant, insignificant and (for multiple dataset inputs) dataset-specific complexes (Table 5). The user may change the p-value threshold using the p-value adjustment sliders. The user can mine additional data by entering keywords to select sub-sets of complexes associated with the keyword (e.g. enter “kinases” to view complexes that contain proteins annotated as kinases). The logical operators “AND” and “OR” can be also used in the search box to search with multiple key words.

Table 5. Color-coding of scatter plot.

ColorCategory

magenta

enriched in both datasets but in opposite directions

black

enriched in both dataset and in the same direction

cyan

enriched only in the dataset shown on the x-axis

blue

enriched only in the dataset shown on the y-axis

grey

not enriched in any datasets

Search Options

COMPLEAT supports simple searches that apply across all indexed fields, as well as field-specific searches and boolean operators. Parentheses are used to specify the execution order of the search. The search is not case-sensitive with the exception of keywords AND, OR, NOT.
Basic search:
cyclin
Advanced search:

(gene:cdc2c OR gene:rpl27) AND source:literature NOT database:GO

Field Codes

Field CodeValuesExample
gene Gene ID, symbol, accession number, FBgn, CG, or locus tag (varies by organism, see table 4) gene:cdc2c
source Literature or Predicted source:literature
database CORUM, PINdb, CYC2008, Gene Ontology (GO), DPiM, KEGG, SignaLink database:corum
name words within complex names name:cyclin
species Fly, Yeast, Human (original species from which ortholog was obtained) species:human
method Prediction methods method:coimmunoprecipitation
reference PubMed ID reference:8560263
citation PubMed ID citation:8560263

Network visualization of the complex: Users can click or select sets of complexes shown as dots in the interactive scatter plot for network visualization. When one or more complex is selected, a network representation(s) of the selected complex(es) is displayed in the web Cytoscape panel (right panel of the same page). In the network visualization, the node colors correspond to the user input values and range from green to red. Green corresponds to the lowest value and red is the maximum value. Gray nodes represent missing values (i.e. a protein that is present in the complex but missing in the user input data). The solid edges/PPIs correspond to known PPIs, and broken edges are interlogs (i.e. proteins for which the orthologous proteins in another species are known to physically interact). The network visualization and interactivity is supported by Cytoscape web. This includes the ability to move a node position and zoom into specific parts of the network.

Table view: The user may view enriched complexes in a sortable table that includes the complex name, input scores and COMPLEAT-computed p-values.

Detailed complex information: From the network visualization or table views, the user may select complexes to view additional details, including complex name, purification method and reference citation (for literature-based complexes), prediction algorithm (for predicted complexes), co-citation and co-localization information. Cytoscape images of selected complexes will be displayed in pairs showing the data from original files as well as binary interactions.

Data download: both the scatter plot and the Cytoscape images of selected complexes can be saved as png or jpg files. The table can be saved as tab-separated text file.

QuestionAnswer
Are the files I upload saved on the server? Uploaded files are not saved. Calculations are performed in memory, and the results are returned to your browser.

Data upload page

Species: The user may choose human, fly or yeast

Data files: The user may upload multiple datasets in one file or multiple files. Tab-separated text files, and excel files (.xls, .xlsx, .xlsm) will be accepted.

Table 3. Format of data file(s) for COMPLEAT

FormatExample of file 1Example of file 2
Tab-separated values, single file with one or more data columns FlyRNAi_data_baseline_vs_EGF.tsv
Tab-separated values, multiple files with one or more data columns per file. (Use shift-click or ctrl-click in file browser to select multiple files.) FlyRNAi_data_baseline.tsv FlyRNAi_data_EGF.tsv
Excel file, single sheet with multiple data columns FlyRNAi_data_baseline_vs_EGF.xls
Excel file, multiple sheets with single data columns FlyRNAi_data_baseline_vs_EGF_2_Sheets.xls
Tab-Separated, One data column HumanRNAiDNARepairScreen.tsv
Tab-Separated, One data column HumanRNAiStemCellDeterminantsScreen.tsv

Gene/protein identifiers: COMPLEAT incorporates an ID mapping table to allow the user to upload data with Entrez GeneID, gene symbol, UniProt accession number, FlyBase GeneID for fly data or locus_tag (ORF name) for yeast data.

Table 4. Gene/protein identifiers for COMPLEAT

SourceHumanFlyYeast
NCBI EntrezGene GeneID or symbol GeneID or symbol GeneID, Locus-tag or symbol
FlyBase   FBgn, CG or symbol  
UniProt Accession number Accession number Accession number

Advanced options: COMPLEAT only returns detailed information for complexes below a certain p-value cutoff (default is 0.1). This detailed information is required for network visualization and further data mining. These restrictions were introduced to optimize the performance of the tool. COMPLEAT allows the user to specify a more stringent p-value cutoff than the default (0.1), to optimize performance based on user preferences. In addition, the user may choose the background from which random complexes are built. The default background (auto option) is selected based on the coverage of the input dataset. For genome-scale or close to genome scale datasets (i.e. input data larger than the size of complex resource; see Table 1 on the “About” page), the user input data will be selected as the background. If the user input is smaller (i.e. smaller than the complex resource size) the complex data are used as the background. Users can change this default option by specifying their preferred background data.

Result page – global view

Scatter plot: COMPLEAT displays the enrichment results using an interactive scatter-plot, where each point corresponds to a single complex. The complex position corresponds to the score (IQM score), size reflects the relative complex size, and the color corresponds to the p-value. For a single dataset, the y-axis corresponds to the complex score (IQM score) and the x-axis corresponds to ranked complexes (based on the complex score). For multiple datasets, the x-axis shows the complex score from one dataset and the y-axis shows the complex score from another dataset. If the input is more than two datasets (up to four datasets are allowed), two dataset at a time can be compared, with the option to change what datasets are displayed on the x- and y-axes. The complexes are color-coded to distinguish significant, insignificant and (for multiple dataset inputs) dataset-specific complexes (Table 5). The user may change the p-value threshold using the p-value adjustment sliders. The user can mine additional data by entering keywords to select sub-sets of complexes associated with the keyword (e.g. enter “kinases” to view complexes that contain proteins annotated as kinases). The logical operators “AND” and “OR” can be also used in the search box to search with multiple key words.

Table 5. Color-coding of scatter plot.

ColorCategory

magenta

enriched in both datasets but in opposite directions

black

enriched in both dataset and in the same direction

cyan

enriched only in the dataset shown on the x-axis

blue

enriched only in the dataset shown on the y-axis

grey

not enriched in any datasets

Search Options

COMPLEAT supports simple searches that apply across all indexed fields, as well as field-specific searches and boolean operators. Parentheses are used to specify the execution order of the search. The search is not case-sensitive with the exception of keywords AND, OR, NOT.
Basic search:
cyclin
Advanced search:

(gene:cdc2c OR gene:rpl27) AND source:literature NOT database:GO

Field Codes

Field CodeValuesExample
gene Gene ID, symbol, accession number, FBgn, CG, or locus tag (varies by organism, see table 4) gene:cdc2c
source Literature or Predicted source:literature
database CORUM, PINdb, CYC2008, Gene Ontology (GO), DPiM, KEGG, SignaLink database:corum
name words within complex names name:cyclin
species Fly, Yeast, Human (original species from which ortholog was obtained) species:human
method Prediction methods method:coimmunoprecipitation
reference PubMed ID reference:8560263
citation PubMed ID citation:8560263

Network visualization of the complex: Users can click or select sets of complexes shown as dots in the interactive scatter plot for network visualization. When one or more complex is selected, a network representation(s) of the selected complex(es) is displayed in the web Cytoscape panel (right panel of the same page). In the network visualization, the node colors correspond to the user input values and range from green to red. Green corresponds to the lowest value and red is the maximum value. Gray nodes represent missing values (i.e. a protein that is present in the complex but missing in the user input data). The solid edges/PPIs correspond to known PPIs, and broken edges are interlogs (i.e. proteins for which the orthologous proteins in another species are known to physically interact). The network visualization and interactivity is supported by Cytoscape web. This includes the ability to move a node position and zoom into specific parts of the network.

Table view: The user may view enriched complexes in a sortable table that includes the complex name, input scores and COMPLEAT-computed p-values.

Detailed complex information: From the network visualization or table views, the user may select complexes to view additional details, including complex name, purification method and reference citation (for literature-based complexes), prediction algorithm (for predicted complexes), co-citation and co-localization information. Cytoscape images of selected complexes will be displayed in pairs showing the data from original files as well as binary interactions.

Data download: both the scatter plot and the Cytoscape images of selected complexes can be saved as png or jpg files. The table can be saved as tab-separated text file.

QuestionAnswer
Are the files I upload saved on the server? Uploaded files are not saved. Calculations are performed in memory, and the results are returned to your browser.