COMPLEAT Documentation

URL

http://www.flyrnai.org/compleat/

Data upload page

Species: The user may choose human, fly or yeast

Data files: The user may upload multiple datasets in one file or multiple files. Tab-separated text files, and excel files (.xls, .xlsx, .xlsm) will be accepted.

 

Table 3. Format of data file(s) for COMPLEAT

FormatExample of file 1Example of file 2 
Tab-separated values, single file with one or more data columns
 
 
Tab-separated values, multiple files with one or more data columns per file. (Use shift-click or ctrl-click in file browser to select multiple files.)
 
Excel file, single sheet with multiple data columns
 
 
Excel file, multiple sheets with single data columns
 
 
Tab-Separated, One data column
 
 
Tab-Separated, One data column
 
 

Gene/protein identifiers: COMPLEAT incorporates an ID mapping table to allow the user to upload data with Entrez GeneID, gene symbol, UniProt accession number, FlyBase GeneID for fly data or locus_tag (ORF name) for yeast data.

Table 4. Gene/protein identifiers for COMPLEAT

SourceHumanFlyYeast
NCBI EntrezGene
GeneID or symbol
GeneID or symbol
GeneID, Locus-tag or symbol
FlyBase
 
FBgn, CG or symbol
 
UniProt
Accession number
Accession number
Accession number

Advanced options: COMPLEAT only returns detailed information for complexes below a certain p-value cutoff (default is 0.1). This detailed information is required for network visualization and further data mining. These restrictions were introduced to optimize the performance of the tool. COMPLEAT allows the user to specify a more stringent p-value cutoff than the default (0.1), to optimize performance based on user preferences. In addition, the user may choose the background from which random complexes are built. The default background (auto option) is selected based on the coverage of the input dataset. For genome-scale or close to genome scale datasets (i.e. input data larger than the size of complex resource; see Table 1 on the “About” page), the user input data will be selected as the background. If the user input is smaller (i.e. smaller than the complex resource size) the complex data are used as the background. Users can change this default option by specifying their preferred background data.

Result page – global view

Scatter plot: COMPLEAT displays the enrichment results using an interactive scatter-plot, where each point corresponds to a single complex. The complex position corresponds to the score (IQM score), size reflects the relative complex size, and the color corresponds to the p-value. For a single dataset, the y-axis corresponds to the complex score (IQM score) and the x-axis corresponds to ranked complexes (based on the complex score). For multiple datasets, the x-axis shows the complex score from one dataset and the y-axis shows the complex score from another dataset. If the input is more than two datasets (up to four datasets are allowed), two dataset at a time can be compared, with the option to change what datasets are displayed on the x- and y-axes. The complexes are color-coded to distinguish significant, insignificant and (for multiple dataset inputs) dataset-specific complexes (Table 5). The user may change the p-value threshold using the p-value adjustment sliders. The user can mine additional data by entering keywords to select sub-sets of complexes associated with the keyword (e.g. enter “kinases” to view complexes that contain proteins annotated as kinases). The logical operators “AND” and “OR” can be also used in the search box to search with multiple key words.

Table 5. Color-coding of scatter plot.

ColorCategory
magenta
enriched in both datasets but in opposite directions
black
enriched in both dataset and in the same direction
cyan
enriched only in the dataset shown on the x-axis
blue
enriched only in the dataset shown on the y-axis
grey
not enriched in any datasets

Search Options

COMPLEAT supports simple searches that apply across all indexed fields, as well as field-specific searches and boolean operators. Parentheses are used to specify the execution order of the search. The search is not case-sensitive with the exception of keywords AND, OR, NOT.
Basic search:
cyclin 
Advanced search:
(gene:cdc2c OR gene:rpl27) AND source:literature NOT database:GO

Field Codes

Field CodeValuesExample
gene
Gene ID, symbol, accession number, FBgn, CG, or locus tag (varies by organism, see table 4)
gene:cdc2c
source
Literature or Predicted
source:literature
database
CORUM, PINdb, CYC2008, Gene Ontology (GO), DPiM, KEGG, SignaLink
database:corum
name
words within complex names
name:cyclin
species
Fly, Yeast, Human (original species from which ortholog was obtained)
species:human
method
Prediction methods
method:coimmunoprecipitation
reference
PubMed ID
reference:8560263
citation
PubMed ID
citation:8560263

Network visualization of the complex: Users can click or select sets of complexes shown as dots in the interactive scatter plot for network visualization. When one or more complex is selected, a network representation(s) of the selected complex(es) is displayed in the web Cytoscape panel (right panel of the same page). In the network visualization, the node colors correspond to the user input values and range from green to red. Green corresponds to the lowest value and red is the maximum value. Gray nodes represent missing values (i.e. a protein that is present in the complex but missing in the user input data). The solid edges/PPIs correspond to known PPIs, and broken edges are interlogs (i.e. proteins for which the orthologous proteins in another species are known to physically interact). The network visualization and interactivity is supported by Cytoscape web. This includes the ability to move a node position and zoom into specific parts of the network.

Table view: The user may view enriched complexes in a sortable table that includes the complex name, input scores and COMPLEAT-computed p-values.

Detailed complex information: From the network visualization or table views, the user may select complexes to view additional details, including complex name, purification method and reference citation (for literature-based complexes), prediction algorithm (for predicted complexes), co-citation and co-localization information. Cytoscape images of selected complexes will be displayed in pairs showing the data from original files as well as binary interactions.

Data download: both the scatter plot and the Cytoscape images of selected complexes can be saved as png or jpg files. The table can be saved as tab-separated text file.

QuestionAnswer
Are the files I upload saved on the server?

Uploaded files are not saved. Calculations are performed in memory, and the results are returned to your browser.

Data upload page

Species: The user may choose human, fly or yeast

Data files: The user may upload multiple datasets in one file or multiple files. Tab-separated text files, and excel files (.xls, .xlsx, .xlsm) will be accepted.

 

Table 3. Format of data file(s) for COMPLEAT

Tab-separated values, single file with one or more data columns
 
Tab-separated values, multiple files with one or more data columns per file. (Use shift-click or ctrl-click in file browser to select multiple files.)
 
Excel file, single sheet with multiple data columns
 
Excel file, multiple sheets with single data columns
 
Tab-Separated, One data column
 
Tab-Separated, One data column
 

Gene/protein identifiers: COMPLEAT incorporates an ID mapping table to allow the user to upload data with Entrez GeneID, gene symbol, UniProt accession number, FlyBase GeneID for fly data or locus_tag (ORF name) for yeast data.

Table 4. Gene/protein identifiers for COMPLEAT

GeneID or symbol
GeneID or symbol
GeneID, Locus-tag or symbol
 
FBgn, CG or symbol
 
Accession number
Accession number
Accession number

Advanced options: COMPLEAT only returns detailed information for complexes below a certain p-value cutoff (default is 0.1). This detailed information is required for network visualization and further data mining. These restrictions were introduced to optimize the performance of the tool. COMPLEAT allows the user to specify a more stringent p-value cutoff than the default (0.1), to optimize performance based on user preferences. In addition, the user may choose the background from which random complexes are built. The default background (auto option) is selected based on the coverage of the input dataset. For genome-scale or close to genome scale datasets (i.e. input data larger than the size of complex resource; see Table 1 on the “About” page), the user input data will be selected as the background. If the user input is smaller (i.e. smaller than the complex resource size) the complex data are used as the background. Users can change this default option by specifying their preferred background data.

Result page – global view

Scatter plot: COMPLEAT displays the enrichment results using an interactive scatter-plot, where each point corresponds to a single complex. The complex position corresponds to the score (IQM score), size reflects the relative complex size, and the color corresponds to the p-value. For a single dataset, the y-axis corresponds to the complex score (IQM score) and the x-axis corresponds to ranked complexes (based on the complex score). For multiple datasets, the x-axis shows the complex score from one dataset and the y-axis shows the complex score from another dataset. If the input is more than two datasets (up to four datasets are allowed), two dataset at a time can be compared, with the option to change what datasets are displayed on the x- and y-axes. The complexes are color-coded to distinguish significant, insignificant and (for multiple dataset inputs) dataset-specific complexes (Table 5). The user may change the p-value threshold using the p-value adjustment sliders. The user can mine additional data by entering keywords to select sub-sets of complexes associated with the keyword (e.g. enter “kinases” to view complexes that contain proteins annotated as kinases). The logical operators “AND” and “OR” can be also used in the search box to search with multiple key words.

Table 5. Color-coding of scatter plot.

magenta
enriched in both datasets but in opposite directions
black
enriched in both dataset and in the same direction
cyan
enriched only in the dataset shown on the x-axis
blue
enriched only in the dataset shown on the y-axis
grey
not enriched in any datasets

Search Options

COMPLEAT supports simple searches that apply across all indexed fields, as well as field-specific searches and boolean operators. Parentheses are used to specify the execution order of the search. The search is not case-sensitive with the exception of keywords AND, OR, NOT.
Basic search:
cyclin 
Advanced search:
(gene:cdc2c OR gene:rpl27) AND source:literature NOT database:GO

Field Codes

gene
Gene ID, symbol, accession number, FBgn, CG, or locus tag (varies by organism, see table 4)
gene:cdc2c
source
Literature or Predicted
source:literature
database
CORUM, PINdb, CYC2008, Gene Ontology (GO), DPiM, KEGG, SignaLink
database:corum
name
words within complex names
name:cyclin
species
Fly, Yeast, Human (original species from which ortholog was obtained)
species:human
method
Prediction methods
method:coimmunoprecipitation
reference
PubMed ID
reference:8560263
citation
PubMed ID
citation:8560263

Network visualization of the complex: Users can click or select sets of complexes shown as dots in the interactive scatter plot for network visualization. When one or more complex is selected, a network representation(s) of the selected complex(es) is displayed in the web Cytoscape panel (right panel of the same page). In the network visualization, the node colors correspond to the user input values and range from green to red. Green corresponds to the lowest value and red is the maximum value. Gray nodes represent missing values (i.e. a protein that is present in the complex but missing in the user input data). The solid edges/PPIs correspond to known PPIs, and broken edges are interlogs (i.e. proteins for which the orthologous proteins in another species are known to physically interact). The network visualization and interactivity is supported by Cytoscape web. This includes the ability to move a node position and zoom into specific parts of the network.

Table view: The user may view enriched complexes in a sortable table that includes the complex name, input scores and COMPLEAT-computed p-values.

Detailed complex information: From the network visualization or table views, the user may select complexes to view additional details, including complex name, purification method and reference citation (for literature-based complexes), prediction algorithm (for predicted complexes), co-citation and co-localization information. Cytoscape images of selected complexes will be displayed in pairs showing the data from original files as well as binary interactions.

Data download: both the scatter plot and the Cytoscape images of selected complexes can be saved as png or jpg files. The table can be saved as tab-separated text file.

Are the files I upload saved on the server?
Uploaded files are not saved. Calculations are performed in memory, and the results are returned to your browser.