The Gene Ontology (GO) is a database established by the Gene Ontology Consortium. It aims to create a semantic vocabulary standard applicable to various species for defining and describing gene and protein functions, updating as research progresses. By establishing a set of controlled vocabulary with dynamic forms, it describes the roles genes and proteins play within cells, thus providing a comprehensive description of the attributes of genes and gene products in organisms. The GO database is divided into three main categories: Biological Process (BP), Cellular Component (CC), and Molecular Function (MF), each describing the possible molecular functions of gene products, their cellular locations, and the biological processes they participate in. A fundamental concept in the GO database is the node, each with a name like 'Cell', 'Fibroblast Growth Factor Receptor Binding', or 'Signal Transduction', and a unique ID such as 'GO:nnnnnnn'. Based on identified protein IDs, mapping is used to retrieve protein GO annotations from the Uniprot database to functionally classify and annotate proteins. For GO nodes involved in BP, CC, and MF, the number of corresponding proteins is listed, and statistical charts are made for the secondary classification of expressed proteins.
1. GO Secondary Classification Statistics Chart
Figure 1: GO Secondary Annotation of Differentially Expressed Proteins
Note: The horizontal axis represents GO classifications, with the left side of the vertical axis showing the percentage of protein numbers and the right side showing the number of proteins. This chart displays the protein enrichment status for each secondary function under differentially expressed upregulated and downregulated proteins, highlighting the significance of each secondary function in both regulatory directions. Secondary functions with significant proportional differences indicate different enrichment trends between upregulated and downregulated proteins, suggesting a focus on whether this function is related to the differential expression.
2. GO Levels Classification Statistics Chart
Based on protein numbers annotated to GO nodes at different levels, the top 20 GO nodes are selected for display, as shown in the figure:
Figure 2: Statistical Chart of Protein Annotations at Different Levels
Note: The horizontal axis represents the percentage of enriched proteins, and the vertical axis is arranged in ascending order of levels. Different colors represent different levels, and the number behind each bar indicates the number of proteins in that classification.
3. topGO Protein Enrichment Analysis
BiotechPack uses topGO for enrichment analysis of differentially expressed proteins, which reveals the enrichment significance of these proteins in GO nodes. The hierarchical relationships of significantly enriched nodes within the GO system are visually displayed as directed acyclic graphs. The topGO directed acyclic graph intuitively displays the GO nodes (Terms) enriched by differentially expressed genes and their hierarchical relationships. Branches represent containment, with the functional description becoming more specific from top to bottom. In the directed acyclic graph, arrows represent containment, meaning all genes of a node are also annotated to its parent node.
The directed acyclic graph of topGO molecular function for differentially expressed proteins is as follows:
Figure 3: Directed Acyclic Graph of topGO Enrichment for Molecular Function of Differentially Expressed Proteins
Note: Each GO node is enriched, represented by the top 10 nodes with high significance in the diagram, which also includes their hierarchical relationships. Each box (or ellipse) provides the content description and enrichment significance value of that GO node. Different colors represent different enrichment significance levels, with darker colors indicating higher significance.