Unlike Principal Component Analysis (PCA), Partial Least Squares Discrimination Analysis (PLS-DA) or Orthogonal PLS-DA (OPLS-DA) is a supervised discriminant analysis statistical method. This method uses PLS-DA to establish a model of the relationship between metabolite expression levels and sample categories to predict sample categories. PLS-DA models (Figure 1) or OPLS-DA models (Figure 2) are established for pairwise group comparisons, and the parameters obtained from the model evaluation are provided in tabular form. R^2X and R^2Y represent the explanatory rates of the model for the X and Y matrices, respectively, and Q2 indicates the predictive ability of the model. In theory, the closer the values of R^2 and Q^2 are to 1, the better the model, and the lower they are, the worse the model's fit accuracy. Generally, R^2 and Q^2 greater than 0.5 (50%) are considered good, above 0.4 are acceptable, and their difference should not be too large. For clinical samples with large individual differences and uncontrollable factors, especially with large samples, R^2 and Q^2 values around 0.2 are also acceptable. Figure 3 tests the PLS-DA model (c), where a large slope of the line and a Q^2 intercept of X indicate that the PLS-DA model is not overfitted. Additionally, the Variable Importance for the Projection (VIP) is calculated to measure the impact strength and explanatory ability of each metabolite's expression pattern on the classification and discrimination of each sample group, thereby aiding in the selection of marker metabolites (usually using a VIP value > 1.0 as the selection criterion) (Figure 4).
Figure 1 PLS-DA score plot for the sham-operated group and the model group
Figure 2 OPLS-DA model for the sham-operated group and the model group
Figure 3 Permutation test plot of the PLS-DA model
Figure 4 PLS-DA model loading plot for the sham-operated group and the model group
Note: Points circled in red boxes are metabolites with VIP > 1