Beginner's Guide to Mass Spectrometry-Based Proteomics

Genes are the fundamental units of heredity, but they only have life when translated into proteins. Proteins are major functional players in biology, involved in processes ranging from biochemical reactions and signal transduction to structural support. The proteome is the collection of all proteins present in the fluids, cells, and tissues of an organism, reflecting the functional state of a biological system. Proteomics is the qualitative and quantitative study of the proteome, often used to compare differences between different cell states and widely applied in the biomedical field. For example, we can perform differential analysis of the proteomes between virus-infected and uninfected cells to reveal the cellular pathways and proteins required for viral infection and replication. Subsequently, drugs targeting these proteins can be developed to slow the infection process. Proteomics is particularly suitable for uncovering potential biochemical mechanisms because it can directly characterize all proteins at once. In this article, we focus on using mass spectrometry (MS) for systematic characterization of the proteome, more specifically, bottom-up proteomics (where proteins are first digested into peptides, then analyzed by MS).

I. Basic Knowledge of Mass Spectrometry

The mass spectrometer was invented in 1912 and has undergone continuous development, with significant improvements in detection limits, speed, and applicability. The principle of mass spectrometry is to use the basic properties of molecules (such as mass and net charge) to detect the presence and abundance of peptides (or other biomolecules such as metabolites, lipids, and proteins). When peptides acquire a net charge (usually by gaining a proton), they are referred to as peptide ions.

All mass spectrometers have three basic components: an ion source, a mass analyzer, and a detector (Figure 1A). Since mass spectrometers can only analyze gaseous ions, methods such as electrospray ionization (ESI) are used to convert peptides from liquid to gaseous ions. The liquid containing peptides is pumped through a micron-sized high-pressure orifice (2-4 kV). Upon reaching this emitter, the stable liquid flow is broken into tiny, highly charged, rapidly evaporating charged droplets, leaving peptide ions in the gas phase. The abundance of gaseous peptide ions directly reflects the concentration of the original protein; therefore, using the lowest possible flow rate can effectively enhance detection sensitivity. In proteomics research, high-performance liquid chromatography (HPLC) is commonly used for the separation of peptide mixtures, with flow rates finely controlled to a few hundred nanoliters per minute, far superior to traditional HPLC flow rates, ensuring more precise detection results.

Figure 1. Overview of sample preparation and instrumentation used in MS-based proteomics

The primary function of the mass analyzer in a mass spectrometer is to separate ions based on their mass-to-charge ratio (m/z). Fundamentally, all ions are separated by adjusting their trajectories in an electric field. The principles used by mass analyzers to separate ions vary, determining their respective application areas. In proteomics, quadrupole mass analyzers are common analytical devices, often used in combination with time-of-flight (TOF) or Orbitrap analyzers. The quadrupole mass analyzer operates by applying an oscillating electric field between four parallel cylindrical rods, with each pair of rods generating a radio frequency field with a phase shift. These fields together shape a pseudo-potential surface that, once configured, can allow all ions to pass through or selectively permit ions within a specific m/z window, achieving effective ion separation.

TOF mass analyzers separate ions by accelerating them to a voltage of about 20 kV and then separating ions based on the time differences in reaching the detector. TOF can detect time differences at the sub-microsecond level, measuring mass differences at the parts-per-million (ppm) level. In contrast, the Orbitrap mass analyzer distinguishes ions based on their oscillation frequency. Ions are tangentially injected and then trapped in the Orbitrap, moving along the axial length of the central metal spindle (Figure 1B). Although the Orbitrap is only a few centimeters long, ions can rapidly travel several kilometers within it, achieving very high resolution (typically reaching tens of thousands) and mass accuracy down to ppm levels.

In proteomics research, the quadrupole component is typically followed by a 'collision cell,' a quadrupole device specifically used for ion fragmentation. Intact peptide ions or fragment ions enter the final stage containing the detector, where the resulting spectra are referred to as MS1 or precursor ion spectra in the former case and MS2 or product ion spectra, also known as MS/MS spectra, in the latter case. TOF instruments use microchannel plate (MCP) detectors to capture ions, releasing electrons whenever an ion contacts its surface, which are then amplified, allowing precise measurement of individual ions. However, this ultrahigh sensitivity presents a challenge: the detector may become saturated in high-signal conditions due to an excessive number of ions. In contrast, the Orbitrap analyzer measures the 'image current' produced by rapidly oscillating ions, which directly reflects the intensity of individual ion packets. The current is recorded in the time domain and converted into the frequency domain through a Fourier transform. Although advances in signal processing algorithms have exponentially improved the resolution achievable within a given signal transient time, unfortunately, the processing speed of these algorithms still lags far behind that of TOF analyzers. Specifically, a single TOF pulse takes only 100 microseconds, whereas the Orbitrap analyzer requires tens to hundreds of milliseconds to complete the entire analysis process.

How does a mass spectrometer sequence or identify peptides? First, the mass spectrometer uses quadrupole or other ion separation devices to separate precursor ions with specific mass-to-charge ratios (m/z). Subsequently, these ions collide with inert gases (such as N2, He, or Ar) in the collision cell, causing fragmentation. During the collision, ions primarily break at the lowest energy bonds, typically some amide bonds (peptide bonds) connecting amino acid residues. This process results in MS/MS spectra with different peak ladders, where the differences between peaks reflect the mass of amino acids. These peak ladder information is highly specific and key to peptide sequence identification. By deeply analyzing the sequences of these amino acids and their masses on both sides (peptide sequence tags), we can identify specific peptides from the human proteome. In practice, it is more common to use database identification, where the database contains all possible fragment spectra generated, and statistical scoring is performed by comparing them with experimental spectra for accurate peptide identification.

Chromatographic retention time is crucial when matching data sets with previous measurements and is key to 'targeted proteomics' techniques. Additionally, ion mobility analysis, as another dimension of peptide ion separation, has been increasingly widely used in recent years. Ions can be filtered based on their cross-sectional area (FAIMS, Field Asymmetric Ion Mobility Spectrometry) or actually separated during analysis (T-Wave or TIMS, Trapped Ion Mobility Spectrometry). TIMS is the foundation of Parallel Accumulation-Serial Fragmentation (PASEF) technology, which enhances sensitivity while increasing sequencing speed by tenfold.

II. Sample Preparation and Specific Enrichment

Mass spectrometry-based proteomics can analyze the protein content of any sample. Besides primary samples such as cells, it can also analyze formalin-fixed paraffin-embedded (FFPE) biopsy tissues and even fossils from hundreds of thousands of years ago. This is because proteins are highly stable, significantly more so than RNA. Typically, proteins are isolated after appropriate biochemical enrichment procedures, such as cell fractionation, affinity enrichment, or proximity analysis, depending on the experimental purpose.

Proteomics sample preparation requires both exquisite skills and scientific rigor. The ultimate result is the digestion of proteins into peptides (Figure 1A). Trypsin is a frequently used enzyme in this process, capable of specifically cleaving at the C-terminus of arginine and lysine, which gives the newly formed C-terminal peptide a positive charge, enhancing peptide ionization and fragmentation. Throughout the sample preparation process, the use of polymers and detergents should be avoided, as these substances can interfere with peptide ionization. By the end of sample preparation, tens of thousands of proteins are converted into hundreds of thousands of purified peptides, with concentration differences potentially reaching up to a million-fold or more.

III. Monitoring Post-Translational Modifications

The primary structure of proteins, the amino acid sequence, often carries modifications, and these post-translational modifications (PTMs) are an efficient and refined regulatory mechanism that can significantly influence protein activity and even function. PTMs are usually sub-stoichiometric, meaning only specific proteins are modified, making the capture and detection of these modifications challenging. Most strategies use antibodies targeting PTMs or leverage the unique chemical properties of PTM groups to enrich modified peptides. Among them, phosphorylation, the most studied PTM, often uses titanium dioxide-based beads for high-specificity enrichment of phosphopeptides. Notably, with the rise of mass spectrometry-based proteomics, it is possible to detect over 10,000 modification sites with single amino acid resolution and extensive signaling networks within cells in just 2 hours, which was previously difficult to achieve. Nowadays, proteomics has become a routine research tool to uncover the important roles of biological processes such as ubiquitination, SUMOylation, acetylation, and glycosylation. However, for less common PTMs, especially those lacking highly specific antibodies, analysis remains challenging.

IV. Data Acquisition and Quantification Strategies

At any given moment during data acquisition by a mass spectrometer, hundreds or thousands of peptides are ionized and simultaneously enter the mass spectrometer. In the past, these peptides were primarily analyzed using data-dependent acquisition (DDA), where users set certain rules (such as m/z, charge, intensity, and cross-section) to selectively capture peptide ions for acquiring MS/MS spectra (Figure 2A). However, due to the number of peptides far exceeding the analysis time limit, this selection process inevitably carries some randomness, resulting in some data becoming missing values. In contrast, data-independent acquisition (DIA) takes a different approach. In this method, the quadrupole continuously cycles through the entire mass range, selecting relatively large m/z ranges (20-40 m/z) (Figure 2B), allowing all ions to be detected and fragmented, ensuring comprehensive and unbiased acquisition of all ion information in the sample. However, this results in very complex MS/MS spectra. Modern software can parse spectra by comparing them with previously acquired 'peptide libraries' to identify multiple peptides, but increasingly, it can be done without comparison. New scanning modes are still being developed to address the 'dynamic range problem'—how to effectively detect extremely low-abundance proteins in the presence of high-abundance proteins. For example, the abundance difference between cytokines and albumin in blood can reach up to 12 orders of magnitude, making the detection of low-abundance proteins particularly challenging.

Figure 2. Two Common Data Acquisition Strategies (DDA and DIA)

Peptide quantification includes label-free quantification and labeled quantification. In label-free quantification (LFQ), researchers extract the mass spectrometry signals of peptides (usually at the MS1 level) from raw data, followed by normalization and comparison under the protein conditions of interest. This method is intuitive and economical, providing great flexibility in project design. However, the quantitative variance of this strategy is relatively high, and if not careful, differences in peptide purity and instrument performance can affect comparisons between individual samples, impacting the accuracy of the results.

Label-based methods use stable isotopes to label proteins under different conditions. The advantage is that these isotope-labeled peptides have the same physicochemical properties but predictable differences in mass. Isotopes can be introduced naturally through metabolic pathways or chemically labeled and 'read.' The latter is known as isobaric tags, where the detection principle is that the tag's mass remains constant, but the distribution of isotopes in the tag becomes apparent after fragmentation, allowing for the differentiation of different samples. In a set of samples containing 6 to 16 different tags, if the samples can be consistently and reproducibly labeled and combined, the quantitative variance is generally lower than that of LFQ. However, isobaric labeling methods like TMT (Tandem Mass Tags) have certain limitations, such as the possibility of co-fragmented peptides suppressing quantitative differences, a phenomenon known as 'ratio compression,' which may affect the accuracy of quantitative results to some extent.

Regardless of the quantification and scanning mode used, the output of a mass spectrometer always includes MS1 and MS/MS spectra. Numerous software programs have been developed to process these data, starting with signal detection, or 'feature discovery,' and then using search engines to precisely match MS/MS spectra with peptide sequences in databases. Next, the software uses complex algorithms to reassemble peptides into proteins, solving the 'protein inference problem.' Finally, precise quantitative analysis is performed at the peptide or protein level (Figure 2C).

In simple terms, the output is a matrix containing a series of proteins and their corresponding abundances in respective samples, filtered using a false discovery rate (FDR) threshold. As research progresses, scientists are no longer satisfied with simple data analysis. Today, efforts are being made to expand this functionality by integrating standard or proteomics-specific bioinformatics workflows (including machine learning techniques) and combining proteomics data with other types of omics data (such as various next-generation sequencing (NGS) methods).

V. Multidimensional Readouts of Functional Cellular States

The development of mass spectrometry technology has entered a new phase, providing strong support for the identification and quantification of proteomes, protein-protein interactions (interactomics), organelle proteomics, and post-translational modification detection, among many other cutting-edge applications (Figure 3). Today, this technology is widely used in the medical field, especially in the routine use of biomarker identification. Although mass spectrometry-based proteomics technology may be more complex in operation compared to antibody-based methods, its excellent detection specificity and comprehensiveness fully compensate for this shortcoming.

Figure 3. Some Common Applications of MS-Based Proteomics in Biology

Proteomics plays a crucial role in biological research, serving as a bridge that connects the gap between genotype and phenotype. Even if genetic information has abnormalities, it does not necessarily mean it will directly affect cellular functions. Proteomics can assess the specific impacts of these genomic abnormalities on protein functions, thereby providing more specific biomarkers or new therapeutic targets for disease subtypes.

In recent years, the significant improvement in the sensitivity of mass spectrometry has opened new horizons for single-cell proteomics research. The advantage of this approach is that it allows in-depth study of individual cells while preserving all spatial information of the cellular environment. Compared to mRNA, proteins are more abundant, making single-cell proteomics research more robust and reliable. Mass spectrometry-based single-cell proteomics can directly reveal dynamic changes between cells (such as receptor-ligand interactions between cells and their microenvironment), providing new perspectives for understanding the complexity of cell communication and behavior.

In summary, we hope to clarify some basic concepts of mass spectrometry-based proteomics through this article. Proteins are multifunctional biomolecules, and their functions do not solely depend on their abundance. Similarly, mass spectrometry-based proteomics possesses rich diversity, capable of flexibly adapting to and thoroughly investigating various aspects of proteins involved in biological functions. Whether identifying and quantifying proteins or analyzing protein-protein interactions, mass spectrometry technology provides robust support, helping us delve deeper into the mysteries of life.

References:

Ankit Sinha; Matthias Mann. A beginner’s guide to mass spectrometry–based proteomics[J], Biochem (Lond) (2020) 42 (5): 64–69.

BiotechPack, A Biopharmaceutical Characterization and Multi-Omics Mass Spectrometry (MS) Services Provider

Related Services:

Protein Mass Spectrometry Identification

Proteomics Analysis

Protein Molecular Weight Determination

Protein Gel Spot, Gel Strip, IP Sample Protein Identification

Peptide Mass Spectrometry Identification

Shotgun Proteomics Identification

Membrane Protein Identification Service

Pull-down Target Protein Mass Spectrometry Identification